Monday, June 3, 2013

Simulation Class: Iterative Matrix Solvers and the Jacobi Method

Iterative Methods for Large Linear Systems

In our previous post, we carefully discretized the equation of motion for a 1D wave equation into a large set of simple linear equations, which we then converted into a sparse matrix notation. It's important to understand that the matrix & vector notation that we reached at the end of our derivation, after creating a list of simultaneous linear equations, was just a change of notation, nothing more. Matrices and the notation surrounding linear algebra can sometimes be obfuscating and generate confusion - whenever that is the case, it's helpful to just remember that the matrix notation is a short-hand, and that we can always revert to the system of equations that the matrix expression represents.

So, armed with this fantastically simple linear matrix equation, we are tasked with solving the system for the unknown solution vector, \(x\):

\[ \begin{equation} A x = b \label{eq:AxEqualsB} \end{equation} \]
Surely we can just construct our matrix \(A\), invert the matrix to produce \(A^{-1}\), and then our solution would be:
\[ \begin{equation} x = A^{-1} b \label{eq:XequalsAinvB} \end{equation} \]
This simple solution is comforting, but sadly, totally unusable in any practical setting. Why?

The first and most important problem we run into with an explicit matrix inversion solution is that the matrices themselves would be incredibly massive in size for any real-world problem. Imagine we have a fluid simulation in 2D that is \(1024 \times 1024\) in size. In any such simulation, our state array size \(N\) is equal to the total number of points being solved for, so in this case, \(N = 1024 \times 1024 = 1048576\). The matrix \(A\) is, as we've seen, size \(N \times N\), which would in that simple case be \(N^4 = 109951162776\), or just slightly larger than \(10^{12}\) points, which would take just over 4 TERABYTES of RAM to hold, for single precision floating point.

Sparse Matrices

The obvious immediate solution to the above storage problem is to take advantage of the fact that our matrices are sparse. A Sparse Matrix is one in which most of the matrix elements are zero. In our example, we notice that there is at most three non-zero elements per-row, one of which is always the diagonal element. The other non-zero elements are always one column before or after our diagonal element. We could be clever and store our matrix \(A\) as an array of \(N\) "Sparse Rows", where a Sparse Row has a structure something like this:


class SparseMatrixRow
{
    float indices[3];
    float values[3];
    int numIndices;
    SparseMatrixRow()
    {
        numIndices = 0;
    }
    void addElement( int i_index, float i_val )
    {
        if ( numIndices < 3 )
        {
            values[numIndices] = i_val;
            indices[numIndices] = i_index;
            ++numIndices;
        }
    }
};

Indeed, structures similar to the one above are the basis of sparse-matrix linear algebra libraries such as Sparse BLAS. With the above representation, our matrix in the 2D height field case would take about 28 megabytes to store, a huge improvement over 4 terabytes.

However, even with the sparse representation of \(A\), how do we invert a huge matrix like this? If you've ever worked to invert a \(3 \times 3\) or \(4 \times 4\) matrix, you know how complex that could be - how on earth would you go about inverting a square matrix with a million rows, and how would you store your intermediate results, which may not enjoy the simple sparse form that the original matrix \(A\) had?

Problems like the above are what supercomputers were essentially built to solve, and there are excellent libraries and tools for doing this. However, such systems are arcane in nature, usually written in FORTRAN, and in our case, drastically more firepower than we actually need.

Iterative Methods

Our matrix has some nice properties that will allow us to solve the problem much more simply. Firstly, it is symmetric. A Symmetric Matrix is one which is identical under a transposition of the rows and columns. If you take any element to the upper right of the diagonal, and look at the mirror of that element in the same location on the lower left of the diagonal, you'll see the same value. A symmetric matrix as a representation of a physical equation of motion is an expression of Newton's Third Law, that each action has an equal but opposite reaction. It means that the effect point \(i\) has on point \(j\) is the same as the effect point \(j\) has on point \(i\). Conservation of energy emerges from this law.

Secondly, assuming that our timestep \(\Delta t\) is relatively small compared to the ratio of wave propagation speed \(c\) to the spatial discretization \(\Delta x\), our off-diagonal elements will be much smaller in absolute magnitude than our diagonal elements. This means that our system is a candidate for The Jacobi Method, which is the simplest iterative \(A x = b\) solver.

Iterative methods, which include Jacobi, but also include more fancy solvers such as The Conjugate Gradient Method, all involve some variation on making an initial guess for the unknown solution vector \(x^{\star0}\), then using that guess in some way to produce a better guess \(x^{\star1}\), and so on, until the change between guesses from one iteration to the next falls below some acceptance threshold, or a maximum number of iterations is reached.

For all iterative methods, we will require storage for the next guess and the previous guess, and we'll swap these temporary storage arrays back and forth as we iterate towards the solution. Our generic state-vector implementation easily allows for this, and indeed we already used this facility when we computed our RK4 approximation.

Jacobi Method

The Jacobi Method is so simple, we don't even need to construct a sparse matrix or a solution vector, we can simply solve for the new estimate of the solution vector \(x\) in place.

Jacobi works by taking each equation and assuming that everything other than the diagonal element is known - which we do by using values from the previous guess of \(x\). The new guess for the diagonal element \(x\) is then, at any row of the matrix:

\[ \begin{equation} x^{\star k+1}_i = \frac{b_i - \sum_{j\neq i}A_{i,j} x^{\star k}_{j}}{A_{i,i}} \label{eq:JacobiIteration} \end{equation} \]
In the above Equation \(\eqref{eq:JacobiIteration}\), the new guess iteration \(k+1\) is computed from the values of the old guess iteration \(k\). This is fairly easy to code, and looks like this:


// Jacobi iteration to get temp acceleration
void JacobiIterationAccel( int i_aOld, int o_aNew, int i_hStar, float i_dt )
{
    float aij = -sq( WaveSpeed ) * sq( i_dt ) / sq( DX );
    float aii = 1.0 + ( 2.0 * sq( WaveSpeed ) * sq( i_dt ) / sq( DX ) );
    float bgain = sq( WaveSpeed ) / sq( DX );
    
    for ( int i = 1; i < ArraySize-1; ++i )
    {
        float aLeft = State[i_aOld][i-1];
        float aRight = State[i_aOld][i+1];
        
        float hStarLeft = State[i_hStar][i-1];
        float hStarCen = State[i_hStar][i];
        float hStarRight = State[i_hStar][i+1];
        
        float bi = bgain * ( hStarLeft + hStarRight - ( 2.0 * hStarCen ) );
        
        float sumOffDiag = ( aij * aLeft ) + ( aij * aRight );
        
        State[o_aNew][i] = ( bi - sumOffDiag ) / aii;
    }
    
    EnforceAccelBoundaryConditions( o_aNew );
}

The code above computes a single new guess for the accelerations from an old guess for the accelerations. We can then write our "acceleration from position" function as we had before, as follows. We're taking a fixed number of iterations for guessing:


// Solve for acceleration.
void JacobiSolveAccel( int i_hStar, float i_dt )
{  
    // Initialize acceleration to zero.
    FillArray( StateAccelStar, 0.0 );
    
    // Solve from StateJacobiTmp into StateAccel
    for ( int iter = 0; iter < 20; ++iter )
    {
        int tmp = StateAccelStar;
        StateAccelStar = StateJacobiTmp;
        StateJacobiTmp = tmp;
        
        JacobiIterationAccel( StateJacobiTmp, StateAccelStar, i_hStar, i_dt );
    }
}

void EstimateAccelStar( float i_dt )
{
    JacobiSolveAccel( StateHeightStar, i_dt );
}

The rest of our implementation looks basically identical. Here's the final timestep, which looks exactly like our previous RK4 implementation:


// Time Step function.
void TimeStep( float i_dt )
{
    // Swap state
    SwapState();
    
    // Initialize estimate. This just amounts to copying
    // The previous values into the current values.
    CopyArray( StateHeightPrev, StateHeight );
    CopyArray( StateVelPrev, StateVel );
    
    // 1
    CopyArray( StateVel, StateVelStar );
    EstimateHeightStar( i_dt );
    EstimateAccelStar( i_dt );
    AccumulateEstimate( i_dt / 6.0 );
    
    // 2
    EstimateVelStar( i_dt / 2.0 );
    EstimateHeightStar( i_dt / 2.0 );
    EstimateAccelStar( i_dt / 2.0 );
    AccumulateEstimate( i_dt / 3.0 );
    
    // 3
    EstimateVelStar( i_dt / 2.0 );
    EstimateHeightStar( i_dt / 2.0 );
    EstimateAccelStar( i_dt / 2.0 );
    AccumulateEstimate( i_dt / 3.0 );
    
    // 4
    EstimateVelStar( i_dt );
    EstimateHeightStar( i_dt );
    EstimateAccelStar( i_dt );
    AccumulateEstimate( i_dt / 6.0 );
    
    // Final boundary conditions on height and vel
    EnforceHeightBoundaryConditions( StateHeight );
    EnforceVelBoundaryConditions( StateVel );

    // Update current time.
    StateCurrentTime += i_dt;
}

And that's it... Let's see how that looks in Processing!

And with that, we're finished... (for now!)

Download

All of the files associated with this class are available via a public github repository, found here: https://github.com/blackencino/SimClass

Simulation Class: Matrix Methods and the Wave Equation

The Wave Equation as a Linear System

In our previous class, we completed our exploration of higher-order explicit time-integration techniques, and moved from a simple spring system to a more complex 1-D Wave Equation. Using the fourth-order Runge-Kutta (RK4) integration scheme, we adapted our spring solver to the wave equation, which seemed stable at first, but as soon as we introduced external forces, the system showed immediate instabilities.

What follows in this post is a step-by-step progression, starting with the original PDE of the 1D Wave Equation, and eventually producing a linear system, represented by the equation \(A x = b\), where \(A\) is a square, mostly sparse matrix of size \(N\times N\), where \(N\) is the number of unknown values in the discretized simulation grid (either heights, or accelerations). This post is a lot of math, but all of it is manipulation of simple terms and equations in a linear algebra setting. I've broken things down to the smallest steps because this is such a core concept to how fluid simulations and grid simulations are performed.

The problem in our previous solution is related to how we calculated our fluid vertical acceleration from the fluid's height. To start with, the 1D Wave Equation is written as:

\[ \begin{equation} \frac{\partial^2 h}{\partial t^2} = c^2 \frac{\partial^2 h}{\partial x^2} \label{eq:Wave1D} \end{equation} \]

Where \(h\) represents height at a given location along an axis, and \(c^2\) represents the square of the wave propagation speed. I think this is my favorite equation, because of the implied equivalency of spatial and temporal dimensions. It says that, as far as waves are concerned, space and time are interchangeable (assuming you choose your units correctly). Anyway, let's examine how we discretized this solution for a given point in our height array, \(h_i\) and its acceleration \(a_i\).

\[ \begin{equation} a_i = c^2 \frac{h_{i+1} -\ 2 h_i + \ h_{i-1}}{\Delta x^2} \label{eq:WaveDisc1} \end{equation} \]
The above equation, \(\eqref{eq:WaveDisc1}\) is just a discretization of Equation \(\eqref{eq:Wave1D}\) above, with \(\frac{\partial^2 h}{\partial t^2}\) taken as acceleration. However, unlike the simple spring system, where the position of the spring varied only in time, our position varies in space as well, which means we have to consider that every point in the array of heights depends on not only its previous positions and velocities, but the heights and velocities and accelerations of neighboring points. This was the oversimplification in our previous attempt, and the source of the instability.

In order to understand this system better, let's replace the acceleration with a finite difference approximation, based on the height and previous heights. Starting with the backwards finite difference approximation of \(a_i\):

\[ \begin{equation} a_{i_{t+\Delta t}} \approx \frac{h_{i_{t+\Delta t}} - \ 2 h_{i_{t}} + \ h_{i_{t-\Delta t}}}{\Delta t^2} \label{eq:FiniteDiffA} \end{equation} \]
we can follow the usual convention in height-field solvers, eliminating acceleration by substituting Equation \(\eqref{eq:FiniteDiffA}\) into Equation \(\eqref{eq:WaveDisc1}\) to get the following expression which contains only height values (at different times and positions):
\[ \begin{equation} \frac{h_{i_{t+\Delta t}} - \ 2 h_{i_{t}} + \ h_{i_{t-\Delta t}}}{\Delta t^2} = c^2 \frac{h_{{i+1}_{t+\Delta t}} - \ 2 h_{i_{t+\Delta t}} + \ h_{{i-1}_{t+\Delta t}} } {\Delta x^2} \label{eq:WaveHeightDisc1} \end{equation} \]
However, our RK4 solver and our other time-integration techniques were written to employ a calculation of acceleration based on an estimate of current position and velocity, so it's more helpful if we instead begin with the following implicit relationship between acceleration, velocity, and height:
\[ \begin{equation} h_{i_{t+\Delta t}} = h_{i_t} + \Delta t\ v_{i_{t+\Delta t}} + \Delta t^2\ a_{i_{t+\Delta t}} \label{eq:ImplicitHeightTimeStep} \end{equation} \]
Let's treat the first two terms of the right hand side of Equation \(\eqref{eq:ImplicitHeightTimeStep}\) as an estimate of height at time \(t+\Delta t\), denoted by \(h^\star\), indicated as follows:
\[ \begin{equation} h^\star_{i_{t+\Delta t}} = h_{i_t} + \Delta t\ v_{i_{t+\Delta t}} \label{eq:HeightEstimate} \end{equation} \]
And then we can substitute Equation \(\eqref{eq:HeightEstimate}\) into Equation \(\eqref{eq:ImplicitHeightTimeStep}\) to get:
\[ \begin{equation} h_{i_{t+\Delta t}} = h^\star_{i_{t+\Delta t}} + \Delta t^2\ a_{i_{t+\Delta t}} \label{eq:EstimatedHeightTimeStep} \end{equation} \]
Now, we can subsitute Equation \(\eqref{eq:EstimatedHeightTimeStep}\) into Equation \(\eqref{eq:WaveDisc1}\) to get the following equation, written in terms of the current acceleration and an estimate of the current height.
\[ \begin{equation} a_{i_{t+\Delta t}} = c^2 \frac{h^\star_{{i+1}_{t+\Delta t}} - \ 2 h^\star_{i_{t+\Delta t}} + \ h^\star_{{i-1}_{t+\Delta t}} } {\Delta x^2} + c^2 \Delta t^2 \frac{a_{{i+1}_{t+\Delta t}} - \ 2 a_{i_{t+\Delta t}} + \ a_{{i-1}_{t+\Delta t}} } {\Delta x^2} \label{eq:AccelDisc1Subscripted} \end{equation} \]
All of the terms in Equation \(\eqref{eq:AccelDisc1Subscripted}\) are at the same point in time, so we can drop the time subscript. Assuming that the estimates of height are given by some process (which we are free to tinker with), the equation is a linear relationship between accelerations at different points in space. Let's simplify by defining intermediate constants:
\[ \begin{equation} \kappa = \frac{c^2 \Delta t^2}{\Delta x^2}, \ \gamma = \frac{c^2}{\Delta x^2} \label{eq:Constants} \end{equation} \]
Substituting in our constants, moving all of the acceleration terms to the left-hand side, and gathering coefficients, we have the following equation which expresses the relationship between the acceleration at the index \(i\) and its spatially adjacent neighbors:
\[ \begin{equation} (1 + 2 \kappa) a_i + ( -\kappa ) a_{i-1} + ( -\kappa ) a_{i+1} = \gamma ( h^\star_{i+1} - \ 2 h^\star_i + \ h^\star_{i-1} ) \label{eq:AccelOneMatrixRow} \end{equation} \]
The above Equation \(\eqref{eq:AccelOneMatrixRow}\) is written relative to any position in the array of heights, denoted by the subscript variable \(i\), and this relationship exists at each point in the array. We'll next write out the equations at each index explicitly, to get a system of linear equations. We have to take a bit of care at the boundary points. Our boundary condition is that heights at the boundary are equal to the height at the adjacent, non-boundary position, which, when transformed to a boundary condition on acceleration, says that the acceleration at the boundary is zero. Therefore, we have no equation for indices \(i = 0\) and \(i = N-1\), because we've explicitly defined those points. Furthermore, for indices \(i = 1\) and \(i = N-2\), the relationship is slightly changed, which we'll incorporate into the equations.
\[ \begin{eqnarray*} \ &\ & (1+2\kappa)a_1& +& (-\kappa)a_2& =& \gamma ( - h^\star_1 + h^\star_2 ) \\ (-\kappa)a_1& +& (1+2\kappa)a_2& +& (-\kappa)a_3& =& \gamma ( h^\star_1 - 2 h^\star_2 + h^\star_3 ) \\ (-\kappa)a_2& +& (1+2\kappa)a_3& +& (-\kappa)a_4& =& \gamma ( h^\star_2 - 2 h^\star_3 + h^\star_4 ) \\ ... \\ (-\kappa)a_{n-5}& +& (1+2\kappa)a_{n-4}& +& (-\kappa)a_{n-3}& =& \gamma ( h^\star_{n-5} - 2 h^\star_{n-4} + h^\star_{n-3} ) \\ (-\kappa)a_{n-4}& +& (1+2\kappa)a_{n-3}& +& (-\kappa)a_{n-2}& =& \gamma ( h^\star_{n-4} - 2 h^\star_{n-3} + h^\star_{n-2} ) \\ (-\kappa)a_{n-3}& +& (1+2\kappa)a_{n-2}&\ &\ & =& \gamma ( h^\star_{n-3} - h^\star_{n-2} ) \\ \label{eq:AccelLinearSystem} \end{eqnarray*} \]
This above system of linear equations can be expressed as a sparse matrix equation:
\[ \begin{equation} A x = b \end{equation} \]
Where the sparse, symmetric matrix \(A\) is defined as:
\[ \begin{equation} A = \begin{bmatrix} 1+2\kappa & -\kappa & \ & \ & \ & \ & \cdots \\ -\kappa & 1+2\kappa & -\kappa & \ & \ & \ & \cdots \\ \ & -\kappa & 1+2\kappa & -\kappa & \ & \ & \cdots \\ \ \vdots & \ & \ddots & \ddots & \ddots & \ & \cdots \\ \ \cdots & \ & \ & -\kappa & 1+2\kappa & -\kappa & \ \\ \ \cdots & \ & \ & \ & -\kappa & 1+2\kappa & -\kappa \\ \ \cdots & \ & \ & \ & \ & -\kappa & 1+2\kappa \\ \end{bmatrix} \end{equation} \]
and the column vectors \(x\) and \(b\) are defined as:
\[ \begin{equation} x = \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ \vdots \\ a_{n-4} \\ a_{n-3} \\ a_{n-2} \\ \end{bmatrix} , b = \begin{bmatrix} \gamma ( - h^\star_1 + h^\star_2 ) \\ \gamma ( h^\star_1 - 2 h^\star_2 + h^\star_3 ) \\ \gamma ( h^\star_2 - 2 h^\star_3 + h^\star_4 ) \\ \vdots \\ \gamma ( h^\star_{n-5} - 2 h^\star_{n-4} + h^\star_{n-3} ) \\ \gamma ( h^\star_{n-4} - 2 h^\star_{n-3} + h^\star_{n-2} ) \\ \gamma ( h^\star_{n-3} - h^\star_{n-2} ) \\ \end{bmatrix} \end{equation} \]
We have thus transformed our acceleration computation problem into an \(A x = b\) matrix problem, a problem which has been carefully studied, and for which many extremely well-understood methods of solution exist. We'll explore simple approaches to solving this system in the next post.

Download

All of the files associated with this class are available via a public github repository, found here: https://github.com/blackencino/SimClass