Iterative Methods for Large Linear Systems
In our previous post, we carefully discretized the equation of motion for a 1D wave equation into a large set of simple linear equations, which we then converted into a sparse matrix notation. It's important to understand that the matrix & vector notation that we reached at the end of our derivation, after creating a list of simultaneous linear equations, was just a change of notation, nothing more. Matrices and the notation surrounding linear algebra can sometimes be obfuscating and generate confusion - whenever that is the case, it's helpful to just remember that the matrix notation is a short-hand, and that we can always revert to the system of equations that the matrix expression represents.So, armed with this fantastically simple linear matrix equation, we are tasked with solving the system for the unknown solution vector, \(x\):
The first and most important problem we run into with an explicit matrix inversion solution is that the matrices themselves would be incredibly massive in size for any real-world problem. Imagine we have a fluid simulation in 2D that is \(1024 \times 1024\) in size. In any such simulation, our state array size \(N\) is equal to the total number of points being solved for, so in this case, \(N = 1024 \times 1024 = 1048576\). The matrix \(A\) is, as we've seen, size \(N \times N\), which would in that simple case be \(N^4 = 109951162776\), or just slightly larger than \(10^{12}\) points, which would take just over 4 TERABYTES of RAM to hold, for single precision floating point.
Sparse Matrices
The obvious immediate solution to the above storage problem is to take advantage of the fact that our matrices are sparse. A Sparse Matrix is one in which most of the matrix elements are zero. In our example, we notice that there is at most three non-zero elements per-row, one of which is always the diagonal element. The other non-zero elements are always one column before or after our diagonal element. We could be clever and store our matrix \(A\) as an array of \(N\) "Sparse Rows", where a Sparse Row has a structure something like this:
class SparseMatrixRow
{
float indices[3];
float values[3];
int numIndices;
SparseMatrixRow()
{
numIndices = 0;
}
void addElement( int i_index, float i_val )
{
if ( numIndices < 3 )
{
values[numIndices] = i_val;
indices[numIndices] = i_index;
++numIndices;
}
}
};
Indeed, structures similar to the one above are the basis of sparse-matrix linear algebra libraries such as Sparse BLAS. With the above representation, our matrix in the 2D height field case would take about 28 megabytes to store, a huge improvement over 4 terabytes.
However, even with the sparse representation of \(A\), how do we invert a huge matrix like this? If you've ever worked to invert a \(3 \times 3\) or \(4 \times 4\) matrix, you know how complex that could be - how on earth would you go about inverting a square matrix with a million rows, and how would you store your intermediate results, which may not enjoy the simple sparse form that the original matrix \(A\) had?
Problems like the above are what supercomputers were essentially built to solve, and there are excellent libraries and tools for doing this. However, such systems are arcane in nature, usually written in FORTRAN, and in our case, drastically more firepower than we actually need.
Iterative Methods
Our matrix has some nice properties that will allow us to solve the problem much more simply. Firstly, it is symmetric. A Symmetric Matrix is one which is identical under a transposition of the rows and columns. If you take any element to the upper right of the diagonal, and look at the mirror of that element in the same location on the lower left of the diagonal, you'll see the same value. A symmetric matrix as a representation of a physical equation of motion is an expression of Newton's Third Law, that each action has an equal but opposite reaction. It means that the effect point \(i\) has on point \(j\) is the same as the effect point \(j\) has on point \(i\). Conservation of energy emerges from this law.Secondly, assuming that our timestep \(\Delta t\) is relatively small compared to the ratio of wave propagation speed \(c\) to the spatial discretization \(\Delta x\), our off-diagonal elements will be much smaller in absolute magnitude than our diagonal elements. This means that our system is a candidate for The Jacobi Method, which is the simplest iterative \(A x = b\) solver.
Iterative methods, which include Jacobi, but also include more fancy solvers such as The Conjugate Gradient Method, all involve some variation on making an initial guess for the unknown solution vector \(x^{\star0}\), then using that guess in some way to produce a better guess \(x^{\star1}\), and so on, until the change between guesses from one iteration to the next falls below some acceptance threshold, or a maximum number of iterations is reached.
For all iterative methods, we will require storage for the next guess and the previous guess, and we'll swap these temporary storage arrays back and forth as we iterate towards the solution. Our generic state-vector implementation easily allows for this, and indeed we already used this facility when we computed our RK4 approximation.
Jacobi Method
The Jacobi Method is so simple, we don't even need to construct a sparse matrix or a solution vector, we can simply solve for the new estimate of the solution vector \(x\) in place.Jacobi works by taking each equation and assuming that everything other than the diagonal element is known - which we do by using values from the previous guess of \(x\). The new guess for the diagonal element \(x\) is then, at any row of the matrix:
// Jacobi iteration to get temp acceleration
void JacobiIterationAccel( int i_aOld, int o_aNew, int i_hStar, float i_dt )
{
float aij = -sq( WaveSpeed ) * sq( i_dt ) / sq( DX );
float aii = 1.0 + ( 2.0 * sq( WaveSpeed ) * sq( i_dt ) / sq( DX ) );
float bgain = sq( WaveSpeed ) / sq( DX );
for ( int i = 1; i < ArraySize-1; ++i )
{
float aLeft = State[i_aOld][i-1];
float aRight = State[i_aOld][i+1];
float hStarLeft = State[i_hStar][i-1];
float hStarCen = State[i_hStar][i];
float hStarRight = State[i_hStar][i+1];
float bi = bgain * ( hStarLeft + hStarRight - ( 2.0 * hStarCen ) );
float sumOffDiag = ( aij * aLeft ) + ( aij * aRight );
State[o_aNew][i] = ( bi - sumOffDiag ) / aii;
}
EnforceAccelBoundaryConditions( o_aNew );
}
The code above computes a single new guess for the accelerations from an old guess for the accelerations. We can then write our "acceleration from position" function as we had before, as follows. We're taking a fixed number of iterations for guessing:
// Solve for acceleration.
void JacobiSolveAccel( int i_hStar, float i_dt )
{
// Initialize acceleration to zero.
FillArray( StateAccelStar, 0.0 );
// Solve from StateJacobiTmp into StateAccel
for ( int iter = 0; iter < 20; ++iter )
{
int tmp = StateAccelStar;
StateAccelStar = StateJacobiTmp;
StateJacobiTmp = tmp;
JacobiIterationAccel( StateJacobiTmp, StateAccelStar, i_hStar, i_dt );
}
}
void EstimateAccelStar( float i_dt )
{
JacobiSolveAccel( StateHeightStar, i_dt );
}
The rest of our implementation looks basically identical. Here's the final timestep, which looks exactly like our previous RK4 implementation:
// Time Step function.
void TimeStep( float i_dt )
{
// Swap state
SwapState();
// Initialize estimate. This just amounts to copying
// The previous values into the current values.
CopyArray( StateHeightPrev, StateHeight );
CopyArray( StateVelPrev, StateVel );
// 1
CopyArray( StateVel, StateVelStar );
EstimateHeightStar( i_dt );
EstimateAccelStar( i_dt );
AccumulateEstimate( i_dt / 6.0 );
// 2
EstimateVelStar( i_dt / 2.0 );
EstimateHeightStar( i_dt / 2.0 );
EstimateAccelStar( i_dt / 2.0 );
AccumulateEstimate( i_dt / 3.0 );
// 3
EstimateVelStar( i_dt / 2.0 );
EstimateHeightStar( i_dt / 2.0 );
EstimateAccelStar( i_dt / 2.0 );
AccumulateEstimate( i_dt / 3.0 );
// 4
EstimateVelStar( i_dt );
EstimateHeightStar( i_dt );
EstimateAccelStar( i_dt );
AccumulateEstimate( i_dt / 6.0 );
// Final boundary conditions on height and vel
EnforceHeightBoundaryConditions( StateHeight );
EnforceVelBoundaryConditions( StateVel );
// Update current time.
StateCurrentTime += i_dt;
}
And that's it... Let's see how that looks in Processing!
And with that, we're finished... (for now!)