Monday, June 3, 2013

Simulation Class: Iterative Matrix Solvers and the Jacobi Method

Iterative Methods for Large Linear Systems

In our previous post, we carefully discretized the equation of motion for a 1D wave equation into a large set of simple linear equations, which we then converted into a sparse matrix notation. It's important to understand that the matrix & vector notation that we reached at the end of our derivation, after creating a list of simultaneous linear equations, was just a change of notation, nothing more. Matrices and the notation surrounding linear algebra can sometimes be obfuscating and generate confusion - whenever that is the case, it's helpful to just remember that the matrix notation is a short-hand, and that we can always revert to the system of equations that the matrix expression represents.

So, armed with this fantastically simple linear matrix equation, we are tasked with solving the system for the unknown solution vector, \(x\):

\[ \begin{equation} A x = b \label{eq:AxEqualsB} \end{equation} \]
Surely we can just construct our matrix \(A\), invert the matrix to produce \(A^{-1}\), and then our solution would be:
\[ \begin{equation} x = A^{-1} b \label{eq:XequalsAinvB} \end{equation} \]
This simple solution is comforting, but sadly, totally unusable in any practical setting. Why?

The first and most important problem we run into with an explicit matrix inversion solution is that the matrices themselves would be incredibly massive in size for any real-world problem. Imagine we have a fluid simulation in 2D that is \(1024 \times 1024\) in size. In any such simulation, our state array size \(N\) is equal to the total number of points being solved for, so in this case, \(N = 1024 \times 1024 = 1048576\). The matrix \(A\) is, as we've seen, size \(N \times N\), which would in that simple case be \(N^4 = 109951162776\), or just slightly larger than \(10^{12}\) points, which would take just over 4 TERABYTES of RAM to hold, for single precision floating point.

Sparse Matrices

The obvious immediate solution to the above storage problem is to take advantage of the fact that our matrices are sparse. A Sparse Matrix is one in which most of the matrix elements are zero. In our example, we notice that there is at most three non-zero elements per-row, one of which is always the diagonal element. The other non-zero elements are always one column before or after our diagonal element. We could be clever and store our matrix \(A\) as an array of \(N\) "Sparse Rows", where a Sparse Row has a structure something like this:


class SparseMatrixRow
{
    float indices[3];
    float values[3];
    int numIndices;
    SparseMatrixRow()
    {
        numIndices = 0;
    }
    void addElement( int i_index, float i_val )
    {
        if ( numIndices < 3 )
        {
            values[numIndices] = i_val;
            indices[numIndices] = i_index;
            ++numIndices;
        }
    }
};

Indeed, structures similar to the one above are the basis of sparse-matrix linear algebra libraries such as Sparse BLAS. With the above representation, our matrix in the 2D height field case would take about 28 megabytes to store, a huge improvement over 4 terabytes.

However, even with the sparse representation of \(A\), how do we invert a huge matrix like this? If you've ever worked to invert a \(3 \times 3\) or \(4 \times 4\) matrix, you know how complex that could be - how on earth would you go about inverting a square matrix with a million rows, and how would you store your intermediate results, which may not enjoy the simple sparse form that the original matrix \(A\) had?

Problems like the above are what supercomputers were essentially built to solve, and there are excellent libraries and tools for doing this. However, such systems are arcane in nature, usually written in FORTRAN, and in our case, drastically more firepower than we actually need.

Iterative Methods

Our matrix has some nice properties that will allow us to solve the problem much more simply. Firstly, it is symmetric. A Symmetric Matrix is one which is identical under a transposition of the rows and columns. If you take any element to the upper right of the diagonal, and look at the mirror of that element in the same location on the lower left of the diagonal, you'll see the same value. A symmetric matrix as a representation of a physical equation of motion is an expression of Newton's Third Law, that each action has an equal but opposite reaction. It means that the effect point \(i\) has on point \(j\) is the same as the effect point \(j\) has on point \(i\). Conservation of energy emerges from this law.

Secondly, assuming that our timestep \(\Delta t\) is relatively small compared to the ratio of wave propagation speed \(c\) to the spatial discretization \(\Delta x\), our off-diagonal elements will be much smaller in absolute magnitude than our diagonal elements. This means that our system is a candidate for The Jacobi Method, which is the simplest iterative \(A x = b\) solver.

Iterative methods, which include Jacobi, but also include more fancy solvers such as The Conjugate Gradient Method, all involve some variation on making an initial guess for the unknown solution vector \(x^{\star0}\), then using that guess in some way to produce a better guess \(x^{\star1}\), and so on, until the change between guesses from one iteration to the next falls below some acceptance threshold, or a maximum number of iterations is reached.

For all iterative methods, we will require storage for the next guess and the previous guess, and we'll swap these temporary storage arrays back and forth as we iterate towards the solution. Our generic state-vector implementation easily allows for this, and indeed we already used this facility when we computed our RK4 approximation.

Jacobi Method

The Jacobi Method is so simple, we don't even need to construct a sparse matrix or a solution vector, we can simply solve for the new estimate of the solution vector \(x\) in place.

Jacobi works by taking each equation and assuming that everything other than the diagonal element is known - which we do by using values from the previous guess of \(x\). The new guess for the diagonal element \(x\) is then, at any row of the matrix:

\[ \begin{equation} x^{\star k+1}_i = \frac{b_i - \sum_{j\neq i}A_{i,j} x^{\star k}_{j}}{A_{i,i}} \label{eq:JacobiIteration} \end{equation} \]
In the above Equation \(\eqref{eq:JacobiIteration}\), the new guess iteration \(k+1\) is computed from the values of the old guess iteration \(k\). This is fairly easy to code, and looks like this:


// Jacobi iteration to get temp acceleration
void JacobiIterationAccel( int i_aOld, int o_aNew, int i_hStar, float i_dt )
{
    float aij = -sq( WaveSpeed ) * sq( i_dt ) / sq( DX );
    float aii = 1.0 + ( 2.0 * sq( WaveSpeed ) * sq( i_dt ) / sq( DX ) );
    float bgain = sq( WaveSpeed ) / sq( DX );
    
    for ( int i = 1; i < ArraySize-1; ++i )
    {
        float aLeft = State[i_aOld][i-1];
        float aRight = State[i_aOld][i+1];
        
        float hStarLeft = State[i_hStar][i-1];
        float hStarCen = State[i_hStar][i];
        float hStarRight = State[i_hStar][i+1];
        
        float bi = bgain * ( hStarLeft + hStarRight - ( 2.0 * hStarCen ) );
        
        float sumOffDiag = ( aij * aLeft ) + ( aij * aRight );
        
        State[o_aNew][i] = ( bi - sumOffDiag ) / aii;
    }
    
    EnforceAccelBoundaryConditions( o_aNew );
}

The code above computes a single new guess for the accelerations from an old guess for the accelerations. We can then write our "acceleration from position" function as we had before, as follows. We're taking a fixed number of iterations for guessing:


// Solve for acceleration.
void JacobiSolveAccel( int i_hStar, float i_dt )
{  
    // Initialize acceleration to zero.
    FillArray( StateAccelStar, 0.0 );
    
    // Solve from StateJacobiTmp into StateAccel
    for ( int iter = 0; iter < 20; ++iter )
    {
        int tmp = StateAccelStar;
        StateAccelStar = StateJacobiTmp;
        StateJacobiTmp = tmp;
        
        JacobiIterationAccel( StateJacobiTmp, StateAccelStar, i_hStar, i_dt );
    }
}

void EstimateAccelStar( float i_dt )
{
    JacobiSolveAccel( StateHeightStar, i_dt );
}

The rest of our implementation looks basically identical. Here's the final timestep, which looks exactly like our previous RK4 implementation:


// Time Step function.
void TimeStep( float i_dt )
{
    // Swap state
    SwapState();
    
    // Initialize estimate. This just amounts to copying
    // The previous values into the current values.
    CopyArray( StateHeightPrev, StateHeight );
    CopyArray( StateVelPrev, StateVel );
    
    // 1
    CopyArray( StateVel, StateVelStar );
    EstimateHeightStar( i_dt );
    EstimateAccelStar( i_dt );
    AccumulateEstimate( i_dt / 6.0 );
    
    // 2
    EstimateVelStar( i_dt / 2.0 );
    EstimateHeightStar( i_dt / 2.0 );
    EstimateAccelStar( i_dt / 2.0 );
    AccumulateEstimate( i_dt / 3.0 );
    
    // 3
    EstimateVelStar( i_dt / 2.0 );
    EstimateHeightStar( i_dt / 2.0 );
    EstimateAccelStar( i_dt / 2.0 );
    AccumulateEstimate( i_dt / 3.0 );
    
    // 4
    EstimateVelStar( i_dt );
    EstimateHeightStar( i_dt );
    EstimateAccelStar( i_dt );
    AccumulateEstimate( i_dt / 6.0 );
    
    // Final boundary conditions on height and vel
    EnforceHeightBoundaryConditions( StateHeight );
    EnforceVelBoundaryConditions( StateVel );

    // Update current time.
    StateCurrentTime += i_dt;
}

And that's it... Let's see how that looks in Processing!

And with that, we're finished... (for now!)

Download

All of the files associated with this class are available via a public github repository, found here: https://github.com/blackencino/SimClass

Simulation Class: Matrix Methods and the Wave Equation

The Wave Equation as a Linear System

In our previous class, we completed our exploration of higher-order explicit time-integration techniques, and moved from a simple spring system to a more complex 1-D Wave Equation. Using the fourth-order Runge-Kutta (RK4) integration scheme, we adapted our spring solver to the wave equation, which seemed stable at first, but as soon as we introduced external forces, the system showed immediate instabilities.

What follows in this post is a step-by-step progression, starting with the original PDE of the 1D Wave Equation, and eventually producing a linear system, represented by the equation \(A x = b\), where \(A\) is a square, mostly sparse matrix of size \(N\times N\), where \(N\) is the number of unknown values in the discretized simulation grid (either heights, or accelerations). This post is a lot of math, but all of it is manipulation of simple terms and equations in a linear algebra setting. I've broken things down to the smallest steps because this is such a core concept to how fluid simulations and grid simulations are performed.

The problem in our previous solution is related to how we calculated our fluid vertical acceleration from the fluid's height. To start with, the 1D Wave Equation is written as:

\[ \begin{equation} \frac{\partial^2 h}{\partial t^2} = c^2 \frac{\partial^2 h}{\partial x^2} \label{eq:Wave1D} \end{equation} \]

Where \(h\) represents height at a given location along an axis, and \(c^2\) represents the square of the wave propagation speed. I think this is my favorite equation, because of the implied equivalency of spatial and temporal dimensions. It says that, as far as waves are concerned, space and time are interchangeable (assuming you choose your units correctly). Anyway, let's examine how we discretized this solution for a given point in our height array, \(h_i\) and its acceleration \(a_i\).

\[ \begin{equation} a_i = c^2 \frac{h_{i+1} -\ 2 h_i + \ h_{i-1}}{\Delta x^2} \label{eq:WaveDisc1} \end{equation} \]
The above equation, \(\eqref{eq:WaveDisc1}\) is just a discretization of Equation \(\eqref{eq:Wave1D}\) above, with \(\frac{\partial^2 h}{\partial t^2}\) taken as acceleration. However, unlike the simple spring system, where the position of the spring varied only in time, our position varies in space as well, which means we have to consider that every point in the array of heights depends on not only its previous positions and velocities, but the heights and velocities and accelerations of neighboring points. This was the oversimplification in our previous attempt, and the source of the instability.

In order to understand this system better, let's replace the acceleration with a finite difference approximation, based on the height and previous heights. Starting with the backwards finite difference approximation of \(a_i\):

\[ \begin{equation} a_{i_{t+\Delta t}} \approx \frac{h_{i_{t+\Delta t}} - \ 2 h_{i_{t}} + \ h_{i_{t-\Delta t}}}{\Delta t^2} \label{eq:FiniteDiffA} \end{equation} \]
we can follow the usual convention in height-field solvers, eliminating acceleration by substituting Equation \(\eqref{eq:FiniteDiffA}\) into Equation \(\eqref{eq:WaveDisc1}\) to get the following expression which contains only height values (at different times and positions):
\[ \begin{equation} \frac{h_{i_{t+\Delta t}} - \ 2 h_{i_{t}} + \ h_{i_{t-\Delta t}}}{\Delta t^2} = c^2 \frac{h_{{i+1}_{t+\Delta t}} - \ 2 h_{i_{t+\Delta t}} + \ h_{{i-1}_{t+\Delta t}} } {\Delta x^2} \label{eq:WaveHeightDisc1} \end{equation} \]
However, our RK4 solver and our other time-integration techniques were written to employ a calculation of acceleration based on an estimate of current position and velocity, so it's more helpful if we instead begin with the following implicit relationship between acceleration, velocity, and height:
\[ \begin{equation} h_{i_{t+\Delta t}} = h_{i_t} + \Delta t\ v_{i_{t+\Delta t}} + \Delta t^2\ a_{i_{t+\Delta t}} \label{eq:ImplicitHeightTimeStep} \end{equation} \]
Let's treat the first two terms of the right hand side of Equation \(\eqref{eq:ImplicitHeightTimeStep}\) as an estimate of height at time \(t+\Delta t\), denoted by \(h^\star\), indicated as follows:
\[ \begin{equation} h^\star_{i_{t+\Delta t}} = h_{i_t} + \Delta t\ v_{i_{t+\Delta t}} \label{eq:HeightEstimate} \end{equation} \]
And then we can substitute Equation \(\eqref{eq:HeightEstimate}\) into Equation \(\eqref{eq:ImplicitHeightTimeStep}\) to get:
\[ \begin{equation} h_{i_{t+\Delta t}} = h^\star_{i_{t+\Delta t}} + \Delta t^2\ a_{i_{t+\Delta t}} \label{eq:EstimatedHeightTimeStep} \end{equation} \]
Now, we can subsitute Equation \(\eqref{eq:EstimatedHeightTimeStep}\) into Equation \(\eqref{eq:WaveDisc1}\) to get the following equation, written in terms of the current acceleration and an estimate of the current height.
\[ \begin{equation} a_{i_{t+\Delta t}} = c^2 \frac{h^\star_{{i+1}_{t+\Delta t}} - \ 2 h^\star_{i_{t+\Delta t}} + \ h^\star_{{i-1}_{t+\Delta t}} } {\Delta x^2} + c^2 \Delta t^2 \frac{a_{{i+1}_{t+\Delta t}} - \ 2 a_{i_{t+\Delta t}} + \ a_{{i-1}_{t+\Delta t}} } {\Delta x^2} \label{eq:AccelDisc1Subscripted} \end{equation} \]
All of the terms in Equation \(\eqref{eq:AccelDisc1Subscripted}\) are at the same point in time, so we can drop the time subscript. Assuming that the estimates of height are given by some process (which we are free to tinker with), the equation is a linear relationship between accelerations at different points in space. Let's simplify by defining intermediate constants:
\[ \begin{equation} \kappa = \frac{c^2 \Delta t^2}{\Delta x^2}, \ \gamma = \frac{c^2}{\Delta x^2} \label{eq:Constants} \end{equation} \]
Substituting in our constants, moving all of the acceleration terms to the left-hand side, and gathering coefficients, we have the following equation which expresses the relationship between the acceleration at the index \(i\) and its spatially adjacent neighbors:
\[ \begin{equation} (1 + 2 \kappa) a_i + ( -\kappa ) a_{i-1} + ( -\kappa ) a_{i+1} = \gamma ( h^\star_{i+1} - \ 2 h^\star_i + \ h^\star_{i-1} ) \label{eq:AccelOneMatrixRow} \end{equation} \]
The above Equation \(\eqref{eq:AccelOneMatrixRow}\) is written relative to any position in the array of heights, denoted by the subscript variable \(i\), and this relationship exists at each point in the array. We'll next write out the equations at each index explicitly, to get a system of linear equations. We have to take a bit of care at the boundary points. Our boundary condition is that heights at the boundary are equal to the height at the adjacent, non-boundary position, which, when transformed to a boundary condition on acceleration, says that the acceleration at the boundary is zero. Therefore, we have no equation for indices \(i = 0\) and \(i = N-1\), because we've explicitly defined those points. Furthermore, for indices \(i = 1\) and \(i = N-2\), the relationship is slightly changed, which we'll incorporate into the equations.
\[ \begin{eqnarray*} \ &\ & (1+2\kappa)a_1& +& (-\kappa)a_2& =& \gamma ( - h^\star_1 + h^\star_2 ) \\ (-\kappa)a_1& +& (1+2\kappa)a_2& +& (-\kappa)a_3& =& \gamma ( h^\star_1 - 2 h^\star_2 + h^\star_3 ) \\ (-\kappa)a_2& +& (1+2\kappa)a_3& +& (-\kappa)a_4& =& \gamma ( h^\star_2 - 2 h^\star_3 + h^\star_4 ) \\ ... \\ (-\kappa)a_{n-5}& +& (1+2\kappa)a_{n-4}& +& (-\kappa)a_{n-3}& =& \gamma ( h^\star_{n-5} - 2 h^\star_{n-4} + h^\star_{n-3} ) \\ (-\kappa)a_{n-4}& +& (1+2\kappa)a_{n-3}& +& (-\kappa)a_{n-2}& =& \gamma ( h^\star_{n-4} - 2 h^\star_{n-3} + h^\star_{n-2} ) \\ (-\kappa)a_{n-3}& +& (1+2\kappa)a_{n-2}&\ &\ & =& \gamma ( h^\star_{n-3} - h^\star_{n-2} ) \\ \label{eq:AccelLinearSystem} \end{eqnarray*} \]
This above system of linear equations can be expressed as a sparse matrix equation:
\[ \begin{equation} A x = b \end{equation} \]
Where the sparse, symmetric matrix \(A\) is defined as:
\[ \begin{equation} A = \begin{bmatrix} 1+2\kappa & -\kappa & \ & \ & \ & \ & \cdots \\ -\kappa & 1+2\kappa & -\kappa & \ & \ & \ & \cdots \\ \ & -\kappa & 1+2\kappa & -\kappa & \ & \ & \cdots \\ \ \vdots & \ & \ddots & \ddots & \ddots & \ & \cdots \\ \ \cdots & \ & \ & -\kappa & 1+2\kappa & -\kappa & \ \\ \ \cdots & \ & \ & \ & -\kappa & 1+2\kappa & -\kappa \\ \ \cdots & \ & \ & \ & \ & -\kappa & 1+2\kappa \\ \end{bmatrix} \end{equation} \]
and the column vectors \(x\) and \(b\) are defined as:
\[ \begin{equation} x = \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ \vdots \\ a_{n-4} \\ a_{n-3} \\ a_{n-2} \\ \end{bmatrix} , b = \begin{bmatrix} \gamma ( - h^\star_1 + h^\star_2 ) \\ \gamma ( h^\star_1 - 2 h^\star_2 + h^\star_3 ) \\ \gamma ( h^\star_2 - 2 h^\star_3 + h^\star_4 ) \\ \vdots \\ \gamma ( h^\star_{n-5} - 2 h^\star_{n-4} + h^\star_{n-3} ) \\ \gamma ( h^\star_{n-4} - 2 h^\star_{n-3} + h^\star_{n-2} ) \\ \gamma ( h^\star_{n-3} - h^\star_{n-2} ) \\ \end{bmatrix} \end{equation} \]
We have thus transformed our acceleration computation problem into an \(A x = b\) matrix problem, a problem which has been carefully studied, and for which many extremely well-understood methods of solution exist. We'll explore simple approaches to solving this system in the next post.

Download

All of the files associated with this class are available via a public github repository, found here: https://github.com/blackencino/SimClass

Tuesday, May 28, 2013

Simulation Class: The Wave Equation

The Wave Equation

In our previous class, we improved our time integration to use the popular fourth-order Runge-Kutta method, and now our simple spring system is looking pretty good. It would be great if we could just use RK4 for any simulation problem we might encounter, and have it just work. Let's explore a system that's a little bit more complicated.

The 1D Wave Equation describes the motion of waves in a height field, and can be written simply as:

\( \frac{\partial^2 h}{\partial t^2} = c^2 \frac{\partial^2 h}{\partial x^2} \)

Just like in our simple spring example, we have an equation of motion in which the acceleration of position (height, in this case) is expressed as a function of the current value of position (again, height). However, we've added a complication! The function of position itself involves a second derivative of height, this time with respect to space, rather than time. The constant \(c^2\) represents the square of the wave propagation speed, and is a parameter to the system. So now we've graduated from our previous simple Ordinary Differential Equations ODE to the much more exciting Partial Differential Equations PDE.

The Wave Equation, in 1D, is extremely similar at its core to the simple spring equation, and indeed, if you explore the wikipedia article, you'll see that it is actually derived by imagining a series of masses, evenly spaced, attached to each other via springs, as seen in this illustration:

State Definition for Wave Equation

We're going to try to use our RK4 implementation from the spring example directly, so we'll need to create space in the state vector for the temporary values created. RK4 sets its final estimate to a weighted average of the four estimates it creates along the way, and each estimate is created from the previous estimate. Therefore, rather than storing all four estimates separately, we can accumulate each estimate directly into our final sum. Here's what our state will look like:

int StateSize = 7;
float[][] State = new float[StateSize][ArraySize];
int StateHeight = 0;
int StateVel = 1;
int StateHeightPrev = 2;
int StateVelPrev = 3;
int StateVelStar = 4;
int StateAccelStar = 5;
int StateHeightTmp = 6;

Initial State

The initial state of the wave equation example is a bit more complex than the pendulum or the spring. For this example, we're going to use several octaves of perlin noise to create a wave-like distribution of crests. Because our initial state is complex and large, we don't want to keep a copy of it around. If we need to reset, we'll simply call the SetInitialState function that we create. Here's that function, which is entirely subject to artistic interpretation:


void SetInitialState()
{
    noiseSeed( 0 );
    for ( int i = 0; i < ArraySize; ++i )
    {
        float worldX = 2341.17 + DX * ( float )i;
        State[StateHeight][i] = 0.5 * anoise( worldX * 0.0625 ) +
                         0.4 * anoise( worldX * 0.125 ) +
                         0.3 * anoise( worldX * 0.25 ) +
                         0.2 * anoise( worldX * 0.5 );
        State[StateVel][i] = 0.0;
    }
    EnforceBoundaryConditions( StateHeight );
    EnforceBoundaryConditions( StateVel );
    StateCurrentTime = 0.0;
}

Boundary Conditions

In order to compute the second derivative of height, spatially, we need to reference values at adjacent points to the left and right. When we reach the sides of the simulation, though, we have a problem - what value do we use when we're looking off the side of the simulation? For these areas, the simulation needs to have an answer without actually storing a state. The areas of the simulation where we already know the answer, or some aspect of the answer, are called Boundary Conditions. The area of the simulation where boundary conditions are applied is called, not surprisingly, the boundary. The boundary in our simulation is the edges of the wave to the left and right, but if we were to have collision objects, for example, the places where the simulation met the collision objects would be considered a boundary also.

There are many different types of boundary conditions, which place varyingly subtle constraints on the values at boundary points. In this simple wave equation simulation, we're going to assert that the values at the boundary are exactly equal to the simulated values just adjacent to the boundary. In other words, we are asserting that the derivative of any value across the boundary is zero. This explicit constraint on a derivative of a value is called a Neumann Boundary Condition. We implement it in code very simply, by just copying the values to the edges:


void EnforceBoundaryConditions( int io_a )
{
    State[io_a][0] = State[io_a][1];
    State[io_a][ArraySize-1] = State[io_a][ArraySize-2];
}

Note that in this implementation, we allow ourselves to affect any part of the state vector. The flexibility of our abstract state field naming again shows its power!

Integration Tools

In the simple spring example, we were able to simply create temporary floats to store temporary values, and we were able to use arithmetic operators to compute the integrated fields. Because our state values are now arrays of data, we need special functions for evaluating equations over the whole array at once. These are called "vectorized" functions. Let's break them down one by one. First, a utility function for copying an array from one part of the state vector to another:


void CopyArray( int i_src, int o_dst )
{
    for ( int i = 0; i < ArraySize; ++i )
    {
        State[o_dst][i] = State[i_src][i];
    }
}

As we move from time step to time step, we need to swap out our "current" height and velocity values into a "previous" array. Because of the way we've indexed our state vector with labels, we don't have to actually copy the values over. You could imagine that in a very large fluid simulation, this copying of millions of points of data would be very expensive. Instead, we just swap the labels:


void SwapHeight()
{
    int tmp = StateHeight;
    StateHeight = StateHeightPrev;
    StateHeightPrev = tmp;
}

void SwapVel()
{
    int tmp = StateVel;
    StateVel = StateVelPrev;
    StateVelPrev = tmp;
}

void SwapState()
{
    SwapHeight();
    SwapVel();
}

The RK4 technique involves, for every estimate other than the first, estimating a temporary value for height using the vStar estimate from the previous estimation iteration, and the height of the previous time step. The time interval over which these temporary height estimates are made varies from iteration to iteration. This function is simple! In our spring example, it looked like this:


float xTmp = State[StatePositionX] + ( i_dt * vStar1 );

In our wave state, this operation on a single float gets expanded to work on the whole state array like so:


// Estimate temp height
void EstimateTempHeight( float i_dt )
{
    for ( int i = 0; i < ArraySize; ++i )
    {
        State[StateHeightTmp][i] = State[StateHeightPrev][i] + 
                ( i_dt * State[StateVelStar][i] );
    }
    EnforceBoundaryConditions( StateHeightTmp );
}

Following along, in the simple spring example we then had to estimate a new velocity, vStar based on the previous iteration's aStar, and the previous time step's velocity. In the spring example, that looks like this:


float vStar2 = State[StateVelocityX] + ( i_dt * aStar1 );

In our wave state, this operation is expanded as follows:


// Estimate vel star
void EstimateVelStar( float i_dt )
{
    for ( int i = 0; i < ArraySize; ++i )
    {
        State[StateVelStar][i] = State[StateVelPrev][i] + 
                ( i_dt * State[StateAccelStar][i] );
    }
    EnforceBoundaryConditions( StateVelStar );
}

Discretizing the Spatial Derivative of Height

Finally, we need to port the computation of acceleration from an estimated position. In the spring example, this was as simple as could be (on purpose):


// Acceleration from Position.
float A_from_X( float i_x )
{
    return -( Stiffness / BobMass ) * i_x;
}
Now that we have a spatial derivative in our setup, though, we need to calculate our acceleration based not only on the position at each point in the grid, but also the neighboring points in the grid. In order to do that, we need to discretize the second spatial spatial derivative, which we can do using finite differences, as we learned several classes ago. The finite difference approximation to the second derivative, in one dimension, looks like this:

\( \frac{\partial^2 h}{\partial x^2} \approx \frac{ ( \frac{ ( h_{i+1} - h_i ) }{\Delta x} - \frac{ ( h_i - h_{i-1} ) }{\Delta x} ) }{\Delta x} \)

Which simplifies to:

\( \frac{\partial^2 h}{\partial x^2} \approx \frac{ ( h_{i+1} + h_{i-1} - 2 h_i ) }{\Delta x^2} \)

We can implement this in code as follows:


void A_from_H( int i_h )
{
    for ( int i = 1; i < ArraySize-1; ++i )
    {
        float hLeft = State[i_h][i-1];
        float hCen = State[i_h][i];
        float hRight = State[i_h][i+1];

        float d2h_dx2 = ( hRight + hLeft - ( 2.0*hCen ) ) / sq( DX );

        State[StateAccelStar][i] = sq( WaveSpeed ) * d2h_dx2;
    }
    
    EnforceBoundaryConditions( StateAccelStar );
}

Note that we only iterate across the center values in the arrays - we skip the very first and the very last points, because they don't have values adjacent to themselves in one direction. Here is where the boundary conditions help to fill in the gaps, and you can see that they're applied at the last step.

The Time Step

Okay, so armed with all the pieces above, we can construct our TimeStep. It should look just like the spring solver's RK4 time-step, with the appropriate adjustments for the array data. Here was the old time step:


// Time Step function.
void TimeStep( float i_dt )
{
    float vStar1 = State[StateVelocityX];
    float aStar1 = A_from_X( State[StatePositionX] );

    float vStar2 = State[StateVelocityX] + ( ( i_dt / 2.0 ) * aStar1 );
    float xTmp2 = State[StatePositionX] + ( ( i_dt / 2.0 ) * vStar1 );
    float aStar2 = A_from_X( xTmp2 );

    float vStar3 = State[StateVelocityX] + ( ( i_dt / 2.0 ) * aStar2 );
    float xTmp3 = State[StatePositionX] + ( ( i_dt / 2.0 ) * vStar2 );
    float aStar3 = A_from_X( xTmp3 );

    float vStar4 = State[StateVelocityX] + ( i_dt * aStar3 );
    float xTmp4 = State[StatePositionX] + ( i_dt * vStar3 );
    float aStar4 = A_from_X( xTmp4 );

    State[StatePositionX] += ( i_dt / 6.0 ) * 
        ( vStar1 + (2.0*vStar2) + (2.0*vStar3) + vStar4 );
    State[StateVelocityX] += ( i_dt / 6.0 ) * 
        ( aStar1 + (2.0*aStar2) + (2.0*aStar3) + aStar4 );

    // Update current time.
    State[StateCurrentTime] += i_dt;
}

And here's the new one, for the wave system:


// Time Step function.
void TimeStep( float i_dt )
{
    // Swap state
    SwapState();
    
    // Initialize estimate. This just amounts to copying
    // The previous values into the current values.
    CopyArray( StateHeightPrev, StateHeight );
    CopyArray( StateVelPrev, StateVel );
    
    // Vstar1, Astar1
    CopyArray( StateVel, StateVelStar );
    A_from_H( StateHeight );
    // Accumulate
    AccumulateEstimate( i_dt / 6.0 );
    
    // Height Temp 2
    EstimateTempHeight( i_dt / 2.0 );
    // Vstar2, Astar2
    EstimateVelStar( i_dt / 2.0 );
    A_from_H( StateHeightTmp );
    // Accumulate
    AccumulateEstimate( i_dt / 3.0 );
    
    // Height Temp 3
    EstimateTempHeight( i_dt / 2.0 );
    // Vstar3, Astar3
    EstimateVelStar( i_dt / 2.0 );
    A_from_H( StateHeightTmp );
    // Accumulate
    AccumulateEstimate( i_dt / 3.0 );
    
     // Height Temp 4
    EstimateTempHeight( i_dt );
    // Vstar3, Astar3
    EstimateVelStar( i_dt );
    A_from_H( StateHeightTmp );
    // Accumulate
    AccumulateEstimate( i_dt / 6.0 );
    
    // Final boundary conditions on height and vel
    EnforceBoundaryConditions( StateHeight );
    EnforceBoundaryConditions( StateVel );

    // Update current time.
    StateCurrentTime += i_dt;
}

It's a little bit longer in code length, but importantly - it doesn't actually contain more operations! If you compare this to the previous example, you'll see that it's basically the same! Let's see what that looks like running in processing with it all added together:

That's not bad! Perhaps we could make it more interesting, by gathering some mouse input. We'll make it so that if you click in the view, it will set the height and velocity of the sim at that point to the height of the click, and zero, respectively. The input grabbing code looks like this:


void GetInput()
{
    if ( mousePressed && mouseButton == LEFT )
    {
        float mouseCellX = mouseX / ( ( float )PixelsPerCell );
        float mouseCellY = ( (height/2) - mouseY ) / ( ( float )PixelsPerCell );
        float simY = mouseCellY * DX;
        
        int iX = ( int )floor( mouseCellX + 0.5 );
        if ( iX > 0 && iX < ArraySize-1 )
        {
            State[StateHeight][iX] = simY;
            State[StateVel][iX] = 0.0;
        }
    }
}

And we just add that directly into our main draw loop, right before the time step:


void draw()
{
    background( 0.9 );
   
    GetInput();
   
    TimeStep( 1.0 / 24.0 );

    // Label.
    fill( 1.0 );
    text( "Wave Equation RK4", 10, 30 );
  
    DrawState();
}

And what happens now? Here it is running:

Uh oh.... Looks like stability has eluded us again. Next class, we'll look into how to fix stability for cases like this, involving multiple derivatives.

Download

All of the files associated with this class are available via a public github repository, found here: https://github.com/blackencino/SimClass

Simulation Class: Runge-Kutta Methods

Student example from Last Week

Florian Hecht used the lessons from last week to create this inspiring example: Florian's Spring Solver

Higher-Order Time Steps

In our previous class, we improved the accuracy and stability of our simulation's time step by exploring a number of different integration techniques, of which the best-performing were the Euler-Cromer and Velocity-Verlet techniques. However, while stable, their solutions drifted from the theoretically correct answer over time.

In each of our previous examples, the system's equations were evaluated just once per time-step, or in some cases twice, which means the solutions were first- or second- order. In order to improve our estimation of the function we're trying to solve for, we can make more than one or two evaluations per time-step, resulting in a higher-order approximation.

Runge-Kutta methods are a general class of explicit and implicit iterative methods for approximating functions from their derivatives. In order to explore this class of techniques, let's quickly revisit the equations which define our simple harmonic oscillator system. We have a mass attached to an ideal spring with a certain stiffness, without friction, resulting in a force equal to a constant times its displacement from the spring's origin, in the direction opposite the displacement. That system is defined like this:

\( x'_t = v_t \\ v'_t = a_t \\ a_t = -\frac{k}{m} x_t \)
The idea behind all of the explicit Runge-Kutta methods is that we will estimate a velocity, a position, and then an acceleration, and then we will improve our estimation by using the previous estimates to make better estimates. We can do this any number of times, and the number of these estimation stages is what we mean by the "order" of the approximation. After all the estimation steps have been taken, the state of the system is finally updated using a weighted average of all the estimates. The only difference between different variations of the explicit Runge-Kutta methods is how the different estimates are weighted, and how each estimate is used to produce the next estimate.

Second-Order Runge-Kutta

To illustrate the concept simply, here is the traditional second-order Runge-Kutta method, applied to our simple system. In this notation, we use the \(\star\) superscript to indicate an estimate, and the numbered subscripts to indicate which iteration of estimation we're in. Note that each estimation iteration only references values with subscripts less than its own: this is what makes the method explicit, it computes new values only from previous values.

\( \begin{array}{l} v^\star_1 = v_t, & a^\star_1 = -\frac{k}{m} x_t, \\ v^\star_2 = v_t + \Delta t\ a^\star_1, & a^\star_2 = -\frac{k}{m}( x_t + \Delta t\ v^\star_1 ), \\ x_{t+\Delta t} = x_t + \frac{\Delta t}{2}(v^\star_1 + v^\star_2), & v_{t+\Delta t} = v_t + \frac{\Delta t}{2}(a^\star_1 + a^\star_2), \\ \end{array} \)

In processing code, we can implement the RK2 timestep as follows:


// Acceleration from Position.
float A_from_X( float i_x )
{
    return -( Stiffness / BobMass ) * i_x;
}

// Time Step function.
void TimeStep( float i_dt )
{
    float vStar1 = State[StateVelocityX];
    float aStar1 = A_from_X( State[StatePositionX] );

    float vStar2 = State[StateVelocityX] + ( i_dt * aStar1 );
    float xTmp = State[StatePositionX] + ( i_dt * vStar1 );
    float aStar2 = A_from_X( xTmp );

    State[StatePositionX] += ( i_dt / 2.0 ) * ( vStar1 + vStar2 );
    State[StateVelocityX] += ( i_dt / 2.0 ) * ( aStar1 + aStar2 );

    // Update current time.
    State[StateCurrentTime] += i_dt;
}

Note that we've extracted the calculation of acceleration from position into a separate function. This is to emphasize the structure of the system and not bog the implementation down by repeated instances of the spring and mass constants. In larger systems, this is usually the most expensive part of the evaluation, so this implementation helps us understand how much actual work we're doing. We can see above that we're evaluating the acceleration twice in this time step. Here's what that looks like running:

RK2 is basically the same as the midpoint method, applied to both the position and velocity estimates.

It's not as bad as forward Euler, but it diverges from the solution quickly, and if you watch it long enough, you will see that it is not entirely stable, and will eventually produce large amplitudes. We could apply the same type of forward-backwards approach that we applied to Euler to get Euler-Cromer to RK2, but the results are still not a good fit, though then they would be at least stable again. We are simply using the second-order formulation to illustrate the technique behind Runge-Kutta methods, so let's explore what happens at higher orders.

Fourth-Order Runge-Kutta

As we increase the number of estimations, the solution improves dramatically. The fourth-order Runge-Kutta approximation is very popular, and with good reason. It makes four sets of estimates, as you'd expect from a fourth-order method, and then uses an unevenly weighted average of those estimates to get the final result. The derivation of exactly how RK4 chooses its estimation timesteps, and how it weights the average of its estimates, is complex. If you're interested in working through the Taylor-series expansion, it is detailed here. For now, we'll trust the coefficients that are given and look at how it works. Here's the formulation, as applied to our system:

\( \begin{array}{l} v^\star_1 = v_t, & a^\star_1 = -\frac{k}{m} x_t, \\ v^\star_2 = v_t + \frac{\Delta t}{2}\ a^\star_1, & a^\star_2 = -\frac{k}{m}( x_t + \frac{\Delta t}{2}\ v^\star_1 ), \\ v^\star_3 = v_t + \frac{\Delta t}{2}\ a^\star_2, & a^\star_3 = -\frac{k}{m}( x_t + \frac{\Delta t}{2}\ v^\star_2 ), \\ v^\star_4 = v_t + \Delta t\ a^\star_3, & a^\star_4 = -\frac{k}{m}( x_t + \Delta t\ v^\star_3 ), \\ x_{t+\Delta t} = x_t + \frac{\Delta t}{6} (v^\star_1 + 2 v^\star_2 + 2 v^\star_3 + v^\star_4), & v_{t+\Delta t} = v_t + \frac{\Delta t}{6} (a^\star_1 + 2 a^\star_2 + 2 a^\star_3 + a^\star_4), \\ \end{array} \)
Here's what that looks like in Processing code, computing temporary positions at each sub-estimation, and again using the separate function for acceleration from position:


// Time Step function.
void TimeStep( float i_dt )
{
    float vStar1 = State[StateVelocityX];
    float aStar1 = A_from_X( State[StatePositionX] );

    float vStar2 = State[StateVelocityX] + ( ( i_dt / 2.0 ) * aStar1 );
    float xTmp2 = State[StatePositionX] + ( ( i_dt / 2.0 ) * vStar1 );
    float aStar2 = A_from_X( xTmp2 );

    float vStar3 = State[StateVelocityX] + ( ( i_dt / 2.0 ) * aStar2 );
    float xTmp3 = State[StatePositionX] + ( ( i_dt / 2.0 ) * vStar2 );
    float aStar3 = A_from_X( xTmp3 );

    float vStar4 = State[StateVelocityX] + ( i_dt * aStar3 );
    float xTmp4 = State[StatePositionX] + ( i_dt * vStar3 );
    float aStar4 = A_from_X( xTmp4 );

    State[StatePositionX] += ( i_dt / 6.0 ) * 
        ( vStar1 + (2.0*vStar2) + (2.0*vStar3) + vStar4 );
    State[StateVelocityX] += ( i_dt / 6.0 ) * 
        ( aStar1 + (2.0*aStar2) + (2.0*aStar3) + aStar4 );

    // Update current time.
    State[StateCurrentTime] += i_dt;
}
Here's what RK4 looks like running in Processing:

Well hot-damn. The popularity of RK4 is easy to understand! However, it comes at a cost - four evaluations of acceleration in the timestep. Futhermore, there is a temporary storage requirement which begins to be an issue in larger simulations where a temporary position or velocity may amount to millions of points. Also, RK4 is not unconditionally stable. In many production cases, the stability of Velocity Verlet or Euler-Cromer may be more attractive (especially with the fewer calculations), and the additional accuracy provided by RK4 may not be visible. These are the trade-offs and design decisions made by simulation designers when implementing simulation engines.

Interesting point - the "correct" solution to our simple system here is just a cosine function of time. Our numerical integrator is achieving almost identical results without ever evaluating anything other than a multiplication or an addition. This illustrates how Taylor-series expansions are used to provide numerical approximations to mathematical functions.

So far, we've only been considering differential equations of a single variable, time. Now that we've achieved a pretty good time step, the next class will explore a system in which the equations feature derivatives in both time and space.

Download

All of the files associated with this class are available via a public github repository, found here: https://github.com/blackencino/SimClass

Monday, May 20, 2013

Simulation Class: Midpoint Method and Velocity Verlet

Higher-Order Difference Equations

In our previous class, we were able to fix the instability of our spring system by switching from the never-correct Forward Euler time step to the Euler-Cromer time step, which made our simulation stable, but did not achieve an accurate result over time when compared to the exact answer.

The solutions we've explored so far are primarily explicit time integrations, in that we're only using values from previous points in time to solve for points at future points in time. When we consider, later, implicit time integration, where values at the current time are solved self-referentially, things will get a lot more complex. So, we'd like to continue exploring techniques that are fundamentally explicit in nature.

The approach that all of the higher-order explicit methods will take will be to use lower-order approximations to compute temporary values, which will be used to compute better derivatives that will then be used to integrate forward. The first, and most common higher-order method is called "The Midpoint Method".

The Midpoint Method

Let's approximate the velocity at half-way through our time step, rather than all the way to the new time, just using the forward finite-difference approximation we used with Forward Euler:
\( a( t ) = f( t, x( t ) ) \\ x( t + \frac{\Delta t}{2} ) \approx x( t ) + \frac{\Delta t}{2} v( t ) \\ v( t + \Delta t ) \approx v( t ) + \Delta t a( t ) \\ x( t + \Delta t ) \approx x( t + \frac{\Delta t}{2} ) + \frac{\Delta t}{2} v( t + \frac{\Delta t}{2} ) \)
Now we've got one extra step of computation at each time step, computing a midpoint position. This isn't a huge improvement over what we had before, and here's what it looks like in code:
// Time Step function.
void TimeStep( float i_dt )
{
    // Compute acceleration from current position.
    float A = ( -Stiffness / BobMass ) * State[StatePositionX];

    // Update position half-way.
    State[StatePositionX] += ( i_dt/2.0 ) * State[StateVelocityX];

    // Update velocity based on acceleration.
    State[StateVelocityX] += i_dt * A;
    
    // Update position half-way.
    State[StatePositionX] += ( i_dt/2.0 ) * State[StateVelocityX];

    // Update current time.
    State[StateCurrentTime] += i_dt;
}

And here that is:

So really we've actually made things worse with this simple higher-order approximation. An alternative midpoint-method approach is called Velocity Verlet, and it looks like this:

\( a( t ) = f( t, x( t ) ) \\ v( t + \frac{\Delta t}{2} ) \approx v(t) + \frac{\Delta t}{2} a( t ) \\ x( t + \Delta t ) \approx x( t ) + \Delta t v( t + \frac{\Delta t}{2} ) \\ a( t + \Delta t ) = f( t+\Delta t, x( t + \Delta t ) \\ v( t + \Delta t ) \approx v( t + \frac{\Delta t}{2} ) + \frac{\Delta t}{2} a( t + \Delta t ) \)
And here's what that looks like in code:


// Time Step function.
void TimeStep( float i_dt )
{
    // Compute acceleration from current position.
    float A = ( -Stiffness / BobMass ) * State[StatePositionX];

    // Update velocity half-way.
    State[StateVelocityX] += ( i_dt/2.0 ) * A;

    // Update position.
    State[StatePositionX] += i_dt * State[StateVelocityX];

    // Re-compute acceleration.
    A = ( -Stiffness / BobMass ) * State[StatePositionX];
    
    // Update velocity half-way.
    State[StateVelocityX] += ( i_dt/2.0 ) * A;

    // Update current time.
    State[StateCurrentTime] += i_dt;
}

And here it is running:

Not bad! Now we're cooking with gas! Still not perfect, though. The next class will explore the fourth-order Runge Kutta approach, and begin to consider implicit methods.

Download

All of the files associated with this class are available via a public github repository, found here: https://github.com/blackencino/SimClass

Simulation Class: The Euler-Cromer Time Step

Forward Euler Problems

In the previous class, we observed that a Forward Euler time step quickly produced an unusable solution to our simple spring system. This illustration from the Wikipedia entry on Forward Euler illustrates how the method works, and also why its solutions drift away from the correct result.

In this illustration, you can see that at each point, the slope of the function is used to construct a line segment that estimates the position of the next point. The error in each estimate is accumulated with each subsequent estimate, and the solution veers off course. In order to study these systems better, we first need some debug tools.

Adding Debug Tools to our Spring Simulation

The Forward Euler Time Step that we added to the Spring Simulation in the previous class produced an oscillatory motion, but did so in a way that produced an unstable result, with the mass oscillating wildly off screen in a short period of time.

The instability shown by our simple spring simulation is a constant problem for most simulations. It's easy to see the problem in this simple example (which is why we're using it!), but the problem affects large-scale fluid and cloth sims, rigid body sims, and so on. Strangely, despite the general knowledge that Forward Euler is not a good time integration tool, it is still used in many commercial tools, including Maya and Houdini.

In order to analyze our improvements, we first need a measuring stick. To quote my good friend Adam Shand, "You can't improve what you can't measure.". We chose the simple spring because it did have an exact answer. Therefore, we'll add some code to our drawing routine that draws the right answer just above the simulated spring. Here's that code - note that this uses the expression for the explicit position of the spring that we established in the previous class. We'll also add some text to the top of the drawn image to remind us which time integration we're using.


void DrawState()
{
    // Compute end of arm.
    float SpringEndX = PixelsPerMeter * State[StatePositionX];

    // Compute the CORRECT position.
    float sqrtKoverM = sqrt( Stiffness / BobMass );
    float x0 = InitState[StatePositionX];
    float v0 = InitState[StateVelocityX];
    float t = State[StateCurrentTime];
    float CorrectPositionX = ( x0 * cos( sqrtKoverM * t ) ) +
        ( ( v0 / sqrtKoverM ) * sin( sqrtKoverM + t ) );
    
    // Compute draw pos for "correct"
    float CorrectEndX = PixelsPerMeter * State[CorrectPositionX];

    // Draw the spring.
    strokeWeight( 1.0 );
    line( 0.0, 0.0, SpringEndX, 0.0 );
          
    // Draw the spring pivot
    fill( 0.0 );
    ellipse( 0.0, 0.0, 
             PixelsPerMeter * 0.03, 
             PixelsPerMeter * 0.03 );
    
    // Draw the spring bob
    fill( 1.0, 0.0, 0.0 );
    ellipse( SpringEndX, 0.0, 
             PixelsPerMeter * 0.1, 
             PixelsPerMeter * 0.1 );

    // Draw the correct bob in blue
    fill( 0.0, 0.0, 1.0 );
    ellips( CorrectEndX, -PixelsPerMeter * 0.25,
            PixelsPerMeter * 0.1,
            PixelsPerMeter * 0.1 );
}

And here that is:

I also added a "reset" capability to the simple spring code by using Processing's keyRelease function. In response to the key 'r' being released in the view (which has ascii-code 114), we copy the initial state onto the state. (See how handy state vectors are turning out to be!) Because we included the Current Time in our state vector, this is easy:


// Reset function. If the key 'r' is released in the display, 
// copy the initial state to the state.
void keyReleased()
{
    if ( key == 114 )
    {
        for ( int i = 0; i < StateSize; ++i )
        {
            State[i] = InitState[i];
        }
    }  
}

The Euler-Cromer Time Step

When we created the forward finite difference approximation to our ODE equation of motion in the previous class, we had to create two forward difference equations - one for velocity from acceleration, and then one for position from velocity. As you remember, we computed acceleration from the current position, then updated position based on the current velocity, and then finally updated velocity based on the acceleration. What if we updated velocity first, and then updated the position from the current velocity? That would look like this:
\( a( t ) = f( t, x( t ) ) = \frac{-k}{m} x( t ) \\ v( t + \Delta t ) \approx v( t ) + \Delta t a( t ) \\ x( t + \Delta t ) \approx x( t ) + \Delta t v( t + \Delta t ) \)
This seems almost trivially different from our Forward-Euler time step. This method is called many different things: Semi-Implicit Euler Method, Symplectic Euler, Semi-Explicit Euler, Euler-Cromer, and Newton–Størmer–Verlet (NSV) (at least). One of the things I've found in years of writing simulations is that just learning the names of all these different techniques, and learning that the different names often mean exactly the same thing, is a huge part of simplifying the giant mess of understanding.

The code changes to implement Euler Cromer are trivial, we just reorder the time step - literally just swap the velocity update line and the position update line:


// Euler-Cromer Time Step function.
void TimeStep( float i_dt )
{
    // Compute acceleration from current position.
    float A = ( -Stiffness / BobMass ) * State[StatePositionX];

    // Update velocity based on acceleration.
    State[StateVelocityX] += i_dt * A;

    // Update position based on current velocity.
    State[StatePositionX] += i_dt * State[StateVelocityX];

    // Update current time.
    State[StateCurrentTime] += i_dt;
}

And let's see how this ONE LINE (okay, TWO-LINE) change affects the sim:

This is not bad! Our simulation seems to have become stable, which is good news. However, if we watch the simulation for a while, it becomes clear that the simulated positions diverge more and more over time from the correct positions. In order to fix this, we're going to need to explore even higher-order time integration techniques, which is the topic of the next class!

Download

All of the files associated with this class are available via a public github repository, found here: https://github.com/blackencino/SimClass