# Assignment 3. Animation

The Animation assignment doesn’t have any checkpoints. Work by yourself or with a partner, as always.

In this assignment you will implement mesh animation with linear blend skinning. For input data we have a few simple test scenes originating from the glTF sample models collection, a few animations we built in Blender (where you can make more!) and for more complex and realistic motion, some data from a cool research project on human motion modeling [Loper et al., SIGGRAPH Asia 2015] (The paper video is a nice overview and worth a look.) The authors provide this model in the form of files in the widely used FBX animation format, which we read into Blender to convert to glTF files. For this assignment we will focus on the mesh skinning aspect of the data, so that we can display animations of the average body shape, but it is a relatively straightforward extension to add the ability to change body shape and to use their pose-dependent shape corrections.

There isn’t really any new framework for this assignment, but pull from the course Git repository to get some input scenes that have suitable animations.

# 1 Load some animated scenes

Start with your Pipeline solution; you might want to add an option to render using a simple forward shading pipeline for performance (with animation you will find you suddenly care about the difference between 30 and 60 frames per second).

There are a couple of changes needed to the Assimp importer configuration. Some of the glTF test animations have many bones influencing some vertices, but we need the number to be limited to 4. Assimp will drop the lowest weighted bones and renormalize if you pass the aiProcess_LimitBoneWeights flag. We also want to keep the options to triangulate (many of these scenes have character meshes built from quadrilaterals). So your input-reading call might look like this:

const aiScene* input_scene = importer.ReadFile(scenePath,
aiProcess_LimitBoneWeights |
aiProcess_Triangulate |
aiProcess_SortByPType);

The test scenes in the repo include:

• BoxAnimated.glb: a glTF test animation with no skeletons, just rigid motions.
• CesiumMan.glb and RiggedFigure.glb: these are glTF test animations of human characters with simple animation cycles.
• mosh_cmu_*.glb: these are SMPL animations using motion capture data from the CMU Motion Capture Database.

With the Assimp options above you should be able to load these and display them frozen in their bind poses (example for Cesium Man). The bind poses for these models are in z-up coordinates but the animated poses will be right side up. These scenes don’t all include cameras or lights so you will want to be sure your program provides sensible defaults.

# 2 Apply the node animations

A standard skinned character animation has two parts: animation controls that move a hierarchy of nodes and a deformer that deforms a mesh to follow this motion. You can get these two halves working one at a time starting with the node animations. For this the scene BoxAnimated.glb is a good one to work with because it has a simple animation with just rigid motions.

The information needed to animate the node hierarchy is stored in the array aiScene::mAnimations; each animation contains an array mChannels. Each channel points (by name matching) to one node in the scene, and contains lists of keyframes for translation, scaling, and rotation. Extend your scene reading code to convert this information and store it in some convenient data structures of your devising; I found std::map was an ideal tool for storing keyframes since it is ordered and supports looking up by keys that might fall in between the keys stored in the map. I also found that I needed a map to look up nodes by name, so that I could connect these animation channels to nodes in my already-built scene. Scenes are allowed to have multiple animations (and often do, e.g. for game characters with several behaviors) but for this assignment we can assume the first one is always the one of interest.

Once you have this info available, you just need to write a function you will call before drawing each frame to update the node transformations for the current time. To do this, you loop over all the channels, and for each one you interpolate in the timeline to compute the current translation, rotation, and scale. You can just use linear interpolation for translation and scale and spherical linear interpolation on quaternions for rotations. In the GLM library, there is support for working with quaternions in the header <glm/gtc/quaternion.hpp>, and support for constructing transformations in <glm/gtc/matrix_transform.hpp>. The class glm::quat is useful for representing quaternions, and the function glm::mix does spherical linear interpolation when given quaternion arguments. Multiply these transformations together in the order T, R, S (with T toward the root and S toward the leaf) and assign that product to the transformation of the node referenced by the channel. This is all quite analogous to what you did back in the CS4620 animation assignment. That’s really all there is to it!

Once you get this working, you should be able to get poses for different times in the animation. You will likely want to implement play/pause and frame forward/backward controls in your application (I just used a few one-liner keyboard handlers). For realtime playback, the functions glfwGetTime and glfwSetTime are convenient: you can start the timer running when the user hits “play” and then just use the actual current time on the timer to fetch each frame to draw.

The aiAnimation class carries two other bits of information that are useful in getting animations to play back sensibly; the times of keyframes are measured in “ticks” and the conversion factor to seconds is called mTicksPerSecond; also, the duration of the animation (in ticks) is in mDuration. Many animations are designed to loop, so that it makes sense to arrange your playback logic so that times beyond the duration are wrapped back into the range between 0 and the duration. You might like to include an additional adjustable scale factor for the playback speed in case you feel like the default speed of some animations is off.

Here is what the box animation looks like for me.

# 3 Animate the mesh

The other half of a mesh animation is the skinning weights and transformations that bind the mesh to the node hierarchy. The mesh comes with a collection of “bones,” each of which refers to a node in the animated hierarchy. Each bone comes with two pieces of information: a list of weights (one for every vertex in the mesh), and a matrix that is the transformation from the coordinates of the skeleton root (the node where the mesh lives) to the local coordinates of the bone in the bind pose. This is known as the inverse bind pose matrix. This collection of bones is often called a skeleton.

In Assimp, you will find skeletons stored with meshes. Along with vertex attributes like positions and normals, a mesh can contain bones, in the array aiMesh::mBones. The entries in this array point to aiBone objects, each containing a node name, a list of (vertex index, weight) pairs, and the matrix mOffsetMatrix, which is the inverse bind pose matrix. Extend the code for reading meshes into your scene so that it also reads bones. I stored the bone information in two parts: the bones themselves go in a skeleton, which is basically just a list of (node, inverse bind pose matrix) pairs and a reference to the mesh it operates on; and the weights need to be converted into mesh attributes to hand them off to a vertex shader.

The vertex shader for skinning operates on one vertex at a time, and it needs to have the weights and transformations for all bones that influence the vertex it is processing. The way we’ll do this is (1) limit the number of bones influencing each bone to 4 (we already did this by asking Assimp to take care of it); (2) place the weights for those 4 bones into a vec4-valued vertex attribute and the corresponding bone indices into a ivec4-valued vertex attribute; and (3) put the bone transformations into a uniform array of mat4s. With this information the vertex shader code can very simply evaluate the linear blend skinning equation from lecture:

$\displaystyle \mathbf{v}_i'= \sum_j w_{ij} M_j \mathbf{v}_i$

because in this sum, only 4 terms are nonzero for any particular vertex. The four values of $\textstyle j$ for the current vertex are found in the bone-index attribute, the four values of $\textstyle w_{ij}$ are found in the weights attribute, and the matrices $\textstyle M_j$ are in the uniform array.

To set up the two additional vertex attribute arrays, you need to traverse Assimp’s weights-per-bone arrays and organize the data into 4-by-$\textstyle n_v$ index and weight matrices similar to the ones you use for the position and normal attributes. Then set these attributes in your mesh, and they will be available in a vertex shader that has in declarations with matching attribute indices. For instance, you might use indices 2 and 3, then use declarations

layout (location = 2) in ivec4 boneIds;
layout (location = 3) in vec4 boneWts;

Note that the meshes from the SMPL project always have exactly 4 bones affecting each vertex, but the other meshes do not, so it’s important to have that Assimp importer option active to limit the number of bones per vertex, and also to be able to tolerate fewer than 4 bones affecting a vertex (typically this just requires ensuring the unused weights are zero and the unused indices are not out of range).

Once you have added the vertex attribute arrays and written the shader that uses them, the only last thing you need is to compute the bone transformations and upload them to the uniform array. This is where you need the skeleton you stored: it is a list of references to nodes with an inverse bind pose matrix for each one. In lecture we talked about how to construct a transformation from the bind-pose space (the coordinates in which the mesh vertices are stored) to the bone’s local space (this is the inverse bind pose transform provided by Assimp), then back to world space using the pose of that bone at the current frame (this is the node-to-world transformation of the node referenced by the bone). The transformations describing how each bone moves from bind pose to the current frame go into the uniform matrix array. (The simple way to upload to an array of uniform matrices is to treat it as a collection of separate variables with names like boneTransformations[3], though it’s also possible to upload them as a single block of data using a uniform buffer.)

To test whether your transformations are working, before the weights are correct, you can temporarily have your vertex shader transform all vertices by just one bone transformation, and see whether the resulting motion seems plausibly to follow that body part. (Refs for mosh_cmu_7516 for bone 1 (Pelvis) and bone 8 (R_Ankle) — the former tracks the overall body motion, and the latter rotates a lot during the kicking part of the motion.)

Once your weights are correct you will see the complete animation!

One minor pitfall: due to rounding, if you blend homogeneous vectors or matrices you can end up with vectors whose final component is not exactly 1, which will cause them to transform slightly wrong under the projection to NDC. This can cause weird view-dependent distortions in the mesh. They are solved by only blending the spatial components and explicitly setting the final coordinate to 1.

# 4 Handing in

Hand in using the same process as previous assignments.