One could argue that the choice of a particular control mechanism is inherently dependent on the purpose of the animation. Therefore, if facial animation control is partitioned into modules, it should be able to select and combine the techniques that satisfy the specified goals. This approach, for want of a better description, is called ``mix and match''. To illustrate this approach the control techniques are partitioned into a high and low levels of control.
Examples of the high level control techniques might include (1) a speech module taking in text and generating lip motion for a model, (2) a vision module capable of extracting facial motion from an actor and drive a computer-animated facial model, (3) a natural language module taking text and generating expressions, gaze direction, or head movements based on derived meaning, (4) a script-driven module taking commands such as ``be mad'', or ``look right'', instead of interactive input.
Examples of the lower-level control modules are more directly tied to the geometry itself; for instance, (1) a muscle-based system may be used to control facial deformations or (2) the face may be controlled by morphing software. The former animates from the inside out where an example command might be to pull on the upper lip muscle to open the mouth. The latter works from the outside in, where an example command might entail moving an upper lip line closer to the nose to achieve mouth opening.
Making a picture sequence is usually the last step, using various rendering techniques, which may be either 2D or 3D in nature. However, in some video tracking algorithms, the picture sequence is compared with the input video, and a feedback loop is used to further refine the tracked data. The following example scenario illustrates the ``mix and match'' approach:
In essence, one animator's low-level box may be another's high-level box, so the choice of what is low or high level is somewhat arbitrary. The lower levels are modules more closely tied to the specific type of model being used, whereas the higher levels can be model independent.