Next: Text-To-Visual/Auditory Speech
Up: Control Architecture Examples
Previous: Physically Based
Facial expression, head, and eye motion can be
automatically driven from spoken input, thereby providing a high level
programming interface for 3D facial animation. In this mode of
operation a particular spoken utterance, with associated intonation
and emotion, can be computed independently of the facial model. Once
the computation is complete, a facial model can be articulated through
the Action Units (AU) described by FACS notation system.
The process is as follows:
- Phonemes are characterized by their degree of deformability. For
each deformable segment, the algorithm looks for the nearby segment
whose associated lip shapes influence it, using the look-ahead model
for coarticulation [70].
The properties of muscle contractions are taken into account in two
ways: (1) spatially, by adjusting the sequence of contracting muscles
if antagonist movements (i.e., movements which show very different lip
positions, like pucker movements versus lip extensions) succeed each
other, and (2) temporally by noticing if a muscle has enough time to
contract (respectively relax) before (respectively after) the
surrounding lip shape. Both constraints act on the final computation
of the lip shapes [111].
- Starting from a functional group (lip shapes, conversational
signal, punctuator, regulator or manipulator), algorithms
can incorporate synchrony, and create coarticulation effects,
emotional signals, and eye and head movements
[113]. Rules generate automatically the
facial actions corresponding to an input utterance. A conversational
signal (movements occurring on accents, like raising of eyebrow)
starts and ends with the accented word, while punctuator signals (such
as smiling) coincide with pauses. Blinking is synchronized at the
phoneme level. Head nods and shakes appear on accent and pause. The
head of the speaker turns away from the listener at the beginning of a
speaking turn and turns toward the listener at the end of a speaking
turn to signal a change of turn.
- Facial interaction between agents and synchronization of head and
eye movements to the dialogue for each agent are accomplished using
Parallel Transition Networks (PaT-Nets), which allow facial
coordination rules to be encoded as simultaneously executing
finite-state automata [24]. PaT-Nets can call for action in
the simulation
and make state transitions either conditionally or probabilistically.
All face and eye movement behavior for an individual is encoded in a
single PaT-Net. Each node of the PaT-Net corresponds to one gaze
function. A PaT-Net instance is created to control each agent with
appropriate parameters. Then as agents' PaT-Nets synchronize the
agents with the dialogue and interact with the unfolding simulation
they schedule activity that achieves a complex observed interaction
behavior. Probabilities appropriate for each agent given the current
role as listener or speaker are set for the PaT-Net before it
executes. At each turn change, the probabilities affect actions
accordingly.
Next: Text-To-Visual/Auditory Speech
Up: Control Architecture Examples
Previous: Physically Based