Next: Mix And Match
Up: Control Architecture Examples
Previous: Text-To-Visual/Auditory Speech
In this control scheme the lips of real human
faces are analyzed to extract parameters that drive continuous
functions that fit the shape of the lips [6].
Using video analysis it is then possible to synchronize a lip model
with the natural voice of the speaker.
One of the unique features of this scheme is that the lip contour
shapes, while highly deformable, follow some very regular rules. In
fact, the coefficients for the continuous functions can easily be
predicted from three anatomical parameters measured on the speaker's face:
(1) the horizontal width and (2) the vertical height of the internal lip
contour, and (3) the distance between a vertical profile reference and
the lip contact protrusion.
The process is as follows:
- The images of the lips of a real human face uttering coarticulated
phonemes are first recorded and then geometrically analyzed.
- From this analysis, a set of lip-jaw shapes, representing the ``labial
space'' of a speaker, as well as relevant control parameters, are
extracted [8].
- The three above mentioned control parameters predict a set of continuous
functions (polynomial and sinusoid) that best fit the frontal projection of
the contours of the ``viseme'' set.
The analysis is extended to 3D [58] where the
equations of the lip contours in the coronal plane can be derived.
The lip volume is created by linearly interpolating three
intermediate contours in-between the frontal, internal, and external
contours of the vermilion zone. For each of the five contours, a
function approximates each horizontal projection. Two extra
parameters, the distances between the vertical profile reference
and (4) the lower and (5) the upper lip protrusions, are necessary to
predict all the equations of the 3D model.
- The model is animated and synchronized with the natural voice of
the speaker whose lip gestures control the model by real-time video
analysis [54].
Next: Mix And Match
Up: Control Architecture Examples
Previous: Text-To-Visual/Auditory Speech