Even with a powerful set of motion generators, a challenge remains to provide effective and easily learned user interfaces to control, manipulate and animate virtual humans. Interactive point and click systems such as Jack \ work now, but with a cost in user learning and menu traversal. Such interfaces decouple the human participant's instructions and actions from the avatar through a narrow and ad hoc communication channel of hand and finger motions. A direct programming interface, while powerful, must be rejected as as off-line method that moreover requires specialized computer programming understanding and expertise. The option that remains is a language-based interface.
Perhaps not surprisingly, instructions for people are given in natural language augmented with graphical diagrams and occasionally, animations. Recipes, instruction manuals, and interpersonal conversations use language as the medium for conveying process and action. While our historic interest in instructions has been on creating animations from instructions [5,3,45], we have recently begun to examine the inverse process, namely, generating text from the PaT-Net representations of animations. The purpose is primarily to help automate the production of aircraft maintenance instruction orders (manuals) in conjunction with the animation of the tasks themselves. The expectation is that the synthesized text material ought to reflect the proper execution of the tasks (which can be visually verified through the animation) and will have consistency across the entire document. By the same principles, being able to process the textual instructions will aid in discovering ambiguities, omitted steps, or inappropriate terminology.
The key to linking language and animation lies in constructing a semantic representation of actions, objects, and agents which is simultaneously suitable for execution (animation) as well as natural language expression. We have called this implementable semantics: the representation must have the power of a (parallel) programming language which drives a simulation (in a context of a given set of objects and agents), and yet supports the enormous range of expression, nuance, and manner offered by language. The details of this Parameterized Action Representation (PAR) -- which involves PaT-Nets as an implementation language -- are being developed in a companion document .
As a prototype implementation of this language--animation connection, we are constructing JackMOO: a multi-user environment mediated by an existing system called lambdaMOO . The lambdaMOO is a text-based multi-user world. By adding a Jack system and an additional dialog box to the user interface, the user can instruct his or her avatar to take steps, go into a neighboring room, turn on a TV set, and so on. While the text input is quite constrained by the limited ``verb-object-modifier'' syntax of lambdaMOO, it is nonetheless very efficient for specifying the avatar's actions. The JackMOO updates a local client's 3D animated view of the instruction executed on the pilot avatar. Simultaneously, the lambdaMOO updates the persistent world view (on the lambdaMOO server) and informs (textually) other users occupying the same virtual room of the avatar's actions. Should one of the other users have a JackMOO interface, the avatar's actions will be mirrored by the user's drone on that client's display. In effect, the actions are specified in either a textual or graphical fashion and executed as both a textual and database update and (if possible) an animated display. It is thus essential that discrete (text-based) instructions have continuous (animation) consequences: the conversion is based on action (``verb'') semantics embedded into object-specific methods stored in the lambdaMOO objects. We are extending this object-oriented system to store the richer semantic information necessitated by the scope and range of human actions that an avatar must portray.