Classifier Predicate Representations for an English to American Sign Language Machine Translation System Matt Huenerfauth First Annual Symposium of the.

Classifier Predicate Representations for an English to American Sign Language Machine Translation System Matt Huenerfauth First Annual Symposium of the Penn Working Group in Language April 17, 2004 Computer and Information Science University of Pennsylvania Research Advisors: Mitch Marcus & Martha Palmer

Motivations and Applications One half of Deaf high school graduates can read English at a fourth-grade level. Most are fluent in ASL; English & ASL are very different languages. Many efforts to help the Deaf access the hearing world forget English is often their 2 nd language. Applications for a Machine Translation System: –TV captioning, teletype telephones. –Computer user-interfaces in ASL. –Educational tools, access to information/media.

American Sign Language What is ASL? –Real language? Who uses it? –Different than Signed English. How is it different than English? –Simultaneous channels of output… –Non-categorical modulation of output… –Objects floating in space – point to them, show them moving around.

Input / Output What’s our input? English Text. What’s our output? Less clear… Imagine a 3D virtual reality human being… One that can perform sign language… What’s our input? English Text. What’s our output? Less clear… Imagine a 3D virtual reality human being… One that can perform sign language… But this character needs a set of instructions telling it how to move! Our job: English  These Instructions. VCom3d

Photos: Seamless Solutions, Inc. Simon the Signer (Bangham et al. 2000.) Vcom3D Corporation Off-the-Shelf Virtual Humans

Two Kinds of Sentences Traditional Sentences: (No classifier predicates.) Which university does Billy attend? wh #BILLY IX x GO-TO UNIVERSITY WHICH Spatially Complex Sentences: (Use classifier predicates.) The car drove down the bumpy road past a cat. CAT ClassPred-bentV-{location of cat} CAR ClassPred-3-{drive on bumpy road}

Classifier Predicates The car drove down the bumpy road past a cat. CAT ClassPred-bentV-{location of cat} CAR ClassPred-3-{drive on bumpy road} Where’s the cat, the road, and the car? How close? Where does the path start/stop? How show path is bumpy, winding, or hilly? Pushing the boundaries of ‘language.’ –Hard to handle with traditional computational linguistic representations (lexicons, grammars).

Previous ASL MT Systems Word-for-Sign direct transliteration. –Produces Signed English, not ASL Syntactic analysis, transfer, generation. –Handles much of the non-spatial phenomena. All ignore classifier predicates. –Need ASL classifiers to fluently translate many English input texts. –Signers use classifier predicates once per minute in most genres (17x/minute in some). Morford and McFarland, 2003.

Focus and Assumptions Other Systems –Non-Spatial ASL sentences only. This Project –ASL with Classifier Predicates!

Motivating a Design for a Classifier Predicate Generator Four progressively better designs…

Four Designs: Keep Improving Design 1: Fully Lexicalized Design 2: Compositional Rules Design 3: Directly Pictorial Design 4: Template Lexicon Virtual Reality Spatial Model Associate a movement path with English multi-word phrases. Combine a set of morphemes using compositional rules that we must write. Invisible world; Place hand on top of the moving object. Invisible world to calculate 3d points, then fill template for some classpred. Can’t List All of Them Too Many Morphemes Overgenerates Supalla’s Polymorphemic DeMatteo’s Visual/Gestural Liddell’s Templates

Design 1: List them all… Multi-word English lexical entries. Associate a classifier predicate with each. Exhaustively list them all… Problem? –Anticipate all of them? –ClassPreds are very productive. –Many ways to modulate performance. “…drive up the hill…” –This approach is impractical.

Design 2: Composition Rules Identify minimal components of meaning. Corresponding element of movement/shape: –path contour, hand elevation, palm orientation… –e.g. “…which way is the person facing…” Compositional rules to combine these ‘morphemes’ into full classifier predicate.

Design 2: Linguistic Analogy Analogous to Suppalla’s polymorphemic model of classifier predicate generation. (1978, 1982, 1986) –Every piece of information = morpheme. –Build the predicate = combining lots of them. –E.g. “…two people meet…” (Liddell, 2003) Morpheme count explosion! Not practical.

So, what’s the problem? Every 3D location/path = a new morpheme. –No model of how objects arranged in space… 3D model = more intuitive. –Easier to select the motion path of our hand. –Need many fewer morphemes. Analyze English text  make a 3D model. 3D coordinates  How to move our hand.

Four Designs: Keep Improving Design 1: Fully Lexicalized Design 2: Compositional Rules Design 3: Directly Pictorial Design 4: Template Lexicon Virtual Reality Spatial Model Associate a movement path with English multi-word phrases. Combine a set of morphemes using compositional rules that we must write. Invisible world; Place hand on top of the moving object. Invisible world to calculate 3d points, then fill template for some classpred. Can’t List All of Them Too Many Morphemes Overgenerates Supalla’s Polymorphemic DeMatteo’s Visual/Gestural Liddell’s Templates Virtual Reality Spatial Model

A Useful Technology… Controlling a virtual reality with English input commands…

English-Controlled 3D Scene AnimNL System –Virtual reality model of characters/objects in 3D. –Input: English sentences. Directions for characters/objects to follow. –Produces an animation: Characters/objects obey the English commands. –Updates the 3D scene to show changes. Badler, Bindiganavale, Allbeck, Schuler, Zhao, Lee, Shin, and Palmer. 2000. Schuler. 2003.

English-Controlled 3D Scene http://hms.upenn.edu/software/PAR/images.html

How it Works English Text  Syntactic Analysis  Select a PAR Template  Fill the PAR Template  “Planning Process”  Animation Output PAR = “Parameterized Action Representation” (on next slide)

Parameterized Action Representation participants: [ agent: AGENT objects: OBJECT list ] semantics:[ motion: {Object, Translate?, Rotate?} path:{Direction, Start, End, Distance} termination:CONDITION duration:TIME-LENGTH manner:MANNER] start:TIME prep conditions:CONDITION boolean-exp sub-actions:sub-PARs parent action:PAR24 previous action:PAR35 next action:PAR64 This is a subset of PAR info. http://hms.upenn.edu/software/PAR Bob tripped on the ball. …tripped… Planning Operator (Artificial Intelligence formalism for deciding how to act in complex situation.) Bob { ball_1 } {Bob, translate…, rotate…} Specifics of the path taken… Accidentally. End at 6am. 3 Hours. Accidentally, Rapidly. …until dawn. …for 3 hours. …rapidly. What is a “planning” algorithm good for?

Adding Detail, Making Animation PAR is missing details needed to create animation. –“…turn the handle…” Use an artificial intelligence “planning” algorithm –Calculate preconditions, physical constraints, sub-actions, effects, etc. of each animation movement. Works out the details needed to build animation.

Diagram of AnimNL

A 3D Spatial Model for American Sign Language Using the virtual reality English-command technology

English-Controlled 3D Scene http://hms.upenn.edu/software/PAR/images.html

Using this technology… An NL-Controlled 3D Scene http://hms.upenn.edu/software/PAR/images.html

Using this technology… An NL-Controlled 3D Scene

Using this technology… An NL-Controlled 3D Scene Original image from: Simon the Signer (Bangham et al. 2000.) Signing Character

“Invisible World” Approach Invisible objects floating in front of the signer. English sentences  commands for virtual reality. Positions, moves, and orients objects in this world. So, we’ve got all these floating invisible objects… What do we do with them?

Using the 3D Virtual Reality Design 3 and Design 4

Design 3: Directly Pictorial Invisible 3D Objects  Classifier Predicate –Put hand in the proper handshape –Place hand directly on top of (inside of) object in the 3D scene. –Follow the paths objects trace through space. We go along for the ride!

Diagram of Design 3

The AnimNL Technology

Diagram of Design 3 The AnimNL Technology English Text English Syntax Filled-In PAR Animation Script Animated 3D Scene PAR Templates for 3D Object Motions

Linguistic Analogy / Problems DeMatteo’s gestural model of classifier predicates (1977) –Mental model of scene. –Move hands in topologically analogous manner. –Merely iconic gestural movements. Problem? Overgenerative. –Doesn’t explain conventions/restrictions: legal combinations of handshape/movement. some movements not visually representative. discourse factors / multi-predicate concerns. Design 3 has same problem! (Liddell, 2003)

Diagram of Design 3 This process is harder than it seems.

The Solution? More Templates! Can’t just ‘go along for the ride.’ –Making a ClassPred is more complicated. Our last complicated animation task? –Move 3D objects based on English text. –We used templates and ‘planning’. Can we do something like this again? –This time: how to move the arm to do a ClassPred.

Diagram of Design 3 Insert a template library here… Insert a planning process here…

Diagram of Design 4

A Second PAR Template Library First library of templates: Possible movements of invisible objects in virtual reality. Second library: Possible movements of the signer’s hands while performing classifier predicates to describe these objects. Original image from: Simon the Signer (Bangham et al. 2000.) Library 1 Library 2

Selecting/Filling a Template Big list of prototypical classifier predicates stored as templates. Select a template based upon: –English lexical items –Linguistic features in English sentence –3D coordinates and motion paths of objects Let planning process build animated output. How is this better than design 3?

…leisurely walking along… AnimNL: English Text  Virtual Reality. Parse of Sentence  Select a Template Leisurely-Walking-Upright-Figure –Specifies handshape, palm orientation, “bouncing” path contour, and speed/timing. –Still needs 3D starting/stopping coordinates. Get coordinates from “invisible world,” fill template, let animation software make output. How’s it better? Invisible world motion path  ≠ hand motion path.

Linguistic Motivations “Blended Spaces” Lexicalized Classifier Predicate Model of Scott Liddell (2003). –Double-Actionary design analogous to model of how humans generate classifier predicates. –This model assumes signers imagine entities under discussion occupying space before them. –It argues that classifier predicates are stored as a lexicon of templates that are parameterized on locations/orientations of these spatial entities.

Linguistic Motivations “Blended Spaces” Lexicalized Classifier Predicate Model of Scott Liddell (2003). –Double-Library design analogous to model of how humans generate classifier predicates. –This model assumes signers imagine entities under discussion occupying space before them. –It argues that classifier predicates are stored as a lexicon of templates that are parameterized on locations/orientations of these spatial entities.

Linguistic Motivations “Blended Spaces” Lexicalized Classifier Predicate Model of Scott Liddell (2003). –Double-Library design analogous to model of how humans generate classifier predicates. –This model assumes signers imagine entities under discussion occupying space before them. –It argues that classifier predicates are stored as a lexicon of templates that are parameterized on locations/orientations of these spatial entities. Both engineering & linguistic motivations.

Linguistic Motivations “Blended Spaces” Lexicalized Classifier Predicate Model of Scott Liddell (2003). –Signers imagine objects occupying space. –Classifier predicates stored as: lexicon of templates that are parameterized on locations/orientations of these spatial entities. Both engineering & linguistic motivations.

Why is this a good model? Better than polymorphemic model. –Use of 3D coordinates allows it to generate many motion paths without needing huge numbers of morphemes. Better than visual/gestural model. –Can explain handshape/path restrictions. Don’t add these combinations to the list of templates. –Can explain non-representative movements. Template specifies a movement path for your hand that is different than the movement path of the object.

The Four Designs: Wrap Up Design 1: Fully Lexicalized Design 2: Compositional Rules Design 3: Directly Pictorial Design 4: Template Lexicon Virtual Reality Spatial Model Associate a movement path with English multi-word phrases. Combine a set of morphemes using compositional rules that we must write. Invisible world; Place hand on top of the moving object. Invisible world to calculate 3d points, then fill template for some classpred. Can’t List All of Them Too Many Morphemes Overgenerates Supalla’s Polymorphemic DeMatteo’s Visual/Gestural Liddell’s Templates Exciting Possibilities

Wrap Up Applications/motivations for ASL MT Classifier predicates are hard to generate. Need a 3D spatial model and generation process: Virtual reality “invisible worlds.” Engineering and Linguistic Motivations.

References Badler, Bindiganavale, Allbeck, Schuler, Zhao, Lee, Shin, and Palmer. 2000. Parameterized action representations and natural language instructions for dynamic behavior modification of embodied agents. AAAI Spring Symposium. Bangham, Cox, Lincoln, Marshall. 2000. Signing for the deaf using virtual humans. IEE2000. DeMatteo, A. (1977). Visual Analogy and the Visual Analogues in American Sign Language. In Lynn Friedman (ed.). On the Other Hand: New Perspectives on American Sign Language. (pp 109-136). New York: Academic Press. Holt, J. (1991). Demographic, Stanford Achievement Test - 8th Edition for Deaf and Hard of Hearing Students: Reading Comprehension Subgroup Results. Liddell. 2003. Sources of Meaning in ASL Classifier Predicates. In Karen Emmorey (ed.). Perspectives on Classifier Constructions in Sign Languages. Workshop on Classifier Constructions, La Jolla, San Diego, California. Liddell. 2003. Grammar, Gesture, and Meaning in American Sign Language. UK: Cambridge U. Press. Morford and MacFarlane. 2003. “Frequency Characteristics of ASL.” Sign Language Studies, 3:2. Schuler. 2003. Using model-theoretic semantic interpretation to guide statistical parsing and word recognition in a spoken language interface. Proceedings of the 41 st Annual Meeting of the Association for Computational Linguistics (ACL’03), Sapporo, Japan. Supalla, T. (1978). Morphology of Verbs of Motion and Location. In F. Caccamise and D. Hicks (eds). Proceedings of the Second National Symposium on Sign Language Research and Teaching. (pp. 27-45). Silver Spring, MD: National Association for the Deaf. Supalla, T. (1982). Structure and Acquisition of Verbs of Motion and Location in American Sign Language. Ph.D. Dissertation, University of California, San Diego. Supalla, T. (1986). The Classifier System in American Sign Language. In C. Craig (ed.) Noun Phrases and Categorization, Typological Studies in Language, 7. (pp. 181-214). Philadelphia: John Benjamins.

Photo Credits Some images taken from: Seamless Solutions, Inc. Website Vcom3d Company Website J.A. Bangham, S J Cox, M Lincoln, I Marshall. 2000. Signing for the deaf using virtual humans. IEE2000

Classifier Predicate Representations for an English to American Sign Language Machine Translation System Matt Huenerfauth First Annual Symposium of the.

Similar presentations

Presentation on theme: "Classifier Predicate Representations for an English to American Sign Language Machine Translation System Matt Huenerfauth First Annual Symposium of the."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Classifier Predicate Representations for an English to American Sign Language Machine Translation System Matt Huenerfauth First Annual Symposium of the.

Similar presentations

Presentation on theme: "Classifier Predicate Representations for an English to American Sign Language Machine Translation System Matt Huenerfauth First Annual Symposium of the."— Presentation transcript:

Similar presentations

About project

Feedback