Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human.

Similar presentations


Presentation on theme: "A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human."— Presentation transcript:

1 A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human Language Technologies conference / North American chapter of the Association for Computational Linguistics annual meeting. Boston, MA, USA. May 2, 2004 Computer and Information Science University of Pennsylvania Research Advisors: Mitch Marcus & Martha Palmer

2 Motivations and Applications One half of Deaf high school graduates (age 18) can read English at a fourth-grade level (age 10). –But most are fluent in ASL. (ASL ≠ English.) –Many accessibility technologies assume English-fluency. –ASL used by 500,000 Deaf people in North America. Applications for a Machine Translation System: –TV captioning, teletype telephones. –Computer user-interfaces in ASL. –Educational tools, access to information/media.

3 MT: Input / Output What’s the input? English Text. What’s the output? Less clear… Imagine a 3D virtual reality human being… One that can perform sign language… What’s the input? English Text. What’s the output? Less clear… Imagine a 3D virtual reality human being… One that can perform sign language… But this character needs a set of instructions telling it how to move! The task: English  These Instructions. VCom3d

4 Off-the-Shelf Virtual Humans Photos: Seamless Solutions, Inc. Simon the Signer (Bangham et al., 2000) Vcom3D Corporation

5 American Sign Language Sentence without classifier predicate: Which university does Billy attend? wh #BILLY IX x GO-TO UNIVERSITY WHICH Sentence with classifier predicate: The car drove down the bumpy road past a cat. CAT ClassPred-bentV-{location of cat} CAR ClassPred-3-{drive on bumpy road}

6 Difficult to Generate but Important The car drove down the bumpy road past the cat. Where’s the cat? The road? The car? How close are they? Where does path start/stop? How show path is bumpy vs. windy vs. hilly? Some English sentences require a classifier predicate to be translated fluently. Spatial prepositions, adverbs, other phrases… Signers use classifier predicates frequently. Depending on genre, one to 17 times per minute.

7 Initial Approaches to ASL MT Non-statistical Direct and Transfer MT Architectures

8 Why not Statistical MT? ASL has no written form. Corpora is hard to collect, transcribe. –Annotate video: multiple simultaneous channels of face, body, hand, and arm movements. There’s no training data.

9 Machine Translation Pyramid Options in MT design. –more work –domain size –subtler divergences handled Interlingua Semantic Structure Semantic Structure Syntactic Structure Syntactic Structure Word Structure Word Structure Source TextTarget Text Semantic Composition Semantic Decomposition Semantic Analysis Semantic Generation Syntactic Analysis Syntactic Generation Morphological Analysis Semantic Transfer Syntactic Transfer Direct Morphological Generation MT Pyramid (Dorr, 1998.)

10 Direct ‘ASL’ MT Systems Word-for-sign dictionary look-up system. –Produces Signed English, not ASL. –Definitely can’t generate classifier predicates.

11 Transfer ASL MT Systems Syntactically analyze English text before crossing over to ASL. –Capture more divergences, more phenomena –Still can’t handle the complex use of space. –Still can’t generate classifier predicates.

12 When the going gets tough… …the tough try an interlingua. –Direct or transfer architectures are insufficient. –If not an interlingua, then at least an approach with more spatial knowledge/representation. Of course, there’s a problem. –It’s hard/impossible to build interlingua system for an open-ended domain.

13 Getting by with limited domain? We can identify sentences that need complex translation. (That need classifier predicates.) When do we use classifier predicates? –Locations, orientations, or movements –Spatial verbs, prepositions, adverbs –Concrete or animate entities –Don’t worry about abstractions, beliefs, intentions

14 “Multi-Path” MT Only when needed, Use complex, sophisticated MT. Interlingua? Otherwise, Use simpler easier-to-build MT. Transfer Use the linguistic ‘breadth’ of one approach and knowledge/spatial ‘depth’ of the other.

15 “Multi-Path” MT Only when needed, Use complex, sophisticated MT. Interlingua? Otherwise, Use simpler easier-to-build MT. Transfer If all else fails, Use word-for-sign Direct transliteration.

16 “Pyramidal” MT No longer a set of options. Now a design for a new multi-path architecture. Interlingua Semantic Structure Semantic Structure Syntactic Structure Syntactic Structure Word Structure Word Structure Source TextTarget Text Semantic Composition Semantic Decomposition Semantic Analysis Semantic Generation Syntactic Analysis Syntactic Generation Morphological Analysis Semantic Transfer Syntactic Transfer Direct Morphological Generation MT Pyramid (Dorr, 1998.)

17 “Pyramidal” MT No longer a set of options. Now a design for a new multi-path architecture. Direct: Unanalyzable Text Interlingual: Spatial Text Transfer: Most Sentences Interlingua Semantic Structure Semantic Structure Syntactic Structure Syntactic Structure Word Structure Word Structure Source TextTarget Text Semantic Composition Semantic Decomposition Semantic Analysis Semantic Generation Syntactic Analysis Syntactic Generation Morphological Analysis Semantic Transfer Syntactic Transfer Direct Morphological Generation MT Pyramid (Dorr, 1998.)

18 But what’s our interlingua? And is it really an interlingua?

19 What do human interpreters do? Listen to English about spatial topics  make 3D mental model of what’s said  produce ASL classifier predicates Using a spatial representation of reality…

20 What could a computer do? Computer analyzes English text  build 3D virtual reality of the scene  use VR as basis for generating the spatial classifier predicate movements University of Pennsylvania AnimNL system: –3D virtual reality model with characters/objects. –Input: English directions for characters to follow. –Builds animation: characters obey commands. Badler, Bindiganavale, Allbeck, Schuler, Zhao, Lee, Shin, and Palmer & Schuler

21 English-Controlled 3D Scene

22 How it Works English Text  Syntactic Analysis  Select a PAR Template  Fill the PAR Template  “Planning Process”  Animation Output PAR = “Parameterized Action Representation” (on next slide)

23 Parameterized Action Representation participants: [ agent: AGENT objects: OBJECT list ] semantics:[ motion: {Object, Translate?, Rotate?} path:{Direction, Start, End, Distance} termination:CONDITION duration:TIME-LENGTH manner:MANNER] start:TIME prep conditions:CONDITION boolean-exp sub-actions:sub-PARs parent action:PAR24 previous action:PAR35 next action:PAR64 This is a subset of PAR info. Bob tripped on the ball. …tripped… Planning Operator: Linked to 3D VR Animated Movements. Bob { ball_1 } {Bob, translate…, rotate…} Specifics of the path taken… Accidentally. End at 6am. 3 Hours. Accidentally, Rapidly. …until dawn. …for 3 hours. …rapidly. Planning algorithm works out movement details.

24 English-Controlled 3D Scene

25 Using this technology… An NL-Controlled 3D Scene

26 Using this technology… An NL-Controlled 3D Scene

27 Using this technology… An NL-Controlled 3D Scene Original image from: Simon the Signer (Bangham et al ) Signing Character

28 Using this technology… An NL-Controlled 3D Scene Original image from: Simon the Signer (Bangham et al ) Signing Character

29 “Invisible World” Approach Tiny 3D virtual reality in front of signer’s hands. AnimNL: English sentences about locomotion  Move invisible objects accordingly Put hand on top of an object: go along for the ride! We just built a CLASSIFIER PREDICATE.

30 Classifier Predicate Pathway

31 Direct: Unanalyzable Text Interlingual: Spatial Text Transfer: Most Sentences

32 Design Issues and Discussion

33 Is the VR really an interlingua? Depends on your definition & how implemented. –Semantic representation: Yes, model for 3D spatial domains. –Useful for translation: We’ve shown how it can be. –World knowledge beyond input semantics: Yes, in that it handles spatial/physical constraints. –Language neutral: 3D coordinates: not just interlingual, it’s non-lingual. But might need other semantic/discourse information…

34 Other Languages Alleviates tradeoff: –Domain specificity vs. divergence-handling power. –Use deeper approach in a broad coverage system. Translate variety of texts but perform deeper processing on certain inputs. –Important or well-understood sentences. –Sublanguage that requires special handling. –Transfer or deeper/interlingual approach for “special” text and resource-lighter approach for the rest.

35 Mixing Statistical/Symbolic MT This system had no statistical pathways. –Nothing prevents their use with this design. Statistical approach for most inputs; manually override translation of certain texts. –Statistical approach for direct (and transfer). –Hand-build the higher pathways.

36 Project Status Finishing design specification. Beginning implementation. Other considerations: –Evaluation? –Initial Applications? –How to generate multiple classifier predicates? –Representations to use in transfer pathway?

37 Questions?

38 References Badler, Bindiganavale, Allbeck, Schuler, Zhao, Lee, Shin, and Palmer Parameterized action representations and natural language instructions for dynamic behavior modification of embodied agents. AAAI Spring Symposium. Bangham, Cox, Lincoln, Marshall Signing for the deaf using virtual humans. IEE2000. DeMatteo, A. (1977). Visual Analogy and the Visual Analogues in American Sign Language. In Lynn Friedman (ed.). On the Other Hand: New Perspectives on American Sign Language. (pp ). New York: Academic Press. Holt, J. (1991). Demographic, Stanford Achievement Test - 8th Edition for Deaf and Hard of Hearing Students: Reading Comprehension Subgroup Results. Liddell Sources of Meaning in ASL Classifier Predicates. In Karen Emmorey (ed.). Perspectives on Classifier Constructions in Sign Languages. Workshop on Classifier Constructions, La Jolla, San Diego, California. Liddell Grammar, Gesture, and Meaning in American Sign Language. UK: Cambridge U. Press. Morford and MacFarlane “Frequency Characteristics of ASL.” Sign Language Studies, 3:2. Schuler Using model-theoretic semantic interpretation to guide statistical parsing and word recognition in a spoken language interface. Proceedings of the 41 st Annual Meeting of the Association for Computational Linguistics (ACL’03), Sapporo, Japan. Supalla, T. (1978). Morphology of Verbs of Motion and Location. In F. Caccamise and D. Hicks (eds). Proceedings of the Second National Symposium on Sign Language Research and Teaching. (pp ). Silver Spring, MD: National Association for the Deaf. Supalla, T. (1982). Structure and Acquisition of Verbs of Motion and Location in American Sign Language. Ph.D. Dissertation, University of California, San Diego. Supalla, T. (1986). The Classifier System in American Sign Language. In C. Craig (ed.) Noun Phrases and Categorization, Typological Studies in Language, 7. (pp ). Philadelphia: John Benjamins.

39 Advantages of Virtual Reality ASL signers can arrange objects under discussion in the space around them. –Presence of a virtual reality model in this system enables sophisticated management of these positioned objects. The AnimNL system can also control the movements of virtual human figures who participate in the 3D scene. These figures possess skills useful for ASL signing; so, we can use one as our signer. –Same technology for signer and 3D spatial model.

40 System Diagram


Download ppt "A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human."

Similar presentations


Ads by Google