Presentation is loading. Please wait.

Presentation is loading. Please wait.

Handling Spatially Complex English-to-ASL MT with a Multi-Path Pyramidal Architecture Matt Huenerfauth CLUNCH Presentation November 3, 2003.

Similar presentations


Presentation on theme: "Handling Spatially Complex English-to-ASL MT with a Multi-Path Pyramidal Architecture Matt Huenerfauth CLUNCH Presentation November 3, 2003."— Presentation transcript:

1 Handling Spatially Complex English-to-ASL MT with a Multi-Path Pyramidal Architecture Matt Huenerfauth CLUNCH Presentation November 3, 2003

2 ASL Machine Translation with Pyramids and Invisible Worlds Matt Huenerfauth CLUNCH Presentation November 3, 2003

3 Todays Talk This is work in progress. ASL Linguistics and Machine Translation Initial Approaches to ASL MT Handling Spatially Complex ASL –A Multi-Path MT Architecture. –Adopting some HMS lab technology. –Interesting Linguistic Motivations. Current and Future Work

4 Motivations and Applications Only half of deaf high school graduates can read English at a fourth-grade level – despite sophisticated ASL fluency. Many efforts to help the deaf access the hearing world forget English is their 2 nd language (& different than ASL). Applications for a Machine Translation System: –TV captioning, teletype telephones. –Human interpreters intrusive/expensive. –Educational tools, access to information. –Storage and transmission of ASL.

5 Output: Signing Virtual Humans Virtual reality models of the human form are now articulate & fast enough to produce ASL. ASL Generator produces instructions for the avatar, and the avatar performs the signs -- producing animated output for the user. Our problem is how to build these instructions.

6 Virtual Signing Humans Photos: Seamless Solutions, Inc. Simon the Signer (Bangham et al. 2000.) Vcom3D Corporation

7 ASL Linguistics I What is ASL? –Real language? Who uses it? –Different than SEE or SSE. How is it different than English? –Grammar, Vocabulary, Visual/Spatial. –More than the Hands: Simultaneity! –How signs can be changed: Morphology! –Use of Space around the Signer…

8 ASL Linguistics II Discourse Space –Put discourse entities on shelves for later referential use. –Agreement - Pronouns, Possessives, Verbs. –Dont interpret locations literally. (Bob to the left of Tim.) Three-Dimensional Space –Space around signer is visually analogous to a real scene. –Classifier Predicates Signers describe 3D scenes with their hands. Meaningful handshape and 3D representative movement path.

9 ASL Linguistics III Traditional Sentences: (No classifier predicates.) Where does Billy attend college? wh #BILLY IX x GO-TO UNIVERSITY WHERE Spatially Complex: (Uses classifier predicates.) I parked my car next to his cat. POSS x CAT ClassPred-bent-V-{locate cat in space} POSS 1s CAR ClassPred-3-{park next to cat} The truck drove down the windy road. IX x TRUCK ClassPred-3-{drive on windy road} 8

10 Initial Approaches to ASL MT Non-statistical Direct and Transfer MT Architectures

11 Corpora for ASL? ASL has no written form; so, theres no newswires or ready-made sources of text. Some groups have attempted to record and annotate video tapes, but the difficulty of creating a useful and consistent manual transcription standard and then performing the transcription makes for very slow work. No statistical approaches to ASL MT.

12 Machine Translation Pyramid MT Pyramid Dorr 1998. Options in MT design. No stats? higher path: –more work –domain size –subtler divergences handled

13 Option 1: Direct Translation What kind of non-statistical translation possible if all we do is word-level analysis (i.e. morphology, POS & sense tagging) ? Word-for-sign dictionary look-up system. Probably not sophisticated enough analysis to produce ASL, but could produce SEE.

14 Option 2: Transfer Translation Syntactically analyze English text before crossing over to ASL. –Capture more divergences and handle more complex phenomena. –Can successfully translate many English sentences into ASL. Some previous work along these lines. –some use deep syntax or simple semantics

15 Transfer Issues for ASL ASL Discourse Model: topics, referents in space. Representing & Generating Non-Manual Signals. Computational Model of ASL Phonology –facilitate creation of an ASL lexicon –define morphological and phonological operations Parameterizing ASL Features for Morphology Note: If system couldnt handle a particular input, just fall back on direct translation to produce signing output closer to SEE than fluent ASL. 14

16 Handling Spatially Complex ASL Failings of direct and transfer approaches to ASL MT.

17 But whats the hard part? Previous ASL generation work has ignored spatially complex ASL sentences. –Classifier predicates and spatial verbs –Very common, very communicatively useful. Difficult to handle in transfer architecture. (More going on than just syntax with these.)

18 Translate to a Classifier Predicate The car drove down the bumpy road past my house. POSS 1s HOUSE ClassPred-C-{locate house} IX x CAR ClassPred-3-{drive on bumpy road} Wheres the house, the road, and the car? How close? Where does the path start/stop? How show path is bumpy, winding, or hilly?

19 Paralinguistic? Iconic? Spatial? Linguists debate whether classifier predicates are: –Paralinguistic visually iconic gestural movements –Complex non-spatial polymorphemic constructions –Semantically compositional yet still spatially aware Pushing the boundaries of language… –May involve gradient information, spatial analogy, scene visualization, and a degree of iconicy. –Not clear traditional linguistic approaches can capture. –Still seems linguistic however: many constraints… 18

20 When the going gets tough… …the tough try an interlingua. –Hard to address using morphological, syntactic, and simple semantic information of the English text. –Direct or transfer architecture appear insufficient. What about an interlingual approach? –Problem: Hard to build interlingua system for unlimited (or even medium-sized) domain. Lots of overhead! –Interlingual systems only for limited domains.

21 Getting by with limited domain? Special about ASL: can identify hard sentences. –Spatially descriptive text: English spatial verbs describing locations, orientations, or movements; spatial prepositions or adverbs; concrete or animate entities; other common motifs or situations when classifier predicates are used (detect lexically). Use broad-coverage transfer approach for most inputs, and detect when we need to use something more powerful when we have a spatially complex English input sentence.

22 Multi-Path MT? Whenever possible, Use simpler easier-to-build MT approach. Only when needed, Use more sophisticated resource-intensive. We take advantage of the breadth of one and the depth of the other. If we add direct translation (to SEE) to the picture, we actually have three pathways.

23 Pyramidal MT? Dont interpret this picture as a set of options anymore… Now its a skeleton for a multi-path MT architecture. MT Pyramid Dorr 1998.

24 What is our Interlingua? What is the language-neutral representation between the English and ASL when talking about a spatially complex scene? Intuitively, the signer has a visualization of the 3D scene which they are discussing. So, a spatial representation of reality (or the signers imagination/conception of this reality) is serving as the interlingua. This sounds rather ambitious… How could the computer model spatial reality?

25 What about Virtual Reality? Analyze the English text, construct 3D virtual reality representation of the scene, and use VR as basis for generating the spatially iconic classifier predicate movements. But has anyone ever attempted to construct a 3D virtual reality representation of a changing scene as it is described by English sentences? Actually, the University of Pennsylvania has. 22

26 A Useful Technology Natural Language Command and Control of Virtual Reality Scenes

27 HMS & NLP Labs: 3D Scene NL-Command Have a virtual reality model of characters and objects in a three-dimensional scene. Accepts English text input (directions for the characters or objects to follow). Produces an animation in which the characters obey the English commands. Updates the 3D scene to show changes. Badler, Bindiganavale, Allbeck, Schuler, Zhao, Lee, Shin, and Palmer. 2000. Schuler. 2003.

28 An NL-Controlled 3D Scene http://hms.upenn.edu/software/PAR/images.html

29 NL Command and Control English Text English Syntax Filled-In PAR Analysis Selecting a PAR Template from the Actionary and Filling-In Slots Animation Script Animated 3D Scene Actionary: PAR Templates for Entity Motions Hierarchical Planning: handle ambiguities, add more detail…

30 NL Command and Control English Text English Syntax Filled-In PAR Analysis Selecting a PAR Template from the Actionary and Filling-In Slots Animation Script Animated 3D Scene Actionary: PAR Templates for Entity Motions Whats a PAR? Hierarchical Planning: handle ambiguities, add more detail… Actionary = Action Dictionary = List of PAR Templates

31 Parameterized Action Representation participants: [ agent: AGENT objects: OBJECT list ] semantics:[ motion: {Object, Translate?, Rotate?} path:{Direction, Start, End, Distance} termination:CONDITION duration:TIME-LENGTH manner:MANNER] start:TIME prep conditions:CONDITION boolean-exp sub-actions:sub-PARs parent action:PAR previous action:PAR next action:PAR This is a subset of PAR info. http://hms.upenn.edu/software/PAR Specify Locomotion Arguments Verb Adjuncts Planning Operator

32 NL Command and Control English Text English Syntax Filled-In PAR Analysis Selecting a PAR Template from the Actionary and Filling-In Slots Animation Script Animated 3D Scene Actionary: PAR Templates for Entity Motions Hierarchical Planning: handle ambiguities, add more detail…

33 NL Command and Control English Text English Syntax Filled-In PAR Analysis Hierarchical Planning: handle ambiguities, add more detail… Animation Script Animated 3D Scene Actionary: PAR Templates for Entity Motions Selecting a PAR Template from the Actionary and Filling-In Slots

34 NL Command and Control English Text English Syntax Filled-In PAR Analysis Hierarchical Planning: handle ambiguities, add more detail… Animation Script Animated 3D Scene Actionary: PAR Templates for Entity Motions Selecting a PAR Template from the Actionary and Filling-In Slots

35 MT Approach to Classifier Predicates Using the HMS NL Command and Control Technology 25

36 Using this technology… An NL-Controlled 3D Scene http://hms.upenn.edu/software/PAR/images.html

37 Using this technology… An NL-Controlled 3D Scene

38 Using this technology… An NL-Controlled 3D Scene Original image from: Simon the Signer (Bangham et al. 2000.) Signing Character

39 Using this technology… An NL-Controlled 3D Scene Original image from: Simon the Signer (Bangham et al. 2000.) Signing Character

40 Invisible World Approach Mini VR scene in front of the signer containing entities from English text. (Theyre invisible.) Interpret the English sentences as NL commands. Instantiate PARs which position, move, reorient, and otherwise modify the entities in this world. Update VR model. Use hand to show changes in the invisible scene. VR acts as intermediary between English & ASL.

41 Original image: MT Pyramid Dorr 1998. Interlingual Pathway for ASL Our MT picture… We now have an interlingual pathway.

42 Interlingual Pathway for ASL

43 The NL-Command Technology

44 Interlingual Pathway for ASL

45 This step harder than seems…

46 VR Scene Doesnt Do It All Various factors aside from the movement of the scene itself can affect this generation choice: –conventional motifs of expression e.g. furniture or items in a room –restrictions on use of multiple hands simultaneously –handshape-movement combination constraints e.g. approaching constructions –discourse or semantic concerns/priorities, etc. Theres generation work to be done!

47 An NL Engineering Solution How to create the classifier predicates from VR? –Write rules obeying restrictions that inspect the VR scene, consider English text semantics, and combine many small units/morphemes to slowly produce or narrow-in on a classifier predicate output. –Easier approach: Lexicalize classifier predicates as much as possible. Define and specify a big list of classifier predicate templates – their performance and semantics. Fill slots based on info in the VR scene. HMS: To define set of possible movement templates, build a PAR actionary specifying the animation possibilities. 30

48 A Second Actionary: For ASL The first actionary (list of PAR templates) we saw was used while analyzing the English text. It listed possible types of movements the imaginary entities perform in the virtual reality scene. This second actionary would describe the possible movements of the signers hands while performing one or more interrelated classifier predicates (& discourse/semantic effects). Original image from: Simon the Signer (Bangham et al. 2000.)

49 Interlingual Pathway for ASL This step could be hard…

50 Interlingual Pathway for ASL We now have an architecture for the interlingual pathway!

51 MT Pyramid Dorr 1998. Direct: Unanalyzable Text Interlingual: Spatial Text Transfer: Most Sentences Multi-Path Pyramidal MT

52 A Final Consideration Other motivations for the lexicalized classifier predicate double actionary architecture… 34

53 Interlingual Pathway for ASL Practical engineering motivations for design: Just a hack? Does relying on template actionary limit output too much?

54 Linguistic Motivations Blended Spaces Lexicalized Classifier Predicate Model of Scott Liddell (2003). –Double-Actionary design analogous to model of how humans generate classifier predicates. –This model assumes signers imagine entities under discussion occupying space before them. –It argues that classifier predicates are stored as a lexicon of templates that are parameterized on locations/orientations of these spatial entities.

55 Linguistic Motivations Blended Spaces Lexicalized Classifier Predicate Model of Scott Liddell (2003). –Double-Actionary design analogous to model of how humans generate classifier predicates. –This model assumes signers imagine entities under discussion occupying space before them. –It argues that classifier predicates are stored as a lexicon of templates that are parameterized on locations/orientations of these spatial entities.

56 Linguistic Motivations Blended Spaces Lexicalized Classifier Predicate Model of Scott Liddell (2003). –Double-Actionary design analogous to model of how humans generate classifier predicates. –This model assumes signers imagine entities under discussion occupying space before them. –It argues that classifier predicates are stored as a lexicon of templates that are parameterized on locations/orientations of these spatial entities.

57 Linguistic Motivations Blended Spaces Lexicalized Classifier Predicate Model of Scott Liddell (2003). –Double-Actionary design analogous to model of how humans generate classifier predicates. –This model assumes signers imagine entities under discussion occupying space before them. –It argues that classifier predicates are stored as a lexicon of templates that are parameterized on locations/orientations of these spatial entities.

58 Linguistic Motivations Blended Spaces Lexicalized Classifier Predicate Model of Scott Liddell (2003). –Double-Actionary design analogous to model of how humans generate classifier predicates. –This model assumes signers imagine entities under discussion occupying space before them. –It argues that classifier predicates are stored as a lexicon of templates that are parameterized on locations/orientations of these spatial entities. Both engineering & linguistic motivations.

59 Liddells Argument for Lexicalization Rejects assertion that spatial model not necessary. –Failings of non-spatial polymorphemic CP models. Unless very many morphemes: under-productive. Rejects naïve visually representative/analogous paralinguistic description of classifier predicates. –These models are over-productive, predicting unseen ASL constructions corresponding to imaginable movements, but model cant explain these restrictions. Parameterized CP lexicon explains restrictions (template not in lexicon) but incorporates spatial productivity of the visually analogous model. 38

60 Summary

61 Where were at… Seen MT approach for ASL classifier predicates. Proposed Multi-Path Pyramidal architecture. Uses HMS lab virtual reality software. Design is analogous to Liddells recent CP model. –Reached same design from engineering approach. –System could serve as test-bed for the model. Survey, analysis, design draft, and specification. Implementation not started yet… Suggestions?

62 Questions?

63 Is the VR really an interlingua? Depends on your definition & how implemented. –Language neutral: 3D coordinates & VR info: not language specific. But ASL PAR selection/filling might use other info. –Semantic representation: Yes, model for 3D spatial domains. –Useful for translation: Weve shown how it can be. –World knowledge beyond input semantics: Yes, in that it handles spatial/physics matters. Lets consider this…

64 Ontology vs. Domain Special property of ASL: easy to identify hard sentences requiring interlingua. –Only need to build interlingual resources to cover these domains (e.g. moving vehicles, furniture layout, etc.). But limited domains all similar: discuss 3D location, movements, and dimensions. –So the ontological expressiveness of this interlingua doesnt have to be nearly as powerful as most systems. –Abstract concepts, beliefs/intentions, quantification… –Not just things – but types of things – are limited.

65 References Cited Badler, Bindiganavale, Allbeck, Schuler, Zhao, Lee, Shin, and Palmer. 2000. Parameterized action representations and natural language instructions for dynamic behavior modification of embodied agents. AAAI Spring Symposium. Bangham, Cox, Lincoln, Marshall. 2000. Signing for the deaf using virtual humans. IEE2000. Liddell. 2003. Sources of Meaning in ASL Classifier Predicates. In Karen Emmorey (ed.). Perspectives on Classifier Constructions in Sign Languages. Workshop on Classifier Constructions, La Jolla, San Diego, California. Schuler. 2003. Using model-theoretic semantic interpretation to guide statistical parsing and word recognition in a spoken language interface. Proceedings of the 41 st Annual Meeting of the Association for Computational Linguistics (ACL03), Sapporo, Japan.


Download ppt "Handling Spatially Complex English-to-ASL MT with a Multi-Path Pyramidal Architecture Matt Huenerfauth CLUNCH Presentation November 3, 2003."

Similar presentations


Ads by Google