David L. Chen Supervisor: Professor Raymond J. Mooney Ph.D. Dissertation Defense January 25, 2012 Learning Language from Ambiguous Perceptual Context.

David L. Chen Supervisor: Professor Raymond J. Mooney Ph.D. Dissertation Defense January 25, 2012 Learning Language from Ambiguous Perceptual Context

Semantics of Language The meaning of words, phrases, etc Learning semantics of language is one of the ultimate goals in natural language processing The meanings of many words are grounded in our perception of the physical world: red, ball, cup, run, hit, fall, etc. [Harnad, 1990] Computer representation should also be grounded in real world perception 1

Grounding Language Spanish goalkeeper Casillas blocks the ball 2

Grounding Language Block(Casillas) Spanish goalkeeper Casillas blocks the ball 3

Natural Language and Meaning Representation Spanish goalkeeper Casillas blocks the ball Block(Casillas) 4

Natural Language and Meaning Representation Natural Language (NL) NL: A language that has evolved naturally, such as English, German, French, Chinese, etc Block(Casillas) Spanish goalkeeper Casillas blocks the ball 4

Natural Language and Meaning Representation NL: A language that has evolved naturally, such as English, German, French, Chinese, etc MRL: Formal languages such as logic or any computer-executable code Meaning Representation Language (MRL) Block(Casillas) Spanish goalkeeper Casillas blocks the ball Natural Language (NL) 4

Semantic Parsing and Surface Realization NL Semantic Parsing: maps a natural-language sentence to a complete, detailed semantic representation MRL Semantic Parsing (NL  MRL) Block(Casillas) Spanish goalkeeper Casillas blocks the ball 5

Semantic Parsing and Surface Realization NL Semantic Parsing: maps a natural-language sentence to a complete, detailed semantic representation Surface Realization: Generates a natural-language sentence from a meaning representation. MRL Semantic Parsing (NL  MRL) Surface Realization (NL  MRL) Block(Casillas) Spanish goalkeeper Casillas blocks the ball 5

Applications Natural language interface – Issue commands and queries in natural language – Computer responds with answer in natural language Knowledge acquisition Computer assisted tasks 6

Traditional Learning Approach Manually Annotated Training Corpora (NL/MRL pairs) Semantic Parser MRLNL Semantic Parser Learner 7

Example of Annotated Training Corpus Natural Language (NL) Meaning Representation Language (MRL) Alice passes the ball to BobPass(Alice, Bob) Bob turns the ball over to JohnTurnover(Bob, John) John passes to FredPass(John, Fred) Fred shoots for the goalKick(Fred) Paul blocks the ballBlock(Paul) Paul kicks off to NancyPass(Paul, Nancy) …… 8

Learning Language from Perceptual Context Constructing annotated corpora for language learning is difficult Children acquire language through exposure to linguistic input in the context of a rich, relevant, perceptual environment Ideally, a computer system can learn language in the same manner 9

Navigation Example 在餐廳的地方右轉 Alice: Bob 10

Navigation Example 在醫院的地方右轉 Alice: Bob 11

Navigation Example 在醫院的地方右轉在餐廳的地方右轉 Scenario 1 Scenario 2 12

Navigation Example 在的地方右轉 Scenario 1 Scenario 2 13

Navigation Example 在的地方右轉 Scenario 1 Scenario 2 Make a right turn 13

Navigation Example 餐廳 Scenario 1 Scenario 2 15

Navigation Example 醫院 Scenario 1 Scenario 2 17

Thesis Contributions Learning from limited ambiguity – Sportscasting task [Chen & Mooney, ICML, 2008; Chen, Kim & Mooney, JAIR, 2010] – Collected English corpus of 2000 sentences that has been adopted by other researchers [Liang et al., 2009; Bordes, Usunier, & Weston, 2010; Angeli et al., 2010; Hajishirzi, Hockenmaier, Muellar, & Amir, 2011] Learning from ambiguously supervised relational data – Navigation task [Chen & Mooney, AAAI, 2011] – Organized and semantically annotated corpus from MacMahon et al. (2006) 18

Overview Background Learning from limited ambiguity Learning from ambiguous relational data Future directions Conclusions 19

Semantic Parser Learners Learn a function from NL to MR NL: “Purple3 passes the ball to Purple5” MR: Pass ( Purple3, Purple5 ) Semantic Parsing (NL  MR) Surface Realization (MR  NL) We experiment with two semantic parser learners –WASP [Wong & Mooney, 2006; 2007] –KRISP [Kate & Mooney, 2006] 20

Uses statistical machine translation techniques – Synchronous context-free grammars (SCFG) [Wu, 1997; Melamed, 2004; Chiang, 2005] – Word alignments [Brown et al., 1993; Och & Ney, 2003] Capable of both semantic parsing and surface realization WASP: Word Alignment-based Semantic Parsing 21

KRISP: Kernel-based Robust Interpretation by Semantic Parsing Productions of MR language are treated like semantic concepts SVM classifier is trained for each production with string subsequence kernel These classifiers are used to compositionally build MRs of the sentences More resistant to noisy supervision but incapable of language generation 22

Tractable Challenge Problem: Learning to Be a Sportscaster Goal: Learn from realistic data of natural language used in a representative context while avoiding difficult issues in computer perception (i.e. speech and vision). Solution: Learn from textually annotated traces of activity in a simulated environment. Example: Traces of games in the RoboCup simulator paired with textual sportscaster commentary. 24

Sample Human Sportscast in Korean 25

Robocup Sportscaster Trace Natural Language CommentaryMeaning Representation Purple goalie turns the ball over to Pink8 badPass ( Purple1, Pink8 ) Pink11 looks around for a teammate Pink8 passes the ball to Pink11 Purple team is very sloppy today Pink11 makes a long pass to Pink8 Pink8 passes back to Pink11 turnover ( Purple1, Pink8 ) pass ( Pink11, Pink8 ) pass ( Pink8, Pink11 ) ballstopped pass ( Pink8, Pink11 ) kick ( Pink11 ) kick ( Pink8) kick ( Pink11 ) kick ( Pink8 ) 26

Robocup Sportscaster Trace Natural Language CommentaryMeaning Representation Purple goalie turns the ball over to Pink8 P6 ( C1, C19 ) Pink11 looks around for a teammate Pink8 passes the ball to Pink11 Purple team is very sloppy today Pink11 makes a long pass to Pink8 Pink8 passes back to Pink11 P5 ( C1, C19 ) P2 ( C22, C19 ) P2 ( C19, C22 ) P0 P2 ( C19, C22 ) P1 ( C22 ) P1( C19 ) P1 ( C22 ) P1 ( C19 ) 27

Robocup Data Collected human textual commentary for the 4 Robocup championship games from 2001-2004. –Avg # events/game = 2,613 –Avg # English sentences/game = 509 –Avg # Korean sentences/game = 499 Each sentence matched to all events within previous 5 seconds. –Avg # MRs/sentence = 2.5 (min 1, max 12) Manually annotated with correct matchings of sentences to MRs (for evaluation purposes only). 28

Machine Sportscast in English 29

Navigation Task Learn to interpret and follow free-form navigation instructions – e.g. Go down this hall and make a right when you see an elevator to your left Assume no prior linguistic knowledge Learn by observing how humans follow instructions Use virtual worlds and instructor/follower data from MacMahon et al. (2006) 31

H C L S S B C H E L E Environment H – Hat Rack L – Lamp E – Easel S – Sofa B – Barstool C - Chair 32

Environment 33

Sample Instructions Take your first left. Go all the way down until you hit a dead end. Go towards the coat hanger and turn left at it. Go straight down the hallway and the dead end is position 4. Walk to the hat rack. Turn left. The carpet should have green octagons. Go to the end of this alley. This is p-4. Walk forward once. Turn left. Walk forward twice. Start 3 3 H H 4 4 End 34

Sample Instructions 3 3 H H 4 4 Take your first left. Go all the way down until you hit a dead end. Go towards the coat hanger and turn left at it. Go straight down the hallway and the dead end is position 4. Walk to the hat rack. Turn left. The carpet should have green octagons. Go to the end of this alley. This is p-4. Walk forward once. Turn left. Walk forward twice. Observed primitive actions: Forward, Left, Forward, Forward Start End 34

Formal Problem Definition Given: { (e 1, a 1, w 1 ), (e 2, a 2, w 2 ), …, (e n, a n, w n ) } e i – A natural language instruction a i – An observed action sequence w i – A world state Goal: Build a system that produces the correct a j given a previously unseen (e j, w j ). 35

Challenges Many different instructions for the same task – Describe different actions – Use different parameters – Different ways to describe the same parameters Hidden plans – Needs to infer the plans from observed actions Exponential number of possible plans 36

Observation Instruction World State Training Action Trace 37

Learning system for parsing navigation instructions Learning system for parsing navigation instructions Observation Instruction World State Training Action Trace Navigation Plan Constructor 37

Learning system for parsing navigation instructions Learning system for parsing navigation instructions Observation Instruction World State Training Action Trace Navigation Plan Constructor Semantic Parser Learner 37

Learning system for parsing navigation instructions Learning system for parsing navigation instructions Observation Instruction World State Training Action Trace Navigation Plan Constructor Semantic Parser Learner Plan Refinement 37

Learning system for parsing navigation instructions Learning system for parsing navigation instructions Observation Instruction World State Instruction World State Training Testing Action Trace Navigation Plan Constructor Semantic Parser Learner Plan Refinement 37

Learning system for parsing navigation instructions Learning system for parsing navigation instructions Observation Instruction World State Instruction World State Training Testing Action Trace Navigation Plan Constructor Semantic Parser Learner Plan Refinement Semantic Parser 37

Learning system for parsing navigation instructions Learning system for parsing navigation instructions Observation Instruction World State Execution Module (MARCO) Instruction World State Training Testing Action Trace Navigation Plan Constructor Semantic Parser Learner Plan Refinement Semantic Parser Action Trace 37

Constructing Navigation Plans Landmarks plan: Add interleaving verification steps Basic plan: Directly model the observed actions Verify Travel Turn Verify front: BLUE HALL front: BLUE HALL steps: 1 at: SOFA LEFT Travel Turn steps: 1 LEFT Instruction: Walk to the couch and turn left Action Trace: Forward, Left 38

Plan Refinement There are generally many ways to describe the same route. The plan refinement step aims to find the correct plan that corresponds to the instruction. Verify Travel Turn Verify LEFT steps: 2 at: SOFA front: SOFA front: BLUE HALL front: BLUE HALL 39

Plan Refinement Turn and walk to the couch Verify Travel Turn Verify LEFT steps: 2 at: SOFA front: SOFA front: BLUE HALL front: BLUE HALL 40

Plan Refinement Face the blue hall and walk 2 steps Verify Travel Turn Verify LEFT steps: 2 at: SOFA front: SOFA front: BLUE HALL front: BLUE HALL 41

Plan Refinement Turn left. Walk forward twice. Verify Travel Turn Verify LEFT steps: 2 at: SOFA front: SOFA front: BLUE HALL front: BLUE HALL 42

Plan Refinement Remove extraneous details in the plans First learn the meaning of words and short phrases Use the learned lexicon to remove parts of the plans unrelated to the instructions 43

Verify Travel Turn Verify LEFT steps: 2 at: SOFA front: SOFA front: BLUE HALL front: BLUE HALL Subgraph Generation Online Lexicon Learning (SGOLL) Turn and walk to the couch 1. As an example comes in, break down the sentence and the graph into n-grams and connected subgraphs 44

Verify Travel Turn Verify LEFT step s: 2 at: SOF A front : SOF A front : BLUE HALL front : BLUE HALL Subgraph Generation Online Lexicon Learning (SGOLL) Turn and walk to the couch turn, and, walk, to, the, couch 1-gram 2-gram 3-gram turn and, and walk, walk to, to the, the couch turn and walk, and walk to, walk to the, to the couch … Connected subgraph of size 1 Connected subgraph of size 2 Turn LEFT Verify … … Turn LEFT Verify Turn 45

Verify Travel Turn Verify LEFT step s: 2 at: SOF A front : SOF A front : BLUE HALL front : BLUE HALL Subgraph Generation Online Lexicon Learning (SGOLL) Turn and walk to the couch turn, and, walk, to, the, couch 1-gram 2-gram 3-gram turn and, and walk, walk to, to the, the couch turn and walk, and walk to, walk to the, to the couch … Turn LEFT Verify … … Turn LEFT Verify Turn Connected subgraph of size 1 Connected subgraph of size 2 45

Subgraph Generation Online Lexicon Learning (SGOLL) turn Turn … LEFT 2. Increase the counts and co-occurrence count of each n-gram, connected-subgraph pair. Hash the connected-subgraphs for efficient update. 48 22 26 Turn RIGHT 46

Subgraph Generation Online Lexicon Learning (SGOLL) turn Turn … LEFT 2. Increase the counts and co-occurrence count of each n-gram, connected-subgraph pair. Hash the connected-subgraphs for efficient update. 48+1 22+1 26 Turn RIGHT 46

Subgraph Generation Online Lexicon Learning (SGOLL) turn Turn … LEFT 2. Increase the counts and co-occurrence count of each n-gram, connected-subgraph pair. Hash the connected-subgraphs for efficient update. 49 23 26 Turn RIGHT 46

Subgraph Generation Online Lexicon Learning (SGOLL) turn Turn … LEFT 3. Rank the entries by the scoring function 0.56 0.31 0.27 Turn RIGHT 47

Refining Plans Using A Lexicon Verify Travel Turn Verify front: BLUE HALL front: BLUE HALL steps: 1 at: SOFA LEFT Walk to the couch and turn left Look for the highest-scoring lexical entry that matches both the words in the instruction and the nodes in the graph. 48

Refining Plans Using A Lexicon Walk to the couch and turn left Lexicon entry: turn left (score: 0.84) Turn LEFT Verify Travel Turn Verify front: BLUE HALL front: BLUE HALL steps: 1 at: SOFA LEFT 49

Refining Plans Using A Lexicon Walk to the couch and Lexicon entry: walk to the couch (score: 0.75) Verify Travel at: SOFA Verify Travel Turn Verify front: BLUE HALL front: BLUE HALL steps: 1 at: SOFA LEFT 50

Refining Plans Using A Lexicon and Lexicon exhausted Verify Travel Turn Verify front: BLUE HALL front: BLUE HALL steps: 1 at: SOFA LEFT 51

Refining Plans Using A Lexicon and Remove all unmarked nodes Verify Travel Turn at: SOFA LEFT 52

Refining Plans Using A Lexicon Walk to the couch and turn left Verify Travel Turn at: SOFA LEFT 53

Evaluation Data Statistics 3 maps, 6 instructors, 1-15 followers/direction Hand-segmented into single sentence steps Take the wood path towards the easel. At the easel, go left and then take a right on the the blue path at the corner. Follow the blue path towards the chair and at the chair, take a right towards the stool. When you reach the stool, you are at 7. Paragraph Single sentence Take the wood path towards the easel. At the easel, go left and then take a right on the the blue path at the corner. … 54 Turn, Forward, Turn left, Forward, Turn right, Forward x 3, Turn right, Forward Turn Forward, Turn left, Forward, Turn right

Evaluation Data Statistics ParagraphSingle-Sentence # Instructions7063236 Avg. # sentences5.0 (±2.8)1.0 (±0) Avg. # words37.6 (±21.1)7.8 (±5.1) Avg. # actions10.4 (±5.7)2.1 (±2.4) 55

Plan Construction Test how well the system infers the correct navigation plans Gold-standard plans annotated manually Use partial parse accuracy as metric – Credit for the correct action type (e.g. Turn) – Additional credit for correct arguments (e.g. LEFT) Lexicon learned and tested on the same data from two maps out of three 56

Plan Construction PrecisionRecallF1 Basic Plans81.4655.8866.27 Landmarks Plans45.4285.4659.29 Refined Landmarks Plans 87.3272.9679.49 57

Parse Accuracy Examine how well the learned semantic parsers can parse novel sentences Train semantic parsers on the inferred plans using KRISP [Kate and Mooney, 2006] Leave-one-map-out approach Use partial parse accuracy as metric Upper baseline: train on human annotated plans 58

Parse Accuracy PrecisionRecallF1 Basic Plans 85.3950.9563.76 Landmarks Plans 52.2050.4851.11 Refined Landmarks Plans 88.3657.0369.31 Human Annotated Plans 90.0976.2982.62 59

End-to-end Execution Test how well the system can perform the overall navigation task Leave-one-map-out approach Strict metric: Only successful if the final position matches exactly 60

End-to-end Execution Lower baseline – A simple generative model based on the frequency of actions alone Upper baselines – Training with human annotated gold plans – Complete MARCO system [MacMahon, 2006] – Humans 61

End-to-end Execution Single SentencesParagraphs Simple Generative Model 11.08%2.15% Basic Plans 58.72%16.21% Landmarks Plans 18.67%2.66% Refined Landmarks Plans 57.28%19.18% Human Annotated Plans 62.67%29.59% MARCO 77.87%55.69% Human Followers N/A69.64% 62

Example Parse Instruction: “Place your back against the wall of the ‘T’ intersection. Turn left. Go forward along the pink-flowered carpet hall two segments to the intersection with the brick hall. This intersection contains a hatrack. Turn left. Go forward three segments to an intersection with a bare concrete hall, passing a lamp. This is Position 5.” Parse:Turn ( ), Verify ( back: WALL ), Turn ( LEFT ), Travel ( ), Verify ( side: BRICK HALLWAY ), Turn ( LEFT ), Travel ( steps: 3 ), Verify ( side: CONCRETE HALLWAY ) 63

Mandarin Chinese Experiment Translated all the instructions from English to Chinese Train and test in the same way Chinese does not include word boundaries (spaces) Naively segment each character Use a trained Chinese word segmenter [Chang, Galley & Manning, 2008] 64

Mandarin Chinese Experiment Single SentencesParagraphs Refined Landmarks Plans (segment by character) 58.54%16.11% Refined Landmarks Plans (segment by Stanford segmenter) 58.70%20.13% 65

Collect Additional Training Data Using Mechanical Turk Needs 2 types of data – Navigation instructions – Action traces corresponding to those instructions Design 2 Human Intelligence Tasks (HITs) to collect them from Amazon’s Mechanical Turk 66

Instructor Task Shown a randomly generated simulation that contains 1 to 4 actions Workers asked to write an instruction that would lead someone to perform those actions – Intended to be single sentences, but not enforced 67

Follower Task Given an instruction and placed at the starting position Worker asked to follow the instruction as best as they can 68

Data Statistics Two-month period, 653 workers, $277.92 Over 2,400 instructions and 14,000 follower traces Filter so that instructions need to have > 5 followers and majority agreement on a single path 69 MTurkOriginal Data # Instructions10113236 Vocabulary size590629 Avg. # words7.69 (±7.12)7.8 (±5.1) Avg. # actions1.84 (±1.24)2.1 (±2.4)

Mechanical Turk Training Data Experiment Single SentencesParagraphs Refined Landmarks Plans 57.28%19.18% Refined Landmarks Plans (with additional training data from Mechanical Turk) 57.62%20.64% 70

System Improvements Tighter integration of the components Learning higher-order concepts Incorporating syntactic information Scaling the system Sportscasting task – More detailed meaning representation – Entertainment (variability, emotion, etc) Navigation task – Generating instructions – Dialogue system for more interactions 72

Long-term Directions Data acquisition framework – Let users of the system also provide training data Real perceptions – Use computer vision to recognize objects and events Machine Translation – Learn translation models from comparable corpora 73

Conclusion Traditional language learning approach relies on expensive, annotated training data. We have developed language learning systems that can learn by observing language used in a relevant, perceptual environment. We have devised two test applications that address two different scenarios: – Learning to sportscast (limited ambiguity) – Learning to follow navigation instructions (exponential ambiguity) 74

David L. Chen Supervisor: Professor Raymond J. Mooney Ph.D. Dissertation Defense January 25, 2012 Learning Language from Ambiguous Perceptual Context.

Similar presentations

Presentation on theme: "David L. Chen Supervisor: Professor Raymond J. Mooney Ph.D. Dissertation Defense January 25, 2012 Learning Language from Ambiguous Perceptual Context."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

David L. Chen Supervisor: Professor Raymond J. Mooney Ph.D. Dissertation Defense January 25, 2012 Learning Language from Ambiguous Perceptual Context.

Similar presentations

Presentation on theme: "David L. Chen Supervisor: Professor Raymond J. Mooney Ph.D. Dissertation Defense January 25, 2012 Learning Language from Ambiguous Perceptual Context."— Presentation transcript:

Similar presentations

About project

Feedback