Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing Raymond.

Similar presentations


Presentation on theme: "University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing Raymond."— Presentation transcript:

1 University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing Raymond J. Mooney Yuk Wah Wong Ruifang Ge Rohit Kate

2 2 Syntactic Natural Language Learning Most computational research in natural-language learning has addressed “low-level” syntactic processing. –Morphology (e.g. past-tense generation) –Part-of-speech tagging –Shallow syntactic parsing (chunking) –Syntactic parsing

3 3 Semantic Natural Language Learning Learning for semantic analysis has been restricted to relatively “shallow” meaning representations. –Word sense disambiguation (e.g. SENSEVAL) –Semantic role assignment (determining agent, patient, instrument, etc., e.g. FrameNet, PropBank) –Information extraction

4 4 Semantic Parsing A semantic parser maps a natural-language sentence to a complete, detailed semantic representation: logical form or meaning representation (MR). For many applications, the desired output is immediately executable by another program. Two application domains: –CLang: RoboCup Coach Language –GeoQuery: A Database Query Application

5 5 CLang: RoboCup Coach Language In RoboCup Coach competition teams compete to coach simulated players The coaching instructions are given in a formal language called CLang Simulated soccer field Coach CLang If the ball is in our penalty area, then all our players except player 4 should stay in our half. ((bpos (penalty-area our)) (do (player-except our{4}) (pos (half our))) Semantic Parsing

6 6 GeoQuery: A Database Query Application Query application for U.S. geography database containing about 800 facts [Zelle & Mooney, 1996] User How many states does the Mississippi run through? Query answer(A, count(B, (state(B), C=riverid(mississippi), traverse(C,B)), A)) Semantic Parsing

7 7 Learning Semantic Parsers Manually programming robust semantic parsers is difficult due to the complexity of the task. Semantic parsers can be learned automatically from sentences paired with their logical form. NL  MR Training Exs Semantic-Parser Learner Natural Language Meaning Rep Semantic Parser

8 8 Engineering Motivation Most computational language-learning research strives for broad coverage while sacrificing depth. –“Scaling up by dumbing down” Realistic semantic parsing currently entails domain dependence. Domain-dependent natural-language interfaces have a large potential market. Learning makes developing specific applications more tractable. Training corpora can be easily developed by tagging existing corpora of formal statements with natural-language glosses.

9 9 Cognitive Science Motivation Most natural-language learning methods require supervised training data that is not available to a child. –General lack of negative feedback on grammar. –No treebank or sense or semantic-role tagged data. Assuming a child can infer the likely meaning of an utterance from context, NL  MR pairs are more cognitively plausible training data.

10 10 Our Semantic-Parser Learners CHILL+WOLFIE (Zelle & Mooney, 1996; Thompson & Mooney, 1999, 2003) –Separates parser-learning and semantic-lexicon learning. –Learns a deterministic parser using ILP techniques. COCKTAIL (Tang & Mooney, 2001) –Improved ILP algorithm for CHILL. SILT (Kate, Wong & Mooney, 2005) –Learns symbolic transformation rules for mapping directly from NL to MR. SCISSOR (Ge & Mooney, 2005) –Integrates semantic interpretation into Collins’ statistical syntactic parser. WASP (Wong & Mooney, 2006) –Uses syntax-based statistical machine translation methods. KRISP (Kate & Mooney, 2006) –Uses a series of SVM classifiers employing a string-kernel to iteratively build semantic representations.

11 11 GeoQuery On-Line Demo http://www.cs.utexas.edu/users/ml/geo.html

12 12 Based on a fairly standard approach to compositional semantics [Jurafsky and Martin, 2000] A statistical parser is used to generate a semantically augmented parse tree (SAPT) –Augment Collins’ head-driven model 2 to incorporate semantic labels Translate SAPT into a complete formal meaning representation (MR) SCISSOR : Semantic Composition that Integrates Syntax and Semantics to get Optimal Representations MR: bowner(player(our,2)) ourplayer2has theball PRP$-teamNN-playerCD-unumVB-bowner DT-nullNN-null NP-null VP-bownerNP-player S-bowner ourplayer2has theball PRP$-teamNN-playerCD-unumVB-bowner DT-nullNN-null NP-null VP-bownerNP-player S-bowner

13 13 Overview of S CISSOR Integrated Semantic Parser SAPT Training Examples TRAINING SAPT ComposeMR MR NL Sentence TESTING learner

14 14 SCISSOR SAPT Parser Implementation Semantic labels added to Bikel’s (2004) open- source version of the Collins statistical parser. Head-driven derivation of production rules augmented to also generate semantic labels. Parameter estimates during training employ an augmented smoothing technique to account for additional data sparsity created by semantic labels. Parsing of test sentences to find the most probable SAPT is performed using a standard beam-search constrained version of CKY chart-parsing algorithm.

15 15 ComposeMR ourplayer 2 has theball teamplayerunumbowner null bownerplayer bowner

16 16 ComposeMR ourplayer 2 has theball teamplayer(_,_)unumbowner(_) null bowner(_)player(_,_) bowner(_)

17 17 ComposeMR ourplayer 2 has theball teamplayer(_,_)unumbowner(_) null player(_,_) bowner(_) null bowner(_) null player(_,_) player(team,unum) player(our,2) bowner(_) bowner(player) bowner(player(our,2))

18 18 WASP A Machine Translation Approach to Semantic Parsing Based on a semantic grammar of the natural language. Uses machine translation techniques –Synchronous context-free grammars (SCFG) (Wu, 1997; Melamed, 2004; Chiang, 2005) –Word alignments (Brown et al., 1993; Och & Ney, 2003) Hence the name: Word Alignment-based Semantic Parsing

19 19 Synchronous Context-Free Grammars (SCFG) Developed by Aho & Ullman (1972) as a theory of compilers that combines syntax analysis and code generation in a single phase Generates a pair of strings in a single derivation

20 20 Compiling, Machine Translation, and Semantic Parsing SCFG: formal language to formal language (compiling) Alignment models: natural language to natural language (machine translation) WASP: natural language to formal language (semantic parsing)

21 21 QUERY  What is CITY CITY  the capital CITY CITY  of STATE STATE  Ohio Context-Free Semantic Grammar Ohio of STATE QUERY CITY What is CITY the capital

22 22 QUERY  What is CITY / answer(CITY) Productions of Synchronous Context-Free Grammars Referred to as transformation rules in Kate, Wong & Mooney (2005) patterntemplate

23 23 STATE  Ohio / stateid('ohio') QUERY  What is CITY / answer(CITY) CITY  the capital CITY / capital(CITY) CITY  of STATE / loc_2(STATE) What is the capital of Ohio Synchronous Context-Free Grammars Ohio of STATE QUERY CITY What is QUERY answer ( CITY ) capital ( CITY ) loc_2 ( STATE ) stateid ( 'ohio' ) answer(capital(loc_2(stateid('ohio')))) CITY the capital

24 24 N (non-terminals) = {QUERY, CITY, STATE, …} S (start symbol) = QUERY T m (MRL terminals) = {answer, capital, loc_2, (, ), …} T n (NL words) = {What, is, the, capital, of, Ohio, …} L (lexicon) = λ (parameters of probabilistic model) = ? Parsing Model of WASP STATE  Ohio / stateid('ohio') QUERY  What is CITY / answer(CITY) CITY  the capital CITY / capital(CITY) CITY  of STATE / loc_2(STATE)

25 25 Probabilistic Parsing Model Ohio of STATE CITY capital ( CITY ) loc_2 ( STATE ) stateid ( 'ohio' ) capital CITY STATE  Ohio / stateid('ohio') CITY  capital CITY / capital(CITY) CITY  of STATE / loc_2(STATE) d1d1

26 26 Probabilistic Parsing Model Ohio of RIVER CITY capital ( CITY ) loc_2 ( RIVER ) riverid ( 'ohio' ) capital CITY RIVER  Ohio / riverid('ohio') CITY  capital CITY / capital(CITY) CITY  of RIVER / loc_2(RIVER) d2d2

27 27 CITY capital ( CITY ) loc_2 ( STATE ) stateid ( 'ohio' ) Probabilistic Parsing Model CITY capital ( CITY ) loc_2 ( RIVER ) riverid ( 'ohio' ) STATE  Ohio / stateid('ohio') CITY  capital CITY / capital(CITY) CITY  of STATE / loc_2(STATE) RIVER  Ohio / riverid('ohio') CITY  capital CITY / capital(CITY) CITY  of RIVER / loc_2(RIVER) 0.5 0.3 0.5 0.05 0.5 λλ 1.31.05 ++ Pr(d 1 |capital of Ohio) = exp( ) / ZPr(d 2 |capital of Ohio) = exp( ) / Z d1d1 d2d2 normalization constant

28 28 N (non-terminals) = {QUERY, CITY, STATE, …} S (start symbol) = QUERY T m (MRL terminals) = {answer, capital, loc_2, (, ), …} T n (NL words) = {What, is, the, capital, of, Ohio, …} L (lexicon) = λ (parameters of probabilistic model) Parsing Model of WASP STATE  Ohio / stateid('ohio') QUERY  What is CITY / answer(CITY) CITY  the capital CITY / capital(CITY) CITY  of STATE / loc_2(STATE)

29 29 Overview of WASP Lexical acquisition Parameter estimation Semantic parsing Unambiguous CFG of MRL Training set, {(e,f)} Lexicon, L Parsing model parameterized by λ Input sentence, e' Output MR, f' Training Testing

30 30 Lexical Acquisition Transformation rules are extracted from word alignments between an NL sentence, e, and its correct MR, f, for each training example, (e, f)

31 31 Word Alignments A mapping from French words to their meanings expressed in English And the program has been implemented Le programme a été mis en application

32 32 Lexical Acquisition Train a statistical word alignment model (IBM Model 5) on training set Obtain most probable n-to-1 word alignments for each training example Extract transformation rules from these word alignments Lexicon L consists of all extracted transformation rules

33 33 Word Alignment for Semantic Parsing How to introduce syntactic tokens such as parens? ( ( true ) ( do our { 1 } ( pos ( half our ) ) ) ) The goalie should always stay in our half

34 34 Use of MRL Grammar The goalie should always stay in our half RULE  (CONDITION DIRECTIVE) CONDITION  (true) DIRECTIVE  (do TEAM {UNUM} ACTION) TEAM  our UNUM  1 ACTION  (pos REGION) REGION  (half TEAM) TEAM  our top-down, left-most derivation of an un- ambiguous CFG n-to-1

35 35 TEAM Extracting Transformation Rules The goalie should always stay in our half RULE  (CONDITION DIRECTIVE) CONDITION  (true) DIRECTIVE  (do TEAM {UNUM} ACTION) TEAM  our UNUM  1 ACTION  (pos REGION) REGION  (half TEAM) TEAM  our TEAM  our / our

36 36 REGION TEAM REGION  TEAM half / (half TEAM) Extracting Transformation Rules The goalie should always stay in half RULE  (CONDITION DIRECTIVE) CONDITION  (true) DIRECTIVE  (do TEAM {UNUM} ACTION) TEAM  our UNUM  1 ACTION  (pos REGION) REGION  (half TEAM) TEAM  our REGION  (half our)

37 37 ACTION ACTION  (pos (half our)) REGION ACTION  stay in REGION / (pos REGION) Extracting Transformation Rules The goalie should always stay in RULE  (CONDITION DIRECTIVE) CONDITION  (true) DIRECTIVE  (do TEAM {UNUM} ACTION) TEAM  our UNUM  1 ACTION  (pos REGION) REGION  (half our)

38 38 Based on maximum-entropy model: Features f i (d) are number of times each transformation rule is used in a derivation d Output translation is the yield of most probable derivation Probabilistic Parsing Model

39 39 Parameter Estimation Maximum conditional log-likelihood criterion Since correct derivations are not included in training data, parameters λ * are learned in an unsupervised manner EM algorithm combined with improved iterative scaling, where hidden variables are correct derivations (Riezler et al., 2000)

40 40 KRISP: Kernel-based Robust Interpretation by Semantic Parsing Learns semantic parser from NL sentences paired with their respective MRs given MRL grammar Productions of MRL are treated like semantic concepts SVM classifier is trained for each production with string subsequence kernel These classifiers are used to compositionally build MRs of the sentences

41 41 Experimental Corpora CLang –300 randomly selected pieces of coaching advice from the log files of the 2003 RoboCup Coach Competition –22.52 words on average in NL sentences –14.24 tokens on average in formal expressions GeoQuery [Zelle & Mooney, 1996] –250 queries for the given U.S. geography database –6.87 words on average in NL sentences –5.32 tokens on average in formal expressions –Also translated into Spanish, Turkish, & Japanese.

42 42 Experimental Methodology Evaluated using standard 10-fold cross validation Correctness –CLang: output exactly matches the correct representation –Geoquery: the resulting query retrieves the same answer as the correct representation Metrics

43 43 Precision Learning Curve for CLang

44 44 Recall Learning Curve for CLang

45 45 Precision Learning Curve for GeoQuery

46 46 Recall Learning Curve for Geoquery

47 47 Precision Learning Curve for GeoQuery (WASP)

48 48 Recall Learning Curve for GeoQuery (WASP)

49 49 Tactical Natural Language Generation Mapping a formal MR into NL Can be done using statistical machine translation –Previous work focuses on using generation in interlingual MT (Hajič et al., 2004) –There has been little, if any, research on exploiting statistical MT methods for generation

50 50 Tactical Generation Can be seen as inverse of semantic parsing ((true) (do our {1} (pos (half our)))) The goalie should always stay in our half Semantic parsing Tactical generation

51 51 Tactical generation: Generation by Inverting WASP Same synchronous grammar is used for both generation and semantic parsing QUERY  What is CITY / answer(CITY) NL:MRL: InputOutput Semantic parsing:

52 52 Generation by Inverting WASP Same procedure for lexical acquisition Chart generator very similar to chart parser, but treats MRL as input Log-linear probabilistic model inspired by Pharaoh (Koehn et al., 2003), a phrase-based MT system Uses a bigram language model for target NL Resulting system is called WASP -1

53 53 Geoquery (NIST score; English)

54 54 RoboCup (NIST score; English) contiguous phrases only Similar human evaluation results in terms of fluency and adequacy

55 55 Future Work Explore methods that can automatically generate SAPTs to minimize the annotation effort for S CISSOR. Learning semantic parsers just from sentences paired with “perceptual context.”

56 56 Conclusions Learning semantic parsers is an important and challenging problem in natural-language learning. We have obtained promising results on several applications using a variety of approaches with different strengths and weaknesses. One of our semantic parsers has been inverted to produce a generation system. Not many others have explored this problem, I would encourage others to consider it. More and larger corpora are needed for training and testing semantic parser induction.

57 57 Thank You! Our papers on learning semantic parsers are on-line at: http://www.cs.utexas.edu/~ml/publication/lsp.html Our corpora can be downloaded from: http://www.cs.utexas.edu/~ml/nldata.html Try our GeoQuery demo at: http://www.cs.utexas.edu/~ml/geo.html Questions??

58 58 PR Curves SCISSOR, WASP, and KRISP give probabilities for their semantic derivations which are taken as confidences of the MRs We plot precision-recall curves (PR Curves) at the last points of the learning curves by first sorting the best MR for each sentence by confidences and then finding precision for every recall value The result of COCKTAIL on GeoQuery is shown as a point on the PR Curve, while its result on CLang is not shown since it failed to run at the last point of the learning curve.

59 59 PR Curve for CLang

60 60 PR Curve for GeoQuery

61 61 Precision Learning Curve for GeoQuery (880)

62 62 Recall Learning Curve for Geoquery (880)

63 63 PR Curve for GeoQuery (880)

64 64 Precision Learning Curve for GeoQuery (WASP with lambda-calculus)

65 65 Recall Learning Curve for GeoQuery (WASP with lambda-calculus)


Download ppt "University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing Raymond."

Similar presentations


Ads by Google