University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Semantic Parsers Using Statistical.

Slides:



Advertisements
Similar presentations
Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Advertisements

Albert Gatt Corpora and Statistical Methods Lecture 11.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
1 Natural Language Processing COMPSCI 423/723 Rohit Kate.
1 Relational Learning of Pattern-Match Rules for Information Extraction Presentation by Tim Chartrand of A paper bypaper Mary Elaine Califf and Raymond.
1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Language Semantics from Ambiguous Supervision Rohit J. Kate.
A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science.
Spring /22/071 Beyond PCFGs Chris Brew Ohio State University.
1 Learning for Semantic Parsing Using Statistical Syntactic Parsing Techniques Ruifang Ge Ph.D. Final Defense Supervisor: Raymond J. Mooney Machine Learning.
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.
An Extended GHKM Algorithm for Inducing λ-SCFG Peng Li Tsinghua University.
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing Raymond.
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning a Compositional Semantic Parser.
Supertagging CMSC Natural Language Processing January 31, 2006.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Natural Language Generation with Tree Conditional Random Fields Wei Lu, Hwee Tou Ng, Wee Sun Lee Singapore-MIT Alliance National University of Singapore.
Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore
NTU & MSRA Ming-Feng Tsai
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing of Natural.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing of Natural.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
A Kernel-based Approach to Learning Semantic Parsers
CSC 594 Topics in AI – Natural Language Processing
Approaches to Machine Translation
PRESENTED BY: PEAR A BHUIYAN
Authorship Attribution Using Probabilistic Context-Free Grammars
Semantic Parsing for Question Answering
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
Using String-Kernels for Learning Semantic Parsers
Learning to Transform Natural to Formal Languages
CS 388: Natural Language Processing: Statistical Parsing
Probabilistic and Lexicalized Parsing
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Learning to Parse Database Queries Using Inductive Logic Programming
Learning to Sportscast: A Test of Grounded Language Acquisition
Approaches to Machine Translation
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
David Kauchak CS159 – Spring 2019
Statistical NLP: Lecture 10
Presentation transcript:

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Semantic Parsers Using Statistical Syntactic Parsing Techniques February 8, 2006 Ruifang Ge Supervisor Professor: Raymond J. Mooney

2 Semantic Parsing Semantic Parsing: maps a natural-language sentence to a complete, detailed and formal meaning representation (MR) in a meaning representation language Applications –Core component in practical spoken language systems: JUPITER (MIT weather talk) MERCURY (MIT flight MIT-talk) –Advice taking (Kuhlmann et al., 2004)

3 CLang: RoboCup Coach Language In RoboCup Coach competition teams compete to coach simulated players The coaching instructions are given in a formal language called CLang Simulated soccer field Coach CLang If our player 2 has the ball, our player 4 should stay in our half ((bowner our {2}) (do our {4} (pos (half our)))) Semantic Parsing

4 Motivating Example Semantic parsing is a compositional process. Sentence structures are needed for building meaning representations. ((bowner our {2}) (do our {4} (pos (half our)))) If our player 2 has the ball, our player 4 should stay in our half

5 Roadmap Related work on semantic parsing SCISSOR Experimental results Proposed work Conclusions

6 Category I: Syntax-Based Approaches Meaning composition follows the tree structure of a syntactic parse Composing the meaning of a constituent from the meanings of its sub-constituents in a syntactic parse –specified using syntactic relations and semantic constraints in application domains Miller et al. (1996), Zettlemoyer & Collins (2005)

7 Category I: Example ourplayer 2 has theball PRP$-ourNN-player(_,_)CD-2VB-bowner(_) DT-nullNN-null NP-null VP-bowner(_)NP-player(our,2) S-bowner(player(our,2)) player(team,unum) semantic vacuous require argumentsrequire no arguments bowner(player)

8 Category I: Example ourplayer 2 has theball PRP$-ourNN-player(_,_)CD-2VB-bowner(_) DT-nullNN-null NP-null VP-bowner(_) S-bowner(player(our,2)) NP-player(our,2) player(team,unum) bowner(player)

9 Category I: Example ourplayer 2 has theball PRP$-ourNN-player(_,_)CD-2VB-bowner(_) DT-nullNN-null NP-null VP-bowner(_)NP-player(our,2) S-bowner(player(our,2)) player(team,unum) bowner(player)

10 Category II: Purely Semantic-Driven Approaches No syntactic information is used in building tree structures Non-terminals in this category correspond to semantic concepts in application domains Tang & Mooney (2001), Kate (2005), Wong(2005)

11 Category II: Example ourplayer2 hastheball our2 player bowner

12 Category III: Hybrid Approaches Utilizing syntactic information in semantic parsing approaches driven by semantics –Syntactic phrase boundaries –syntactic category of semantic concepts –word dependencies Kate, Wong & Mooney (2005)

13 Our Approach We introduce an approach falling into category I: a syntax-driven approach Reason –Employ state-of-the-art statistical syntactic parsing techniques to help building tree structures for meaning composition –State-of-the-art statistical parsing techniques are becoming more and more robust and accurate [Collins (1997) and Charniak & Johnson (2005)]

14 Roadmap Related work on semantic parsing SCISSOR Experimental results Proposed work Conclusions

15 SCISSOR: Semantic Composition that Integrates Syntax and Semantics to get Optimal Representations

16 An integrated syntax-based approach –Allows both syntax and semantics to be used simultaneously to build meaning representations A statistical parser is used to generate a semantically augmented parse tree (SAPT) Translate a SAPT into a complete formal meaning representation (MR) using a meaning composition process SCISSOR MR: bowner(player(our,2)) ourplayer2has theball PRP$-teamNN-playerCD-unumVB-bowner DT-nullNN-null NP-null VP-bownerNP-player S-bowner

17 An integrated syntax-based approach –Allows both syntax and semantics to be used simultaneously to build meaning representations A statistical parser is used to generate a semantically augmented parse tree (SAPT) Translate a SAPT into a complete formal meaning representation (MR) using a meaning composition process Allow statistical modeling of semantic selectional constraints in application domains –( AGENT pass) = PLAYER SCISSOR

18 Overview of S CISSOR Integrated Semantic Parser SAPT Training Examples TRAINING SAPT ComposeMR MR NL Sentence TESTING learner

19 Extending Collins’ (1997) Syntactic Parsing Model Collins’ (1997) introduced a lexicalized head- driven syntactic parsing model Bikel’s (2004) provides an easily-extended open- source version of the Collins statistical parser Extending the parsing model to generate semantic labels simultaneously with syntactic labels constrained by semantic constraints in application domains

20 Example: Probabilistic Context Free Grammar (PCFG) PRP$NNCDVB DTNN NP VPNP S ourplayer2has theball S  NP VP 0.4 NP  PRP$ NN CD 0.06 VP  VB NP 0.3 PRP$  our 0.01 NN  player CD  VB  has 0.02 NN  ball 0.01 DT  the 0.1 P(Tree, S) = 0.4*0.06*0.3*…*0.01 Probability of rules are independent Of words involved

21 Example: Lexicalized PCFG PRP$NNCDVB DTNN NP VPNP S ourplayer2has theball PRP$ NN CD VB DT NN NP(ball) VP(has) NP(player) S(has) ourplayer2has theball Nodes in purple are heads of the rules

22 Example: Estimating Rule Probability P (NP(player) VP(has) | S(has)) VP(has) NP(player) S(has) = P (VP(has) | S(has)) × P (NP(player) | S(has) VP(has) ) Decompose expansion of a non-terminal into primitive steps In Collins’ model, syntactic subcategorization frames are used to constrain the generation of modifiers, e.g., has requires an NP as its subject

23 Integrating Semantics into the Model PRP$-teamNN-nullCD-unumVB-bowner DT-nullNN-null NP-null(ball) VP-bowner(has)NP-player(player) S-bowner(has) ourplayer2has theball Non-terminals now have both syntactic and semantic labels

24 Estimating Rule Probability Including Semantic Labels S-bowner(has) VP-bowner(has) P h (VP-bowner | S-bowner, has)

25 S-bowner(has) VP-bowner(has) P lc ({NP}-{player} | S-bowner, VP-bowner, has)× P rc ({}-{}| S-bowner, VP-bowner, has) P h (VP-bowner | S-bowner, has) × {NP}-{player}{ }-{ } Estimating Rule Probability Including Semantic Labels has requires an NP as its object, but it’s within VP {NP}: syntactic constraint to the left {player}: semantic constraint to the left

26 P d (NP-player(player) | S-bowner, VP-bowner, has, LEFT, {NP}-{player}) P lc ({NP}-{player} | S-bowner, VP-bowner, has)× P rc ({}-{}| S-bowner, VP-bowner, has) × P h (VP-bowner | S-bowner, has) × NP-player(player) S-bowner(has) VP-bowner(has) {NP}-{player}{ }-{ } Estimating Rule Probability Including Semantic Labels

27 P d (NP-player(player) | S-bowner, VP-bowner, has, LEFT, {NP}-{player}) P lc ({NP}-{player} | S-bowner, VP-bowner, has)× P rc ({}-{}| S-bowner, VP-bowner, has) × P h (VP-bowner | S-bowner, has) × S-bowner(has) VP-bowner(has)NP-player(player) { }-{ } Estimating Rule Probability Including Semantic Labels

28 Parser Implementation Supervised training on annotated SAPTs is just frequency counting Augmented smoothing technique is employed to account for additional data sparsity created by semantic labels. Parsing of test sentences to find the most probable SAPT is performed using a variant of standard CKY chart-parsing algorithm.

29 Roadmap Related work on semantic parsing SCISSOR Experimental results Proposed work Conclusions

30 Experimental Results: Experimental Corpora CLang –300 randomly selected rules from the log files of the 2003 RoboCup Coach Competition –Coaching advice is annotated with NL sentences by 4 annotators independently –22.52 words per sentence GeoQuery [Zelle & Mooney, 1996] –250 queries for U.S. geography database –6.87 words per sentence

31 Experimental Methodology Evaluated using standard 10-fold cross validation Correctness –CLang: output exactly matches the correct representation –Geoquery: query retrieves correct answer

32 Experimental Methodology Metrics

33 Compared Systems COCKTAIL (Tang & Mooney, 2001) –A purely semantic-driven approach which learns a shift-reduce deterministic parser using inductive logic programming techniques WASP (Wong, 2005) –A purely semantic-driven approach using machine translation techniques KRISP (Kate, 2005) –A purely semantic-driven approach based on string kernel The above systems all learn from sentences paired with meaning representations SCISSOR need extra annotation (SAPTs)

34 Precision Learning Curve for CLang deterministic parsing memory overflow

35 Recall Learning Curve for CLang 12

36 F-measure Learning Curve for CLang Significantly better at the 95% confidence interval

37 Results on Sentences within Different Length Range How does sentence complexity affect parsing performance Sentence complexity is a difficult thing to measure Use sentence length as an indicator

38 Sentence Length Distribution (CLang)

39 Detailed CLang Results on Sentence Length Syntactic structure is needed on longer sentences where using semantic constraints alone can not sufficiently eliminate ambiguities

40 Precision Learning Curve for GeoQuery

41 Recall Learning Curve for GeoQuery

42 F-measure Learning Curve for GeoQuery Not significantly better at the 95% confidence interval

43 Zettlemoyer & Collins (2005) It introduces a syntax-based semantic parser based on combinatory categorical grammar (CCG) (Steedman, 2000) Require a set of hand-built rules to specify possible syntactic categories for each type of semantic concepts

44 Zettlemoyer & Collins (2005) Provide results on a larger GeoQuery dataset (880 examples): –Using a different experimental setup –Prec/Recall: 96.25/79.29 (SCISSOR Prec/Recall: 92.08/72.27 ) Performance on more complex domains such as CLang is not clear –Need to design another set of hand-built template rules

45 Roadmap Related work on semantic parsing SCISSOR Experimental results Proposed work –Discriminative Reranking for Semantic Parsing –Automating the SAPT-Generation –Other issues Conclusions

46 Reranking for Semantic Parsing Reranker SAPTs after Reranking S3 S1 S2 S4 SCISSOR Input Sentence Current Ranked SAPTs S1 S2 S3 S4 local features global features Reranking has been successfully used in parsing, tagging, machine translation, …

47 Reranking Features Collins (2000) introduces syntactic features for reranking syntactic parses –One level rules: f( NP  PRP$ NN CD)=1 –Bigrams, two level rules, … To reranking SAPTs, we can introduce a semantic feature type for each syntactic feature type –Based on the coupling of syntax and semantics –Example: one level rules f(PLAYER  TEAM PLAYER UNUM)=1 NP-PLAYER NN-PLAYERPRP$-TEAMCD-UNUM

48 Reranking Evaluation Rerank on top 50 best parses generated by SCISSOR Reranking algorithm: averaged perceptron (Collins, 2002) –Simple, fast and effective CLang PRF SCISSOR Oracle score sem (14.0) syn sem+syn Significantly better Reranking does not improve the results on GeoQuery

49 Further Investigation of Reranking Features Semantic Role Labeling (SRL) features –Identifying the semantic relations, or semantic roles of a target word in a given sentence [ giver John] gave [ entity given to Mary] [ thing given a pen]

50 Roadmap Related work on semantic parsing SCISSOR Experimental results Proposed work –Discriminative Reranking for Semantic Parsing –Automating the SAPT-Generation –Other issues Conclusions

51 Discriminative Learner Syntactic Parser Training set {(NL, MR)} Training set {(NL, MR, SynT )} Training set {(NL, MR, SAPT )} Automating the SAPT-Generation NL: natural language sentence MR: meaning representation SynT: syntactic parse tree SAPT: semantically-augmented parse tree SCISSOR Correct SAPTs are not available, Only MRs are available

52 Step 1: Obtaining Automatic Syntactic Parses Automatically generated syntactic parses have been used successfully in many NLP tasks High performance parsers –Collins(1997), Charniak(2000), Hockenmaier & Steedman(2000) Charniak & Johnson (2005) reported the highest F-measure on parsing the Penn Treebank: 91.02%

53 Syntactic F-measure Learning Curve for CLang statistics inherent in application reduce generalization error

54 Syntactic F-measure Learning Curve for GeoQuery

55 Step 2: Discriminating Good SAPTs from Bad SAPTs Generating candidate SAPTs given a syntactic parse tree –Initialize each word with its candidate semantic labels using co-occurrence measures, word alignment systems, or dictionary learning methods –Label non-terminals with semantic labels passed up from one of its children using a function of compositional semantics recursively

56 Discriminative features: semantic labels of words, predicate-argument pairs, … Maximum Entropy (ME) models can be used on learning on incomplete data (Reizler 2002) –Acquire empirical statistics required for training a ME model from SAPTs that lead to correct MRs as correct The training process is still integrated, because syntactic parse trees which cannot lead to correct MRs will be rejected. An alternative syntactic parse tree can be provided by the parser. Step 2: Discriminating Good SAPTs from Bad SAPTs (Cont.)

57 Roadmap Related work on semantic parsing SCISSOR Experimental results Proposed work –Discriminative Reranking for Semantic Parsing –Automating the SAPT-Generation –Other issues Conclusions

58 Future Work: Other Issues Apply to other application domains –Air Travel Information Service (ATIS) data [price 1990] Investigate parsers in CCG formalism (Hockenmaier & Steedman 2002, Clark & Curran 2004) –Elegant treatment of a variety of linguistic phenomena Compare WASP, KRISP, SCISSOR trained on the same amount of supervision –Sentences annotated with tree structures –Sentences only paired with MRs

59 Conclusions Introduced SCISSOR for semantic parsing Evaluated on two real-world corpora Produced more accurate semantic representations than other approaches, especially on long sentences Future work: –Discriminative reranking for semantic parsing –Automating the SAPT-generation –Other issues

60 Thank You! Questions?