Natural Language Generation with Tree Conditional Random Fields Wei Lu, Hwee Tou Ng, Wee Sun Lee Singapore-MIT Alliance National University of Singapore.

Slides:

Advertisements

Similar presentations

Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.

Advertisements

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.

Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Semantic Parsers Using Statistical.

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.

1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.

Statistical NLP: Lecture 11

Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.

Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.

A Markov Random Field Model for Term Dependencies Donald Metzler and W. Bruce Croft University of Massachusetts, Amherst Center for Intelligent Information.

Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.

A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.

Conditional Random Fields

Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.

Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.

1 Learning to Interpret Natural Language Navigation Instructions from Observation Ray Mooney Department of Computer Science University of Texas at Austin.

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Language Semantics from Ambiguous Supervision Rohit J. Kate.

Graphical models for part of speech tagging

Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.

Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.

1 Learning for Semantic Parsing Using Statistical Syntactic Parsing Techniques Ruifang Ge Ph.D. Final Defense Supervisor: Raymond J. Mooney Machine Learning.

A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.

Better Punctuation Prediction with Dynamic Conditional Random Fields Wei Lu and Hwee Tou Ng National University of Singapore.

Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.

Discriminative Models for Spoken Language Understanding Ye-Yi Wang, Alex Acero Microsoft Research, Redmond, Washington USA ICSLP 2006.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart

Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.

Presented by Jian-Shiun Tzeng 5/7/2009 Conditional Random Fields: An Introduction Hanna M. Wallach University of Pennsylvania CIS Technical Report MS-CIS

A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪

August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.

Deep Learning for Efficient Discriminative Parsing Niranjan Balasubramanian September 2 nd, 2015 Slides based on Ronan Collobert’s Paper and video from.

Evaluating Models of Computation and Storage in Human Sentence Processing Thang Luong CogACLL 2015 Tim J. O’Donnell & Noah D. Goodman.

MaskIt: Privately Releasing User Context Streams for Personalized Mobile Applications SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning a Compositional Semantic Parser.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.

Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.

Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Sofus A. Macskassy Fetch Technologies

Semantic Parsing for Question Answering

Relation Extraction CSCI-GA.2591

Joint Training for Pivot-based Neural Machine Translation

Improving a Pipeline Architecture for Shallow Discourse Parsing

Using String-Kernels for Learning Semantic Parsers

Learning to Transform Natural to Formal Languages

A Markov Random Field Model for Term Dependencies

Learning to Parse Database Queries Using Inductive Logic Programming

Learning to Sportscast: A Test of Grounded Language Acquisition

Unsupervised Pretraining for Semantic Parsing

Visual Recognition of American Sign Language Using Hidden Markov Models 문현구 문현구.

Presentation transcript:

Natural Language Generation with Tree Conditional Random Fields Wei Lu, Hwee Tou Ng, Wee Sun Lee Singapore-MIT Alliance National University of Singapore

Generating Natural Language (NL) paraphrases for Meaning Representations (MR) Natural Language Generation How many states do not have rivers ? 2 Natural Language Sentence Meaning Representation … … ………… … Lu, Ng & Lee 6 August 2009, EMNLP09

Meaning Representation (MR) 3 do nothavestatesrivershow many? QUERY:answer(NUM) NUM:count(STATE) STATE:exclude(STATE STATE) STATE:state(all)STATE:loc_1(RIVER) RIVER:river(all) Lu, Ng & Lee 6 August 2009, EMNLP09

Previous Work Chart Generation for Surface Realization –Head-Driven Phrase Structure Grammar (HPSG) (Carroll et al., 1999; Carroll and Oepen, 2005; Nakanishi et al., 2005) –Combinatory Categorial Grammar (CCG) (White and Baldridge, 2003; White, 2004). W ASP by Wong and Mooney (2007) –View the problem as a statistical machine translation task –Inversion of a semantic parser called W ASP, with incorporation of models borrowed from P HARAOH 4 Lu, Ng & Lee 6 August 2009, EMNLP09

Hybrid Tree Framework Aims to bridge natural language sentences and their underlying meaning representations On top of the framework, we built a generative model that jointly generates both natural language and MR tree Details presented in our EMNLP 2008 paper for semantic parsing –Wei Lu, Hwee Tou Ng, Wee Sun Lee, and Luke S. Zettlemoyer A Generative Model for Parsing Natural Language to Meaning Representations. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), pages 783– Lu, Ng & Lee 6 August 2009, EMNLP09

Hybrid Tree do nothavestatesrivershow many? QUERY:answer(NUM) NUM:count(STATE) STATE:exclude(STATE STATE) STATE:state(all)STATE:loc_1(RIVER) RIVER:river(all) 6 NL-MR Pair Hybrid sequences Lu, Ng & Lee 6 August 2009, EMNLP09

The Joint Generative Model Assume the MR tree and NL sentence are jointly generated with a top-down recursive Markov process Able to handle re-ordering of nodes (MR productions) in MR tree during generation process Generation process results in a hybrid tree Shown effective in semantic parsing 7 Lu, Ng & Lee 6 August 2009, EMNLP09

NLG with Hybrid Trees The most probable NL w for a given MR m is: 1. Find the most probable hybrid tree T: T* = argmax T p(T|m) 2. The most probable NL sentence w is the yield of the hybrid tree T*: w* = yield(T*) Different assumptions can be made for finding the most probable hybrid tree T Two models: –Baseline: Direct Inversion Model –Tree Conditional Random Field Model 8 Lu, Ng & Lee 6 August 2009, EMNLP09

QUERY:answer(NUM) NUM:count(STATE) STATE:exclude(STATE STATE) STATE:state(all)STATE:loc_1(RIVER) RIVER:river(all) 9 do not states have rivers how many? Lu, Ng & Lee 6 August 2009, EMNLP09 Direct Inversion Model NUM STATE STATE 1 STATE 2 RIVER

Direct Inversion Model Direct inversion of the semantic parser –The distance d β (w 1,w 2 ) = -log θ(w 2 | β,w 1 ), which is non-negative –Find the most probable sequence from BEGIN to END –Problem equivalent to shortest path problem 10 manySTATE 1 β: NUM:count(STATE) howBEGINEND d β (BEGIN,how)d β (how,many)d β (many,STATE 1 )d β (STATE 1,END) Lu, Ng & Lee 6 August 2009, EMNLP09 thenumberof STATE 1 d β (BEGIN,the) d β (the,number) d β (number,of) d β (of,STATE 1 ) d β (STATE 1,END) ……

Direct Inversion Model Problems with the Direct Inversion Model –Strong independence assumptions –Always generates the same phrase below the same MR production, regardless of context MR productions –Modeling dependencies at word level only Need to model dependencies between adjacent hybrid sequences 11 Lu, Ng & Lee 6 August 2009, EMNLP09

Tree Conditional Random Fields (CRF) Model –Generate complete phrases instead of words –Explicitly model dependencies between adjacent phrases NLG with Hybrid Trees 12 NUM:count(STATE) BEGIN how many STATE 1 END STATE:exclude(STATE STATE) BEGIN STATE 1 do not STATE 2 END STATE:loc_1(RIVER) BEGIN have RIVER 1 END STATE:state(all) BEGIN states END Lu, Ng & Lee 6 August 2009, EMNLP09

Tree CRF Model 13 QUERY:answer(RIVER) RIVER:longest(RIVER) RIVER:exclude(RIVER 1,RIVER 2 ) RIVER:river(all)RIVER:traverse(STATE) STATE:stateid(STATENAME) STATENAME:texas what is RIVER 1 the longest RIVER 1 RIVER 1 that does not RIVER 2 riverrun through STATE 1 STATENAME 1 texas Four sets of features: 1.Hybrid Sequence Features 2.Two-level Hybrid Sequence Features 3.Three-level Hybrid Sequence Features 4.Adjacent Hybrid Sequence Features Lu, Ng & Lee 6 August 2009, EMNLP09

Features for Tree CRF Model 14 QUERY:answer(RIVER) RIVER:longest(RIVER) RIVER:exclude(RIVER 1,RIVER 2 ) RIVER:river(all)RIVER:traverse(STATE) STATE:stateid(STATENAME) STATENAME:texas what is RIVER 1 the longest RIVER 1 RIVER 1 that does not RIVER 2 riverrun through STATE 1 STATENAME 1 texas Hybrid Sequence Features Lu, Ng & Lee 6 August 2009, EMNLP09

Features for Tree CRF Model 15 QUERY:answer(RIVER) RIVER:longest(RIVER) RIVER:exclude(RIVER 1,RIVER 2 ) RIVER:river(all)RIVER:traverse(STATE) STATE:stateid(STATENAME) STATENAME:texas what is RIVER 1 the longest RIVER 1 RIVER 1 that does not RIVER 2 riverrun through STATE 1 STATENAME 1 texas Two-level Hybrid Sequence Features Lu, Ng & Lee 6 August 2009, EMNLP09

QUERY:answer(RIVER) Features for Tree CRF Model 16 RIVER:longest(RIVER) RIVER:exclude(RIVER 1,RIVER 2 ) RIVER:river(all)RIVER:traverse(STATE) STATE:stateid(STATENAME) STATENAME:texas what is RIVER 1 the longest RIVER 1 RIVER 1 that does not RIVER 2 riverrun through STATE 1 STATENAME 1 texas Three-level Hybrid Sequence Features Lu, Ng & Lee 6 August 2009, EMNLP09

Features for Tree CRF Model 17 QUERY:answer(RIVER) RIVER:longest(RIVER) RIVER:exclude(RIVER 1,RIVER 2 ) RIVER:river(all)RIVER:traverse(STATE) STATE:stateid(STATENAME) STATENAME:texas what is RIVER 1 the longest RIVER 1 RIVER 1 that does not RIVER 2 river run through STATE 1 STATENAME 1 texas Adjacent Hybrid Sequence Features Lu, Ng & Lee 6 August 2009, EMNLP09

Strengths of Tree CRF Model Allows features that specifically model the dependencies between neighboring hybrid sequences in the tree to be used Can efficiently capture long range dependencies between MR productions and hybrid sequences since each hybrid sequence is allowed to depend on the entire MR tree. 18 Lu, Ng & Lee 6 August 2009, EMNLP09

Candidate hybrid sequences –Each MR production is associated with a set of candidate hybrid sequences –Tree CRF: the correct hybrid sequence for each MR production is hidden Tree CRF Model 19 NUM:count(STATE) BEGIN how many STATE 1 END BEGIN how many STATE 1 are there END BEGIN what is the number of STATE 1 END BEGIN count the number of STATE 1 END BEGIN give me the number of STATE 1 END …… Lu, Ng & Lee 6 August 2009, EMNLP09

Candidate hybrid sequences –Training set consists of hybrid trees which are determined with Viterbi algorithm –Candidate hybrid sequences for each MR production are extracted from these training hybrid trees Tree CRF Model 20 NUM:count(STATE) BEGIN how many STATE 1 END BEGIN how many STATE 1 are there END BEGIN what is the number of STATE 1 END BEGIN count the number of STATE 1 END BEGIN give me the number of STATE 1 END …… Lu, Ng & Lee 6 August 2009, EMNLP09

Comparison over two models Two benchmark corpora: Geoquery and Robocup The tree CRF model performs better than the direct inversion model Validates the belief that some long range dependencies are important for NLG task While the direct inversion model performs well on R OBOCUP, it performs substantially worse on G EOQUERY Evaluations (I) G EOQUERY (880)R OBOCUP (300) B LEU N IST B LEU N IST Direct inversion model Tree CRF model Lu, Ng & Lee 6 August 2009, EMNLP09

Sample Outputs 22 G EOQUERY Reference what is the largest state bordering texas Direct Inversion model what the largest states border texas Tree CRF model what is the largest state that borders texas R OBOCUP Reference if DR2C7 is true then players 2, 3, 7 and 8 should pass to player 4 Direct Inversion model if DR2C7, then players 2, 3 7 and 8 should ball to player 4 Tree CRF model if the condition DR2C7 is true then players 2, 3, 7 and 8 should pass to player 4 Lu, Ng & Lee 6 August 2009, EMNLP09

Comparison with the previous state-of-the-art model Previous model optimizes evaluation metrics directly However, on both corpora, the tree CRF model performs better than the previous model Confirms that longer range dependencies and phrase-level dependencies are more important Evaluations (II) 23 G EOQUERY (880)R OBOCUP (300) B LEU N IST B LEU N IST W ASP Tree CRF Model Lu, Ng & Lee 6 August 2009, EMNLP09

Evaluations (III) Comparison on other languages (G EOQUERY -250) Achieves better performance than the previous state-of-the-art system on all languages 24 EnglishJapanese B LEU N IST B LEU N IST W ASP Tree CRF Model SpanishTurkish B LEU N IST B LEU N IST W ASP Tree CRF Model Lu, Ng & Lee 6 August 2009, EMNLP09

Conclusions Built two novel models for NLG, on top of the hybrid tree framework –Direct Inversion Model –Tree CRF Model Evaluation shows the tree CRF model performs better than the direct inversion model Further evaluation shows the proposed tree CRF model performs better than a previous state-of-the-art system reported in the literature 25 Lu, Ng & Lee 6 August 2009, EMNLP09

Questions? 26 Lu, Ng & Lee 6 August 2009, EMNLP09