What is the Jeopardy Model? A Quasi-Synchronous Grammar for Question Answering Mengqiu Wang, Noah A. Smith and Teruko Mitamura Language Technology Institute.

Slides:



Advertisements
Similar presentations
Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,
LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.
Robust Textual Inference via Graph Matching Aria Haghighi Andrew Ng Christopher Manning.
1 Quasi-Synchronous Grammars  Based on key observations in MT: translated sentences often have some isomorphic syntactic structure, but not usually in.
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.
Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.
1 Learning with Latent Alignment Structures Quasi-synchronous Grammar and Tree-edit CRFs for Question Answering and Textual Entailment Mengqiu Wang Joint.
1 Tree-edit CRFs for RTE Mengqiu Wang and Chris Manning.
Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl University of Limerick, Ireland 2 - University of Essex, UK 3.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
Machine translation Context-based approach Lucia Otoyo.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.
Chapter 23: Probabilistic Language Models April 13, 2004.
Ling573 NLP Systems and Applications May 7, 2013.
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Question Classification using Support Vector Machine Dell Zhang National University of Singapore Wee Sun Lee National University of Singapore SIGIR2003.
Automatic Labeling of Multinomial Topic Models
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Automatic Labeling of Multinomial Topic Models Qiaozhu Mei, Xuehua Shen, and ChengXiang Zhai DAIS The Database and Information Systems Laboratory.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Question Classification Ling573 NLP Systems and Applications April 25, 2013.
Queensland University of Technology
Approaches to Machine Translation
Reading Notes Wang Ning Lab of Database and Information Systems
Approaches to Machine Translation
PRESENTATION: GROUP # 5 Roll No: 14,17,25,36,37 TOPIC: STATISTICAL PARSING AND HIDDEN MARKOV MODEL.
Presentation transcript:

What is the Jeopardy Model? A Quasi-Synchronous Grammar for Question Answering Mengqiu Wang, Noah A. Smith and Teruko Mitamura Language Technology Institute Carnegie Mellon University

2 The task High-efficiency document retrieval High-precision answer ranking Who is the leader of France? 1. Bush later met with French president Jacques Chirac. 2. Henri Hadjenberg, who is the leader of France ’s Jewish community, … 3. … 1. Henri Hadjenberg, who is the leader of France ’s Jewish community, … 2. Bush later met with French president Jacques Chirac. (as of May ) 3. …

3 Challenges High-efficiency document retrieval High-precision answer ranking Who is the leader of France ? 1. Bush later met with French president Jacques Chirac. 2. Henri Hadjenberg, who is the leader of France ’s Jewish community, … 3. …

4 Semantic Tranformations Q:“Who is the leader of France?” A: Bush later met with French president Jacques Chirac.

5 Syntactic Transformations WholeadertheFranceofis? BushmetFrenchwithpresidentJacquesChirac mod

6 Syntactic Variations WholeadertheFranceofis? HenriHadjenberb,wholeaderistheofFrance’sJewishcommunity mod

7 Two key phenomena in QA  Semantic transformation leader president  Syntactic transformation leader of France French president Q A

8 Existing work in QA  Semantics Use WordNet as thesaurus for expansion  Syntax Use dependency parse trees, but merely transform the feature space into dependency parse feature space. No fundamental changes in the algorithms (edit-distance, classifier, similarity measure).

9 Where else have we seen these transformations?  Machine Translation (especially in syntax-based MT)  Paraphrasing  Sentence compression  Textual entailment F E

10 Noisy-channel  Machine Translation  Question Answering S E Q A Language modelTranslation model retrieval model Jeopardy model

11  From wikipedia.org: Jeopardy! is a popular international television quiz game show ( #2 of the 50 Greatest Game Show of All Times ). 3 contestants select clues in the form of an answer, to which they must supply correct responses in the form of a question. The concept of "questioning answers" is original to Jeopardy!. What is Jeopardy! ?

12 Jeopardy Model  We make use of a formalism called quasi-synchronous grammar [ D. Smith & Eisner ’06 ], originally developed for MT

13 Quasi-Synchronous Grammars  Based on key observations in MT: translated sentences often have some isomorphic syntactic structure, but not usually in entirety. the strictness of the isomorphism may vary across words or syntactic rules.  Key idea: Unlike some synchronous grammars (e.g. SCFG, which is more strict and rigid), QG defines a monolingual grammar for the target tree, “inspired” by the source tree.

14 Quasi-Synchronous Grammars  In other words, we model the generation of the target tree, influenced by the source tree (and their alignment)  QA can be thought of as extremely free translation within the same language.  The linkage between question and answer trees in QA is looser than in MT, which gives a bigger edge to QG.

15 Jeopardy Model  Works on labeled dependency parse trees  Learn the hidden structure (alignment between Q and A trees) by summing out ALL possible alignments  One particular alignment tells us both the syntactic configurations and the word-to-word semantic correspondences  An example… question answer parse tree question parse tree an alignment

Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod

Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod

Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person is VB Q:A: $ root $ root subjwith nmod Our model makes local Markov assumptions to allow efficient computation via Dynamic Programming (details in paper) given its parent, a word is independent of all other words (including siblings).

Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword is VB Q:A: $ root $ root subj root subjwith nmod

Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB Q:A: $ root $ root subjobj root subjwith nmod

Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT Q:A: $ root $ root subjobj det root subjwith nmod

Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod

23 6 types of syntactic configurations  Parent-child

Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod

Parent-child configuration

26 6 types of syntactic configurations  Parent-child  Same-word

Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod

Same-word configuration

29 6 types of syntactic configurations  Parent-child  Same-word  Grandparent-child

Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod

Grandparent-child configuration

32 6 types of syntactic configurations  Parent-child  Same-word  Grandparent-child  Child-parent  Siblings  C-command (Same as [D. Smith & Eisner ’06])

34 Modeling alignment  Base model

Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod

Bush NNP person met VBD French JJ location president NN Jacques Chirac NNP person who WP qword leader NN is VB the DT France NNP location Q:A: $ root $ root subjobj detof root subjwith nmod

37 Modeling alignment cont.  Base model  Log-linear model Lexical-semantic features from WordNet, Identity, hypernym, synonym, entailment, etc.  Mixture model

38 Parameter estimation  Things to be learnt Multinomial distributions in base model Log-linear model feature weights Mixture coefficient  Training involves summing out hidden structures, thus non-convex.  Solved using conditional Expectation- Maximization

39 Experiments  Trec8-12 data set for training  Trec13 questions for development and testing

40 Candidate answer generation  For each question, we take all documents from the TREC doc pool, and extract sentences that contain at least one non-stop keywords from the question.  For computational reasons (parsing speed, etc.), we only took answer sentences <= 40 words.

41 Dataset statistics  Manually labeled 100 questions for training Total: 348 positive Q/A pairs  84 questions for dev Total: 1415 Q/A pairs 3.1+,  100 questions for testing Total: 1703 Q/A pairs 3.6+,  Automatically labeled another 2193 questions to create a noisy training set, for evaluating model robustness

42 Experiments cont.  Each question and answer sentence is tokenized, POS tagged (MX-POST), parsed (MSTParser) and labeled with named-entity tags (Identifinder)

43 Baseline systems (replications)  [Cui et al. SIGIR ‘05] The algorithm behind one of the best performing systems in TREC evaluations. It uses a mutual information-inspired score computed over dependency trees and a single fixed alignment between them.  [Punyakanok et al. NLE ’04] measures the similarity between Q and A by computing tree edit distance.  Both baselines are high-performing, syntax-based, and most straight-forward to replicate  We further enhanced the algorithms by augmenting them with WordNet.

44 Results Mean Average Precision Mean Reciprocal Rank of Top 1 Statistically significantly better than the 2 nd best score in each column 28.2% 23.9% 41.2% 30.3%

45 Summing vs. Max

46 Conclusion  We developed a probabilistic model for QA based on quasi-synchronous grammar  Experimental results showed that our model is more accurate and robust than state-of- the-art syntax-based QA models  The mixture model is shown to be powerful. The log-linear model allows us to use arbitrary features.  Provides a general framework for many other NLP applications (compression, textual entailment, paraphrasing, etc.)

47 Future Work  Higher-order Markovization, both horizontally and vertically, allows us to look at more context, at the expense of higher computational cost.  More features from external resources, e.g. paraphrasing database  Extending it for Cross-lingual QA Avoid the paradigm of translation as pre- of post-processing We can naturally fit in a lexical or phrase translation probability table into our model to model the translation inherently  Taking into account parsing uncertainty

48 Thank you! Questions?