Answer Extraction Ling573 NLP Systems and Applications May 17, 2011.

Slides:



Advertisements
Similar presentations
Introduction to Information Retrieval
Advertisements

Elliot Holt Kelly Peterson. D4 – Smells Like D3 Primary Goal – improve D3 MAP with lessons learned After many experiments: TREC 2004 MAP = >
QA-LaSIE Components The question document and each candidate answer document pass through all nine components of the QA-LaSIE system in the order shown.
Group 3 Chad Mills Esad Suskic Wee Teck Tan. Outline  System and Data  Document Retrieval  Passage Retrieval  Results  Conclusion.
QA and Language Modeling (and Some Challenges) Eduard Hovy Information Sciences Institute University of Southern California.
Probabilistic Language Processing Chapter 23. Probabilistic Language Models Goal -- define probability distribution over set of strings Unigram, bigram,
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Web Document Clustering: A Feasibility Demonstration Hui Han CSE dept. PSU 10/15/01.
A Noisy-Channel Approach to Question Answering Authors: Echihabi and Marcu Date: 2003 Presenters: Omer Percin Emin Yigit Koksal Ustun Ozgur IR 2013.
Answer Extraction Ling573 NLP Systems and Applications May 19, 2011.
Deliverable #3: Document and Passage Retrieval Ling 573 NLP Systems and Applications May 10, 2011.
Passage Retrieval & Re-ranking Ling573 NLP Systems and Applications May 5, 2011.
Passage Retrieval and Re-ranking Ling573 NLP Systems and Applications May 3, 2011.
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.
Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection Panagiotis G. Ipeirotis Luis Gravano Computer Science Department Columbia.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Learning Surface Text Patterns for a Question Answering System Deepak Ravichandran Eduard Hovy Information Sciences Institute University of Southern California.
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
WMES3103 : INFORMATION RETRIEVAL INDEXING AND SEARCHING.
Employing Two Question Answering Systems in TREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation.
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Information Retrieval in Practice
A Web-based Question Answering System Yu-shan & Wenxiu
Hang Cui et al. NUS at TREC-13 QA Main Task 1/20 National University of Singapore at the TREC- 13 Question Answering Main Task Hang Cui Keya Li Renxu Sun.
A Technical Seminar on Question Answering SHRI RAMDEOBABA COLLEGE OF ENGINEERING & MANAGEMENT Presented By: Rohini Kamdi Guided By: Dr. A.J.Agrawal.
Query Processing: Query Formulation Ling573 NLP Systems and Applications April 14, 2011.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Structured Use of External Knowledge for Event-based Open Domain Question Answering Hui Yang, Tat-Seng Chua, Shuguang Wang, Chun-Keat Koh National University.
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
NTCIR /21 ASQA: Academia Sinica Question Answering System for CLQA (IASL) Cheng-Wei Lee, Cheng-Wei Shih, Min-Yuh Day, Tzong-Han Tsai, Tian-Jian Jiang,
Deep Processing QA & Information Retrieval Ling573 NLP Systems and Applications April 11, 2013.
Question Answering over Implicitly Structured Web Content
Automatic Set Instance Extraction using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University Pittsburgh,
A Novel Pattern Learning Method for Open Domain Question Answering IJCNLP 2004 Yongping Du, Xuanjing Huang, Xin Li, Lide Wu.
Chapter 23: Probabilistic Language Models April 13, 2004.
Ling573 NLP Systems and Applications May 7, 2013.
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
LING 573 Deliverable 3 Jonggun Park Haotian He Maria Antoniak Ron Lockwood.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.
Automatic Question Answering  Introduction  Factoid Based Question Answering.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Supertagging CMSC Natural Language Processing January 31, 2006.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Shallow & Deep QA Systems Ling 573 NLP Systems and Applications April 9, 2013.
Improving QA Accuracy by Question Inversion John Prager, Pablo Duboue, Jennifer Chu-Carroll Presentation by Sam Cunningham and Martin Wintz.
(Pseudo)-Relevance Feedback & Passage Retrieval Ling573 NLP Systems & Applications April 28, 2011.
Learning Analogies and Semantic Relations Nov William Cohen.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
The P YTHY Summarization System: Microsoft Research at DUC 2007 Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi, Hisami Suzuki,
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
Automatic Question Answering Beyond the Factoid Radu Soricut Information Sciences Institute University of Southern California Eric Brill Microsoft Research.
Integrating linguistic knowledge in passage retrieval for question answering J¨org Tiedemann Alfa Informatica, University of Groningen HLT/EMNLP 2005.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Question Classification Ling573 NLP Systems and Applications April 25, 2013.
Linguistic Graph Similarity for News Sentence Searching
Query Reformulation & Answer Extraction
Ling573 NLP Systems and Applications May 16, 2013
ABDULLAH ALOTAYQ, DONG WANG, ED PHAM PROJECT BY:
Panagiotis G. Ipeirotis Luis Gravano
Presentation transcript:

Answer Extraction Ling573 NLP Systems and Applications May 17, 2011

Roadmap Deliverable 3 Discussion What worked Deliverable 4 Answer extraction: Learning answer patterns Answer extraction: classification and ranking Noisy channel approaches

Reminder Steve Sim Career Exploration discussion After class today

Deliverable #3 Document & Passage Retrieval What was tried: Query processing:

Deliverable #3 Document & Passage Retrieval What was tried: Query processing: Stopwording: bigger/smaller lists

Deliverable #3 Document & Passage Retrieval What was tried: Query processing: Stopwording: bigger/smaller lists Stemming: Y/N: Krovetz/Porter/Snowball

Deliverable #3 Document & Passage Retrieval What was tried: Query processing: Stopwording: bigger/smaller lists Stemming: Y/N: Krovetz/Porter/Snowball Targets: Concatenation, pronoun substitution

Deliverable #3 Document & Passage Retrieval What was tried: Query processing: Stopwording: bigger/smaller lists Stemming: Y/N: Krovetz/Porter/Snowball Targets: Concatenation, pronoun substitution Reformulation

Deliverable #3 Document & Passage Retrieval What was tried: Query processing: Stopwording: bigger/smaller lists Stemming: Y/N: Krovetz/Porter/Snowball Targets: Concatenation, pronoun substitution Reformulation Query expansion: WordNet synonym expansion Pseudo-relevance feedback Slight differences

Deliverable #3 What worked: Query processing: Stopwording: bigger/smaller lists Both - apparently

Deliverable #3 What worked: Query processing: Stopwording: bigger/smaller lists Both - apparently Stemming: Y/N: Krovetz/Porter/Snowball

Deliverable #3 What worked: Query processing: Stopwording: bigger/smaller lists Both - apparently Stemming: Y/N: Krovetz/Porter/Snowball Targets: Concatenation, pronoun substitution: Both good

Deliverable #3 What worked: Query processing: Stopwording: bigger/smaller lists Both - apparently Stemming: Y/N: Krovetz/Porter/Snowball Targets: Concatenation, pronoun substitution: Both good Reformulation: little effect

Deliverable #3 What worked: Query processing: Stopwording: bigger/smaller lists Both - apparently Stemming: Y/N: Krovetz/Porter/Snowball Targets: Concatenation, pronoun substitution: Both good Reformulation: little effect Query expansion: Generally degraded results One group had some improvement in MRR

Deliverable #3 What was tried: Indexing: Lucene & Indri

Deliverable #3 What was tried: Indexing: Lucene & Indri Passage retrieval: Indri passages: different window sizes/steps

Deliverable #3 What was tried: Indexing: Lucene & Indri Passage retrieval: Indri passages: different window sizes/steps Passage reranking: Multitext SiteQ ISI Classification

Deliverable #3 What worked: Indexing: Lucene & Indri: Generally comparable: MAP

Deliverable #3 What worked: Indexing: Lucene & Indri: Generally comparable: MAP Passage retrieval: Indri passages: different window sizes/steps Generally works well with moderate overlap: best results

Deliverable #3 What worked: Indexing: Lucene & Indri: Generally comparable: MAP Passage retrieval: Indri passages: different window sizes/steps Generally works well with moderate overlap: best results Passage reranking: Multitext SiteQ ISI Classification: hard to beat Indri

Deliverable #4 End-to-end QA: Answer extraction/refinement Jeopardy!-style answers: beyond scope Instead: More fine-grained passages

Deliverable #4 End-to-end QA: Answer extraction/refinement Jeopardy!-style answers: beyond scope Instead: More fine-grained passages Specifically, passages of: 100-char, 250-char, and 1000-char

Deliverable #4 End-to-end QA: Answer extraction/refinement Jeopardy!-style answers: beyond scope Instead: More fine-grained passages Specifically, passages of: 100-char, 250-char, and 1000-char Evaluated on QA 2004, 2005 (held out) data

Deliverable #4 Output Format: Factoids only please Top 20 results Output lines: Qid run-tag DocID Answer_string Answer string: different lengths Please no carriage returns….

Answer Extraction Goal: Given a passage, find the specific answer in passage Go from ~1000 chars -> short answer span

Answer Extraction Goal: Given a passage, find the specific answer in passage Go from ~1000 chars -> short answer span Example: Q: What is the current population of the United States? Pass: The United States enters 2011 with a population of more than million people, according to a U.S. Census Bureau estimate.

Answer Extraction Goal: Given a passage, find the specific answer in passage Go from ~1000 chars -> short answer span Example: Q: What is the current population of the United States? Pass: The United States enters 2011 with a population of more than million people, according to a U.S. Census Bureau estimate. Answer: million

Challenges ISI’s answer extraction experiment: Given: Question: 413 TREC-2002 factoid questions Known answer type All correct answer passages

Challenges ISI’s answer extraction experiment: Given: Question: 413 TREC-2002 factoid questions Known answer type All correct answer passages Task: Pin-point specific answer string Accuracy: Systems: 68.2%, 63.4%, 56.7% Still missing 30%+ answers

Challenges ISI’s answer extraction experiment: Given: Question: 413 TREC-2002 factoid questions Known answer type All correct answer passages Task: Pin-point specific answer string Accuracy: Systems: 68.2%, 63.4%, 56.7% Still missing 30%+ answers Oracle (any of 3 right): 78.9% (20% miss)

Basic Strategies Answer-type matching: Build patterns for answer locations Restrict by answer type

Basic Strategies Answer-type matching: Build patterns for answer locations Restrict by answer type Information for pattern types:

Basic Strategies Answer-type matching: Build patterns for answer locations Restrict by answer type Information for pattern types: Lexical: word patterns Syntactic/structural: Syntactic relations b/t question and answer Semantic: Semantic/argument relations b/t question and answer

Basic Strategies Answer-type matching: Build patterns for answer locations Restrict by answer type Information for pattern types: Lexical: word patterns Syntactic/structural: Syntactic relations b/t question and answer Semantic: Semantic/argument relations b/t question and answer Combine with machine learning to select

Pattern Matching Example Answer type: Definition

Pattern Matching Example Answer type: Definition

Pattern Matching Example Answer type: Definition Answer type: Birthdate

Pattern Matching Example Answer type: Definition Answer type: Birthdate Question: When was Mozart born? Answer: Mozart was born on ….

Pattern Matching Example Answer type: Definition Answer type: Birthdate Question: When was Mozart born? Answer: Mozart was born on …. Pattern: was born on Pattern: ( - …..)

Basic Strategies N-gram tiling: Typically as part of answer validation/verification Integrated with web-based retrieval Based on retrieval of search ‘snippets’ Identifies frequently occurring, overlapping n-grams Of correct type

41 N-gram Tiling Dickens Charles Dickens Mr Charles Scores merged, discard old n-grams Mr Charles Dickens Score 45 N-Grams tile highest-scoring n-gram N-Grams Repeat, until no more overlap

Automatic Pattern Learning Ravichandran and Hovy 2002; Echihabi et al, 2005 Inspiration (Soubottin and Soubottin ’01) Best TREC 2001 system:

Automatic Pattern Learning Ravichandran and Hovy 2002; Echihabi et al, 2005 Inspiration (Soubottin and Soubottin ’01) Best TREC 2001 system: Based on extensive list of surface patterns Mostly manually created

Automatic Pattern Learning Ravichandran and Hovy 2002; Echihabi et al, 2005 Inspiration (Soubottin and Soubottin ’01) Best TREC 2001 system: Based on extensive list of surface patterns Mostly manually created Many patterns strongly associated with answer types E.g. ( - ) Person’s birth and death

Pattern Learning S & S ‘01 worked well, but

Pattern Learning S & S ‘01 worked well, but Manual pattern creation is a hassle, impractical

Pattern Learning S & S ‘01 worked well, but Manual pattern creation is a hassle, impractical Can we learn patterns? Supervised approaches:

Pattern Learning S & S ‘01 worked well, but Manual pattern creation is a hassle, impractical Can we learn patterns? Supervised approaches: Not much better, Have to tag training samples, need training samples

Pattern Learning S & S ‘01 worked well, but Manual pattern creation is a hassle, impractical Can we learn patterns? Supervised approaches: Not much better, Have to tag training samples, need training samples Bootstrapping approaches: Promising:

Pattern Learning S & S ‘01 worked well, but Manual pattern creation is a hassle, impractical Can we learn patterns? Supervised approaches: Not much better, Have to tag training samples, need training samples Bootstrapping approaches: Promising: Guidance from small number of seed samples Can use answer data from web

Finding Candidate Patterns For a given question type Identify an example with qterm and aterm

Finding Candidate Patterns For a given question type Identify an example with qterm and aterm Submit to a search engine Download top N web docs (N=1000) Select only sentences w/qterm and aterm

Finding Candidate Patterns For a given question type Identify an example with qterm and aterm Submit to a search engine Download top N web docs (N=1000) Select only sentences w/qterm and aterm Identify all substrings and their counts Implemented using suffix trees for efficiency Select only phrases with qterm AND aterm Replace qterm and aterm instances w/generics

Example Q: When was Mozart born? A: Mozart (1756-….

Example Q: When was Mozart born? A: Mozart (1756 – Qterm: Mozart; Aterm: 1756 The great composer Mozart (1756–1791) achieved fame Mozart (1756–1791) was a genius Indebted to the great music of Mozart (1756–1791)

Example Q: When was Mozart born? A: Mozart (1756 – Qterm: Mozart; Aterm: 1756 The great composer Mozart (1756–1791) achieved fame Mozart (1756–1791) was a genius Indebted to the great music of Mozart (1756–1791) Phrase: Mozart ( ); count =3

Example Q: When was Mozart born? A: Mozart (1756 – Qterm: Mozart; Aterm: 1756 The great composer Mozart (1756–1791) achieved fame Mozart (1756–1791) was a genius Indebted to the great music of Mozart (1756–1791) Phrase: Mozart ( ); count =3 Convert to : (

Patterns Typically repeat with a few more examples

Patterns Typically repeat with a few more examples Collect more patterns: E.g. for Birthdate a. born in, b. was born on, c. ( - d. ( - ) Is this enough?

Patterns Typically repeat with a few more examples Collect more patterns: E.g. for Birthdate a. born in, b. was born on, c. ( - d. ( - ) Is this enough? No – some good patterns, but Probably lots of junk, too; need to filter

Computing Pattern Precision For question type: Search only on qterm

Computing Pattern Precision For question type: Search only on qterm Download top N web docs (N=1000) Select only sentences w/qterm

Computing Pattern Precision For question type: Search only on qterm Download top N web docs (N=1000) Select only sentences w/qterm For each pattern, check if a) matches w/any aterm; C o b)matches/w right aterm: C a

Computing Pattern Precision For question type: Search only on qterm Download top N web docs (N=1000) Select only sentences w/qterm For each pattern, check if a) matches w/any aterm; C o b)matches/w right aterm: C a Compute precision P = C a /C o Retain if match > 5 examples

Pattern Precision Example Qterm: Mozart Pattern: was born in

Pattern Precision Example Qterm: Mozart Pattern: was born in Near-Miss: Mozart was born in Salzburg Match: Mozart born in 1756.

Pattern Precision Example Qterm: Mozart Pattern: was born in Near-Miss: Mozart was born in Salzburg Match: Mozart born in Precisions: 1.0 ( - ) 0.6 was born in ….

Nuances Alternative forms: Need to allow for alternate forms of question or answer E.g. dates in different formats, full names, etc Use alternate forms in pattern search

Nuances Alternative forms: Need to allow for alternate forms of question or answer E.g. dates in different formats, full names, etc Use alternate forms in pattern search Precision assessment: Use other examples of same type to compute Cross-checks patterns

Answer Selection by Pattern Identify question types and terms Filter retrieved passages, replace qterm by tag Try to match patterns and answer spans Discard duplicates and sort by pattern precision

Pattern Sets WHY-FAMOUS 1.0 called 1.0 laureate 1.0 by the,,1.0 - the of 1.0 was the of BIRTHYEAR 1.0 ( - ) 0.85 was born on, 0.6 was born in 0.59 was born 0.53 was born

Results Improves, though better with web data

Limitations & Extensions Where are the Rockies?..with the Rockies in the background

Limitations & Extensions Where are the Rockies?..with the Rockies in the background Should restrict to semantic / NE type

Limitations & Extensions Where are the Rockies?..with the Rockies in the background Should restrict to semantic / NE type London, which…., lies on the River Thames word* lies on Wildcards impractical

Limitations & Extensions Where are the Rockies?..with the Rockies in the background Should restrict to semantic / NE type London, which…., lies on the River Thames word* lies on Wildcards impractical Long-distance dependencies not practical

Limitations & Extensions Where are the Rockies?..with the Rockies in the background Should restrict to semantic / NE type London, which…., lies on the River Thames word* lies on Wildcards impractical Long-distance dependencies not practical Less of an issue in Web search Web highly redundant, many local dependencies Many systems (LCC) use web to validate answers

Limitations & Extensions When was LBJ born? Tower lost to Sen. LBJ, who ran for both the…

Limitations & Extensions When was LBJ born? Tower lost to Sen. LBJ, who ran for both the… Requires information about: Answer length, type; logical distance (1-2 chunks)

Limitations & Extensions When was LBJ born? Tower lost to Sen. LBJ, who ran for both the… Requires information about: Answer length, type; logical distance (1-2 chunks) Also, Can only handle single continuous qterms Ignores case Needs handle canonicalization, e.g of names/dates

Integrating Patterns II Fundamental problem:

Integrating Patterns II Fundamental problem: What is there’s no pattern??

Integrating Patterns II Fundamental problem: What is there’s no pattern?? No pattern -> No answer!!! More robust solution: Not JUST patterns

Integrating Patterns II Fundamental problem: What is there’s no pattern?? No pattern -> No answer!!! More robust solution: Not JUST patterns Integrate with machine learning MAXENT!!! Re-ranking approach

Answering w/Maxent

Feature Functions Pattern fired: Binary feature

Feature Functions Pattern fired: Binary feature Answer frequency/Redundancy factor: # times answer appears in retrieval results

Feature Functions Pattern fired: Binary feature Answer frequency/Redundancy factor: # times answer appears in retrieval results Answer type match (binary)

Feature Functions Pattern fired: Binary feature Answer frequency/Redundancy factor: # times answer appears in retrieval results Answer type match (binary) Question word absent (binary): No question words in answer span

Feature Functions Pattern fired: Binary feature Answer frequency/Redundancy factor: # times answer appears in retrieval results Answer type match (binary) Question word absent (binary): No question words in answer span Word match: Sum of ITF of words matching b/t questions & sent

Training & Testing Trained on NIST QA questions Train: TREC 8,9; Cross-validation: TREC candidate answers/question Positive examples: NIST pattern matches Negative examples: NIST pattern doesn’t match Test: TREC-2003: MRR: 28.6%; 35.6% exact top 5

Noisy Channel QA Employed for speech, POS tagging, MT, summ, etc Intuition: Question is a noisy representation of the answer

Noisy Channel QA Employed for speech, POS tagging, MT, summ, etc Intuition: Question is a noisy representation of the answer Basic approach: Given a corpus of (Q,S A ) pairs Train P(Q|S A ) Find sentence with answer as S i,Aij that maximize P(Q|S i,Aij )

QA Noisy Channel A: Presley died of heart disease at Graceland in 1977, and.. Q: When did Elvis Presley die?

QA Noisy Channel A: Presley died of heart disease at Graceland in 1977, and.. Q: When did Elvis Presley die? Goal: Align parts of Ans parse tree to question Mark candidate answers Find highest probability answer

Approach Alignment issue:

Approach Alignment issue: Answer sentences longer than questions Minimize length gap Represent answer as mix of words/syn/sem/NE units

Approach Alignment issue: Answer sentences longer than questions Minimize length gap Represent answer as mix of words/syn/sem/NE units Create ‘cut’ through parse tree Every word –or an ancestor – in cut Only one element on path from root to word

Approach Alignment issue: Answer sentences longer than questions Minimize length gap Represent answer as mix of words/syn/sem/NE units Create ‘cut’ through parse tree Every word –or an ancestor – in cut Only one element on path from root to word Presley died of heart disease at Graceland in 1977, and.. Presley died PP PP in DATE, and.. When did Elvis Presley die?

Approach (Cont’d) Assign one element in cut to be ‘Answer’ Issue: Cut STILL may not be same length as Q

Approach (Cont’d) Assign one element in cut to be ‘Answer’ Issue: Cut STILL may not be same length as Q Solution: (typical MT) Assign each element a fertility 0 – delete the word; > 1: repeat word that many times

Approach (Cont’d) Assign one element in cut to be ‘Answer’ Issue: Cut STILL may not be same length as Q Solution: (typical MT) Assign each element a fertility 0 – delete the word; > 1: repeat word that many times Replace A words with Q words based on alignment Permute result to match original Question Everything except cut computed with OTS MT code

Schematic Assume cut, answer guess all equally likely

Training Sample Generation Given question and answer sentences Parse answer sentence Create cut s.t.: Words in both Q & A are preserved Answer reduced to ‘A_’ syn/sem class label Nodes with no surface children reduced to syn class Keep surface form of all other nodes 20K TREC QA pairs; 6.5K web question pairs

Selecting Answers For any candidate answer sentence: Do same cut process

Selecting Answers For any candidate answer sentence: Do same cut process Generate all candidate answer nodes: Syntactic/Semantic nodes in tree

Selecting Answers For any candidate answer sentence: Do same cut process Generate all candidate answer nodes: Syntactic/Semantic nodes in tree What’s a bad candidate answer?

Selecting Answers For any candidate answer sentence: Do same cut process Generate all candidate answer nodes: Syntactic/Semantic nodes in tree What’s a bad candidate answer? Stopwords Question words! Create cuts with each answer candidate annotated Select one with highest probability by model

Example Answer Cuts Q: When did Elvis Presley die? S A1 : Presley died A_PP PP PP, and … S A2 : Presley died PP A_PP PP, and …. S A3 : Presley died PP PP in A_DATE, and … Results: MRR: 24.8%; 31.2% in top 5

Error Analysis Component specific errors: Patterns: Some question types work better with patterns Typically specific NE categories (NAM, LOC, ORG..) Bad if ‘vague’ Stats based: No restrictions on answer type – frequently ‘it’ Patterns and stats: ‘Blatant’ errors: Select ‘bad’ strings (esp. pronouns) if fit position/pattern

Error Analysis Component specific errors: Patterns: Some question types work better with patterns Typically specific NE categories (NAM, LOC, ORG..) Bad if ‘vague’

Error Analysis Component specific errors: Patterns: Some question types work better with patterns Typically specific NE categories (NAM, LOC, ORG..) Bad if ‘vague’ Stats based: No restrictions on answer type – frequently ‘it’

Error Analysis Component specific errors: Patterns: Some question types work better with patterns Typically specific NE categories (NAM, LOC, ORG..) Bad if ‘vague’ Stats based: No restrictions on answer type – frequently ‘it’ Patterns and stats: ‘Blatant’ errors: Select ‘bad’ strings (esp. pronouns) if fit position/pattern

Combining Units Linear sum of weights?

Combining Units Linear sum of weights? Problematic: Misses different strengths/weaknesses

Combining Units Linear sum of weights? Problematic: Misses different strengths/weaknesses Learning! (of course) Maxent re-ranking Linear

Feature Functions 48 in total Component-specific: Scores, ranks from different modules Patterns. Stats, IR, even QA word overlap

Feature Functions 48 in total Component-specific: Scores, ranks from different modules Patterns. Stats, IR, even QA word overlap Redundancy-specific: # times candidate answer appears (log, sqrt)

Feature Functions 48 in total Component-specific: Scores, ranks from different modules Patterns. Stats, IR, even QA word overlap Redundancy-specific: # times candidate answer appears (log, sqrt) Qtype-specific: Some components better for certain types: type+mod

Feature Functions 48 in total Component-specific: Scores, ranks from different modules Patterns. Stats, IR, even QA word overlap Redundancy-specific: # times candidate answer appears (log, sqrt) Qtype-specific: Some components better for certain types: type+mod Blatant ‘errors’: no pronouns, when NOT DoW

Experiments Per-module reranking: Use redundancy, qtype, blatant, and feature from mod

Experiments Per-module reranking: Use redundancy, qtype, blatant, and feature from mod Combined reranking: All features (after feature selection to 31)

Experiments Per-module reranking: Use redundancy, qtype, blatant, and feature from mod Combined reranking: All features (after feature selection to 31) Patterns: Exact in top 5: 35.6% -> 43.1% Stats: Exact in top 5: 31.2% -> 41% Manual/knowledge based: 57%