AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002.

Slides:



Advertisements
Similar presentations
QA-LaSIE Components The question document and each candidate answer document pass through all nine components of the QA-LaSIE system in the order shown.
Advertisements

Creating a Similarity Graph from WordNet
Probabilistic Language Processing Chapter 23. Probabilistic Language Models Goal -- define probability distribution over set of strings Unigram, bigram,
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Applications Chapter 9, Cimiano Ontology Learning Textbook Presented by Aaron Stewart.
Search Engines and Information Retrieval
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Information Retrieval in Practice
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction Kiyoshi Sudo Ph.D. Research Proposal New York University Committee:
Scalable Text Mining with Sparse Generative Models
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Information Retrieval in Practice
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
1 Answering Questions through Understanding and Analysis (AQUA) Ralph Weischedel and Scott Miller BBN Technologies 3 December 2001.
CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.
Search Engines and Information Retrieval Chapter 1.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( Bridging Languages for Question Answering: DIOGENE at CLEF-2003.
Hang Cui et al. NUS at TREC-13 QA Main Task 1/20 National University of Singapore at the TREC- 13 Question Answering Main Task Hang Cui Keya Li Renxu Sun.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
Question Answering From Zero to Hero Elena Eneva 11 Oct 2001 Advanced IR Seminar.
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
QUALIFIER in TREC-12 QA Main Task Hui Yang, Hang Cui, Min-Yen Kan, Mstislav Maslennikov, Long Qiu, Tat-Seng Chua School of Computing National University.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg.
NTCIR /21 ASQA: Academia Sinica Question Answering System for CLQA (IASL) Cheng-Wei Lee, Cheng-Wei Shih, Min-Yuh Day, Tzong-Han Tsai, Tian-Jian Jiang,
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
A Novel Pattern Learning Method for Open Domain Question Answering IJCNLP 2004 Yongping Du, Xuanjing Huang, Xin Li, Lide Wu.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.
AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6.
Describing Images using Inferred Visual Dependency Representations Authors : Desmond Elliot & Arjen P. de Vries Presentation of Paper by : Jantre Sanket.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.
Automatic Question Answering  Introduction  Factoid Based Question Answering.
Supertagging CMSC Natural Language Processing January 31, 2006.
Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.
AQUAINT IBM PIQUANT ARDACYCORP Subcontractor: IBM Question Answering Update piQuAnt ARDA/AQUAINT December 2002 Workshop This work was supported in part.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Using Semantic Relations to Improve Information Retrieval
Overview of Statistical NLP IR Group Meeting March 7, 2006.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Automatically Labeled Data Generation for Large Scale Event Extraction
Linguistic Graph Similarity for News Sentence Searching
Text Based Information Retrieval
Web IR: Recent Trends; Future of Web Search
Donna M. Gates Carnegie Mellon University
CS 430: Information Discovery
CS246: Information Retrieval
Presentation transcript:

AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002

2 AQUAINT BBN’s Approach to QA Theme: Use document retrieval, entity recognition, & proposition recognition Analyze the question –Reduce question to propositions and a bag of words –Predict the type of the answer Rank candidate answers using passage retrieval from primary corpus (the Aquaint corpus) Other knowledge sources (e.g. the Web) are optionally used to rerank answers Re-rank candidates based on propositions Estimate confidence for answers

3 AQUAINT System Diagram Question Classification Web Search NP Labeling Treebank Name Annotation Name Extraction Parsing Description Classification Proposition Finding Document Retrieval Confidence Estimation Passage Retrieval Question Answer & Confidence Score Name Extraction Regularization Proposition Bank

AQUAINT Question Classification

5 AQUAINT Question Classification A hybrid approach based on rules and statistical parsing & question templates –Match question templates against statistical parses –Back off to statistical bag-of-word classification Example features used for classification –The type of WHNP starting the question (e.g. “Who”, “What”, “When” …) –The headword of the core NP –WordNet definition –Bag of words –Main verb of the question Performance –TREC8&9 questions for training –~85% when testing on TREC10

6 AQUAINT Examples of Question Analysis Where is the Taj Mahal? –WHNP=where –Answer type: Location or GPE Which pianist won the last International Tchaikovsky Competition? –Headword of core NP=pianist, –WordNet definition=person –Answer type: Person

7 AQUAINT Question-Answer Types TypeSubtype ORGANIZATION CORPORATION EDUCATIONAL GOVERNMENT HOSPITAL HOTEL MUSEUM OTHER POLITICAL RELIGIOUS LOCATIONCONTINENT LAKE_SEA_OCEAN OTHER REGION RIVER BORDER FACAIRPORT ATTRACTION BRIDGE BUILDING HIGHWAY_STREET OTHER GAME PRODUCTDRUG OTHER VEHICLE WEAPON NATIONALITYNATIONALITY OTHER POLITICAL RELIGION LANGUAGE FAC_DESCAIRPORT ATTRACTION BRIDGE BUILDING HIGHWAY_STREET OTHER MONEY GPE_DESCCITY COUNTRY OTHER STATE_PROVINCE ORG_DESC CORPORATION EDUCATIONAL GOVERNMENT HOSPITAL HOTEL MUSEUM OTHER POLITICAL RELIGIOUS CONTACT_INFOADDRESS OTHER PHONE WORK_OF_ARTBOOK OTHER PAINTING PLAY SONG *Thanks to USC/ISI and IBM groups for sharing the conclusions of their analyses.

8 AQUAINT Question Answer Types (cont’d) PRODUCT_DESCOTHER VIHICLE WEAPON PERSON EVENTHURRICAN OTHER WAR SUBSTANCECHEMICAL DRUG FOOD OTHER PER_DESC PRODCUTOTHER ORDINAL ANIMAL QUANTITY 1D 1D_SPACE 2D 2D_SPACE 3D 3D_SPACE ENERGY OTHER SPEED WEIGHT TEMPERATURE GPECITY COUNTRY OTHER STATE_PROVINCE DISEASE CARDINAL AGE TIME PLANT PERCENT LAW DATEAGE DATE DURATION OTHER

9 AQUAINT Frequency of Q Types

AQUAINT Interpretation

11 AQUAINT IdentiFinder TM Status Current IdentiFinder performance on types IdentiFinder easily trainable for other languages, e.g., Arabic and Chinese

12 AQUAINT Proposition Indexing A shallow semantic representation –Deeper than bags of words –But broad enough to cover all the text Characterizes documents by –The entities they contain –Propositions involving those entities Resolves all references to entities –Whether named, described, or pronominal Represents all propositions that are directly stated in the text

13 AQUAINT Proposition Finding Example Propositions (e1: “Dell”) (e2: “Comaq”) (e3: “the most PCs”) (e4: “2001”) (sold subj:e1, obj:e3, in:e4) (beating subj:e1, obj:e2) Question: Which company sold the most PCs in 2001? Text: Dell, beating Compaq, sold the most PCs in Passage retrieval would select the wrong answer Answer

14 AQUAINT Proposition Recognition Strategy Start with a lexicalized, probabilistic (LPCFG) parsing model Distinguish names by replacing NP labels with NPP Currently, rules normalize the parse tree to produce propositions At a later date, extend the statistical model to –Predict argument labels for clauses –Resolve references to entities

15 AQUAINT Confidence Estimation Compute probability P(correct|Q,A) from the following features P(correct|Q,A)  P(correct|type(Q),, PropSat) –type(Q): question type –m: question length –n: number of matched question words in answer context –PropSat: whether answer satisfies propositions in the question Confidence for answers found on the Web P(correct|Q,A)  P(correct|Freq, InTrec) –Freq=Number of Web hits, using Google –InTrec=Whether Q was also a top answer from Aquaint corpus

16 AQUAINT Dependence of Answer Correctness on Question Type

17 AQUAINT Dependence on Proposition Satisfaction

18 AQUAINT Dependence on Number of Matched Words

19 AQUAINT Dependence of Answer Correctness on Web Frequency

20 AQUAINT Official Results of TREC2002QA RunTag Unranked Average Precision Ranked Average Precision Upper- bound BBN2002A BBN2002B BBN2002C BBN2002A did not use Web BBN2002B&C used Web Unranked average precision=percentage of questions for which the first answer is correct Ranked average precision=Confidence weighted score, the official metric for TREC2002 Upper-bound=confidence weighted score given perfect confidence estimation

21 AQUAINT Recent Progress In the last six months, we have: –Retrained our name tagger (IdentiFinder TM ) for roughly 29 question types –Distributed the re-trained English version of IdentiFinder to other sites –Participated in the Question Answering track of TREC 2002 –Participated in a pilot evaluation of automatically answering definitional/biographical questions –Developed a demonstration of our question answering system AQUA against streaming news