Presentation on theme: "TextMap: An Intelligent Question- Answering Assistant Project Members:Ulf Hermjakob Eduard Hovy Chin-Yew Lin Kevin Knight Daniel Marcu Deepak Ravichandran."— Presentation transcript:
TextMap: An Intelligent Question- Answering Assistant Project Members:Ulf Hermjakob Eduard Hovy Chin-Yew Lin Kevin Knight Daniel Marcu Deepak Ravichandran
State-of-the-art Q&A capabilities [Webclopedia-2001] Question 110: Who killed Lee Harvey Oswald? Qtargets: I-EN-PROPER-PERSON & S-PROPER-NAME, I-EN-PROPER-ORGANIZATION “Belli’s clients have included Jack Ruby, who killed John F. Kennedy assassin Lee Harvey Oswald, and Jim and Tammy Bakker.”
What can current Q&A systems do well? Answer factoid questions –What was the name of the first Russian astronaut to do a spacewalk? –Where is Belize located? –How much folic acid should an expectant mother get daily? –What type of bridge is the Golden Gate Bridge? Best system performance (TREC-10): 66%.
What can current systems not do well? Answer complex questions: –What do you know about Bill Clinton? Answer rhetorical questions: –What were the causes of the war in Yugoslavia? Find answers in foreign-language documents. Assist users in –exploring large textual collections; –aggregating the information they mine to enable subsequent analysis. Adapt to users’ preferences and knowledge.
The TextMap Approach Put the user in the driver’s seat: –let the user decide how complex questions should be decomposed into simple questions and how answers to simple questions should be aggregated; –log all steps to enable automatic learning of complex question decomposition and answer matching. Pre-annotate! –Syntax, Shallow semantics (Named Entities), Ontologies, Discourse.
TextMap Scenarios Scenario 1: –Start with simple questions: When was Mullah Mohammad Rabbani born? Where did Mullah Mohammad Rabbani get his education? What is the highest position Mullah Mohammad Rabbani had in the Afghan government? –Use answers to search “adjacent” information spaces: What are the political views of Mullah Mohammad Rabbani? –Aggregate answers according to user-defined criteria, to form a coherent answer.
TextMap Scenarios Scenario 2: –Start with complex questions: Construct the biography of Mullah Mohammad Rabbani. –Automatically decompose complex questions into simple ones: When was Mullah Mohammad Rabbani born? Where did Mullah Mohammad Rabbani get his education? What is the highest position Mullah Mohammad Rabbani had in the Afghan government? –Automatically aggregate answers, using previously observed / learned patterns.
Resources at ISI (1) Webclopedia—Q&A answering system: Software –CONTEX, a syntactic/semantic parser [Hermjakob, 2001] –Query-formation module, which includes stemming, query expansion, and other preprocessing routines –MG, an Information Retrieval engine (Sydney University) –text segmenters and text rankers to determine likelihood that segments contain answers –IdentiFinder (BBN’s Named Entitity recognizer) –Answers modules to find and present the answers Additional resources –Typology of Question/Answer types –18,000+ questions from answers.com
Resources at ISI (2) Summarization: –SUMMARIST and NeATS (single- and multi-document summarizers). –SEE (Summarization Evaluation Interface). Discourse processing: –Discourse parser and discourse-based summarizer. –Corpus of discourse trees. Machine Translation: –ReWrite: Statistical-based machine translation system (learner + decoder). –Parallel and comparable corpora.
Development plan – Year 1 Build TextMap Interface and integrate Webclopedia capabilities into it. Annotate massive amounts of texts with syntactic, semantic, discourse tags. Develop rhetorical question-answering capabilities (focus initially on answering causal questions). Develop complex question-answering capabilities (focus initially on answering event descriptions and biographical questions). Query expansion for foreign names, covering spelling variants.
Development plan – Year 2 Improve TextMap Interface to learn from user feedback. Extend simple, rhetorical, and complex question- answering capabilities. Exploit system logs in order to learn question- answering decompositions and question-answering patterns. Translation of names, locations, etc., to provide English indices to foreign-language documents.
Main problem We don’t know how to evaluate! –Want to automate if possible, but have to figure out how to remove the user from the task.