Presentation is loading. Please wait.

Presentation is loading. Please wait.

Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved. 1 15-381 Lecture, Spring 2003 Open-Domain Question Answering.

Similar presentations


Presentation on theme: "Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved. 1 15-381 Lecture, Spring 2003 Open-Domain Question Answering."— Presentation transcript:

1 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Open-Domain Question Answering Eric Nyberg Associate Professor

2 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Outline What is question answering? Typical QA pipeline Unsolved problems The JAVELIN QA architecture Related research areas These slides and links to other background material can be found here:

3 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Inputs: a question in English; a set of text and database resources Output: a set of possible answers drawn from the resources QA SYSTEM Text Corpora & RDBMS “When is the next train to Glasgow?” “8:35, Track 9.” Question Answering

4 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Ancestors of Modern QA Information Retrieval –Retrieve relevant documents from a set of keywords; search engines Information Extraction –Template filling from text (e.g. event detection); e.g. TIPSTER, MUC Relational QA –Translate question to relational DB query; e.g. LUNAR, FRED

5 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003

6 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Typical TREC QA Pipeline Question Extract Keywords Query Search Engine Corpus Docs Passage Extractor Answers Answer Selector Answer “A 50-byte passage likely to contain the desired answer” (TREC QA track) “A simple factoid question”

7 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Sample Results Mean Reciprocal Rank (MRR): Find the ordinal position of the correct answer in your output (1 st answer, 2 nd answer, etc.) and divide by one; average over entire test suite.

8 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Functional Evolution Traditional QA Systems (TREC) –Question treated like keyword query –Single answers, no understanding Q: Who is prime minister of India? A: John Smith is not prime minister of India

9 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Functional Evolution [2] Future QA Systems –System understands questions –System understands answers and interprets which are most useful –System produces sophisticated answers (list, summarize, evaluate) What other airports are near Niletown? Where can helicopters land close to the embassy?

10 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Major Research Challenges Acquiring high-quality, high-coverage lexical resources Improving document retrieval Improving document understanding Expanding to multi-lingual corpora Flexible control structure –“beyond the pipeline” Answer Justification –Why should the user trust the answer? –Is there a better answer out there?

11 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Why NLP is Required Question: “When was Wendy’s founded?” Passage candidate: –“The renowned Murano glassmaking industry, on an island in the Venetian lagoon, has gone through several reincarnations since it was founded in Three exhibitions of 20th-century Murano glass are coming up in New York. By Wendy Moonan.” Answer: 20 th Century

12 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Predicate-argument structure Q336: When was Microsoft established? Difficult because Microsoft tends to establish lots of things… Microsoft plans to establish manufacturing partnerships in Brazil and Mexico in May. Need to be able to detect sentences in which `Microsoft’ is object of `establish’ or close synonym. Matching sentence: Microsoft Corp was founded in the US in 1975, incorporated in 1981, and established in the UK in 1982.

13 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Why Planning is Required Question: What is the occupation of Bill Clinton’s wife? –No documents contain these keywords plus the answer Strategy: decompose into two questions: – Who is Bill Clinton’s wife? = X – What is the occupation of X?

14 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 JAVELIN: Justification-based Answer Valuation through Language Interpretation Carnegie Mellon Univ. (Language Technologies Institute) OBJECTIVES QA as planning by developing a glass box planning infrastructure Universal auditability by developing a detailed set of labeled dependencies that form a traceable network of reasoning steps Utility-based information fusion PLAN Address the full Q/A task: Question analysis - question typing, interpretation, refinement, clarification Information seeking - document retrieval, entity and relation extraction Multi-source information fusion - multi-faceted answers, redundancy and contradiction detection Data Repository JAVELIN GUI Question Analyzer Answer Generator Retrieval Strategist Execution Manager... search engines & document collections process history and results operator (action) models Request Filler Planner Domain Model

15 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 JAVELIN Objectives QA as Planning –Create a general QA planning system –How should a QA system represent its chain of reasoning? QA and Auditability –How can we improve a QA system’s ability to justify its steps? –How can we make QA systems open to machine learning?

16 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 JAVELIN Objectives [2] Utility-Based Information Fusion –Perceived utility is a function of many different factors –Create and tune utility metrics, e.g.: U = Argmax k [F (Rel(I,Q,T), Nov(I,T,A), Ver(S,Sup(I,S)), Div(S), Cmp(I,A)), Cst(I,A)] I: Info item, Q: Question, S: Source, T: Task context, A: Analyst - relevance - novelty - veracity, support - diversity - comprehensibility - cost

17 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Control Flow Strategic Decision Points

18 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Repository ERD (Entity Relationship Diagram)

19 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 JAVELIN User Interface

20 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Javelin Architecture Data Repository JAVELIN GUI Question Analyzer Answer Generator Retrieval Strategist Execution Manager... search engines & document collections process history and results operator (action) models Information Extractor Planner Domain Model Integrated w/XML Modules can run on different servers

21 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Module Integration Via XML DTDs for each object type Modules use simple XML object- passing protocol built on TCP/IP Execution Manager takes care of checking objects in/out of Repository

22 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Sample Log File Excerpt Components communicate via XML object representations

23 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Question Analyzer Taxonomy of question-answer types and type- specific constraints Knowledge integration Pattern matching approach for this year’s evaluation Question input (XML format) Tokenizer Token information extraction Wordnet Kantoo Lexicon Brill Tagger BBN Identifier KANTOO lexifier Token string input QA taxonomy + Type-specific constraints Get FR? Yes Event/entity template filler Request object builder FR No KANTOO grammars Parser Pattern matching Request object builder Request object + system result (XML format)

24 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Question Taxonomies Q-Types –Express relationships between events, entities and attributes –Influence Planner strategy A-Types –Express semantic type of valid answers Q-TypeA-Type When did the Titanic sink ? event- completion time- point Who was Darth Vader's son? concept- completion person- name What is thalassemia ?definition We expect to add more A-types and refine granularity

25 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Sample of Q-Type Hierarchy

26 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Sample of A-Type Hierarchy

27 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Question type Answer type Computation element Keyword set F-structure event-completion person-name order 1 first, U.S. president, appear, TV (event(subject(person-name ?) (occupation “U.S. president”)) (act appear) (order 1)(theme TV)) Request Object Who was the first U.S. president to appear on TV ?

28 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 How the Retrieval Strategist Works Inputs: –Keywords and keyphrases –Type of answer desired –Resource constraints Min/Max documents, time, etc. Outputs: –Ranked set of documents –Location of keyword matches

29 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 How the Retrieval Strategist Works Constructs sequences of queries based on a Request Object –Start with very constrained queries High quality matches, low probability of success –Progressively relax queries until search constraints are met Lower quality matches, high probability of success

30 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Sample Search Strategy Inquery Operator Type?Query #3Yes#3(Titanic #syn(sink sank) *date) #UW20Yes#UW20(Titanic #syn(sink sank) *date) : : : : : : : : : : #PASSAGE250Yes#PASSAGE250(Titanic #syn(sink sank) *date) #SUMYes#SUM(Titanic #syn(sink sank) *date) *** ** **** ** * *** ** ** *** ** *** * * * ** * *** * *** ** * ** *** ** * ** ** *** * * ** * * ** *** ** **** ** * *** ** ** *** ** *** * * * ** * *** * *** ** * ** *** ** ** *** ** *** * * * ** * *** * *** ** * ** *** ** * ** ** *** * * ** * * ** *** ** **** ** * *** ** ** *** ** *** * * * ** * *** * *** ** * ** *** ** ** *** ** *** * * * ** * *** * *** ** * ** *** ** * ** ** *** * * ** * * ** *** ** **** ** * *** ** ** *** ** *** * * * ** * *** * *** ** * ** *** **

31 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Retrieval Strategist (RS): TREC Results Analysis Success: % of questions where at least 1 answer document was found TREC 2002: Success 30 docs: 60 docs: 120 docs: ~86% Reasonable performance for a simple method, but room for improvement

32 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 RS: Ongoing Improvements Improved incremental relaxation –Searching for all keywords too restrictive Use subsets prioritized by discriminative ability –Remove duplicate documents from results Don’t waste valuable list space –15% fewer failures (229 test questions) Overall success 30 docs 83% (was 60 docs 87% (was 85%) Larger improvements unlikely without additional techniques, such as constrained query expansion Investigate constrained query expansion –WordNet, Statistical methods

33 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 What Does the Request Filler Do? Input: –Request Object (from QA module) –Document Set (from RS module) Output: –Set of extracted answers which match the desired type (Request Fill objects) –Confidence scores Role in JAVELIN: Extract possible answers & passages from documents

34 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Request Filler Steps Filter passages –Match answer type? –Contain sufficient keywords? Create variations on passages –POS tagging (Brill) –Cleansing (punctuation, tags, etc.) –Expand contractions –Reduce surface forms to lexemes Calculate feature values A classifier scores the passages, which are output with confidence scores

35 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Features Features are self-contained algorithms that score passages in different ways Example: Simple Features –# Keywords present –Normalized window size –Average distance Example: Pattern Features –cN [..] cV [..] in/on [date] –[date], iN [..] cV [..] Any procedure that returns a numeric value is a valid feature!

36 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Learning An Answer Confidence Function Supervised learning –Answer type-specific model –Aggregate model across answer types Decision Tree – C4.5 –Variable feature dependence –Fast enough to re-learn from each new instance

37 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 A When Q-Type Decision Tree % Keywords present in the passage % Keywords present in the passage Average distance Maximum scaled keyword window size > 0.75 > / /1  /91.8 > 0.2  0.2 60 > /11.6

38 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 The company said it believes the expenses of the restructuring will be recovered by the end of 1992 …the artist expressed The company said it believes … It is a misconception the Titanic sank on April the 15 th,1912 … The/DT company/NN say/VBD it/PRP believe/VBZ the/DT expense/NNS of/IN the/DT restructuring/NN will/MD be/VB recover/VBN by/IN the/DT end/NN of/IN 1992/CD … the performer expressed Microsoft said it believes … The Titanic sank on April the 15 th,1912 … Semantic Analysis Would Help

39 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Information Extractor (IX): TREC Analysis InputsAnswer in top 5 Answer in docset Trec Trec Trec If the answer is in the doc set returned by the Retrieval Strategist, does the IX module identify it as an answer candidate with a high confidence score?

40 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 IX: Current & Future Work Enrich feature space beyond surface patterns & surface statistics Perform AType-specific learning Perform adaptive semantic expansion Enhance training data quantity/quality Tune objective function

41 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 NLP for Information Extraction Simple statistical classifiers are not sufficient on their own Need to supplement statistical approach with natural language processing to handle more complex queries

42 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Example question Question: “When was Wendy’s founded?” Question Analyzer extended output: –{ temporal(?x), found(*, Wendy’s) } Passage discovered by retrieval module: –“R. David Thomas founded Wendy’s in 1969, …” Conversion to predicate form by Passage Analyzer: –{ founded(R. David Thomas, Wendy’s), DATE(1969), … } Unification of QA literals against PA literals: –Equiv(found(*,Wendy’s), founded(R. David Thomas, Wendy’s)) –Equiv(temporal(?x), DATE(1969)) –?x := 1969 Answer: 1969

43 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Answer Generator Currently last module in pipe-line. Main tasks: –Combination of different sorts of evidence for answer verification. –Detection and combination of similar answer candidates to address answer granularity. –Initiation of processing loops to gather more evidence. –Generation of answers in required format.

44 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Answer Generator input Analyzed question (RequestObject): –Question/Answer type (qtype/atype) –Number of expected answers; –Syntactic parse and keywords. Passages (RequestFills): –Marked candidates of right semantic type (right NE type); –Confidences computed using set of text-based (surface) features such as keyword placement.

45 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Answer Generator output Answer string from document (for now). Set of text passages (RequestFills) Answer Generator decided were supportive of answer. Or, requests for more information (exceptions) passed on to Planner: –“Not enough answer candidates” –“Can’t distinguish answer candidates”

46 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Types of evidence Currently implemented: Redundancy, frequency counts. –Preference given to more often occurring, normalized answer candidates. Next step: Structural information from parser. –Matching question and answer predicate-argument structure. –Detecting hypotheticals, negation, etc. Research level: Combining collection-wide statistics with ‘symbolic’ QA. –Ballpark estimates of temporal boundaries of events/states.

47 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Example Q: What year did the Titanic sink? A: 1912 Supporting evidence: It was the worst peacetime disaster involving a British ship since the Titanic sank on the 14 th of April, The Titanic sank after striking an iceberg in the North Atlantic on April 14 th, The Herald of Free Enterprise capsized off the Belgian port of Zeebrugge on March 6, 1987, in the worst peacetime disaster involving a British ship since the Titanic sank in 1912.

48 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 What happened? Different formats for answer candidates detected, normalized and combined: –`April 14 th, 1912’ –`14 th of April, 1912’ Supporting evidence detected and combined: –` 1912’ supports `April 14 th, 1912’ Structure of date expressions understood and correct piece output: –`1912’ rather than `April 14 th, 1912’ Most frequent answer candidate found and output: –` April 14 th, 1912’ rather than something else.

49 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Answer Normalization Request Filler/Answer Generator aware of NE types: dates, times, people names, company names, locations, currency expressions. `April 14 th, 1912’, `14 th of April 1912’, `14 April 1912’ instances of same date, but different strings. For date expressions, normalization performed to ISO 8601 (YYYY-MM-DD) in Answer Generator. ‘summer’, ‘last year’, etc. remain as strings.

50 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Answer Normalization Normalization enables comparison and detection of redundant or complementary answers. Define supporting evidence as piece of text expressing same or less specific information. E.g., `1912’ supports `April 12 th, 1912’. Complementary evidence: ‘1912’ complements ‘April 12 th ’. Normalization and supporting extend to other NE types: –`Clinton’ supports `Bill Clinton’; –`William Clinton’ and `Bill Clinton’ are normalized to same. –For locations, `Pennsylvania’ supports `Pittsburgh’.

51 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Other forms of evidence Q: Name all the bills that were passed during the Bush administration. Not likely to find passages mentioning `bill’, `pass’, `Bush administration’. When was Bush administration?? `Symbolic’ QA: look for explicit answer in collection, might not be present. `Statistical’ QA: look at distribution of documents mentioning Bush administration. Combining evidence of different sorts!

52 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Other forms of evidence Can we figure out if Bush administration was around when document was written? Look at tense/aspect/wording. Forward time references –Bush administration will do something Backward time references –Bush administration has done something Hypothesis: –Backward time references provide information about onset of event; –Forward time references provide information about end of event.

53 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Other forms of evidence Bush administration forward references Administration change Event end Time stamps #docs mentioning Bush adm. on given day

54 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Other forms of evidence Bush administration backward references #docs mentioning Bush adm. on given day Time stamps Administration change Event onset

55 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Planning in JAVELIN Enable generation of new question- answering strategies at run-time Improve ability to recover from bad decisions as information is collected Gain insight into when different QA components are most useful

56 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Planner Integration exe E Domain Model Planner Data Repository JAVELIN GUI module A Execution Manager process history and data JAVELIN operator (action) models module E module F... question answer ack... dialog response exe A results exe F results store

57 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Current Domain Operators RESPOND_TO_USER pre: (and (interactive_session) (request ?q ?ro) (ranked_answers ?ans ?ro ?fills) (> (max_ans_score ?ans) 0.1) (> answer_quality 0)) ASK_USER_FOR_ANSWER_TYPE pre: (and (interactive_session) (request ?q ?ro) (or (and (ranked_answers ?ans ?ro ?fills) (< (max_ans_score ?ans) 0.1)) (no_docs_found ?ro) (no_fills_found ?ro ?docs))) ASK_USER_FOR_MORE_KEYWORDS pre: (and (interactive_session) (request ?q ?ro) (or (and (ranked_answers ?ans ?ro ?fills) (< (max_ans_score ?ans) 0.1)) (no_docs_found ?ro) (no_fills_found ?ro ?docs))) QuestionAnalyzer module called as a precursor to planning Demonstrates generation of multiple search paths, feedback loops RETRIEVE_DOCUMENTS pre: (and (request ?q ?ro) (> (extracted_terms ?ro) 0) (> request_quality 0)) EXTRACT_DT_CANDIDATE_FILLS pre: (and (retrieved_docs ?docs ?ro) (== (expected_atype ?ro) location_t) (> docset_quality 0.3)) EXTRACT_KNN_CANDIDATE_FILLS pre: (and (retrieved_docs ?docs ?ro) (!= (expected_atype ?ro) location_t) (> docset_quality 0.3)) RANK_CANDIDATES pre: (and (candidate_fills ?fills ?ro ?docs) (> fillset_quality 0))

58 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Current Domain Operators RETRIEVE_DOCUMENTS (?q - question ?ro - qtype) pre: (and (request ?q ?ro) (> (extracted_terms ?ro) 0) (> request_quality 0)) dbind: ?docs (genDocsetID) ?dur (estTimeRS (expected_atype ?ro)) ?pnone (probNoDocs ?ro) ?pgood (probDocsHaveAns ?ro) ?dqual (estDocsetQual ?ro)) effects: (?pnodocs ((no_docs_found ?ro) (scale-down request_quality 2) (assign docset_quality 0) (increase system_time ?dur)) ?pgood ((retrieved_docs ?docs ?ro) (assign docset_quality ?dqual) (increase system_time ?dur)) (1-?pgood-?pnone) ((retrieved_docs ?docs ?ro) (scale-down request_quality 2) (assign docset_quality 0) (increase system_time ?dur))) execute: (RetrievalStrategist ?docs ?ro ) more detailed operator view...

59 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Illustrative Examples Where is bile produced? Overcomes current limitations of system “location” knowledge Uses answer candidate confidence to trigger feedback loop 1st iter 2nd iter Top 3 answers found during initial pass (with “location” answer type) 1: Moscow (Conf: ) 2: China (Conf: ) 3: Guangdong Province (Conf: ) Top 3 answers displayed (with user-specified “object” answer type; ‘liver’ ranked 6th) 1: gallbladder (Conf: ) 2: dollars (Conf: ) 3: stores (Conf: )

60 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Illustrative Examples Who invented the road traffic cone? Overcomes current inability to relax phrases during document retrieval Uses answer candidate confidence scores to trigger feedback loop 1st iter 2nd iter 1: Colvin (Conf: ) 2: Vladimir Zworykin (Conf: ) 3: Angela Alioto (Conf: ) Top 3 answers found during initial pass (using terms ‘invented’ and ‘road traffic cone’) Top 3 answers displayed (with additional user-specified term ‘traffic cone’; correct answer is ‘David Morgan’) 1: Morgan (Conf: ) 2: Colvin (Conf: ) 3: Angela Alioto (Conf: )

61 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Multilingual Question Answering Goals –English questions –Multilingual information sources (Jpn/Chi) –English/Multilingual Answers Extensions to existing JAVELIN modules –Question Analyzer –Retrieval Strategist –Information Extractor –Answer Generator

62 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 RS Multilingual Architecture Answer Generator Japanese Index Chinese Index Information Extractor 3 (Chinese) Question Analyzer Other Index English Index Answers ?’s Bilingual Dictionary Module Machine xlation Information Extractor 1 (English) Information Extractor 2 (Japanese) Information Extractor 4 (other lang) Encoding Converter Japanese corpora Chinese corpora other lang corpora English corpora

63 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring Project Topics Create more, better RF/IX modules –More intelligent feature extractors –Smarter classifiers –Train on different answer types –Plug in and evaluate your work in the context of the larger system End-to-end QA system –Focus on a particular question type –Utilize existing RS module for document retrieval –Evaluate on TREC test suites (subsets)

64 Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved Lecture, Spring 2003 Questions?


Download ppt "Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved. 1 15-381 Lecture, Spring 2003 Open-Domain Question Answering."

Similar presentations


Ads by Google