Presentation is loading. Please wait.

Presentation is loading. Please wait.

From Question-Answering to Information-Seeking Dialogs Jerry R. Hobbs USC Information Sciences Institute Marina del Rey, California (with Chris Culy, Douglas.

Similar presentations


Presentation on theme: "From Question-Answering to Information-Seeking Dialogs Jerry R. Hobbs USC Information Sciences Institute Marina del Rey, California (with Chris Culy, Douglas."— Presentation transcript:

1 From Question-Answering to Information-Seeking Dialogs Jerry R. Hobbs USC Information Sciences Institute Marina del Rey, California (with Chris Culy, Douglas Appelt, David Israel, Peter Jarvis, David Martin, Mark Stickel, and Richard Waldinger of SRI)

2 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI2 Key Ideas 1. Logical analysis/decomposition of questions into component questions, using a reasoning engine 2. Bottoming out in variety of web resources and information extraction engine 3. Use of component questions to drive subsequent dialogue, for elaboration, revision, and clarification 4. Use of analysis of questions to determine, formulate, and present answers.

3 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI3 Plan of Attack Inference-Based System: Inference for Question-Answering -- this year Inference for Dialog Structure -- beginning now Incorporate Resources: Geographical Reasoning -- this year Temporal Reasoning -- this summer Agent and action ontology -- this summer Document retrieval and information extraction for question-answering -- beginning now

4 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI4 An Information-Seeking Scenario How safe is the Mascat harbor for refueling US Navy ships? What recent terrorist incidents in Oman? Are relations between Oman and US friendly? How secure is the Mascat harbor? IR + IE Engine for searching recent news feeds Find map of harbor from DAML-encoded Semantic Web/Intelink Ask Analyst Question Decomposition via Logical Rules Resources Attached to Reasoning Process Asking User is one such Resource

5 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI5 Composition of Information from Multiple Sources How far is it from Mascat to Kandahar? What is the lat/long of Mascat? What is the distance between the two lat/longs? What is the lat/long of Kandahar? Alexandrian Digital Library Gazetteer Geographical Formula or www.nau.edu/~cvm/latlongdist.html Question Decomposition via Logical Rules Resources Attached to Reasoning Process Alexandrian Digital Library Gazetteer GEMINI SNARK

6 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI6 Composition of Information from Multiple Sources Show me the region 100 km north of the capital of Afghanistan. What is the capital of Afghanistan? What is the lat/long 100 km north? What is the lat/long of Kabul? CIA Fact Book Geographical Formula Question Decomposition via Logical Rules Alexandrian Digital Library Gazetteer Show that lat/long Terravision Resources Attached to Reasoning Process

7 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI7 Combining Time, Space, and Personal Information Could Mohammed Atta have met with an Iraqi official between 1998 and 2001? IE Engine Geographical Reasoning Question Decomposition via Logical Rules Resource Attached to Reasoning Process meet(a,b,t) & 1998  t  2001 at(a,x 1,t) & at(b,x 2,t) & near(x 1,x 2 ) & official(b,Iraq) go(a,x 1,t)go(b,x 2,t) IE Engine Temporal Reasoning Logical Form

8 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI8 System Architecture GEMINI SNARK Query Logical Form Web Resources Other Resources parsing decomposition and interpretation Proof with Answer

9 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI9 Two Central Systems GEMINI: Large unification grammar of English Under development for more than a decade Fast parser Generates logical forms Used in ATIS and CommandTalk SNARK: Large, efficient theorem prover Under development for more than a decade Built-in temporal and spatial reasoners Procedural attachment, incl for web resources Extracts answers from proofs Strategic controls for speed-up

10 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI10 Linguistic Variation How far is Mascat from Kandahar? How far is it from Mascat to Kandahar? How far is it from Kandahar to Mascat? How far is it betweeen Mascat and Kandahar? What is the distance from Mascat to Kandahar? What is the distance between Mascat and Kandahar? GEMINI parses and produces logical forms for most TREC-type queries Use TACITUS and FASTUS lexicons to augment GEMINI lexicon Unknown word guessing based on "morphology" and immediate context

11 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI11 "Snarkification" Problem: GEMINI produces logical forms not completely aligned with what SNARK theories need Current solution: Write simplification code to map from one to the other Long-term solution: Logical forms that are aligned better

12 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI12 Relating Lexical Predicates to Core Theory Predicates "... distance..." "how far..." distance-between Need to write these axioms for every domain we deal with Have illustrative examples

13 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI13 Decomposition of Questions lat-long(l 1,x) & lat-long(l 2,y) & lat-long-distance(d,l 1,l 2 ) --> distance-between(d,x,y) Need axioms relating core theory predicates and predicates from available resources Have illustrative examples

14 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI14 Procedural Attachment Declaration for certain predicates: There is a procedure for proving it Which arguments are required before called lat-long(l 1,x) lat-long-distance(d,l 1,l 2 ) When predicate with those arguments bound is generated in proof, procedure is exectuted.

15 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI15 Open Agent Architecture OAA Agent GEMINI snarkify SNARK Resources via OAA Agents

16 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI16 Use of SMART + TextPro Question Subquestion-1 Other Resources Question Decomposition via Logical Rules Resources Attached to Reasoning Process Subquestion-2 Subquestion-3 SMART + TextPro One Resource Among Many

17 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI17 Information Extraction Engine as a Resource Document retrieval for pre-processing TextPro: Top of the line information extraction engine recognizes subject-verb-object, coref rels Analyze NL query w GEMINI and SNARK Bottom out in a pattern for TextPro to seek Keyword search on very large corpus TextPro runs over documents retrieved

18 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI18 Linking SNARK with TextPro TextSearch(EntType(?x), Terms(p), Terms(c), WSeq) & Analyze(WSeq, p(?x,c)) --> p(?x,c) Call to TextPro Type of questioned constituent Synonyms and hypernyms of word associated with p or c Answer: Ordered sequence of annotated strings of words Match pieces of annotated answer strings with pieces of query Subquery generated by SNARK during analysis of query

19 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI19 Three Modes of Operation for TextPro 1.Search for predefined patterns and relations (ACE-style) and translate relations into SNARK's logic Where does the CEO of IBM live? 2.Search for subject-verb-object relations in processed text that matches predicate-argument structure of SNARK's logical expression "Samuel Palmisano is CEO of IBM." 3.Search for passage with highest density of relevant words and entity of right type for answer "Samuel Palmisano.... CEO.... IBM." Use coreference links to get most informative answer ACE Role and AT Relations

20 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI20 First Mode TextSearch(Person, Terms(CEO), Terms(IBM), WSeq) & Analyze(WSeq, Role(?x,Management,IBM,CEO)) --> CEO(?x,IBM) CEO(Samuel Palmisano,IBM) Analyze Entity1: {Samuel Palmisano, Palmisano, head, he} Entity2: {IBM, International Business Machines, they} Relation: Role(Entity1,Entity2, Management,CEO) CEO

21 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI21 Three Modes of Operation for TextPro 1.Search for predefined patterns (MUC-style) and translate template into SNARK's logic Where does the CEO of IBM live? 2.Search for subject-verb-object relations in processed text that matches predicate-argument structure of SNARK's logical expression "Samuel Palmisano is CEO of IBM." 3.Search for passage with highest density of relevant words and entity of right type for answer "Samuel Palmisano.... CEO.... IBM." Use coreference links to get most informative answer

22 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI22 Second Mode TextSearch(Person, Terms(CEO), Terms(IBM), WSeq) & Analyze(WSeq, CEO(?x,IBM)) --> CEO(?x,IBM) " Samuel Palmisano heads IBM " CEO(Samuel Palmisano,IBM) Analyze

23 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI23 Three Modes of Operation for TextPro 1.Search for predefined patterns (MUC-style) and translate template into SNARK's logic Where does the CEO of IBM live? 2.Search for subject-verb-object relations in processed text that matches predicate-argument structure of SNARK's logical expression "Samuel Palmisano is CEO of IBM." 3.Search for passage with highest density of relevant words and entity of right type for answer "Samuel Palmisano.... CEO.... IBM." Use coreference links to get most informative answer

24 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI24 Third Mode TextSearch(Person, Terms(CEO), Terms(IBM), WSeq) & Analyze(WSeq, CEO(?x,IBM)) --> CEO(?x,IBM) " He has recently been rumored to have been appointed Lou Gerstner's successor as CEO of the major computer maker nicknamed Big Blue " CEO(Samuel Palmisano,IBM) Analyze " Samuel Palmisano...." coref

25 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI25 Domain-Specific Patterns Decide upon domain (e.g., nonproliferation) Compile list of principal properties and relations of interest Implement these patterns in TextPro Implement link between TextPro and SNARK, converting between templates and logic

26 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI26 Challenges Cross-document identification of individuals Document 1: Osama bin Laden Document 2: bin Laden Document 3: Usama bin Laden Do entities with the same or similar names represent the same individual? Metonymy Text: Beijing approved the UN resolution on Iraq. Query involves “China”, not “Beijing”

27 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI27 DAML Search Engine pred: arg1: arg2:Indonesia ?x capitalnamespace Searches entire (soon to be exponentially growing) Semantic Web Also conjunctive queries: population of capital of Indonesia Problem: you have to know logic and RDF to use it. Tecknowledge has developed:

28 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI28 DAML Search Engine as AQUAINT Web Resource pred: arg1: arg2:Indonesia ?x capitalnamespace Searches entire (soon to be exponentially growing) Semantic Web Solution: You only have to know English to use it; Makes the entire Semantic Web accessible to AQUAINT users. AQUAINT System capital(?x,Indonesia) procedural attachment in SNARK

29 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI29 Temporal Reasoning: Structure Topology of Time: start, end, before, between Measures of Duration: for an hour,... Clock and Calendar: 3:45pm, Wednesday, June 12 Temporal Aggregates: every other Wednesday Deictic Time: last year,...

30 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI30 Temporal Reasoning: Goals Develop temporal ontology (DAML) Reason about time in SNARK (AQUAINT, DAML) Link with Temporal Annotation Language TimeML (AQUAINT) Answer questions with temporal component (AQUAINT) Nearly complete In progress

31 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI31 Convergence DAML Annotation of Temporal Information on Web (DAML-Time) Annotation of Temporal Information in Text (TimeML) Most information on Web is in text The two annotation schemes should be intertranslatable

32 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI32 TimeML Annotation Scheme (An Abstract View) 2001 6 mos Sept 11 warning clock & calendar intervals & instants intervals inclusion before durations instantaneous events

33 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI33 TimeML Example The top commander of a Cambodian resistance force said Thursday he has sent a team to recover the remains of a British mine removal expert kidnapped and presumed killed by Khmer Rouge guerrillas two years ago. resist command sent recover Thursday saidnow remove kidnap 2 years presumed killed remain

34 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI34 Vision Manual DAML temporal annotation of web resources Manual temporal annotation of large NL corpus Programs for automatic temporal annotation of NL text Automatic DAML temporal annotation of web resources

35 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI35 Spatial and Geographical Reasoning: Structure Topology of Space: Is Albania a part of Europe? Dimensionality Measures: How large is North Korea? Orientation and Shape: What direction is Monterey from SF? Latitude and Longitude: Alexandrian Digital Library Gazetteer Political Divisions: CIA World Fact Book,...

36 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI36 Spatial and Geographical Reasoning: Goals Develop spatial and geographical ontology (DAML) Reason about space and geography in SNARK (AQUAINT, DAML) Attach spatial and geographical resources (AQUAINT) Answer questions with spatial component (AQUAINT) Some capability now

37 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI37 Rudimentary Ontology of Agents and Actions Persons and their properties and relations: name, alias, (principal) residence family and friendship relationships movements and interactions Actions/events: types of actions/events preconditions and effects

38 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI38 Domain-Dependent Ontologies Nonproliferation data and task Construct relevant ontologies

39 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI39 Dialog Modeling: Approaching It Top Down Key Idea: System matches user's utterance with one of several active tasks. Understanding dialog is one active task. Rules of form: property(situation) --> active(Task 1 ) including utter(u,w) --> active(DialogTask) want(u,Task 1 ) --> active(Task 1 ) Understanding is matching utterance (conjunction of predications) with an active task or the condition of an inactive task.

40 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI40 Dialog Task Model understand(a,e,t): hear(a,w) & parse(w,e) & match(e,t) yes Action determined by utterance and task no -- x unmatched Ask about x

41 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI41 Dialog Modeling: Approaching It Bottom Up identify[x | p(x)] ==> identify[x | p(x) & q(x)] Clarification: Show me St Petersburg. Florida or Russia? Refinement: Show me a lake in Israel. Bigger than 100 sq mi. identify[x | p(x)] ==> identify[x | p 1 (x)], where p and p 1 are related Further properties: What's the area of the Dead Sea? The depth? Change of parameter: Show me a lake in Israel. Jordan. Correction: Show me Bryant, Texas. Bryan. identify[y | y=f(x)] ==> identify[z | z=g(y)] Piping: What is the capital of Oman? What's its population? Challenge: Narrowing in on information need.

42 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI42 Fixed-Domain QA Evaluation: Why? Who is Colin Powell? What is naproxen? Broad range of domains ==> shallow processing Relatively small fixed domain ==> possibility of deeper processing

43 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI43 Fixed-Domain QA Evaluation Pick a domain, e.g., nonproliferation Pick a set of resources, including a corpus of texts, structured databases, web services Pick 3-4 pages of Text in domain (to constrain knowledge) Have expert make up 200+ realistic questions, answerable with Text + non-NL resources + inference (maybe + explicit NL resources) Divide questions into training and test sets Give sites one month+ to work on training set Test on test set and analyze results

44 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI44 Some Issues Range of questions from easy to impossible Form of questions: question templates? let data determine -- maybe 90% manually produced logical forms? Form of answers: natural language or XML templates? Isolated questions or sequences related to fixed scenario? Some of each Community interest: Half a dozen sites might participate if difficulties worked out

45 10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI45 Next Steps Pick several candidate Texts Researchers and experts generate questions from those Texts


Download ppt "From Question-Answering to Information-Seeking Dialogs Jerry R. Hobbs USC Information Sciences Institute Marina del Rey, California (with Chris Culy, Douglas."

Similar presentations


Ads by Google