From Question-Answering to Information-Seeking Dialogs Jerry R. Hobbs USC Information Sciences Institute Marina del Rey, California (with Chris Culy, Douglas.

Slides:



Advertisements
Similar presentations
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Advertisements

COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Pointing at Places in a Geospatial Theory Richard Waldinger and Peter Jarvis Artificial Intelligence Center SRI International Jennifer Dungan Ecosystem.
CSE 425: Semantic Analysis Semantic Analysis Allows rigorous specification of a program’s meaning –Lets (parts of) programming languages be proven correct.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Basi di dati distribuite Prof. M.T. PAZIENZA a.a
Using Information Extraction for Question Answering Done by Rani Qumsiyeh.
Automatic Classification of Semantic Relations between Facts and Opinions Koji Murakami, Eric Nichols, Junta Mizuno, Yotaro Watanabe, Hayato Goto, Megumi.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Automatically Constructing a Dictionary for Information Extraction Tasks Ellen Riloff Proceedings of the 11 th National Conference on Artificial Intelligence,
Employing Two Question Answering Systems in TREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation.
Stimulating reuse with an automated active code search tool Júlio Lins – André Santos (Advisor) –
Chapter 5: Information Retrieval and Web Search
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Intelligent Tutoring Systems Traditional CAI Fully specified presentation text Canned questions and associated answers Lack the ability to adapt to students.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Logic Programming for Natural Language Processing Menyoung Lee TJHSST Computer Systems Lab Mentor: Matt Parker Analytic Services, Inc.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Processing of large document collections Part 10 (Information extraction: multilingual IE, IE from web, IE from semi-structured data) Helena Ahonen-Myka.
Notes for Chapter 12 Logic Programming The AI War Basic Concepts of Logic Programming Prolog Review questions.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
From Question-Answering to Information-Seeking Dialogs Jerry R. Hobbs Artificial Intelligence Center SRI International Menlo Park, California (with Douglas.
 Knowledge Acquisition  Machine Learning. The transfer and transformation of potential problem solving expertise from some knowledge source to a program.
Author: William Tunstall-Pedoe Presenter: Bahareh Sarrafzadeh CS 886 Spring 2015.
Artificial intelligence project
Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003.
Semantic Search via XML Fragments: A High-Precision Approach to IR Jennifer Chu-Carroll, John Prager, David Ferrucci, and Pablo Duboue IBM T.J. Watson.
A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.
Carnegie Mellon School of Computer Science Copyright © 2001, Carnegie Mellon. All Rights Reserved. JAVELIN Project Briefing 1 AQUAINT Phase I Kickoff December.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
June 12, 2003AQUAINT 18 Month Meeting San Diego CA Natural Language Querying of the Semantic Web SRI International Information Science Institute.
An Intelligent Analyzer and Understander of English Yorick Wilks 1975, ACM.
From Question-Answering to Information-Seeking Dialogs Jerry R. Hobbs USC/ISI Marina del Rey, CA with Douglas Appelt, David Israel, Peter Jarvis, David.
1 Just-in-Time Interactive Question Answering Language Computer Corporation Sanda Harabagiu, PI John Lehmann John Williams Paul Aarseth.
AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Sheila McIlraith, Knowledge Systems Lab DAML Kickoff 08/14/00 Mobilizing the Web with DAML-Enabled Web Services Services Team Sheila McIlraith (Technical.
Search Engine Architecture
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
From Question-Answering to Information-Seeking Dialogs Jerry R. Hobbs Artificial Intelligence Center SRI International Menlo Park, California (with Douglas.
AQUAINT Kickoff Meeting Advanced Techniques for Answer Extraction and Formulation Language Computer Corporation Dallas, Texas.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Using and modifying plan constraints in Constable Jim Blythe and Yolanda Gil Temple project USC Information Sciences Institute
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Information Retrieval
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
HITIQA: Scenario Based Question Answering Tomek Strzalkowski, et al The State University of New York at Albany Paul Kantor, et al Rutgers University Boris.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Inferring Declarative Requirements Specification from Operational Scenarios IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 24, NO. 12, DECEMBER, 1998.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
MDD-Kurs / MDA Cortex Brainware Consulting & Training GmbH Copyright © 2007 Cortex Brainware GmbH Bild 1Ver.: 1.0 How does intelligent functionality implemented.
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Architecture Components
Knowledge Representation
Social Knowledge Mining
ece 627 intelligent web: ontology and beyond
Presentation transcript:

From Question-Answering to Information-Seeking Dialogs Jerry R. Hobbs USC Information Sciences Institute Marina del Rey, California (with Chris Culy, Douglas Appelt, David Israel, Peter Jarvis, David Martin, Mark Stickel, and Richard Waldinger of SRI)

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI2 Key Ideas 1. Logical analysis/decomposition of questions into component questions, using a reasoning engine 2. Bottoming out in variety of web resources and information extraction engine 3. Use of component questions to drive subsequent dialogue, for elaboration, revision, and clarification 4. Use of analysis of questions to determine, formulate, and present answers.

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI3 Plan of Attack Inference-Based System: Inference for Question-Answering -- this year Inference for Dialog Structure -- beginning now Incorporate Resources: Geographical Reasoning -- this year Temporal Reasoning -- this summer Agent and action ontology -- this summer Document retrieval and information extraction for question-answering -- beginning now

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI4 An Information-Seeking Scenario How safe is the Mascat harbor for refueling US Navy ships? What recent terrorist incidents in Oman? Are relations between Oman and US friendly? How secure is the Mascat harbor? IR + IE Engine for searching recent news feeds Find map of harbor from DAML-encoded Semantic Web/Intelink Ask Analyst Question Decomposition via Logical Rules Resources Attached to Reasoning Process Asking User is one such Resource

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI5 Composition of Information from Multiple Sources How far is it from Mascat to Kandahar? What is the lat/long of Mascat? What is the distance between the two lat/longs? What is the lat/long of Kandahar? Alexandrian Digital Library Gazetteer Geographical Formula or Question Decomposition via Logical Rules Resources Attached to Reasoning Process Alexandrian Digital Library Gazetteer GEMINI SNARK

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI6 Composition of Information from Multiple Sources Show me the region 100 km north of the capital of Afghanistan. What is the capital of Afghanistan? What is the lat/long 100 km north? What is the lat/long of Kabul? CIA Fact Book Geographical Formula Question Decomposition via Logical Rules Alexandrian Digital Library Gazetteer Show that lat/long Terravision Resources Attached to Reasoning Process

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI7 Combining Time, Space, and Personal Information Could Mohammed Atta have met with an Iraqi official between 1998 and 2001? IE Engine Geographical Reasoning Question Decomposition via Logical Rules Resource Attached to Reasoning Process meet(a,b,t) & 1998  t  2001 at(a,x 1,t) & at(b,x 2,t) & near(x 1,x 2 ) & official(b,Iraq) go(a,x 1,t)go(b,x 2,t) IE Engine Temporal Reasoning Logical Form

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI8 System Architecture GEMINI SNARK Query Logical Form Web Resources Other Resources parsing decomposition and interpretation Proof with Answer

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI9 Two Central Systems GEMINI: Large unification grammar of English Under development for more than a decade Fast parser Generates logical forms Used in ATIS and CommandTalk SNARK: Large, efficient theorem prover Under development for more than a decade Built-in temporal and spatial reasoners Procedural attachment, incl for web resources Extracts answers from proofs Strategic controls for speed-up

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI10 Linguistic Variation How far is Mascat from Kandahar? How far is it from Mascat to Kandahar? How far is it from Kandahar to Mascat? How far is it betweeen Mascat and Kandahar? What is the distance from Mascat to Kandahar? What is the distance between Mascat and Kandahar? GEMINI parses and produces logical forms for most TREC-type queries Use TACITUS and FASTUS lexicons to augment GEMINI lexicon Unknown word guessing based on "morphology" and immediate context

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI11 "Snarkification" Problem: GEMINI produces logical forms not completely aligned with what SNARK theories need Current solution: Write simplification code to map from one to the other Long-term solution: Logical forms that are aligned better

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI12 Relating Lexical Predicates to Core Theory Predicates "... distance..." "how far..." distance-between Need to write these axioms for every domain we deal with Have illustrative examples

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI13 Decomposition of Questions lat-long(l 1,x) & lat-long(l 2,y) & lat-long-distance(d,l 1,l 2 ) --> distance-between(d,x,y) Need axioms relating core theory predicates and predicates from available resources Have illustrative examples

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI14 Procedural Attachment Declaration for certain predicates: There is a procedure for proving it Which arguments are required before called lat-long(l 1,x) lat-long-distance(d,l 1,l 2 ) When predicate with those arguments bound is generated in proof, procedure is exectuted.

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI15 Open Agent Architecture OAA Agent GEMINI snarkify SNARK Resources via OAA Agents

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI16 Use of SMART + TextPro Question Subquestion-1 Other Resources Question Decomposition via Logical Rules Resources Attached to Reasoning Process Subquestion-2 Subquestion-3 SMART + TextPro One Resource Among Many

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI17 Information Extraction Engine as a Resource Document retrieval for pre-processing TextPro: Top of the line information extraction engine recognizes subject-verb-object, coref rels Analyze NL query w GEMINI and SNARK Bottom out in a pattern for TextPro to seek Keyword search on very large corpus TextPro runs over documents retrieved

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI18 Linking SNARK with TextPro TextSearch(EntType(?x), Terms(p), Terms(c), WSeq) & Analyze(WSeq, p(?x,c)) --> p(?x,c) Call to TextPro Type of questioned constituent Synonyms and hypernyms of word associated with p or c Answer: Ordered sequence of annotated strings of words Match pieces of annotated answer strings with pieces of query Subquery generated by SNARK during analysis of query

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI19 Three Modes of Operation for TextPro 1.Search for predefined patterns and relations (ACE-style) and translate relations into SNARK's logic Where does the CEO of IBM live? 2.Search for subject-verb-object relations in processed text that matches predicate-argument structure of SNARK's logical expression "Samuel Palmisano is CEO of IBM." 3.Search for passage with highest density of relevant words and entity of right type for answer "Samuel Palmisano.... CEO.... IBM." Use coreference links to get most informative answer ACE Role and AT Relations

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI20 First Mode TextSearch(Person, Terms(CEO), Terms(IBM), WSeq) & Analyze(WSeq, Role(?x,Management,IBM,CEO)) --> CEO(?x,IBM) CEO(Samuel Palmisano,IBM) Analyze Entity1: {Samuel Palmisano, Palmisano, head, he} Entity2: {IBM, International Business Machines, they} Relation: Role(Entity1,Entity2, Management,CEO) CEO

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI21 Three Modes of Operation for TextPro 1.Search for predefined patterns (MUC-style) and translate template into SNARK's logic Where does the CEO of IBM live? 2.Search for subject-verb-object relations in processed text that matches predicate-argument structure of SNARK's logical expression "Samuel Palmisano is CEO of IBM." 3.Search for passage with highest density of relevant words and entity of right type for answer "Samuel Palmisano.... CEO.... IBM." Use coreference links to get most informative answer

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI22 Second Mode TextSearch(Person, Terms(CEO), Terms(IBM), WSeq) & Analyze(WSeq, CEO(?x,IBM)) --> CEO(?x,IBM) " Samuel Palmisano heads IBM " CEO(Samuel Palmisano,IBM) Analyze

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI23 Three Modes of Operation for TextPro 1.Search for predefined patterns (MUC-style) and translate template into SNARK's logic Where does the CEO of IBM live? 2.Search for subject-verb-object relations in processed text that matches predicate-argument structure of SNARK's logical expression "Samuel Palmisano is CEO of IBM." 3.Search for passage with highest density of relevant words and entity of right type for answer "Samuel Palmisano.... CEO.... IBM." Use coreference links to get most informative answer

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI24 Third Mode TextSearch(Person, Terms(CEO), Terms(IBM), WSeq) & Analyze(WSeq, CEO(?x,IBM)) --> CEO(?x,IBM) " He has recently been rumored to have been appointed Lou Gerstner's successor as CEO of the major computer maker nicknamed Big Blue " CEO(Samuel Palmisano,IBM) Analyze " Samuel Palmisano...." coref

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI25 Domain-Specific Patterns Decide upon domain (e.g., nonproliferation) Compile list of principal properties and relations of interest Implement these patterns in TextPro Implement link between TextPro and SNARK, converting between templates and logic

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI26 Challenges Cross-document identification of individuals Document 1: Osama bin Laden Document 2: bin Laden Document 3: Usama bin Laden Do entities with the same or similar names represent the same individual? Metonymy Text: Beijing approved the UN resolution on Iraq. Query involves “China”, not “Beijing”

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI27 DAML Search Engine pred: arg1: arg2:Indonesia ?x capitalnamespace Searches entire (soon to be exponentially growing) Semantic Web Also conjunctive queries: population of capital of Indonesia Problem: you have to know logic and RDF to use it. Tecknowledge has developed:

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI28 DAML Search Engine as AQUAINT Web Resource pred: arg1: arg2:Indonesia ?x capitalnamespace Searches entire (soon to be exponentially growing) Semantic Web Solution: You only have to know English to use it; Makes the entire Semantic Web accessible to AQUAINT users. AQUAINT System capital(?x,Indonesia) procedural attachment in SNARK

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI29 Temporal Reasoning: Structure Topology of Time: start, end, before, between Measures of Duration: for an hour,... Clock and Calendar: 3:45pm, Wednesday, June 12 Temporal Aggregates: every other Wednesday Deictic Time: last year,...

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI30 Temporal Reasoning: Goals Develop temporal ontology (DAML) Reason about time in SNARK (AQUAINT, DAML) Link with Temporal Annotation Language TimeML (AQUAINT) Answer questions with temporal component (AQUAINT) Nearly complete In progress

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI31 Convergence DAML Annotation of Temporal Information on Web (DAML-Time) Annotation of Temporal Information in Text (TimeML) Most information on Web is in text The two annotation schemes should be intertranslatable

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI32 TimeML Annotation Scheme (An Abstract View) mos Sept 11 warning clock & calendar intervals & instants intervals inclusion before durations instantaneous events

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI33 TimeML Example The top commander of a Cambodian resistance force said Thursday he has sent a team to recover the remains of a British mine removal expert kidnapped and presumed killed by Khmer Rouge guerrillas two years ago. resist command sent recover Thursday saidnow remove kidnap 2 years presumed killed remain

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI34 Vision Manual DAML temporal annotation of web resources Manual temporal annotation of large NL corpus Programs for automatic temporal annotation of NL text Automatic DAML temporal annotation of web resources

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI35 Spatial and Geographical Reasoning: Structure Topology of Space: Is Albania a part of Europe? Dimensionality Measures: How large is North Korea? Orientation and Shape: What direction is Monterey from SF? Latitude and Longitude: Alexandrian Digital Library Gazetteer Political Divisions: CIA World Fact Book,...

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI36 Spatial and Geographical Reasoning: Goals Develop spatial and geographical ontology (DAML) Reason about space and geography in SNARK (AQUAINT, DAML) Attach spatial and geographical resources (AQUAINT) Answer questions with spatial component (AQUAINT) Some capability now

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI37 Rudimentary Ontology of Agents and Actions Persons and their properties and relations: name, alias, (principal) residence family and friendship relationships movements and interactions Actions/events: types of actions/events preconditions and effects

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI38 Domain-Dependent Ontologies Nonproliferation data and task Construct relevant ontologies

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI39 Dialog Modeling: Approaching It Top Down Key Idea: System matches user's utterance with one of several active tasks. Understanding dialog is one active task. Rules of form: property(situation) --> active(Task 1 ) including utter(u,w) --> active(DialogTask) want(u,Task 1 ) --> active(Task 1 ) Understanding is matching utterance (conjunction of predications) with an active task or the condition of an inactive task.

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI40 Dialog Task Model understand(a,e,t): hear(a,w) & parse(w,e) & match(e,t) yes Action determined by utterance and task no -- x unmatched Ask about x

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI41 Dialog Modeling: Approaching It Bottom Up identify[x | p(x)] ==> identify[x | p(x) & q(x)] Clarification: Show me St Petersburg. Florida or Russia? Refinement: Show me a lake in Israel. Bigger than 100 sq mi. identify[x | p(x)] ==> identify[x | p 1 (x)], where p and p 1 are related Further properties: What's the area of the Dead Sea? The depth? Change of parameter: Show me a lake in Israel. Jordan. Correction: Show me Bryant, Texas. Bryan. identify[y | y=f(x)] ==> identify[z | z=g(y)] Piping: What is the capital of Oman? What's its population? Challenge: Narrowing in on information need.

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI42 Fixed-Domain QA Evaluation: Why? Who is Colin Powell? What is naproxen? Broad range of domains ==> shallow processing Relatively small fixed domain ==> possibility of deeper processing

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI43 Fixed-Domain QA Evaluation Pick a domain, e.g., nonproliferation Pick a set of resources, including a corpus of texts, structured databases, web services Pick 3-4 pages of Text in domain (to constrain knowledge) Have expert make up 200+ realistic questions, answerable with Text + non-NL resources + inference (maybe + explicit NL resources) Divide questions into training and test sets Give sites one month+ to work on training set Test on test set and analyze results

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI44 Some Issues Range of questions from easy to impossible Form of questions: question templates? let data determine -- maybe 90% manually produced logical forms? Form of answers: natural language or XML templates? Isolated questions or sequences related to fixed scenario? Some of each Community interest: Half a dozen sites might participate if difficulties worked out

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI45 Next Steps Pick several candidate Texts Researchers and experts generate questions from those Texts