An Introduction to Information Retrieval and Question Answering Jimmy Lin College of Information Studies University of Maryland Wednesday, December 8,

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Lecture 11 Search, Corpora Characteristics, & Lucene Introduction.
INFM 603: Information Technology and Organizational Context Jimmy Lin The iSchool University of Maryland Thursday, November 7, 2013 Session 10: Information.
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
Search Engines and Information Retrieval
Information Retrieval Review
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
LBSC 690: Week 11 Information Retrieval and Search Jimmy Lin College of Information Studies University of Maryland Monday, April 16, 2007.
Modern Information Retrieval
The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.
Information Retrieval in Practice
LBSC 690: Session 11 Information Retrieval and Search Jimmy Lin College of Information Studies University of Maryland Monday, November 19, 2007.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Evaluating the Performance of IR Sytems
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
LBSC 690 Session #9 Unstructured Information: Search Engines Jimmy Lin The iSchool University of Maryland Wednesday, October 29, 2008 This work is licensed.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Basic IR Concepts & Techniques ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Employing Two Question Answering Systems in TREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Query Expansion.
Search Engines and Information Retrieval Chapter 1.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
A Technical Seminar on Question Answering SHRI RAMDEOBABA COLLEGE OF ENGINEERING & MANAGEMENT Presented By: Rohini Kamdi Guided By: Dr. A.J.Agrawal.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Natural Language and Dialogue Systems Lab Midterm Feb 23 rd. 7 to 9PM.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
Question Answering From Zero to Hero Elena Eneva 11 Oct 2001 Advanced IR Seminar.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
1 Query Operations Relevance Feedback & Query Expansion.
Using TREC for cross-comparison between classic IR and ontology-based search models at a Web scale Miriam Fernández 1, Vanessa López 2, Marta Sabou 2,
Evaluation INST 734 Module 5 Doug Oard. Agenda  Evaluation fundamentals Test collections: evaluating sets Test collections: evaluating rankings Interleaving.
Day 14 Information Retrieval Question Answering 1.
Answering Definition Questions Using Multiple Knowledge Sources Wesley Hildebrandt, Boris Katz, and Jimmy Lin MIT Computer Science and Artificial Intelligence.
Information retrieval 1 Boolean retrieval. Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text)
QUESTION AND ANSWERING. Overview What is Question Answering? Why use it? How does it work? Problems Examples Future.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Search Engine Architecture
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Oxygen Indexing Relations from Natural Language Jimmy Lin, Boris Katz, Sue Felshin Oxygen Workshop, January, 2002.
The role of knowledge in conceptual retrieval: a study in the domain of clinical medicine Jimmy Lin and Dina Demner-Fushman University of Maryland SIGIR.
Performance Measurement. 2 Testing Environment.
Information Retrieval
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Relevance Feedback Prof. Marti Hearst SIMS 202, Lecture 24.
Cyclone Nargis, Burma/Myanmar Joonki Kim Zoe Newman.
November 8, 2005NSF Expedition Workshop Supporting E-Discovery with Search Technology Douglas W. Oard College of Information Studies and Institute for.
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
INFORMATION RETRIEVAL Pabitra Mitra Computer Science and Engineering IIT Kharagpur
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Search Engine Architecture
Multimedia Information Retrieval
Traditional Question Answering System: an Overview
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Search Engine Architecture
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Introduction to Search Engines
Presentation transcript:

An Introduction to Information Retrieval and Question Answering Jimmy Lin College of Information Studies University of Maryland Wednesday, December 8, 2004

The Information Retrieval Cycle Source Selection Search Query Selection Ranked List Examination Documents Delivery Documents Query Formulation Resource query reformulation, vocabulary learning, relevance feedback source reselection

Supporting the Search Process Source Selection Search Query Selection Ranked List Examination Documents Delivery Documents Query Formulation Resource Indexing Index Acquisition Collection

Types of Information Needs Ad hoc retrieval: find me documents “like this” Question answering Who discovered Oxygen? When did Hawaii become a state? Where is Ayer’s Rock located? What team won the World Series in 1992? Identify positive accomplishments of the Hubble telescope since it was launched in Compile a list of mammals that are considered to be endangered, identify their habitat and, if possible, specify what threatens them. What countries export oil? Name U.S. cities that have a “Shubert” theater. Who is Aaron Copland? What is a quasar? “Factoid” “List” “Definition”

IR is an Experimental Science! Formulate a research question, the hypothesis Design an experiment to answer the question Perform the experiment Compare with a baseline “control” Does the experiment answer the question? Are the results significant? Report the results! Rinse, repeat…

What experiments? Example “questions”: Does morphological analysis improve retrieval performance? Does expanding the query with synonyms improve retrieval performance? Corresponding experiments: Build a “stemmed” index and compare against “unstemmed” baseline Expand queries with synonyms and compare against baseline unexpanded query. What’s missing here?

IR Test Collections Three components of a test collection: Collection of documents (corpus) Set of information needs (topics) Sets of documents that satisfy the information needs (relevance judgments) Metrics for assessing “performance” Precision Recall Other measures derived therefrom

Where do they come from? TREC = Text REtrieval Conferences Series of annual evaluations, started in 1992 Organized into “tracks” Test collections are formed by “pooling” Gather results from all participants Corpus/topics/judgments can be reused

Roots of Question Answering Information Retrieval (IR) Information Extraction (IE)

Information Retrieval (IR) Can substitute “document” for “information” IR systems Use statistical methods Rely on frequency of words in query, document, collection Retrieve complete documents Return ranked lists of “hits” based on relevance Limitations Answers questions indirectly Does not attempt to understand the “meaning” of user’s query or documents in the collection

Information Extraction (IE) IE systems Identify documents of a specific type Extract information according to pre-defined templates Place the information into frame-like database records Templates = pre-defined questions Extracted information = answers Limitations Templates are domain dependent and not easily portable One size does not fit all! Weather disaster: Type Date Location Damage Deaths...

Central Idea of Factoid QA Determine the semantic type of the expected answer Retrieve documents that have keywords from the question Look for named-entities of the proper type near keywords “Who won the Nobel Peace Prize in 1991?” is looking for a PERSON Retrieve documents that have the keywords “won”, “Nobel Peace Prize”, and “1991” Look for a PERSON near the keywords “won”, “Nobel Peace Prize”, and “1991”

An Example But many foreign investors remain sceptical, and western governments are withholding aid because of the Slorc's dismal human rights record and the continued detention of Ms Aung San Suu Kyi, the opposition leader who won the Nobel Peace Prize in The military junta took power in 1988 as pro-democracy demonstrations were sweeping the country. It held elections in 1990, but has ignored their result. It has kept the 1991 Nobel peace prize winner, Aung San Suu Kyi - leader of the opposition party which won a landslide victory in the poll - under house arrest since July The regime, which is also engaged in a battle with insurgents near its eastern border with Thailand, ignored a 1990 election victory by an opposition party and is detaining its leader, Ms Aung San Suu Kyi, who was awarded the 1991 Nobel Peace Prize. According to the British Red Cross, 5,000 or more refugees, mainly the elderly and women and children, are crossing into Bangladesh each day. Who won the Nobel Peace Prize in 1991?

Generic QA Architecture Question Analyzer Document Retriever Passage Retriever Answer Extractor NL question IR Query Documents Passages Answers Answer Type

Question analysis Question word cues Who  person, organization, location (e.g., city) When  date Where  location What/Why/How  ?? Head noun cues What city, which country, what year... Which astronaut, what blues band,... Scalar adjective cues How long, how fast, how far, how old,...

Using WordNet wingspan length diameterradiusaltitude ceiling What is the service ceiling of an U-2? NUMBER

Extracting Named Entities Person: Mr. Hubert J. Smith, Adm. McInnes, Grace Chan Title: Chairman, Vice President of Technology, Secretary of State Country: USSR, France, Haiti, Haitian Republic City: New York, Rome, Paris, Birmingham, Seneca Falls Province: Kansas, Yorkshire, Uttar Pradesh Business: GTE Corporation, FreeMarkets Inc., Acme University: Bryn Mawr College, University of Iowa Organization: Red Cross, Boys and Girls Club

More Named Entities Currency: 400 yen, $100, DM 450,000 Linear: 10 feet, 100 miles, 15 centimeters Area: a square foot, 15 acres Volume: 6 cubic feet, 100 gallons Weight: 10 pounds, half a ton, 100 kilos Duration: 10 day, five minutes, 3 years, a millennium Frequency: daily, biannually, 5 times, 3 times a day Speed: 6 miles per hour, 15 feet per second, 5 kph Age: 3 weeks old, 10-year-old, 50 years of age

How do we extract NEs? Heuristics and patterns Fixed-lists (gazetteers) Machine learning approaches

Answer Type Hierarchy

Does it work? Where do lobsters like to live? on a Canadian airline Where do hyenas live? in Saudi Arabia in the back of pick-up trucks Where are zebras most likely found? near dumps in the dictionary Why can't ostriches fly? Because of American economic sanctions What’s the population of Maryland? three

Limitations?

Conclusion Question answering is an exciting research area! Lies at the intersection of information retrieval and natural language processing A real-world application of NLP technologies The dream: a vast repository of knowledge we can “talk to” We’re a long way from there…