AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.

Slides:



Advertisements
Similar presentations
Fatma Y. ELDRESI Fatma Y. ELDRESI ( MPhil ) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis,
Advertisements

Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Chapter 5: Introduction to Information Retrieval
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
QA-LaSIE Components The question document and each candidate answer document pass through all nine components of the QA-LaSIE system in the order shown.
Natural Language Processing WEB SEARCH ENGINES August, 2002.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
Evaluating Search Engine
SMS-Based Web Search for Low-end Mobile Devices Jay Chen New York University Lakshmi Subramanian New York University Eric Brewer University of California.
Search Engines and Information Retrieval
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
Anatomy of a Large-Scale Hypertextual Web Search Engine (e.g. Google)
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Search engines. The number of Internet hosts exceeded in in in in in
Employing Two Question Answering Systems in TREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation.
Internet Research Search Engines & Subject Directories.
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
 Official Site: facility.org/research/evaluation/clef-ip-10http:// facility.org/research/evaluation/clef-ip-10.
CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.
Search Engines and Information Retrieval Chapter 1.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Tag Data and Personalized Information Retrieval 1.
Internet Business Foundations © 2004 ProsoftTraining All rights reserved.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
A Language Independent Method for Question Classification COLING 2004.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
QUESTION AND ANSWERING. Overview What is Question Answering? Why use it? How does it work? Problems Examples Future.
Presenter: Shanshan Lu 03/04/2010
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan.
1 INTEGRATION OF THE TEXTUAL DATA FOR INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC INFORMATION OF VICINITY Omar LAROUK ELICO -ENS SIB University of Lyon-France.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Date: 2013/10/23 Author: Salvatore Oriando, Francesco Pizzolon, Gabriele Tolomei Source: WWW’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang SEED:A Framework.
CIW Lesson 6MBSH Mr. Schmidt1.  Define databases and database components  Explain relational database concepts  Define Web search engines and explain.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Using Semantic Relations to Improve Information Retrieval
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
General Architecture of Retrieval Systems 1Adrienn Skrop.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Lesson 6: Databases and Web Search Engines
CIW Lesson 6 Web Search Engines.
Search Engines & Subject Directories
Lesson 6: Databases and Web Search Engines
Search Engines & Subject Directories
Search Engines & Subject Directories
Information Retrieval and Web Design
Presentation transcript:

AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002

ABSTRACT AnswerBus is an open-domain question answering system based on sentence level Web information retrieval. It accepts users’ natural-language questions in English, German, French, Spanish, Italian and Portuguese and provides answers in English. Five search engines and directories are used to retrieve relevant Web pages. MRR=70.5% for TREC-8’s 200 questions.

Introduction Researchers have experimented with QA systems based on –closed, pre-tagged corpora –knowledge bases –Text REtrieval Conference (TREC) tasks Recent open-domain QA systems on WWW: –LCC, QuASM, IONAUT, START and Webclopedia

AnswerBus Questions: –in natural language –in English, German, French, Spanish, Italian and Portuguese Answers: –from the Web –via Google, Yahoo, WiseNut, AltaVista, and Yahoo News

Working process of AnswerBus A simple language recognition module will determine whether the question is in English, If not, AltaVista’s translation tool BabelFish is used to translate it into English. 1.select two or three search engines among five for information retrieval 2.contact the search engines and retrieve documents referred at the top of the hit lists 3.extract sentences that potentially contain answers from the documents 4.rank the answers and return the sentences of top choices with contextual URL links to the user.

Search Engine Selection Different search engines or directories may suit different types of questions. –for current events, Yahoo News may be a better choice than Google Determination –pre-answer 2000 questions –record words in each question together with correct answers returned by each search engine –given a new query “word 1 word 2 ” word 1 : (Google, 7 answers), (AltaVista, 4 answers) word 2 : (Google, 8 answers), (AltaVista, 6 answers) –Google (7+8) is chosen this time.

Relevant Document Retrieval AnswerBus aims to retrieve enough relevant documents from search engines within an acceptable response time. The main tasks are to select one or more appropriate search engines for a specific user question. Then form the queries. –Functional words deletion (of, in, …) –Frequently used words deletion –Special words deletion (give me, name one…) –Word form modification (Who did … end? → ended)

Candidate Answer Extraction AnswerBus first parses the documents into sentences and then determines whether sentence is an answer candidate. Two classes of words in a question: –matching words: words also in the query –non-matching words: words not in the query Filtering –sentences not matching the following formula are filtered out.

Filtering Formula q is the number of matching words in the sentence Q is the total number of matching words in the query –Ex: if a query is of 3 words long, then only sentences which match 2 or more words are kept for answer ranking. Sentences which contain no non-matching words are also dropped. Sentences ended with ‘?’ are also dropped.

Answer Ranking Other factors: –the determination of question type and use of a QA specific dictionary –named entities extraction –coreference resolution The final score is a combination of the primary score and the influence of all the different factors.

Question Type and QA specific dictionary “How far …?” and “How close …?” –Qtype: DISTANCE In QA specific dictionary: –“How close” unit: mile, kilometer, light year, inch, centimeter,… –“How far” unit: all above except short unit, such as inch, centimeter…

Dynamic Named Entities Extraction The speed of a normal NE tagging technique is 100M/hour. –For one question, 50 HTML documents  1M bytes needs 36 seconds. AnswerBus conducts dynamic named entities extraction, which extracts only the named entities that match question types.

Coreference Resolution AnswerBus only solves the coreferences in the adjacent sentences. –“he”, “they”… When this type of coreference is detected, the later sentence receives part of score from its previous sentence.

Hit Position and Search Engine Confidence A sentence extracted from the first hit receives the highest score. The score decreases according to the position. Documents returned by different search engines may also receive different scores. Redundant sentences from different search engines are removed.

Evaluation Questions: 200 TREC-8 questions Comparing systems (via Internet): START, LCC, IONAUT, and QuASM Answers are judged manually. In the following table, T refers to Time, and L the Length of answers.

The Performance of Online Question Answering Systems

Future Work Answer generation –An ideal QA system should be able to extract the exact answer or summarize the potential answers. QA specific indexing –instead of general search engines New question set –TREC questions are not designed for Web- based QA systems.