Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning to Find Answers to Questions Eugene Agichtein Steve Lawrence Columbia University NEC Research Luis Gravano Columbia University.

Similar presentations


Presentation on theme: "Learning to Find Answers to Questions Eugene Agichtein Steve Lawrence Columbia University NEC Research Luis Gravano Columbia University."— Presentation transcript:

1 Learning to Find Answers to Questions Eugene Agichtein Steve Lawrence Columbia University NEC Research Luis Gravano Columbia University

2 Motivation Millions of natural language questions are submitted to web search engines daily. An increasing number of search services specifically target natural language questions AskJeeves Databases of precompiled information, metasearching, other proprietary methods AskMe.com and similar Facilitate interaction with human experts

3 Problem Statement Problem/Goal: Find documents containing answers to questions within a collection of text documents Collection: Pages on the web as indexed by a web search engine General Method: Transform questions into a set of new queries that maximize the probability of returning answers to the questions, using existing IR systems or Search Engines.

4 Example Question: “What is a hard disk?” Current search engines might ignore the stopwords, processing a query {hard, disk} and may return homepages of hard drive manufacturers. A good answer might include a definition or an explanation of what a hard disk is. Such answers are likely to contain phrases such as “… is a …”, “… is used to …”, etc… Submitting queries such as { {hard, disk} NEAR “is a” }, {{hard, disk} NEAR “is used to” }, etc., may bias the search engine to return answers to the original question.

5 Method 1. Automatically learn generally applicable question-answer transformations based on the question-answer pairs in the training collection. 2. Automatically probe each IR system (e.g., web search engine) to discover which transformations work better than others for each IR system. 3. At run-time, transform the question using the best transformations from Step 2 and submit to each IR system. Training

6 Background/Related Work Decades of NLP research on Question- Answering Manual methods-linguistics, parsing, heuristics… Learning-based methods General approach to Q-A: Find candidate documents in database (our focus) Extract answer from documents (traditional focus) Text Retrieval Evaluation Conference (TREC) Question-Answering Track Retrieve a short (50 or 250 byte) answer to a set of test questions

7 Related Work (cont.) Most systems focus on extracting answers from documents Most use variants of standard vector space or probabilistic retrieval models to retrieve documents, followed by heuristics and/or linguistics-based methods to extract best passages Evaluation has focused on questions with precise answers Abney et al., Cardie et al., …

8 Related Work (cont.) Berger et al. – independently considered statistical models for finding co-occurring terms in question/answer pairs to facilitate answer retrieval (SIGIR 2000). Lawrence, Giles (IEEE IC 1998) Queries transformed into specific ways of expressing an answer, e.g. “What is ?” is transformed into phrases such as “ is” and “ refers to”. Transformations manually coded, are same for all search engines Glover et al. (SAINT 2001) – Category-specific query modification

9 Our Contributions: The Tritus System Introduced a method for automatically learning multiple query transformations optimized for a specific information retrieval system, with the goal of maximizing the probability of retrieving documents containing the answers. Developed a prototype implementation of a meta-search engine Tritus automatically optimized for real-world web search engines. Performed a thorough evaluation of the Tritus system, comparing it to state-of-the-art search engines.

10 TrainingTritus Training Algorithm 1. Generate question phrases 2. Generate candidate transforms 3. Evaluate candidate transforms on target IR system(s) Data 30,000 question-answer pairs from 270 Frequently Asked Question (FAQ) files obtained from the FAQFinder project QuestionAnswer What is a Lisp Machine (LISPM)? A Lisp machine (or LISPM) is a computer which has been optimized to run lisp efficiently and…. What is a near-field monitor? A near field monitor is one that is designed to be…

11 Training Step 1: Generating Question Phrases Generate phrases that identify different categories of questions For example, the phrase "what is a" in the question "what is a virtual private network?" tells us the goal of the question Find commonly occurring n-grams at the beginning of questions

12 Training Step 1 (cont.) Limitations – e.g. “How do I find out what a sonic boom is?” Advantages of this approach Very inexpensive to compute (especially important at run-time) Domain and language independent, and can be extended in a relatively straightforward fashion for other European languages.

13 Training Step 2: Generating Candidate Transforms Generate candidate terms and phrases for each of the question phrases from the previous stage For each question in the training data matching the current question phrase we rank n-grams in the corresponding answers according to co-occurrence frequency. To reduce domain bias, candidate transforms with nouns were discarded using a part-of- speech tagger (Brill's) e.g., the term "telephone" is intuitively not very useful for a question "what is a rainbow?" Sample candidate transforms when nouns are not excluded for the question phrase "what is a" The term Component Collection of A computer Telephone Stands for Unit Ans

14 Training Step 2: Generating Candidate Transforms (cont.) We take the top topKphrases n-grams with the highest frequency counts and apply term weighting Weight calculated as in Okapi BM25 (uses Robertson/Sparck Jones weights) where r = number of relevant documents containing t, R = number of relevant documents, n = number of documents containing t, N = number of documents in collection Estimate of selectivity/discrimination of a candidate transform with respect to a specific question type Weighting extended to phrases

15 Training Step 2: Sample Candidate Transforms Final Term Selection Weight tw t = qtf t * w t where qtf t = frequency of t in the relevant question type, w t = term selectivity/discrimination weight, tw t = resulting candidate transform weight Question type qt (question phrase) Candidate transform tqtf t wtwt tw t "what is a" "refers to"302.7181.3 "refers"302.6780.1 "meets"123.2138.5 "driven"142.7238.1 "named after"103.6336.3 "often used"123.0036.0 "to describe"132.7035.1

16 Training Step 3: Evaluate Candidate Transforms on a Target IR System Search engines have different ranking methods and treat different queries in different ways (phrases, stop words, etc.) Candidate transforms grouped into buckets according to length Phrases of different length may be treated differently Top n in each bucket evaluated on target IR system

17 Training Step 3 (cont.) For each question phrase and search engine: For up to numExamples QA pairs matching question, sorted by answer length, test each candidate transform e.g. for the QP "what is a", candidate transform "refers to", and question "what is a VPN", the rewritten query {VPN and "refers to" } is sent to each SE Similarity of retrieved documents to known answer computed Final weight for transforms is computed as average similarity between known answers and documents retrieved, across all matching questions evaluated Query syntax transformed for each search engine, transforms encoded as phrases, "NEAR" operator used for AltaVista [Google reports including term proximity in ranking]

18 Computing Similarity of Known Answers and Retrieved Documents Consider subdocuments of length subdocLen within the retrieved documents, overlapping by subdocLen / 2 Assumption that answers are localized Find maximum similarity of any subdocument with the known answer docScore(D) = max (BM25 phrase (Answer, D i )) where t = term, Q = query, k 1 = 1.2, k 3 = 1000, K = k 1 ((1- b)+b.dl/avdl), b = 0.5, dl is the document length in tokens, avdl is the average document length in tokens, w t is the term relevance weight, tf t is the frequency of term t in the document, qtf t is the term frequency within the question phrase (query topic in original BM25), and terms include phrases

19 Sample Transforms AltaVista Transform tTW t "is usually"377.3 "refers to"373.2 "usually"371.6 "refers"370.1 "is used"360.1 Google Transform tTW t "is usually"280.7 "usually"275.7 "called"256.6 "sometimes"253.5 "is one"253.2

20 Evaluating Queries at Runtime Search for matching question phrases, with preference for longer (more specific) phrases Retrieve corresponding transforms and send transformed queries to search engine Compute similarity of returned documents with respect to transformed query If document retrieved by multiple transforms, use maximum similarity

21 Sample Query

22 Experimental Setup/Evaluation Real questions from the query log of the Excite search engine from 12/20/99 Evaluated the following four question types: Where, What, How, and Who These are the four most common types of questions and account for over 90% of natural language questions to Excite Random sample of 50 questions extracted for each question type Potentially offensive queries removed Checked that queries were not in the training set None of the evaluation queries were used in any part of the training process Results from each search engine retrieved in advance Results shown to evaluators in random order Evaluators do not know which engine produced the results 89 questions evaluated

23 Sample Questions Evaluated Who was the first Japanese player in baseball? Who was the original singer of fly me to the moon? Where is the fastest passenger train located? How do I replace a Chevy engine in my pickup? How do I keep my refrigerator smelling good? How do I get a day off school? How do I improve my vocal range? What are ways people can be motivated? What is a sonic boom? What are the advantages of being unicellular ?

24 Systems Evaluated AskJeeves (AJ) – Search engine specializing in answering natural language questions Returns different types of responses - we parse each different type Google (GO) – The Google search engine as is Tritus optimized for Google (TR-GO ) AltaVista (AV) – The AltaVista search engine as is Tritus optimized for AltaVista (TR-AV)

25 Best Performing System Percentage of questions where a system returns the most relevant documents at document cutoff K. All engines considered best for ties. Results for lowest performing systems not statistically significant (very small number of queries where they perform best)

26 Average Precision Average precision at document cutoff K

27 Precision by Question Type Results indicate advantages of Tritus, and best underlying search engine to use vary, but amount of data limits strong conclusions Precision at K for What (a), How (b), Where (c) and Who (d) type questions.

28 Document Overlap (a)(b)(c) Overlap of documents retrieved by transformed queries with the original system: top 150 (a), top 10 (b) and relevant of the top 10 (c).

29 Future Research Combining multiple transformations into a single query Using multiple search engines simultaneously Identifying and routing the questions to the best search engines for different question types Identifying phrase transforms containing content words from the query Dynamic query submission using results of initial transformations to guide subsequent transformations.

30 Summary Introduced a method for learning query transformations that improves the ability to retrieve documents containing answers to questions from an IR system In our approach, we: Automatically classify questions into different question types Automatically generate candidate transforms from a training set of question/answer pairs Automatically evaluate transforms on the target IR system(s) Implemented and evaluated for web search engines Blind evaluation on a set of real queries shows the method significantly outperforms the underlying search engines for common question types.

31 Additional Information http://tritus.cs.columbia.edu/ Contact the authors: http://www.cs.columbia.edu/~eugene/ http://www.neci.nj.nec.com/homepages/lawrence/ http://www.cs.columbia.edu/~gravano/

32 Assumption For some common types of natural language questions (e.g., “What is”, “Who is”, etc…) there exist common ways of expressing answers to the question.


Download ppt "Learning to Find Answers to Questions Eugene Agichtein Steve Lawrence Columbia University NEC Research Luis Gravano Columbia University."

Similar presentations


Ads by Google