Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl 3 1 - University of Limerick, Ireland 2 - University of Essex, UK 3.

Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl 3 1 - University of Limerick, Ireland 2 - University of Essex, UK 3 - University of Hildesheim, Germany

Outline Question Answering (QA) Query Log Analysis (QLA) Characteristics of QA and QLA QA & QLA: 8 Key Questions Workshop Papers Key Questions addressed by Papers Conclusions

Question Answering (QA) A Question Answering (QA) system takes as input a short natural language question and a document collection and produces an exact answer to the question, taken from the collection Origins go back to TREC-8 (Voorhees and Harman, 1999)

Query Log Analysis (QLA) A Query Log is a record of a person’s internet search Log comprises query plus related information Query Log Analysis looks at Logs mainly in order to improve search engines Early study was Spink and Saracevic (2000)

Strengths & Weaknesses of QA Following TREC, CLEF and NTCIR, we know how to build efficient monolingual factoid QA systems However, range of questions asked is extremely narrow Also, work is based on fixed document collections Most evaluation is offline, using artificial queries Real users and real information needs have been ignored Thus, QA is not a solved problem

Strengths & Weaknesses of QLA Potentially there is a huge amount of data, increasing all the time Queries entered are ‘naturally occurring’ because users do not know they are monitored! On the other hand, huge data sets pose problems; manual analysis cannot be used but machine learning etc must be used We must infer from behaviour what users were thinking, what they wanted and whether a search succeeded Also, logs are mostly owned by search engine companies

QA & QLA – 8 Key Questions 1. Can the meaning of queries in logs be deduced? 2. Can NLP techniques such as Named Entity Recognition be applied in QLA? 3. Can QLA tell us new types of questions for QA research? 4. Can queries within a session be interpreted as a dialogue with the user giving the questions and the system providing the answers?

QA & QLA – 8 Key Questions Cont. 5. What can logs from real QA systems like lexxe.com or questions from sites like answers.com tell us? 6. Are QA logs different from IR logs? 7. Can click-through data enable us to deduce new QA question types? 8. What analysis could be done on logs made from telephone QA systems (e.g. cinema booking)

Papers -1 Bernardi and Kirschner: From artificial questions to real user interaction logs Real logs vs. not real questions at TREC etc Three sets (TREC, Bertomeu & BoB) analysed as dialogues TREC differs significantly from BoB (query length, no. of anaphora) Conclusion: future TREC-style evaluation should take these differences into account to make task more realistic

Papers - 2 Leveling: QA evaluation queries vs. real world queries Compares queries to a search engine, to answers.com, and used at TREC and CLEF (six sets) Infers the QA question type of a bare IR query (keywords) and converts it back into a syntactic QA query Conclusion: This process could be used to answer IR queries properly with a QA system

Papers - 3 Zhu et al.: Question Answering based on Community QA Considers whether Q-A pairs from Yahoo Answers can be used a log-like resource to improve QA Given input query, similar queries are identified in logs. Sentences from answers to these are selected by summarisation algorithm to use as response to query

Papers - 4 Momtazi and Klakow: Yahoo! Answers for sentence retrieval Two statistical frameworks developed for capturing relationships between words in Q-A pairs in Yahoo! Answers These were then used in sentence selection task based on TREC 2006 queries Conclusion: Best results exceeded the baseline

Papers - 5 Small and Strzalkowski: Collaborative QA using web trails Logs were made of users in an interactive QA study Information stored includes the documents users saved Docs are placed in standard order to allow comparison between users; docs saved by different users overlap When previously observed seq of docs is saved by user, rest of that seq could be presented to user

Papers - 6 Sutcliffe, White and Kruschwitz: NE recognition in intranet query log A log of queries to a university web site was first analysed by hand This resulted in a list of topic types and a list of Named Entity types Training data for NEs was extracted from web pages and used to train maximum entropy recogniser NE recogniser was evaluated; uses of NEs in answering queries were discussed

Papers - 7 Mandl and Schulz: Log-based evaluation resources for QA Concerned with link between query logs and well-formed questions answered by QA systems Proposes a system switching between IR-mode and QA- mode Discusses log resources available and related tracks at CLEF Presents preliminary analysis of questionl-like queries in MSN log

Papers vs. Workshop Goals - 1 Bernardi and Kirschner investigate Question 6 Leveling investigates Question 1 and Question 3 Momtazi and Klakow look at Question 5 Zhu et al. also look at Question 5 Small and Strzalkowski investigate Question 4

Papers vs. Workshop Goals - 2 Sutcliffe et al. look at Question 2 Mandl and Schulz also look at Question 3 Only Questions 7 and 8 are not addressed at the workshop!

Conclusions It looks like an interesting field We look forward to your papers There will be time at the end for discussion

Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl 3 1 - University of Limerick, Ireland 2 - University of Essex, UK 3.

Similar presentations

Presentation on theme: "Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl 3 1 - University of Limerick, Ireland 2 - University of Essex, UK 3."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl 3 1 - University of Limerick, Ireland 2 - University of Essex, UK 3.

Similar presentations

Presentation on theme: "Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl 3 1 - University of Limerick, Ireland 2 - University of Essex, UK 3."— Presentation transcript:

Similar presentations

About project

Feedback