Presentation on theme: "Automatic Question Generation from Queries Natural Language Computing, Microsoft Research Asia Chin-Yew LIN"— Presentation transcript:
Automatic Question Generation from Queries Natural Language Computing, Microsoft Research Asia Chin-Yew LIN
Generating Questions from Queries Where is the next Hannah Montana concert? Q2Q as a question generation shared task
Remember Ask Jeeves? “How large is British Columbia?”
Live Search QnA (English)
Naver Knowledge iN (Korea) 5 Naver “Knowledge iN “Service Opened at October 2002 70 Millions Knowledge iN DB are collected ( ) # of Users: 12 millions Upper level users (higher than Kosu): 6,648 (0.05%) Distribution of knowledge Education, Learning: 17.78% Computer, Communication: 12.89% Entertainments, Arts: 11.42% Business, Economy: 11.42% Home, Life: 7.44%
Baidu Zhidao (China) 17,012,767 resolved questions in two years’ operation. 8,921,610 are knowledge related. 96.7% of questions are resolved. 10,000,000 daily visitors. 71,308 new questions per day. 3.14 answers per question. ( 中国人搜索行为研究 /User Research Lab of Chinese Search)
Yahoo! Answers (Global; Marciniak) Launched in December 20 million users in the U.S. (> 90 million worldwide). 33,557,437 resolved questions (US; April 2008). ~70,000* new questions per day (US). 6.76* answers per question (US).
Question Taxonomy ISI’s question answer typology ( Hovy et al & 2002 ) Results of analyzing over 20K online questions 140 different question types with examples language/projects/webclopedia/Taxonomy/taxonomy_tople vel.html Liu et al. (COLING 2008)’s cQA question taxonomy Derived from Broder’s (SIGIR Forum 2002) web serach taxonomy Results of analyzing 100 randomly sampled questions from top 4 Yahoo! Answers categories Entertainment & Music, Society & Culture, Health, and Computer & Internet
Main Task: Q2Q Generate questions given a query Query: “Hannah Montana concert” Questions: “How do I get Hannah Montana concert tickets for a really good price?” “What should i wear to a hannah montana concert?” “How long is the Hannah Montana concert?” … Subtasks Predict user goals Learn question templates Normalize questions
Data Preparation cQA archives Live Search QnA Yahoo! Answers Ask.com Other sources Query logs MSN/Live Search Yahoo! Ask.com TREC and other sources Possible process Sample queries from search engine query logs Ensure broad topic coverage Find candidate questions from cQA archives given queries Create mapped Q2Q corpus for training and testing
Intrinsic Evaluation Given a query term Generate a rank list of questions related to the query term Open set – use pooling approach Pool all questions from participants Rate each question as relevant or not Compute recall/precision/F1 scores Closed set – use test set data as gold standard Metrics Diversity, interestingness, utility, and so on.
Extrinsic Evaluation A straw man scenario Task – online information seeking Setup 1. A user select a topic (T) she is interested in. 2. Generate a set of N queries given T and a query log. 3. The user select a query (q) from the set. 4. Generate a set of M questions given q. 5. The user select the question (Q) that she has in mind. 6. If the user does not select any question, record it as not successful. 7. Send q to a search engine (S); get results X. 8. Send q, Q, and anything inferred from Q to S; get results Y. 9. Compare results X and Y using standard IR relevance metrics.
Summary Task: Question generation from queries Data: Search engine query logs cQA question answer archives Question taxonomies Evaluation: Intrinsic – evaluate specific technology areas Extrinsic – evaluate its effect on real world scenarios Real data, real task, and real impact