1 Statistical source expansion for question answering CIKM’11 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

Slides:

Advertisements

Similar presentations

Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.

Advertisements

Term Level Search Result Diversification DATE : 2013/09/11 SOURCE : SIGIR’13 AUTHORS : VAN DANG, W. BRUCE CROFT ADVISOR : DR.JIA-LING, KOH SPEAKER : SHUN-CHEN,

Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.

DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:

Query Chain Focused Summarization Tal Baumel, Rafi Cohen, Michael Elhadad Jan 2014.

Chapter 5: Introduction to Information Retrieval

Knowledge Base Completion via Search-Based Question Answering

Improved TF-IDF Ranker

Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.

1 Asking What No One Has Asked Before : Using Phrase Similarities To Generate Synthetic Web Search Queries CIKM’11 Advisor ： Jia Ling, Koh Speaker ： SHENG.

WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.

Toward Whole-Session Relevance: Exploring Intrinsic Diversity in Web Search Date: 2014/5/20 Author: Karthik Raman, Paul N. Bennett, Kevyn Collins-Thompson.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.

Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)

SLIDE 1IS 240 – Spring 2010 Logistic Regression The logistic function: The logistic function is useful because it can take as an input any.

Iterative Set Expansion of Named Entities using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University.

Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

Web Archive Information Retrieval Miguel Costa, Daniel Gomes (speaker) Portuguese Web Archive.

Chapter 5: Information Retrieval and Web Search

Quality-Aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng Lim, Aixin Sun, and Roger H. L. Chiang. In Proceedings.

1 Hierarchical Tag visualization and application for tag recommendations CIKM’11 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.

Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.

SEEKING STATEMENT-SUPPORTING TOP-K WITNESSES Date: 2012/03/12 Source: Steffen Metzger (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh 1.

TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.

Leveraging Conceptual Lexicon ： Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.

Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.

AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:

Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.

Effective Query Formulation with Multiple Information Sources

INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.

Chapter 6: Information Retrieval and Web Search

Predicting Question Quality Bruce Croft and Stephen Cronen-Townsend University of Massachusetts Amherst.

Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.

Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.

Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.

BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.

WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.

Understanding User’s Query Intent with Wikipedia G 여 승 후.

1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG,

Intelligent Database Systems Lab N.Y.U.S.T. I. M. An information-pattern-based approach to novelty detection Presenter : Lin, Shu-Han Authors : Xiaoyan.

© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.

A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.

Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.

AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.

Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.

Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.

Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.

LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.

Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.

Advisor: Koh Jia-Ling Nonhlanhla Shongwe EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09.

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.

Multi-Aspect Query Summarization by Composite Query Date: 2013/03/11 Author: Wei Song, Qing Yu, Zhiheng Xu, Ting Liu, Sheng Li, Ji-Rong Wen Source: SIGIR.

TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:

SEMANTIC VERIFICATION IN AN ONLINE FACT SEEKING ENVIRONMENT DMITRI ROUSSINOV, OZGUR TURETKEN Speaker: Li, HueiJyun Advisor: Koh, JiaLing Date: 2008/5/1.

Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.

CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.

Enterprise Track: Thread-based Retrieval Enterprise Track: Thread-based Retrieval Yejun Wu and Douglas W. Oard Goal Explore -- document expansion.

Improving Search Relevance for Short Queries in Community Question Answering Date： 2014/09/25 Author ： Haocheng Wu, Wei Wu, Ming Zhou, Enhong Chen, Lei.

Summarizing answers in non-factoid community Question-answering

Preference Based Evaluation Measures for Novelty and Diversity

Presentation transcript:

1 Statistical source expansion for question answering CIKM’11 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG

Outline Introduction Approach – Retrieval – Extraction – Scoring – Merging Experiment – Intrinsic evaluation – Application to QA Conclusion 2

Search engine V.S QA system 3 Query: Data mining

Search engine V.S QA system 4 QA system Jeremy Shu-How Lin (born August 23, 1988) is an American professional basketball player with the New York Knicks of the National Basketball Association (NBA). After receiving no athletic scholarship offers out of high school and being undrafted out of college, the 2010 Harvard University graduate reached a partially guaranteed contract deal later that year with his hometown Golden State Warriors. Source Who is Jeremy lin?

Search engine V.S QA system 5 QA system Source What is Melancholia? No information or lack of information Bad Result Source Expansion

Introduction Question Answering(QA) – Good coverage for a given question domain May not contain the answers to all question Source expansion – Improve coverage – Facilitate extraction and validation of answers – Statistical model 6

7 Seed document Seed document QA system Seed document Seed document Seed document Seed document Source(Wikipedia, Wiktionary) approach Pseudo-document

Approach 8

Retrieval && Extraction Perform Yahoo! Search – Fetch up to 100 web pages links Extraction – Nugget Html paragraphs: … List items: … Table cells: … – Markup-based nugget v.s sentence-based nugget 9

Retrieval && Extraction Extraction – Markup-based nugget v.s sentence-based nugget 10 Jeremy Shu-How Lin (born August 23, 1988) is an American professional basketball player with the New York Knicks of the National Basketball Association (NBA). After receiving no athletic scholarship offers out of high school and being undrafted out of college, the 2010 Harvard University graduate reached a partially guaranteed contract deal later that year with his hometown Golden State Warriors. Markup-based Sentence-based Jeremy Shu-How Lin (born August 23, 1988) is an American professional basketball player with the New York Knicks of the National Basketball Association (NBA). After receiving no athletic scholarship offers out of high school and being undrafted out of college, the 2010 Harvard University graduate reached a partially guaranteed contract deal later that year with his hometown Golden State Warriors.

Approach 11

12

Source nugget 1 Web documents nugget 2 nugget 3 Web documents Web documents Expansion seedPer() corPer()

Source nugget 1 Web documents TopicLRSeed = 0.7/0.1 * 0.6/0.5 * 0.1/0.6 = 1.4 nugget 2 nugget 3 Web documents Web documents Expansion nugget 1 : {w 1,w 2,w 3 } seedPer(w1) = 0.7 seedPer(w2) = 0.6 seedPer(w3) = 0.1 corPer(w1) = 0.1 corPer(w2) = 0.5 corPer(w3) = 0.6

Source nugget 1 Web documents TFIDFSeed = tf(term frequency in seed doc) * idf(inverse document frequency from source document) nugget 2 nugget 3 TFIDFNugget = tf(term frequency in web doc) * idf(inverse document frequency from other web docs) Web documents Web documents Expansion

Source nugget 1 Web documents nugget 2 nugget 3 Web documents Web documents Expansion tf(w) : w frequency in doc df(w) : doc contains w idf(w) : inverse df(w) nugget 1 : {w 1,w 2,w 3 } All doc:{d1~d10} tf(w1) = 7 tf(w2) = 0 tf(w3) = 3 idf(w1) = log(9/1) idf(w2) idf(w3) = log(9/6) TFIDFSeed = (7*log *log1.5)/3 = 2.4 TFIDFNugget = tf(term frequency in web doc) * idf(inverse document frequency from other web docs)

Maximal Marginal relevance Solve redundant problem – Ex: query:” 馬英九 ” – all documents are related to president Combined two metrics – Relevant – Novelty 17

18 query IR system D 1 ~D 10 collection Give a threshold D1D2D4D6D8D9D3D5D1D2D4D6D8D9D3D5 D 7 D 10 threshold R MMR = Sim(D i,Q) λ = 1 Top 5 S:{ ∅ } R/S:{D 1, D 2, D 4, D 6, D 8, D 9, D 3, D 5 } 1-iter S:{D 1 } R/S:{D 2, D 4, D 6, D 8, D 9, D 3, D 5 } 2-iter S:{D 1, D 6 } R/S:{D 2, D 4, D 8, D 9, D 3, D 5 } 3-iter λ sim(D2)- (1-λ)*sim(D1,D2) λ sim(D4)- (1-λ)*sim(D1,D4) λ sim(D6)- (1-λ)*sim(D1,D6) λ sim(D8)- (1-λ)*sim(D1,D8).

Source Web documents Web documents Web documents Expansion 3PPronoun Search absrtact coverd by nugget DocRank && AbstractCoverage Terms in Nugget Nuggetlength Text at the beginning of the doc is more relevant NuggetOffset Lin is a PG Lin is a basketball player Lin studied in Harvard He

20

Scoring Linear regression – y = a 0 + a 1 x 1 Logistic regression model – Non-linear – Logit function p(t) =, [0,1] p(y = a 0 + a 1 x 1 +…+ a 14 x 14 ), [0,1] 21

Approach 22

Merging Drop the nuggets if their relevance scores are below an absolute threshold(0.1) Stop merging if total character length of all nuggets exceeds a threshold(10 times) The remaining nuggets are compiled into a pseudo-document 23

Experiment Intrinsic evaluation Application to QA 24

Intrinsic evaluation 15 Wikipedia articles – Human annotators select relevant substrings – A nugget was considered relevant if any of its tokens were selected by an annotator – Evaluate different relevance models Sentence-level Markup-level 25

Intrinsic evaluation Different relevance models – Random – Round Robin – Search Rank – MMR – LR independent – LR adjacent 26

Intrinsic evaluation 27

Application to QA Dataset – Jeopardy! – TREC questions Expanded sources – Wikipedia && Wiktionary(baseline) – All source Encyclopedias(wikipedia, world book) Dictionaries(wiktionary, thesauri) Newswire(AQUAINT, New York Time archive) Literature 28

Application to QA QA search Experiments – Recall – Answered question / all question 29

Application to QA accuracy 30

Conclusion We proposed a statistical approach for source expansion The proposed method yields significant gains in search performance on all datasets, improving search recall by 4.2–8.6%. SE also improves the accuracy by 7.6–12.9% on Jeopardy! and TREC datasets. 31