Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Term Level Search Result Diversification DATE : 2013/09/11 SOURCE : SIGIR’13 AUTHORS : VAN DANG, W. BRUCE CROFT ADVISOR : DR.JIA-LING, KOH SPEAKER : SHUN-CHEN,

Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.

Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.

Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :

Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.

A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy Date ： 2014/04/15 Source ： KDD’13 Authors ： Chi Wang, Marina Danilevsky, Nihit.

Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.

1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.

DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR.

A Markov Random Field Model for Term Dependencies Donald Metzler and W. Bruce Croft University of Massachusetts, Amherst Center for Intelligent Information.

Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.

Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.

Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

Automatic Indexing (Term Selection) Automatic Text Processing by G. Salton, Chap 9, Addison-Wesley, 1989.

Scalable Text Mining with Sparse Generative Models

The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.

Overview of Search Engines

Information Retrieval in Practice

Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.

School of Electronics Engineering and Computer Science Peking University Beijing, P.R. China Ziqi Wang, Yuwei Tan, Ming Zhang.

Search Engines and Information Retrieval Chapter 1.

Leveraging Conceptual Lexicon ： Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.

1 Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Advisor ： Dr. Koh Jia-Ling Speaker ： Chou-Bin Fan Date ：

1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.

1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis,

Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.

A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:

Web Document Clustering: A Feasibility Demonstration Oren Zamir and Oren Etzioni, SIGIR, 1998.

Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,

Effective Query Formulation with Multiple Information Sources

A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.

RANKING SUPPORT FOR KEYWORD SEARCH ON STRUCTURED DATA USING RELEVANCE MODEL Date: 2012/06/04 Source: Veli Bicer(CIKM’11) Speaker: Er-gang Liu Advisor:

Introduction to Digital Libraries hussein suleman uct cs honours 2003.

Date : 2012/10/25 Author : Yosi Mass, Yehoshua Sagiv Source : WSDM’12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.

Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Automatic Suggestion of Query-Rewrite Rules for Enterprise Search Date : 2013/08/13 Source : SIGIR’12 Authors : Zhuowei Bao, Benny Kimelfeld, Yunyao Li.

1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG,

Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:

Link Analysis Rong Jin. Web Structure  Web is a graph Each web site correspond to a node A link from one site to another site forms a directed edge 

Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,

A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.

Ranking Related Entities Components and Analyses CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

Date: 2013/6/10 Author: Shiwen Cheng, Arash Termehchy, Vagelis Hristidis Source: CIKM’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Predicting the Effectiveness.

Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.

Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR

Date: 2012/5/28 Source: Alexander Kotov. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Interactive Sense Feedback for Difficult Queries.

CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,

PERSONALIZED DIVERSIFICATION OF SEARCH RESULTS Date: 2013/04/15 Author: David Vallet, Pablo Castells Source: SIGIR’12 Advisor: Dr.Jia-ling, Koh Speaker:

Leveraging Knowledge Bases for Contextual Entity Exploration Categories Date:2015/09/17 Author:Joonseok Lee, Ariel Fuxman, Bo Zhao, Yuanhua Lv Source:KDD'15.

DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.

哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.

Federated text retrieval from uncooperative overlapped collections Milad Shokouhi, RMIT University, Melbourne, Australia Justin Zobel, RMIT University,

QUERY-PERFORMANCE PREDICTION: SETTING THE EXPECTATIONS STRAIGHT Date : 2014/08/18 Author : Fiana Raiber, Oren Kurland Source : SIGIR’14 Advisor : Jia-ling.

An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)

Chinese Academy of Sciences, Beijing, China

Compact Query Term Selection Using Topically Related Text

Learning Literature Search Models from Citation Behavior

Feature Selection for Ranking

Enriching Taxonomies With Functional Domain Knowledge

A Neural Passage Model for Ad-hoc Document Retrieval

Presentation transcript:

Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling, Koh Speaker : Shun-Chen, Cheng

Outline Introduction The PhRank Algorithm Graph Construction Edge Weight Random Walk Vertex weights Term ranking Diversity filter Experiment Conclusions

Introduction Query ： Locations of volcanic activity which occurred within the present day boundaries of the U.S. and its territories.

Introduction long queries contain words that are peripheral or shared across many topics so expansion is prone to query drift. Past ： jointly optimize weights and term selection using both global statistics and local syntactic features Shortcoming ： fail to detect or differentiate informative terms Don’t reflect local query context Don’t identify all the informative relations

Introduction Goal ： novel term ranking algorithm, PhRank, that extends work on Markov chain frameworks for query expansion to select compact and focused terms from within a query itself.

Outline Introduction The PhRank Algorithm Graph Construction Edge Weight Random Walk Vertex weights Term ranking Diversity filter Experiment Conclusions

Principles for Term Selection An informative word ： Is informative relative to a query accurately represent the meaning of a query. Is related to other informative words if one index term is good at discriminating relevant from non-relevant documents, then any closely associated index term is also likely to be good at this Contains informative words all terms must contain informative words. Is discriminative in the retrieval collection A term that occurs many times within a small number of documents gives a pronounced relevance signal.

Graph Construction C ： retrieval collection & English Wikipedia Example ： Q ： a b Top k documents ： d1,d2 (if k=2) N(Neighborhood set) ： {d0,d1,d2} ， d0 ： query encoded Graph G ： c b a f e d1 ： c b ed2 ： a f b

Edge Weight 、： the counts of stem co-occurrence in window size=2 and 10 in N ： the probability of the document in which the stems i and j co- occur given Q With idf-weight ： factor r confirms the importance of a connection between i and j in N

Random Walk H =H = If it starts from node 1 at time=0 Then the probability that walks to node 3 at time= =

Vertex weights Factor s balances exhaustivity with global saliency to identify stems that are poor discriminators been relevant and non- relevant documents frequency of a word wn in N,averaged over k + 1 documents, and normalized by the maximum average frequency of any term in N the number of documents in C containing wn TREC query #840 ： ‘Give the definition, locations, or characteristics of geysers’. => “definition geysers” is not more informative

Example Wn = geysers The avg frequency of “geysers” in N = 12/3 |N| = 3, |C|=35 max avg frequency of any term in N = 4 dfwn = 3 Wn = definition The frequency of “definition” in N = 2/3 max avg frequency of any term in N = 4dfwn = 1

Term ranking Input ： all combinations of 1-3 words in a query that are not stopwords. Output ： Rank list sorted by f(x,Q) score To avoid a bias towards longer terms, a term x is scored by averaging the affinity scores for its component words factor z x that represents the degree to which the term is discriminative in a collection the frequency of xe in C

example Term x = volcanic boundariesTerm x = volcanic U.S Query: Locations of volcanic activity which occurred within the present day boundaries of the U.S. and its territories.

Outline Introduction The PhRank Algorithm Diversity filter Experiment Conclusions

Diversity filter PhRank often assigns a high rank to multi-word terms that contain only one highly informative word For example, query: the destruction of Pan Am Flight 103 over Lockerbie, Scotland term ‘pan flight 103 ’ is informative “pan” is uninformative by itself Example ： on the assumption that the longer term better represents the information need. on the assumption that the shorter terms better represent the information need and the longer term is redundant.. birth rate china. birth rate Way 1 ：. declining birth. birth rate. declining birth rate Way 2 ： Discarded!

Outline Introduction The PhRank Algorithm Diversity filter Experiment Conclusions

Experiment Dataset F ： excluded from features ， T ： include in features

Experiment

TREC description topics TREC title queries

Outline Introduction The PhRank Algorithm Diversity filter Experiment Conclusions

have presented PhRank, a novel term ranking algorithm that extends work on Markov chain frameworks for query expansion to select focused and succinct terms from within a query. For all collections, around 26% of queries have more than 5% decrease in MAP compared to SD Efficiency considerations surrounding the time to construct an affinity graph may be ameliorated by off-line indexing to precompute a language model for each document in a collection.