DivQ: Diversification for Keyword Search over Structured Databases Elena Demidova, Peter Fankhauser, Xuan Zhou and Wolfgang Nejfl L3S Research Center,

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Toward Scalable Keyword Search over Relational Data Akanksha Baid, Ian Rae, Jiexing Li, AnHai Doan, and Jeffrey Naughton University of Wisconsin VLDB 2010.
Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!
Introduction to Information Retrieval
1 Evaluations in information retrieval. 2 Evaluations in information retrieval: summary The following gives an overview of approaches that are applied.
Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
Less is More Probabilistic Model for Retrieving Fewer Relevant Docuemtns Harr Chen and David R. Karger MIT CSAIL SIGIR2006 4/30/2007.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Site Level Noise Removal for Search Engines André Luiz da Costa Carvalho Federal University of Amazonas, Brazil Paul-Alexandru Chirita L3S and University.
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
SPARK: Top-k Keyword Query in Relational Databases Yi Luo, Xuemin Lin, Wei Wang, Xiaofang Zhou Univ. of New South Wales, Univ. of Queensland SIGMOD 2007.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Evaluating Search Engine
Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.
Faculty of Informatics and Information Technologies Slovak University of Technology Personalized Navigation in the Semantic Web Michal Tvarožek Mentor:
Search Result Diversification by M. Drosou and E. Pitoura Presenter: Bilge Koroglu June 14, 2011.
SEEKING STATEMENT-SUPPORTING TOP-K WITNESSES Date: 2012/03/12 Source: Steffen Metzger (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh 1.
Efficient Keyword Search over Virtual XML Views Feng Shao and Lin Guo and Chavdar Botev and Anand Bhaskar and Muthiah Chettiar and Fan Yang Cornell University.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Wolf Siberski1 What do you mean? – Determining the Intent of Keyword Queries on Structured Data.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Evaluation of novelty metrics for sentence-level novelty mining Presenter : Lin, Shu-Han Authors : Flora.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Speaker: Ruirui Li 1 The University of Hong Kong.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Querying Structured Text in an XML Database By Xuemei Luo.
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
Weighting and Matching against Indices. Zipf’s Law In any corpus, such as the AIT, we can count how often each word occurs in the corpus as a whole =
A Collaborative and Semantic Data Management Framework for Ubiquitous Computing Environment International Conference of Embedded and Ubiquitous Computing.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.
Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Faculty of Informatics and Information Technologies Slovak University of Technology Personalized Navigation in the Semantic Web Michal Tvarožek Mentor:
Facilitating Document Annotation using Content and Querying Value.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Query Segmentation Using Conditional Random Fields Xiaohui and Huxia Shi York University KEYS’09 (SIGMOD Workshop) Presented by Jaehui Park,
Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint.
Enhancing Web Search by Promoting Multiple Search Engine Use Ryen W. W., Matthew R. Mikhail B. (Microsoft Research) Allison P. H (Rice University) SIGIR.
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Date: 2013/6/10 Author: Shiwen Cheng, Arash Termehchy, Vagelis Hristidis Source: CIKM’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Predicting the Effectiveness.
Post-Ranking query suggestion by diversifying search Chao Wang.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Multi-Aspect Query Summarization by Composite Query Date: 2013/03/11 Author: Wei Song, Qing Yu, Zhiheng Xu, Ting Liu, Sheng Li, Ji-Rong Wen Source: SIGIR.
Survey Jaehui Park Copyright  2008 by CEBT Introduction  Members Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon  We are interested.
Learning in a Pairwise Term-Term Proximity Framework for Information Retrieval Ronan Cummins, Colm O’Riordan Digital Enterprise Research Institute SIGIR.
PERSONALIZED DIVERSIFICATION OF SEARCH RESULTS Date: 2013/04/15 Author: David Vallet, Pablo Castells Source: SIGIR’12 Advisor: Dr.Jia-ling, Koh Speaker:
A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR
Search Result Diversification in Resource Selection for Federated Search Date : 2014/06/17 Author : Dzung Hong, Luo Si Source : SIGIR’13 Advisor: Jia-ling.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 10 Evaluation.
23. Juli deskWeb2.0: Combining Desktop and Social Search Sergej Zerr, Elena Demidova, Sergej Chernov L3S Research Center Hannover, Germany
Federated text retrieval from uncooperative overlapped collections Milad Shokouhi, RMIT University, Melbourne, Australia Justin Zobel, RMIT University,
Sampath Jayarathna Cal Poly Pomona
Martin Rajman, Martin Vesely
Lecture 10 Evaluation.
Advanced Methods of Information Retrieval An Overview
Lecture 6 Evaluation.
Advanced Methods of IR An Overview
Evaluating Information Retrieval Systems
Feature Selection for Ranking
Cumulated Gain-Based Evaluation of IR Techniques
INF 141: Information Retrieval
Presentation transcript:

DivQ: Diversification for Keyword Search over Structured Databases Elena Demidova, Peter Fankhauser, Xuan Zhou and Wolfgang Nejfl L3S Research Center, Hannover, Germany Fraunhofer IPSE, Darmstadt Germany CSIRO ICT Centre, Australia SIGIR Jaehui Park

Copyright  2010 by CEBT INTRODUCTION  Keyword search over structured data No single interpretation of a keyword query can satisfy all users Multiple interpretation may yield overlapping results.  Diversification Minimizing the risk of user's dissatisfaction by balancing relevance and novelty of search results  An example Query: "London" – location: the capital of UK – name: a book written by Jack London The occurrences can be viewed as a keyword interpretation with different semantics offering complementary results. 2

Copyright  2010 by CEBT INTRODUCTION  Motivation Taking advantage of the structure of the databases – Query interpretation in terms of the underlying database – To deliver more diverse and orthogonal representations of query results ex) attribute  Contributions DivQ – A probabilistic query disambiguation model – A diversification scheme for generating top-k query interpretations Evaluation metrics for structured data – α-nDCG-W – WS-recall 3

Copyright  2010 by CEBT The Diversification Scheme  Query interpretations a keyword query -> a set of structured queries  Ranking the query interpretations Providing a quick overview over the available classes of results Faceted search: navigate and choose 4 Q: CONSIDERATION CHRISTOPHER GUEST RelevanceTop-3 interpretations rankingRelevanceTop-3 interpretations diversification 0.9A director CHRISTOPHER GUEST of a movie CONSIDERATION 0.9A director CHRISTOPHER GUEST of a movie CONSIDERATION 0.5A director CHRISTOPHER GUEST 0.4An actor CHRISTOPHER GUEST 0.8An actor CHRISTOPHER GUEST in a movie CONSIDERATION 0.2A plot containing CHRISTOPHER GUEST of a movie increasing novelty

Copyright  2010 by CEBT The Diversification Scheme  Bringing Keywords into Structure Keyword Interpretations A i :k i – Mapping each keyword k i to an element A i of an algebraic expression – (Predefined) query template T joining the keyword interpretations a structural patterns that is frequently used to query the databases – An example Keyword query (K): CONSIDERATION CHRISTOPHER GUEST  director:CHRISTOPHER  director:GUEST  movie:CONSIDERATION T: A director X of a movie Y 5

Copyright  2010 by CEBT The Diversification Scheme  Estimating Query Relevance Relevance of a query interpretation Q to informational needs K – P(Q|K) = P(I,T|K) T: query template, I: a set of keyword interpretations – Assumptions Each keyword has one particular interpretation. The probability of a keyword interpretation is independent from the part of the query interpretation the keyword is not interpreted to. – Attribute specific term frequency (ex. the avg number of co-occurrences) ex) rank higher: a first name and a last name of a person to attribute "name" 6 the probability that, given that A j is a part of a query interpretation, keyword interpretation A j are also a part of the query interpretation. smoothing factor

Copyright  2010 by CEBT The Diversification Scheme  Estimating Query Similarity The Jaccard coefficient between the sets of keyword interpretations I contained by Q 1 and Q 2  Combining Relevance and Similarity 1. Select the most relevance interpretation as the first interpretation presented to the user 2. Each of the following interpretations is selected based on both its relevance and novelty 7 selected query interpretation set

Copyright  2010 by CEBT The Diversification Scheme  The Diversification algorithm materializing top-k relevance query interpretations  the worst case O(l*r) – l: the number of query interpretations in L – r: the number of query interpretations in the result list R 8

Copyright  2010 by CEBT EVALUATION METRICS  α-nDCG-W CG n (Cumulative Gain) – ex) = 11 DCG i (Discounted Cumulative Gain) – ex) DCG 1 = 3, DCG 2 = 3 + 2/log 2 2 = 5, DCG 3 = 3 + (2/log /log 2 3) = nDCG i = DCG i / ideal DCG i α-nDCG – Views a document as the set of information nuggets n Counting how many documents containing n were seen before and discount the gain of this document accordingly – if α = 0, it is a standard nDCG – with increasing α, novelty is rewarded with more credit 9 D1D2D3D4D5D

Copyright  2010 by CEBT EVALUATION METRICS  α-nDCG-W In databases – an information nugget n corresponds to a primary key pk i The gain The overlap – For each primary key pk i in the result of Q k Count how many query interpretations with pk i were seen before, and aggregate the counts 10 overlap factor

Copyright  2010 by CEBT EVALUATION METRICS  Weighted S-Recall S-recall – Instance recall at rank k when search results are related to several subtopics The number of unique subtopics covered by the first k results, divided by the total number of subtopics – a primary key corresponds to a subtopic in S-recall 11

Copyright  2010 by CEBT EXPERIMENTS  IMDB 10,000,000 records  Lyrics 400,000 records  Query logs MSN, AOL 200 most frequent queries (single query) 100 queries (complex queries) 12

Copyright  2010 by CEBT EXPERIMENTS  User Study 16 participants were asked to indicate on a two-point Likert scale to assess the relevance – top-25 interpretations 13

Copyright  2010 by CEBT EXPERIMENTS  α-nDCG-W α = 0, 0.5, and

Copyright  2010 by CEBT EXPERIMENTS  WS-recall  Balancing Relevance and Novelty 15

Copyright  2010 by CEBT CONCLUSION  We present an approach to search results diversification over structured data. a probabilistic query disambiguation model query similarity measure a greedy algorithm  An adaptation of the established evaluation metrics are proposed. – α-nDCG-W and WS-recall  Evaluation results demonstrate the quality of the proposed model and show that using our algorithms the novelty of keyword search results over structured data can be substantially improved. 16