RANKING SUPPORT FOR KEYWORD SEARCH ON STRUCTURED DATA USING RELEVANCE MODEL Date: 2012/06/04 Source: Veli Bicer(CIKM’11) Speaker: Er-gang Liu Advisor:

Slides:

Advertisements

Similar presentations

Date : 2012/09/20 Author : Sina Fakhraee, Farshad Fotouhi Source : KEYS12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.

Advertisements

Term Level Search Result Diversification DATE : 2013/09/11 SOURCE : SIGIR’13 AUTHORS : VAN DANG, W. BRUCE CROFT ADVISOR : DR.JIA-LING, KOH SPEAKER : SHUN-CHEN,

Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :

Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.

Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.

Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.

Site Level Noise Removal for Search Engines André Luiz da Costa Carvalho Federal University of Amazonas, Brazil Paul-Alexandru Chirita L3S and University.

Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Time-sensitive Personalized Query Auto-Completion

Query Specific Fusion for Image Retrieval

Bring Order to Your Photos: Event-Driven Classification of Flickr Images Based on Social Knowledge Date: 2011/11/21 Source: Claudiu S. Firan (CIKM’10)

SPARK: Top-k Keyword Query in Relational Databases Yi Luo, Xuemin Lin, Wei Wang, Xiaofang Zhou Univ. of New South Wales, Univ. of Queensland SIGMOD 2007.

Toward Whole-Session Relevance: Exploring Intrinsic Diversity in Web Search Date: 2014/5/20 Author: Karthik Raman, Paul N. Bennett, Kevyn Collins-Thompson.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.

Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu.

Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.

1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.

Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.

Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.

MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.

Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.

SEEKING STATEMENT-SUPPORTING TOP-K WITNESSES Date: 2012/03/12 Source: Steffen Metzger (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh 1.

Leveraging Conceptual Lexicon ： Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.

Date: 2012/10/18 Author: Makoto P. Kato, Tetsuya Sakai, Katsumi Tanaka Source: World Wide Web conference (WWW "12) Advisor: Jia-ling, Koh Speaker: Jiun.

DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande.

1 Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Advisor ： Dr. Koh Jia-Ling Speaker ： Chou-Bin Fan Date ：

1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Cathy Wang

Estimating Topical Context by Diverging from External Resources SIGIR’13, July 28–August 1, 2013, Dublin, Ireland. Presenter: SHIH, KAI WUN Romain Deveaud.

A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:

April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.

Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,

Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.

Recommending Twitter Users to Follow Using Content and Collaborative Filtering Approaches John HannonJohn Hannon, Mike Bennett, Barry SmythBarry Smyth.

Chapter 6: Information Retrieval and Web Search

FINDING RELEVANT INFORMATION OF CERTAIN TYPES FROM ENTERPRISE DATA Date: 2012/04/30 Source: Xitong Liu (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling.

Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.

Date : 2012/10/25 Author : Yosi Mass, Yehoshua Sagiv Source : WSDM’12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.

Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG,

Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:

Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.

Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.

AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.

Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.

Date: 2013/6/10 Author: Shiwen Cheng, Arash Termehchy, Vagelis Hristidis Source: CIKM’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Predicting the Effectiveness.

1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.

1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.

1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:

Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.

Ranking-based Processing of SQL Queries Date: 2012/1/16 Source: Hany Azzam (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh.

Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,

Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.

Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR

Date: 2012/5/28 Source: Alexander Kotov. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Interactive Sense Feedback for Difficult Queries.

CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,

Search Result Diversification in Resource Selection for Federated Search Date ： 2014/06/17 Author ： Dzung Hong, Luo Si Source ： SIGIR’13 Advisor: Jia-ling.

ENHANCING CLUSTER LABELING USING WIKIPEDIA David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab SIGIR’09.

Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.

Information Retrieval and Extraction 2009 Term Project – Modern Web Search Advisor: 陳信希 TA: 蔡銘峰＆許名宏.

University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G

Neighborhood - based Tag Prediction

Compact Query Term Selection Using Topically Related Text

Image Classification via Attribute Detection

Feature Selection for Ranking

Ranking using Multiple Document Types in Desktop Search

Presentation transcript:

RANKING SUPPORT FOR KEYWORD SEARCH ON STRUCTURED DATA USING RELEVANCE MODEL Date: 2012/06/04 Source: Veli Bicer(CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh 1

Outline Introduction Relevance Model Edge-Specific Relevance Model Edge-Specific Resource Model Smoothing Ranking Experiment Conclusion 2

SQL Keyword Search 3 Introduction - Motivation SELECT cname FROM Person, Character, Movie WHERE Person.id = Character.pid AND Character.mid = Movie.id AND Person.name = ‘Hepburn' AND Movie.title = ‘Holiday' Difficult Q={Hepburn, Holiday}, Result = { p1, p4, p2m2, m1, m2, m3 } Not Good

4 Introduction - Goal 1 Keyword search Q={Hepburn, Holiday} Result = { p1, p4, p2m2, m1, m2, m3} 2 Keyword index

5 Introduction - Goal 3 Ranking Score & Schema TitleName Roman HolidayAudrey Hepburn Breakfast at Tiff.Audrey Hepburn The AviatorKatharine Hepbun The HolidayKate Winslet Result Top K Relevance

wordsp hepburn0.5 holiday0.5 wordsp hepburn0.21 holiday0.15 audrey0.13 katharine0.09 princess0.01 roman0.01 ….… H(RM Q ||RM R ) TitleName Roman HolidayAudrey Hepburn Breakfast at Tiff.Audrey Hepburn The AviatorKatharine Hepbun The HolidayKate Winslet Introduction- overview Query 1 PRF 2 Query RM 3 Resource RM 4 Resource Score 5 Result Ranking 6 wordsp hepburn0.12 holiday0.18 audrey0.11 katharine0.05 princess0.00 roman0.06 ….…

7 Resource nodes (tuple) (p1,p2,m1…) is typed (attribute names) Resources have unique ids, (primary keys) Attribute nodes (Attribute value, ex: “Audrey Hepburn”)

8 Edge-Specific Relevance Edge-Specific Resource Smoothing Ranking JRTs Relevance Model Smoothing Ranking Methods

9 Relevance Models Document Model Q={Hepburn, Holiday} Query Model Similarity Measure H(RM Q ||D 1 ) H(RM Q ||D 2 ) H(RM Q ||D 3 ).... Resource Score

10 Relevance Models Document Model Q={Hepburn, Holiday} Query Model Similarity Measure name birthplace title plot H(RM Q ||D 1 -A title ) H(RM Q ||D 1 -A plot ) Resource Score

Edge-Specific Relevance Models A set of feedback resources F R are retrieved from an inverted keyword index: Edge-specific relevance model for each unique edge e: 11 p1p1 Audrey Hepburn name Ixelles Belgium birthplace m3m3 The Holiday title Iris swaps her cottage for the holiday along the next two plot FRFR ….. name plot Inverted Index Importance of data for query Probability of word in attribute Q = { Hepburn, Holiday }, F R = { p 1, p 4, m 1, m 2, m 3,, p 2 m 2 } princess → m 1, p 1 m 1 breakfast → m 3 hepburn → p 1, p 4, m 1, p 2 m 2 melbourne → m 1 holiday → m 1, m 2, m 3 ann → m 1 p1p1 Audrey Hepburn name m3m3 Iris swaps her cottage for the holiday along the next two plot

Edge-Specific Relevance Models 12 p1p1 Audrey Hepburn name Ixelles Belgium birthplace m3m3 The Holiday title Iris swaps her cottage for the holiday along the next two plot ….. name plot Edge-specific Relevance Models Importance of data for query

Edge Specific Resource Models Each resource (a tuple) is also represented as a RM final results (joint tuples) are obtained by combining resources Edge-specific resource model: 13 terms of the attribute terms in all attributes p1p1 Audrey Hepburn name p1p1 Audrey Hepburn name Ixelles Belgium birthplace v = Hepburn P p1 → a name P p1 → a * name

Resource Score The score of resource: cross-entropy of edge-specific RM and Resource RM: 14 wordsp hepburn0.21 holiday0.15 audrey0.13 katharine0.09 princess0.01 roman0.01 ….… Query RMResource RM wordsp hepburn0.12 holiday0.18 audrey0.11 katharine0.05 princess0.00 roman0.06 ….…

15 Relevance Model Smoothing Ranking Methods Edge-Specific Relevance Edge-Specific Resource Smoothing Ranking JRTs

Smoothing Well-known technique to address data sparseness and improve accuracy of Relevance Model is the core probability for both query and resource RM 16 Neighborhood of attribute a is another attribute a’: p1p1 Audrey Hepburn name type Person Ixelles Belgium birthplace p4p4 Katharine Hepburn name type Connecticut USA birthplace c1c1 Princess Ann name type Character pid_fk 1.a and a’ shares the same resources 2.resources of a and a’ are of the same type 3.resources of a and a’ are connected over a FK

Smoothing 17 Smoothing of each type is controlled by weights: where γ 1, γ 2, γ 3 are control parameters set in experiments is sigmoid function. is the cosine similarity

18 Relevance Model Smoothing Ranking Methods Ranking JRTs Edge-Specific Relevance Edge-Specific Resource Smoothing

Ranking JRTs Ranking aggregated JRTs: Cross entropy between edge-specific RM (Query Model) and geometric mean of combined edge-specific Resource Model: 19 TitlePlot Roman HolidayIris swqp her collage for the holiday……. NameBirthplace Audrey HepburnLx elles, BE Cname Princess Ann

Ranking JRTs 20 The proposed score is monotonic w.r.t. individual resource scores

21 Experiments Datasets: Subsets of Wikipedia, IMDB and Mondial Web databases Queries: 50 queries for each dataset including “TREC style” queries and “single resource” queries Metrics: The number of top-1 relevant results Reciprocal rank Mean Average Precision (MAP) Baselines: BANKS Bidirectional (proximity) Efficient SPARK CoveredDensity (TF-IDF) RM-S: Paper approach

Experiments 22 MAP scores for all queries Reciprocal rank for single resource queries

23 Precision-recall for TREC-style queries on Wikipedia

Conclusions Keyword search on structured data is a popular problem for which various solutions exist. We focus on the aspect of result ranking, providing a principled approach that employs relevance models. Experiments show that RMs are promising for searching structured data. 24

MAP(Mean Average Precision) Topic 1 : There are 4 relative document ‧ rank : 1, 2, 4, 7 Topic 2 : There are 5 relative document ‧ rank : 1, 3,5,7,10 Topic 1 Average Precision : (1/1+2/2+3/4+4/7)/4=0.83 。 Topic 2 Average Precision : (1/1+2/3+3/5+4/7+5/10)/5=0.45 。 MAP= ( )/2=0.64 。 Reciprocal Rank Topic 1 Reciprocal Rank : (1+1/2+1/4+1/7)/4=0.473 。 Topic 2 Reciprocal Rank : (1+1/3+1/5+1/7+1/10)/5=0.354 。 MRR= ( )/2= 。 25