Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.

Slides:



Advertisements
Similar presentations
CWS: A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, § Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at.
Advertisements

Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.
A Domain Level Personalization Technique A. Campi, M. Mazuran, S. Ronchi.
Ziv Bar-YossefMaxim Gurevich Google and Technion Technion TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A AA.
Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
Learning to Cluster Web Search Results SIGIR 04. ABSTRACT Organizing Web search results into clusters facilitates users quick browsing through search.
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Person Name Disambiguation by Bootstrapping Presenter: Lijie Zhang Advisor: Weining Zhang.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Query Expansion Presented By: Usha M.Tech(IT) MIT-876-2k11.
Overview of Search Engines
WageIndicator SEO, December 10, 2008 Irene van Beveren Today: 0.Why SEO is important 1.Keyword Strategies 2.Title Tags 3.Internal Links 4.Duplicate Content.
1 Predicting Download Directories for Web Resources George ValkanasDimitrios Gunopulos 4 th International Conference on Web Intelligence, Mining and Semantics.
The Problem Finding information about people in huge text collections or on-line repositories on the Web is a common activity Person names, however, are.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
SEO Part 1 Search Engine Marketing Chapter 5 Instructor: Dawn Rauscher.
TERM IMPACT- BASED WEB PAGE RAKING School of Electrical Engineering and Computer Science Falah Al-akashi and Diana Inkpen
 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.
User Browsing Graph: Structure, Evolution and Application Yiqun Liu, Yijiang Jin, Min Zhang, Shaoping Ma, Liyun Ru State Key Lab of Intelligent Technology.
 Person Name Disambiguation by Bootstrapping SIGIR’10 Yoshida M., Ikeda M., Ono S., Sato I., Hiroshi N. Supervisor: Koh Jia-Ling Presenter: Nonhlanhla.
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
Linking Wikipedia to the Web Antonio Flores Bernal Department of Computer Sciencies San Pablo Catholic University 2010.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Presented by: Apeksha Khabia Guided by: Dr. M. B. Chandak
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Cloak and Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 左昌國 Seminar.
Personalized Search Cheng Cheng (cc2999) Department of Computer Science Columbia University A Large Scale Evaluation and Analysis of Personalized Search.
1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)
Similar Document Search and Recommendation Vidhya Govindaraju, Krishnan Ramanathan HP Labs, Bangalore, India JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE.
29-30 October, 2006, Estonia 1 IST4Balt Information analysis using social bookmarking and other tools IST4Balt Information analysis using social bookmarking.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
From Social Bookmarking to Social Summarization: An Experiment in Community-Based Summary Generation Oisin Boydell, Barry Smyth Adaptive Information Cluster,
May 30, 2016Department of Computer Sciences, UT Austin1 Using Bloom Filters to Refine Web Search Results Navendu Jain Mike Dahlin University of Texas at.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Search Engine Marketing SEM = Search Engine Marketing SEO = Search Engine Optimization optimizing (altering/changing) your page in order to get a higher.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Mining Translations of OOV Terms from the Web through Crosslingual Query Expansion Ying Zhang Fei Huang Stephan Vogel SIGIR 2005.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
A Novel Pattern Learning Method for Open Domain Question Answering IJCNLP 2004 Yongping Du, Xuanjing Huang, Xin Li, Lide Wu.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns Author : Stamatina Thomaidou, Konstantinos Leymonis, and Michalis Vazirgiannis.
Topical Clustering of Search Results Scaiella et al [Originally published in – “Proceedings of the fifth ACM international conference on Web search and.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Post-Ranking query suggestion by diversifying search Chao Wang.
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Multi-Aspect Query Summarization by Composite Query Date: 2013/03/11 Author: Wei Song, Qing Yu, Zhiheng Xu, Ting Liu, Sheng Li, Ji-Rong Wen Source: SIGIR.
Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Search Engine Optimization
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Improving Search Relevance for Short Queries in Community Question Answering Date: 2014/09/25 Author : Haocheng Wu, Wei Wu, Ming Zhou, Enhong Chen, Lei.
Mining Query Subtopics from Search Log Data
The Recommendation Click Graph: Properties and Applications
Data Integration for Relational Web
Web Mining Department of Computer Science and Engg.
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Presentation transcript:

Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, Beijing , China {z-m, yiqunliu,

CONTENT 1. Introduction 2. Subtopic Mining i. External resources based subtopic mining ii. Top results based subtopic mining 3. Fusion & Optimization 4. Conclusion

INTRODUCTION

Intent Subtopic Mining Extraction of topics related to a larger ambiguous or broad topic “Star Wars” => “Star Wars Movies” => “Star Wars Episode 1” … “Star Wars Books” => “The Last Commando” … “Star Wars Video Games” => … “Star Wars Goodies” => …

SUBTOPIC MINING

External Resources Based Subtopic Mining SUBTOPIC MINING

Resources External Resources Based Subtopic Mining

Query Suggestion From Google, Bing and Yahoo

Query Completion From Google, Bing and Yahoo

Google Insights Top Searches

Google Keyword Tools Related Keywords

Wikipedia Disambiguation Feature Sub-Categories

Filtering, Clustering and Ranking External Resources Based Subtopic Mining

Filtering Keyword Large Inclusion Filtering o Filter all candidate subtopics that do not contain, in any order, the original query words without the stop words

Snippet Based Clustering

Bottom-up hierarchical clustering algorithm with extended Jaccard similarity coefficient

Ranking Ranking based on intent subtopics popularity (amount of search per month) Scores source weight o Jaccard Similarity between the subtopic and the original query: 5% o Normalized Google Insights score: 15% o Normalized Google Keywords Generator score: 75% o Belongs to the query suggestion/completion: 5% Scores normalization Every subtopic candidate score is normalized in a percentage of the same resource’s top subtopic candidate score

Evaluation and Results External Resources Based Subtopic Mining

Evaluation Experimentation Setup o Based on a 50 query set, used for TREC Web Track 2012 o Annotation of results o Compute D#-nDCG score Runs o Baseline: Query Suggestion + Query Completion o Run 1: Baseline + Wikipedia o Run 2: Baseline + Google Insights o Run 3: Baseline + Google Keywords Generator o Run 4: Baseline + Google Keywords Generator + Google Insights + Wikipedia

Results D#-nDCG % inc / baseline I-rec % inc / baseline D-nDCG % inc / baseline Baseline E.R. Mining Run % % % E.R. Mining Run % % % E.R. Mining Run % % % E.R. Mining Run % % % WikipediaGoogle InsightsGoogle Keywords Insights+Keywords +Wilkpedia

Top Results Based Subtopic Mining SUBTOPIC MINING

Subtopics Extraction Top Results Based Subtopic Mining

Subtopic Extraction From top results pages. Extraction of page snippet, ingoing anchor texts and h1 tags Top results pages Sources: o TMiner (THUIR information retrieval system, based on Clueweb) o Google o Yahoo o Bing

Clustering and Ranking Top Results Based Subtopic Mining

Clustering

Modified K-Medoid Algorithm In our task, the number of intent subtopics is not predictable, so we adapted the K-Medoid algorithm

Clusters Filtration and Name Cluster with fragments coming from the same page source are discarded, as well as clusters having only 1 fragment. To generate cluster name, we experimentally set a value k, and choose to take the most popular words in the fragments with a frequency in the cluster above k.

Ranking Fragments are ranked according to the rank of the page from which they are extracted and the URLs diversity inside each cluster

Evaluation and Results Top Results Based Subtopic Mining

Evaluation Runs: o Baseline: Query Suggestion + Query Completion o Run 1: Baseline + TMiner Snippets o Run 2: Baseline + TMiner Snippets, Anchor Texts and h1 tags o Run 3: Baseline + Search-Engines Snippets o Run 4: Baseline + Search-Engines & TMiner Snippets o Run 5: Baseline + Search Engines Snippets + TMiner Snippets, Anchor Texts and h1 tags

Results Great D#-nDCG Improvements

FUSION & OPTIMIZATION

Fusion FUSION & OPTIMIZATION

Evaluation & Results FUSION & OPTIMIZATION

Fusion Performances

This system at NTCIR-10 NTCIR Intent Task: Submit a ranked list of subtopics for every query from a 50 query set A total of 34 runs have been submitted to NTCIR-10 INTENT task by all the participants. This framework was proposed to that workshop and got the best performances; all runs got better results than the other participants runs.

run THUIR-S-E-1A THUIR-S-E-3A THUIR-S-E-2A THUIR-S-E-4A THUIR-S-E-5A THCIB-S-E-2A KLE-S-E-4A THCIB-S-E-1A hultech-S-E-1A THCIB-S-E-3A THCIB-S-E-5A THCIB-S-E-4A KLE-S-E-2A hultech-S-E-4A ORG-S-E-4A SEM12-S-E-1A SEM12-S-E-2A SEM12-S-E-4A SEM12-S-E-5A ORG-S-E-3A KLE-S-E-3A KLE-S-E-1A ORG-S-E-2A SEM12-S-E-3A hultech-S-E-3A ORG-S-E-1A …

Optimization FUSION & OPTIMIZATION

Query Type Analysis – D#-nDCG Performances Informational Queries Navigational Queries

Evaluation & Results FUSION & OPTIMIZATION

Optimization Runs & Results Optimization 1: Fusion + for navigational queries, only keep Top Results Mining (SE + TMiner Snippets, Anchors and h1 Tags). Optimization 2: Fusion + for navigational queries, give a higher weight to subtopics coming from Top Results Mining (SE + TMiner Snippets, Anchors and h1 Tags).

Evaluation

Optimization Performances for Navigational Queries Only 6 navigational queries, so no great impact on that query set, but the performance raise is great for navigational queries FusionOptimization 1 Performance Raise Optimization 2 Performance Raise D-nDCG % % I-rec % % D#-nDCG % %

CONCLUSION

THANKS