Applying Diversity Metrics to Improve the Selection of Web Search Term Refinements CS224N 2008 Tague Griffith, Jan Pfeifer.

Slides:



Advertisements
Similar presentations
eClassifier: Tool for Taxonomies
Advertisements

A Framework for Result Diversification
Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:
Albert Gatt Corpora and Statistical Methods Lecture 13.
Learning to Cluster Web Search Results SIGIR 04. ABSTRACT Organizing Web search results into clusters facilitates users quick browsing through search.
Creating a Similarity Graph from WordNet
Scott Wen-tau Yih (Microsoft Research) Joint work with Vahed Qazvinian (University of Michigan)
Jean-Eudes Ranvier 17/05/2015Planet Data - Madrid Trustworthiness assessment (on web pages) Task 3.3.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Recommender Systems Aalap Kohojkar Yang Liu Zhan Shi March 31, 2008.
Distributional Clustering of Words for Text Classification Authors: L.Douglas Baker Andrew Kachites McCallum Presenter: Yihong Ding.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
By Fernando Seoane, April 25 th, 2006 Demo for Non-Parametric Classification Euclidean Metric Classifier with Data Clustering.
1 Information Retrieval and Web Search Introduction.
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
Sparse vs. Ensemble Approaches to Supervised Learning
Chapter 5: Information Retrieval and Web Search
Information Retrieval in Practice
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
TransRank: A Novel Algorithm for Transfer of Rank Learning Depin Chen, Jun Yan, Gang Wang et al. University of Science and Technology of China, USTC Machine.
A REVIEW OF FEATURE SELECTION METHODS WITH APPLICATIONS Alan Jović, Karla Brkić, Nikola Bogunović {alan.jovic, karla.brkic,
Some studies on Vietnamese multi-document summarization and semantic relation extraction Laboratory of Data Mining & Knowledge Science 9/4/20151 Laboratory.
Search Engines and Information Retrieval Chapter 1.
On a “Buzzword” Hierarchical Structure. CS-575 Software Design, Team 12 Team 1 CS575 – Software Design Bob Hazen, Mike Mangos, Tim Santucci, Chris Dahn.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Case Base Maintenance(CBM) Fabiana Prabhakar CSE 435 November 6, 2006.
Short Text Understanding Through Lexical-Semantic Analysis
Web Document Clustering By Sang-Cheol Seok. 1.Introduction: Web document clustering? Why ? Two results for the same query ‘amazon’ Google : currently.
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
Multi-Prototype Vector Space Models of Word Meaning __________________________________________________________________________________________________.
INF 141 COURSE SUMMARY Crista Lopes. Lecture Objective Know what you know.
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Improving Classification Accuracy Using Automatically Extracted Training Data Ariel Fuxman A. Kannan, A. Goldberg, R. Agrawal, P. Tsaparas, J. Shafer Search.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%
too.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Contextual Ranking of Keywords Using Click Data ICDE`09 Utku Irmak Vadim von Brzeski Vadim von Brzeski Reiner Kraft.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
1 Helping Editors Choose Better Seed Sets for Entity Set Expansion Vishnu Vyas, Patrick Pantel, Eric Crestan CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/05/10.
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.
Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science With funding from the National Science.
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original.
INFORMATION RETRIEVAL PROJECT Creation of clusters of concepts that represent a domain corpus.
Post-Ranking query suggestion by diversifying search Chao Wang.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Jianping Fan Department of Computer Science University of North Carolina at Charlotte Charlotte, NC Relevance Feedback for Image Retrieval.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Machine Learning with Spark MLlib
Data Mining: Concepts and Techniques
Semantic Processing with Context Analysis
Information Retrieval and Web Search
Information Retrieval and Web Search
Comparing Genetic Algorithm and Guided Local Search Methods
Statistical NLP: Lecture 9
A method for WSD on Unrestricted Text
Jia-Bin Huang Virginia Tech
WSExpress: A QoS-Aware Search Engine for Web Services
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

Applying Diversity Metrics to Improve the Selection of Web Search Term Refinements CS224N 2008 Tague Griffith, Jan Pfeifer

Web Search Refinements

Problem Redundant refinements in a limited space Technical senses dominate others: Java island vs Java programming language Amazon river/rain forest vs Amazon the company What happens with too much diversity Amazon grill houston Embraer ERJ 145 Amazon

CBC Word Sense Similarity Similarity of terms measured by feature vectors Features are a combination of co-occurring words with their syntactic context “wine”: [“sip _”+“Verb-Object”,...] Data from Wikipedia corpus Problems: Little overlap between web data and Wikipedia data Hyponym siblings too similar, but good refinements “planet jupiter” and “planet earth”

Web Semantic Similarity Similarity as a function of web search engines results Maximum Marginal Relevance greedy algorithm MMR=argmax_x { (1-a)popularity(x) + (a)diversity(x) } x = candidate refinement popularity(x) given by recent search logs diversity(x) given by overlapping search results Clustering of terms demonstrates validity

Tools: demo

Tools: demo

AB Editorial Test 0.0, 0.3 and 0.8 diversity Evaluate utility of refinements Scale: definitely better, slightly better, same 17 editors Mixed results, with high variability

Results Problems with increased diversity: Editor penalized long refinements Spam and adult terms have “artificial” diversity in web semantic More mixed language results Esoteric refinements Refinement selection should include: Popularity feature Diversity feature Length feature Category classification feature (spam, adult, etc.) ‏