Download presentation
Presentation is loading. Please wait.
Published byAja Turnage Modified over 9 years ago
1
Applying Diversity Metrics to Improve the Selection of Web Search Term Refinements CS224N 2008 Tague Griffith, Jan Pfeifer
2
Web Search Refinements
3
Problem Redundant refinements in a limited space Technical senses dominate others: Java island vs Java programming language Amazon river/rain forest vs Amazon the company What happens with too much diversity Amazon grill houston Embraer ERJ 145 Amazon
4
CBC Word Sense Similarity Similarity of terms measured by feature vectors Features are a combination of co-occurring words with their syntactic context “wine”: [“sip _”+“Verb-Object”,...] Data from Wikipedia corpus Problems: Little overlap between web data and Wikipedia data Hyponym siblings too similar, but good refinements “planet jupiter” and “planet earth”
5
Web Semantic Similarity Similarity as a function of web search engines results Maximum Marginal Relevance greedy algorithm MMR=argmax_x { (1-a)popularity(x) + (a)diversity(x) } x = candidate refinement popularity(x) given by recent search logs diversity(x) given by overlapping search results Clustering of terms demonstrates validity
6
Tools: demo http://abstract.homelinux.org:9240/janpf/fp/diversity_demo.php?term=target
7
Tools: demo
9
AB Editorial Test 0.0, 0.3 and 0.8 diversity Evaluate utility of refinements Scale: definitely better, slightly better, same 17 editors Mixed results, with high variability
10
Results Problems with increased diversity: Editor penalized long refinements Spam and adult terms have “artificial” diversity in web semantic More mixed language results Esoteric refinements Refinement selection should include: Popularity feature Diversity feature Length feature Category classification feature (spam, adult, etc.)
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.