21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Chapter 5: Introduction to Information Retrieval
Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Eye Tracking Analysis of User Behavior in WWW Search Laura Granka Thorsten Joachims Geri Gay.
Site Level Noise Removal for Search Engines André Luiz da Costa Carvalho Federal University of Amazonas, Brazil Paul-Alexandru Chirita L3S and University.
Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Introduction Information Management systems are designed to retrieve information efficiently. Such systems typically provide an interface in which users.
“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS
Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection Panagiotis G. Ipeirotis Luis Gravano Computer Science Department Columbia.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
© Anselm Spoerri Lecture 13 Housekeeping –Term Projects Evaluations –Morse, E., Lewis, M., and Olsen, K. (2002) Testing Visual Information Retrieval Methodologies.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Chapter 5: Information Retrieval and Web Search
Search Engines and Information Retrieval Chapter 1.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
JASS 2005 Next-Generation User-Centered Information Management Information visualization Alexander S. Babaev Faculty of Applied Mathematics.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.
Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.
Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Chapter 6: Information Retrieval and Web Search
Reduction of Training Noises for Text Classifiers Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Interaction LBSC 734 Module 4 Doug Oard. Agenda Where interaction fits Query formulation Selection part 1: Snippets  Selection part 2: Result sets Examination.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Information Retrieval Part 2 Sissi 11/17/2008. Information Retrieval cont..  Web-Based Document Search  Page Rank  Anchor Text  Document Matching.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Queensland University of Technology
Information Retrieval in Practice
Information Organization: Overview
Information Retrieval in Practice
Author: Kazunari Sugiyama, etc. (WWW2004)
Data Mining Chapter 6 Search Engines
CSE 635 Multimedia Information Retrieval
Panagiotis G. Ipeirotis Luis Gravano
Relevance and Reinforcement in Interactive Browsing
Information Organization: Overview
Presentation transcript:

21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover, 08 November 2006

21/11/20152Gianluca Demartini Outline Introduction Rankings Algorithms considered Experimental Setup Results Conclusions

21/11/20153Gianluca Demartini Introduction (1) Search the Web: results are presented sorted using a score value Users should be able to browse the results efficiently An interface that clusters documents performs better Common task in Clustering Search Engines (SE): ordering the results of the classification An efficient ordering of the clusters will be benefic for the user

21/11/20154Gianluca Demartini Introduction (2) We analyze a set of ten different metrics for ordering clusters of search engine result:  Ranking by SE Scores  Ranking by Query to Cluster Similarity  Ranking by Intra Cluster Similarity  Measures independent of the documents within the cluster Two different clustering algorithms: performances of the cluster rankings is not dependent of the clustering algorithms used

21/11/20155Gianluca Demartini SE already employ such an output structuring: Vivisimo, iBoogie, Mooter, Grokker, etc. Many Techniques to cluster web search results: flat manner, or in a hierarchical way Clustering useful for clarifying a vague query, by showing the dominant themes Related Work (1)

Related Work (2) How to display search results to the users: they find answers faster using a categorized organization Faceted search: an Alphabetical order is commonly utilized Text Classifiers: SVM better than Bayesian for Text Classification 21/11/2015Gianluca Demartini6

21/11/20157Gianluca Demartini Outline Introduction Rankings Algorithms considered Experimental Setup Results Conclusions

21/11/20158Gianluca Demartini Cluster Ranking Algorithms 10 different ranking algorithms considered:  Ranking by search engine scores (4)  Ranking by Query to Cluster Similarity (1)  Ranking by Intra Cluster Similarity (2)  Measures independent of the documents within the cluster (3)

21/11/20159Gianluca Demartini Ranking by search engine scores (1) PageRank computation: page at position x Average PageRank Total PageRank

21/11/201510Gianluca Demartini Ranking by search engine scores (2) Average Rank Minimum Rank

21/11/201511Gianluca Demartini Ranking by Query to Cluster Similarity Normalized Logarithmic Likelihood Ratio Average Query/Page similarity

21/11/201512Gianluca Demartini Ranking by Intra Cluster Similarity Similarity between pages and categories (title + description)  values returned by the classifiers  probability that a document belongs to some category  strength with which every result belongs to its assigned category Average Intra Cluster Similarity. (AvgValue)  over all the pages that belong to a category  to the top of the list, clusters where the results are most relevant to their category Maximum Intra Cluster Similarity. (MaxValue)  the focus is on the best match-ing document of each cluster only  the results the user views first are those that have been best classified

21/11/201513Gianluca Demartini Other Metrics Metrics which seem to be used by current commercial web SE and a baseline Order by Size  using the number of docs belonging to the category  used by most of the Clustering SE (e.g. Vivisimo) Alphabetical Order  used in Faceted Search (e.g. Flamenco) Random Order  to compare the other metrics

21/11/201514Gianluca Demartini Outline Introduction Basic Concepts Rankings Algorithms considered Experimental Setup Results Conclusions

21/11/201515Gianluca Demartini Experimental Setup (1) 20 algorithms (10 ranks, 2 classifiers), 20 people Supporting Vector Machines (SVM) and Bayes as Text Classifiers  the performance of the ranking algorithms considered does not depend on the clustering algorithm used ODP categories (top 3 levels) most frequent terms in ODP titles and descriptions of web pages English categories

21/11/201516Gianluca Demartini Experimental Setup (2) Each user evaluated each (algorithm,classifier) once:  task: select the first relevant result  no information about which algorithm was being used  subject began the evaluation from different algorithms  the order of results within a category is the one of Google We measure the time spent for search the relevant result and the position of the results Each user 20 query:  12 from Topic Distillation Task of the Web Track 2003  8 from TREC Web Track 2004 (4 of them ambiguous)  one extra query at the beginning for getting familiarized

21/11/201517Gianluca Demartini Experimental Setup (3) Classification:  retrieved titles and snippets of the top 50 results from Google  allowed each result to belong to maximum three categories (the ones with the best similarity values)  showed to the user only the top 75 results after ranking the clusters to put emphasis on the performances of the ranking  all the results were cached to ensure that results from different participants were comparable

21/11/2015Gianluca Demartini18

21/11/201519Gianluca Demartini Outline Introduction Basic Concepts Rankings Algorithms considered Experimental Setup Results Conclusions

21/11/201520Gianluca Demartini Experimental Results Time to find the relevant result

21/11/201521Gianluca Demartini Experimental Results Time to find the relevant result  NLLR allowed the user to find relevant results in the fastest way, with an average of 31s  performances of Alphabetical and the Size based rankings is rather average  Topic Distillation ones have been the most difficult: they have a task associated  Web Track ambiguous ones were the easiest: no specific search task was associated, and thus the first relevant result was easier to find  experiment is statistically significant at a 99% confidence level.

21/11/201522Gianluca Demartini Experimental Results Average of the position of the algorithm for each user

21/11/201523Gianluca Demartini Experimental Results Average Rank of the Result

21/11/201524Gianluca Demartini Experimental Results Average Rank of the Cluster

21/11/2015Gianluca Demartini25 The results are slightly better when using SVM Experimental Results

21/11/201526Gianluca Demartini Conclusions & Future Work Similarity between the user query and the documents seems to be the best approach to order search result clusters Alphabetical and Size Ranking are not so good We want to test other algorithms  click-thorought data  clustering algorithms which produce results more apart from each other

21/11/201527Gianluca Demartini Thanks for your attention! Q&A