Clustering Top-Ranking Sentences for Information Access Anastasios Tombros, Joemon Jose, Ian Ruthven University of Glasgow & University of Strathclyde.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Information Retrieval (IR) on the Internet. Contents  Definition of IR  Performance Indicators of IR systems  Basics of an IR system  Some IR Techniques.
Retrieval Evaluation J. H. Wang Mar. 18, Outline Chap. 3, Retrieval Evaluation –Retrieval Performance Evaluation –Reference Collections.
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
Exercising these ideas  You have a description of each item in a small collection. (30 web sites)  Assume we are looking for information about boxers,
A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.
An Implicit Feedback approach for Interactive Information Retrieval Ryen W. White, Joemon M. Jose, Ian Ruthven University of Glasgow Hamza Hydri Syed Course.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
A machine learning approach to improve precision for navigational queries in a Web information retrieval system Reiner Kraft
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Fast multiresolution image querying CS474/674 – Prof. Bebis.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
LIS618 lecture 11 i/r performance evaluation Thomas Krichel
Evaluation David Kauchak cs458 Fall 2012 adapted from:
Evaluation David Kauchak cs160 Fall 2009 adapted from:
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
The context of the interface Ian Ruthven University of Strathclyde.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.
22 nd January 2004 UITV 2004 NewsBoy: an interactive news retrieval system Joemon M Jose The Information Retrieval Group Department of Computing Science.
From Social Bookmarking to Social Summarization: An Experiment in Community-Based Summary Generation Oisin Boydell, Barry Smyth Adaptive Information Cluster,
Summarization of XML Documents K Sarath Kumar. Outline I.Motivation II.System for XML Summarization III.Ranking Model and Summary Generation IV.Example.
Text Based Information Retrieval Text Based Information Retrieval H02C8A H02C8B Marie-Francine Moens Karl Gyllstrom Katholieke Universiteit Leuven.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Query Suggestion. n A variety of automatic or semi-automatic query suggestion techniques have been developed  Goal is to improve effectiveness by matching.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
A Repetition Based Measure for Verification of Text Collections and for Text Categorization Dmitry V.Khmelev Department of Mathematics, University of Toronto.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Search Engines WS 2009 / 2010 Prof. Dr. Hannah Bast Chair of Algorithms and Data Structures Department of Computer Science University of Freiburg Lecture.
Reference Collections: Collection Characteristics.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Extractive Summarisation via Sentence Removal: Condensing Relevant Sentences into a Short Summary Marco Bonzanini, Miguel Martinez-Alvarez, and Thomas.
Neighborhood - based Tag Prediction
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Fast multiresolution image querying
Lecture 12: Relevance Feedback & Query Expansion - II
An Empirical Study of Learning to Rank for Entity Search
IST 516 Fall 2011 Dongwon Lee, Ph.D.
IR Theory: Evaluation Methods
Text Categorization Document classification categorizes documents into one or more classes which is useful in Information Retrieval (IR). IR is the task.
Learning Literature Search Models from Citation Behavior
Combining Keyword and Semantic Search for Best Effort Information Retrieval  Andrew Zitzelberger 1.
Retrieval Utilities Relevance feedback Clustering
INF 141: Information Retrieval
Retrieval Performance Evaluation - Measures
Presentation transcript:

Clustering Top-Ranking Sentences for Information Access Anastasios Tombros, Joemon Jose, Ian Ruthven University of Glasgow & University of Strathclyde Glasgow, Scotland

Clustering Top-Ranking Sentences for Information Access Tombros, Jose & Ruthven 2 Some Background & Motivation Challenge: How to provide effective access to information Approach: Combine clustering & top-ranking sentences (TRS)  clustering has been used extensively on the document level  TRS are based on single document summaries Overall aim of the work  to create a personalised information space  to use information from users’ interaction

Clustering Top-Ranking Sentences for Information Access Tombros, Jose & Ruthven 3 Top-Ranking Sentences Assume a user with a query:  the query is sent to an IR system  consider only the top retrieved documents, e.g. 30  apply a query-biased sentence extraction model to each of these documents  construct a sentence extract of max. 4 sentences per document  the set of these sentences for the 30 documents is the set of TRS  TRS can be ranked by their query-biased scores

Clustering Top-Ranking Sentences for Information Access Tombros, Jose & Ruthven 4 Top-Ranking Sentences (cntd.) TRS have shown to be effective in interactive IR on the Web  they provide effective access to the retrieved information They can be seen as a level of abstraction of the set of retrieved documents We introduce an extra layer of abstraction by clustering the set of TRS

Clustering Top-Ranking Sentences for Information Access Tombros, Jose & Ruthven 5 Clustering Top-Ranking Sentences An attempt to create a personalised information space  sentences give local contexts in which query terms occur  sentences discussing query terms in similar contexts should cluster together  this structure should facilitate a more intuitive and effective access to information Similarities and differences to document clustering

Clustering Top-Ranking Sentences for Information Access Tombros, Jose & Ruthven 6 We used 4 searchers with a total of 16 queries  each searcher assessed the utility of the top 30 documents on a scale of 1-10 For each query:  we downloaded the top-30 retrieved documents  we extracted the set of TRS  we clustered the 30 documents and the set of TRS  we assigned scores to document & TRS clusters  sum of the document (sentence) scores divided by the number of documents (sentences) in the cluster Comparing TRS and Document Clustering

Clustering Top-Ranking Sentences for Information Access Tombros, Jose & Ruthven 7 Some Results Scores of TRS clusters were significantly higher than those of document clusters  best cluster averages: 4.78 vs  overall averages: 3.2 vs Average precision and recall were higher for TRS clusters  define P & R based on documents with scores ≥ 7  average P: 0.38 vs  average R: 0.73 vs Cluster sizes were comparable  5 docs per cluster vs. 5.3 sentences per cluster

Clustering Top-Ranking Sentences for Information Access Tombros, Jose & Ruthven 8 Conclusions & Future Plans TRS clusters have the potential to offer more effective information access  only one aspect of their expected utility Integrate TRS clustering in interactive web searching  investigate its utility in user-based studies on the live Internet We have extended the reported work  more searchers & queries, different clustering methods  inter-sentence similarities, structure of information space