Presentation is loading. Please wait.

Presentation is loading. Please wait.

Clustering Top-Ranking Sentences for Information Access Anastasios Tombros, Joemon Jose, Ian Ruthven University of Glasgow & University of Strathclyde.

Similar presentations


Presentation on theme: "Clustering Top-Ranking Sentences for Information Access Anastasios Tombros, Joemon Jose, Ian Ruthven University of Glasgow & University of Strathclyde."— Presentation transcript:

1 Clustering Top-Ranking Sentences for Information Access Anastasios Tombros, Joemon Jose, Ian Ruthven University of Glasgow & University of Strathclyde Glasgow, Scotland

2 Clustering Top-Ranking Sentences for Information Access Tombros, Jose & Ruthven 2 Some Background & Motivation Challenge: How to provide effective access to information Approach: Combine clustering & top-ranking sentences (TRS)  clustering has been used extensively on the document level  TRS are based on single document summaries Overall aim of the work  to create a personalised information space  to use information from users’ interaction

3 Clustering Top-Ranking Sentences for Information Access Tombros, Jose & Ruthven 3 Top-Ranking Sentences Assume a user with a query:  the query is sent to an IR system  consider only the top retrieved documents, e.g. 30  apply a query-biased sentence extraction model to each of these documents  construct a sentence extract of max. 4 sentences per document  the set of these sentences for the 30 documents is the set of TRS  TRS can be ranked by their query-biased scores

4 Clustering Top-Ranking Sentences for Information Access Tombros, Jose & Ruthven 4 Top-Ranking Sentences (cntd.) TRS have shown to be effective in interactive IR on the Web  they provide effective access to the retrieved information They can be seen as a level of abstraction of the set of retrieved documents We introduce an extra layer of abstraction by clustering the set of TRS

5 Clustering Top-Ranking Sentences for Information Access Tombros, Jose & Ruthven 5 Clustering Top-Ranking Sentences An attempt to create a personalised information space  sentences give local contexts in which query terms occur  sentences discussing query terms in similar contexts should cluster together  this structure should facilitate a more intuitive and effective access to information Similarities and differences to document clustering

6 Clustering Top-Ranking Sentences for Information Access Tombros, Jose & Ruthven 6 We used 4 searchers with a total of 16 queries  each searcher assessed the utility of the top 30 documents on a scale of 1-10 For each query:  we downloaded the top-30 retrieved documents  we extracted the set of TRS  we clustered the 30 documents and the set of TRS  we assigned scores to document & TRS clusters  sum of the document (sentence) scores divided by the number of documents (sentences) in the cluster Comparing TRS and Document Clustering

7 Clustering Top-Ranking Sentences for Information Access Tombros, Jose & Ruthven 7 Some Results Scores of TRS clusters were significantly higher than those of document clusters  best cluster averages: 4.78 vs. 5.82  overall averages: 3.2 vs. 3.73 Average precision and recall were higher for TRS clusters  define P & R based on documents with scores ≥ 7  average P: 0.38 vs. 0.49  average R: 0.73 vs. 0.77 Cluster sizes were comparable  5 docs per cluster vs. 5.3 sentences per cluster

8 Clustering Top-Ranking Sentences for Information Access Tombros, Jose & Ruthven 8 Conclusions & Future Plans TRS clusters have the potential to offer more effective information access  only one aspect of their expected utility Integrate TRS clustering in interactive web searching  investigate its utility in user-based studies on the live Internet We have extended the reported work  more searchers & queries, different clustering methods  inter-sentence similarities, structure of information space


Download ppt "Clustering Top-Ranking Sentences for Information Access Anastasios Tombros, Joemon Jose, Ian Ruthven University of Glasgow & University of Strathclyde."

Similar presentations


Ads by Google