Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ranking using Multiple Document Types in Desktop Search

Similar presentations


Presentation on theme: "Ranking using Multiple Document Types in Desktop Search"— Presentation transcript:

1 Ranking using Multiple Document Types in Desktop Search
Jinyoung Kim, W. Bruce Croft SIGIR’10 Speaker: Hsin-Lan, Wang Date: 2011/05/02

2 Outline Introduction Retrieval Model Experiment Conclusion
Type-specific Retrieval Type Prediction Result Merging Experiment Conclusion

3 Introduction People have many types of documents on their desktop with different sets of metadata for each type. →sender、receiver office document→filename、author Desktop search system: being able to predict which type of document a user is looking for given a query.

4 Introduction Main goal: to show how the retrieval effectiveness of a desktop search system can be enhanced by improving type prediction performance.

5 Retrieval Model query Q = (q1,…,qm)
each collection C contains documents of n field types (F1,…,Fn) each document d may include fields (f1,…,fn)

6 Type-specific Retrieval
Goal: to rank documents from each sub-collection. Probabilistic Retrieval Model for Semi-structured Data PQL(qi|fj)=(1-λ)P(qi|fj)+λP(qi|Fj)

7 Type Prediction Goal: to score each collection given a user query.
Methods for Type Prediction Query-likelihood of Collection Query-likelihood of Query Log

8 Type Prediction Methods for Type Prediction Geometric Average ReDDE
Query Clarity

9 Type Prediction Methods for Type Prediction Dictionary-based Matching
Built the dictionary for each sub-collection by using the names of the collection and metadata fields. Using Document Metadata Fields New method: field-based collection query likelihood (FQL).

10 Type Prediction Combining Type Prediction Methods
Grid-search of Parameter Values Golden Section Search Multi-class Classification MultiSVM (Liblinear Toolkit) Rank-learning Method RankSVM

11 Result Merging C: collection score (from type prediction)
D: document score (from type-specific retrieval)

12 Experiment Pseudo-desktop Collection Generation Method
collect documents with similar characteristics generate queries by statistically taking terms from each of the target documents

13 Experiment Pseudo-desktop Collection Prediction Accuracy
Retrieval Performance Mean Reciprocal Rank

14 Experiment Best: use the retrieval method with the best aggregate performance for each sub-collection. Uniform: each collection has the same chance of containing the relevant document. Oracle: have perfect knowledge of the collection that contains the relevant document

15 Experiment CS Collection Generation Method Prediction Accuracy
DocTrack game Prediction Accuracy

16 Experiment CS Collection Retrieval Performance
Leave-one-out Prediction Accuracy

17 Conclusion Suggest a retrieval model for desktop search where type-specific retrieval results are merged into the final rank list based on type prediction scores. Introduce FQL – a new type prediction method.

18 Conclusion Develop a human computation game for collecting queries in a more realistic setting. Show that the combination method can improve type prediction performance.


Download ppt "Ranking using Multiple Document Types in Desktop Search"

Similar presentations


Ads by Google