Retrieval Model and Evaluation Jinyoung Kim UMass Amherst CS646 Lecture 1.

Retrieval Model and Evaluation Jinyoung Kim UMass Amherst CS646 Lecture 1

Outline Personal Search Overview Retrieval Models for Personal Search Evaluation Methods for Personal Search Associative Browsing Model for Personal Info. Experimental Results 2 (optional)

Personal Search What Searching over user’s own personal information Desktop search is most common form Why Personal information has grown over the years In terms of the amount and heterogeneity Search can help users access their information Q : Is it the only option? How about browsing? 4

Typical Scenarios I'm looking for an email about my last flight I want to retrieve all I've read about Apple iPad I need to find a slide I wrote for IR seminar Q : Anything else? 5

Personal Search Example Query : James Registration 6

Personal Search Example User-defined ranking for type-specific results Can't we do better than this? 7

Characteristics & Related Problems People mostly do ‘re-finding’ Known-item search Many document types Federated Search (Distributed IR) Unique metadata for each type Semi-structured document retrieval 8

Research Issues How can we exploit the document structure (e.g. metadata) for retrieval? How can we evaluate personal search algorithms overcoming privacy concerns? What are other methods for personal information access? e.g. Associative Browsing Model 9

Design Considerations Each type has different characteristics How can we exploit type-specific features? e.g. email has a thread structure Knowing the document type the user is looking for will be useful How can we make this prediction? Users want to see the combined result How would you present the result? 11

Retrieval-Merge Strategy 1. Type-specific Ranking Use most suitable algorithm for each type 2. Type Prediction Predict which document type user is looking for 3. Combine into the Final Result Rank list merging 12

Type-specific Ranking Document-based Retrieval Model Score each document as a whole Field-based Retrieval Model Combine evidences from each field q 1 q 2... q m Document-based Scoring Field-based Scoring f1f1 f1f1 f2f2 f2f2 fnfn fnfn... q 1 q 2... q m f1f1 f1f1 f2f2 f2f2 fnfn fnfn... f1f1 f1f1 f2f2 f2f2 fnfn fnfn w1w1 w2w2 wnwn w1w1 w2w2 wnwn 13

Type-specific Ranking Document-based Methods Document Query-likelihood (DQL) Field-based Methods Mixture of Field Language Models (MFLM) w j is trained to maximize retrieval performance e.g. : 1 / : 0.5 /...

Type-specific Ranking Example Query : james registration Document fields : Term distribution DQL vs. MFLM 15 P(q|f)james.sregist.sjames.cregist.cjames.tregist.t D 1 (R)0/101/101/1005/1001/20/2 D 2 (NR)0/10 5/10020/1000/2 DQL 1 : (1+1)/112 * (5+1)/112 DQL 2 : 5/112 * 20/112 DQL 1 (0.105) < DQL 2 (0.877) MFLM 1 : (1/100+1/2) * (1/10+5/100) MFLM 2 : 5/100 * 20/100 MFLM 1 (0.077) > MFLM 2 (0.01)

Type-specific Ranking Probabilistic Retrieval Model for Semi-structured data (PRM-S) [KXC09] Basic Idea Use the probabilistic mapping between query-words and document fields for weighting q 1 q 2... q m f1f1 f1f1 f2f2 f2f2 fnfn fnfn... f1f1 f1f1 f2f2 f2f2 fnfn fnfn P(F 1 |q 1 ) P(F 2 |q 1 ) P(F n |q 1 ) P(F 1 |q m ) P(F 2 |q m ) P(F n |q m ) 16

Type-specific Ranking PRM-S Model [KXC09] Estimate the implicit mapping of each query word to document fields Combine field-level evidences based on mapping probabilities 17 F j : field of collection f j : field of each document

Type-specific Ranking MFLM vs. PRM-S 18 q 1 q 2... q m f1f1 f1f1 f2f2 f2f2 fnfn fnfn... f1f1 f1f1 f2f2 f2f2 fnfn fnfn w1w1 w2w2 wnwn w1w1 w2w2 wnwn q 1 q 2... q m f1f1 f1f1 f2f2 f2f2 fnfn fnfn... f1f1 f1f1 f2f2 f2f2 fnfn fnfn P(F 1 |q 1 ) P(F 2 |q 1 ) P(F n |q 1 ) P(F 1 |q m ) P(F 2 |q m ) P(F n |q m )

Type-specific Ranking Why does PRM-S work? Relevant document has query-terms in many different fields PRM-S boosts P QL (q|f) when query-term is found in ‘correct’ field(s) 19

Type-specific Ranking PRM-S Model [KXC09] Performance in TREC Email Search Task W3C mailing list collection 150 known-item queries Q : Will it work for other document types? e.g. webpages and office documents (Mean Reciprocal Rank) 20

Predicting Document Type A look on Federated Search (aka Distributed IR) There are many information silos (resources) Users want to search over all of them Three major problems Resource representation Resource selection Result merging 21

Predicting Document Type Query-likelihood of Collection [Si02] Get query-likelihood score for each collection LM Treat each collection as a big bag of words Best performance in recent evaluation [Thomas09] Q : Can we exploit the field structure here? 22

Predicting Document Type Field-based collection Query-Likelihood [KC10] Calculate QL score for each field of a collection Combine Field-level scores into a collection score Why it works? Terms from shorter fields are better represented e.g. ‘James’ from, ‘registration’ from Recall why MFLM worked better than DQL 23

Merging into Final Rank List What we have for each collection Type-specific ranking Type score CORI Algorithm for Merging [Callan95] Use normalized collection and document score 24

Challenges in Personal Search Evaluation Hard to create a ‘test-collection’ Each user has different documents and habits Privacy concerns People will not donate their documents and queries for research Q : Can’t we just do some user study? 26

Problems with User Studies It’s costly A ‘working’ system should be implemented Participants should be using it for a long time Big barrier for academic researchers Data is not reusable by third parties The findings cannot be repeated by others Q : How can we perform a cheap & repeatable evaluation? 27

Pseudo-desktop Method [KC09] Collect documents of reasonable size and variety Generate queries automatically Randomly select a target document Take terms from the document Validate generated queries with manual queries Collected by showing each document and asking: ‘What is the query you might use to find this one?’ 28

DocTrack Game [KC10] Basic Idea 1. The user is shown a target document 2. The user is asked to find the document 3. Score is given based on user’s search result 29

DocTrack Game [KC10] Benefits Participants are motivated to contribute the data Resulting queries and logs are reusable Free from privacy concern Much cheaper than doing a traditional user study Limitations Artificial data & task 30

Experimental Setting Pseudo-desktop Collections Crawl of W3C mailing list & documents Automatically generated queries 100 queries / average length 2 CS Collection UMass CS department webpages & emails & etc. Human-formulated queries from DocTrack game 984 queries / average length 3.97 Other details Mean Reciprocal Rank was used for evaluation 32

Collection Statistics Pseudo-desktop Collections CS Collection 33 (#Docs (Length))

Type Prediction Performance Pseudo-desktop Collections CS Collection FQL improves performance over CQL Combining features improves the performance further 34 (% of queries with correct prediction)

Pseudo-desktop Collections CS Collection Retrieval Performance (Mean Reciprocal Rank) 35 Best : use best type-specific retrieval method Oracle : predict correct type perfectly

Motivation Keyword search doesn’t always work Sometimes you don’t have ‘good’ keyword Browsing can help here, yet Hierarchical folder structure is restrictive You can’t tag ‘all’ your documents Associative browsing as a solution Our mind seems to work by association Let’s use a similar model for personal info! 37

Data Model 38

Building the Model Concepts are extracted from metadata e.g. senders and receivers of email Concept occurrences are found in documents The link between concepts and documents We still need to find the link between concepts and between documents There are many ways of doing that Let’s build a feature-based model, where weights are adjusted by user’s click feedback 39

Summary Retrieval Model Retrieval-merge strategy works for personal search Exploiting field structure is helpful both for retrieval and type prediction Evaluation Method Evaluation itself is a challenge for personal search Reasonable evaluation can be done by simulation or game-based user study Associative Browsing Model Search can be combined with other interaction models to enable better information access 41

More Lessons Resembling user’s mental process is the key for the design of a retrieval model The ‘mapping’ assumption of PRM-S model Language models are useful for many tasks e.g. Document LM / Field LM / Collection LM /... Each domain requires specialized retrieval model and evaluation method Search is never a solved problem! 42

Major References [KXC09] A Probabilistic Retrieval Model for Semi-structured Data Jinyoung Kim, Xiaobing Xue and W. Bruce Croft in ECIR'09 [KC09] Retrieval Experiments using Pseudo-Desktop Collections Jinyoung Kim and W. Bruce Croft in CIKM'09 [KC10] Ranking using Multiple Document Types in Desktop Search Jinyoung Kim and W. Bruce Croft in SIGIR'10 [KBSC10] Building a Semantic Representation for Personal Information Jinyoung Kim, Anton Bakalov, David A. Smith and W. Bruce Croft in CIKM'10 43

Further References My webpage http://www.cs.umass.edu/~jykim Chapter in [CMS] Retrieval Models (Ch7) / Evaluation (Ch8) Chapter in [MRS] XML Retrieval (Ch10) / Language Model (Ch12) 44

Retrieval Model and Evaluation Jinyoung Kim UMass Amherst CS646 Lecture 1.

Similar presentations

Presentation on theme: "Retrieval Model and Evaluation Jinyoung Kim UMass Amherst CS646 Lecture 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Retrieval Model and Evaluation Jinyoung Kim UMass Amherst CS646 Lecture 1.

Similar presentations

Presentation on theme: "Retrieval Model and Evaluation Jinyoung Kim UMass Amherst CS646 Lecture 1."— Presentation transcript:

Similar presentations

About project

Feedback