Presentation is loading. Please wait.

Presentation is loading. Please wait.

Seesaw Personalized Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.

Similar presentations


Presentation on theme: "Seesaw Personalized Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR."— Presentation transcript:

1 Seesaw Personalized Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR

2

3 Query expansion Personalization Algorithms Standard IR Document Query User Server Client

4 Query expansion Personalization Algorithms Standard IR Document Query User Server Client v. Result re-ranking

5 Result Re-Ranking Ensures privacy Good evaluation framework Can look at rich user profile Look at light weight user models  Collected on server side  Sent as query expansion

6 Seesaw Search EngineSeesaw dog 1 cat10 india 2 mit 4 search93 amherst12 vegas 1

7 Seesaw Search Engine query dog 1 cat10 india 2 mit 4 search93 amherst12 vegas 1

8 Seesaw Search Engine query dog 1 cat10 india 2 mit 4 search93 amherst12 vegas 1 dog cat monkey banana food baby infant child boy girl forest hiking walking gorp baby infant child boy girl csail mit artificial research robot web search retrieval ir hunt

9 Seesaw Search Engine query dog 1 cat10 india 2 mit 4 search93 amherst12 vegas 1 1.60.2 6.0 0.2 2.7 1.3 Search results page web search retrieval ir hunt 1.3

10 Calculating a Document’s Score Based on standard tf.idf web search retrieval ir hunt 1.3

11 Calculating a Document’s Score Based on standard tf.idf (r i +0.5)(N-n i -R+r i +0.5) (n i -r i +0.5)(R-r i +0.5) w i = log 1.3 0.1 0.5 0.05 0.35 0.3 User as relevance feedback  Stuff I’ve Seen index  More is better

12 Finding the Score Efficiently Corpus representation (N, n i )  Web statistics  Result set Document representation  Download document  Use result set snippet Efficiency hacks generally OK!

13 Evaluating Personalized Search 15 evaluators Evaluate 50 results for a query  Highly relevant  Relevant  Irrelevant Measure algorithm quality  DCG(i) = { Gain(i), DCG (i–1) + Gain(i)/log(i), if i = 1 otherwise

14 Evaluating Personalized Search Query selection  Chose from 10 pre-selected queries  Previously issued query cancer Microsoft traffic … bison frise Red Sox airlines … Las Vegas rice McDonalds … Pre-selected 53 pre-selected (2-9/query) Total: 137 Joe Mary

15 Seesaw Improves Text Retrieval Random Relevance Feedback Seesaw

16 Text Features Not Enough

17 Take Advantage of Web Ranking

18 Further Exploration Explore larger parameter space Learn parameters  Based on individual  Based on query  Based on results Give user control?

19 Making Seesaw Practical Learn most about personalization by deploying a system Best algorithm reasonably efficient Merging server and client  Query expansion Get more relevant results in the set to be re-ranked  Design snippets for personalization

20 User Interface Issues Make personalization transparent Give user control over personalization  Slider between Web and personalized results  Allows for background computation Creates problem with re-finding  Results change as user model changes  Thesis research – Re:Search Engine

21 Thank you! teevan@csail.mit.edu

22 END

23 Personalizing Web Search Motivation Algorithms Results Future Work

24 Personalizing Web Search Motivation Algorithms Results Future Work

25 Study of Personal Relevancy 15 participants  Microsoft employees  Managers, support staff, programmers, … Evaluate 50 results for a query  Highly relevant  Relevant  Irrelevant ~10 queries per person

26 Study of Personal Relevancy Query selection  Chose from 10 pre-selected queries  Previously issued query cancer Microsoft traffic … bison frise Red Sox airlines … Las Vegas rice McDonalds … Pre-selected 53 pre-selected (2-9/query) Total: 137 Joe Mary

27 Relevant Results Have Low Rank Highly Relevant Relevant Irrelevant

28 Relevant Results Have Low Rank Highly Relevant Relevant Irrelevant Rater 1 Rater 2

29 Same Results Rated Differently Average inter-rater reliability: 56% Different from previous research  Belkin: 94% IRR in TREC  Eastman: 85% IRR on the Web Asked for personal relevance judgments Some queries more correlated than others

30 Same Query, Different Intent Different meanings  “Information about the astronomical/astrological sign of cancer”  “information about cancer treatments” Different intents  “is there any new tests for cancer?”  “information about cancer treatments”

31 Same Intent, Different Evaluation Query: Microsoft  “information about microsoft, the company”  “Things related to the Microsoft corporation”  “Information on Microsoft Corp” 31/50 rated as not irrelevant  Only 6/31 do more than one agree  All three agree only for www.microsoft.comwww.microsoft.com  Inter-rater reliability: 56%

32 Search Engines are for the Masses JoeMary

33 Much Room for Improvement Group ranking  Best improves on Web by 38%  More people  Less improvement

34 Much Room for Improvement Group ranking  Best improves on Web by 38%  More people  Less improvement Personal ranking  Best improves on Web by 55%  Remains constant

35 - Seesaw Search Engine- See- Seesaw Personalizing Web Search Motivation Algorithms Results Future Work

36 BM25 N nini NniNni w i = log riri R with Relevance Feedback Score = Σ tf i * w i

37 N nini (r i +0.5)(N-n i -R+r i +0.5) (n i -r i +0.5)(R-r i +0.5) riri R w i = log Score = Σ tf i * w i BM25with Relevance Feedback

38 (r i +0.5)(N-n i -R+r i +0.5) (n i - r i +0.5)(R-r i +0.5) User Model as Relevance Feedback N nini R riri Score = Σ tf i * w i (r i +0.5)(N’-n i ’-R+r i +0.5) (n i ’- r i +0.5)(R-r i +0.5) w i = log N’ = N+R n i ’ = n i +ri

39 User Model as Relevance Feedback N nini R riri World User Score = Σ tf i * w i

40 User Model as Relevance Feedback R riri User N nini World World related to query N nini Score = Σ tf i * w i

41 User Model as Relevance Feedback N nini R riri World User World related to query User related to query R N nini riri Query Focused Matching Score = Σ tf i * w i

42 User Model as Relevance Feedback N nini R riri World User Web related to query User related to query R N riri Query Focused Matching nini World Focused Matching Score = Σ tf i * w i

43 Parameters Matching User representation World representation Query expansion

44 Parameters Matching User representation World representation Query expansion Query focused World focused

45 Parameters Matching User representation World representation Query expansion Query focused World focused

46 User Representation Stuff I’ve Seen (SIS) index  MSR research project [Dumais, et al.]  Index of everything a user’s seen Recently indexed documents Web documents in SIS index Query history None

47 Parameters Matching User representation World representation Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history None

48 Parameters Matching User representation World representation Query expansion Query Focused World Focused All SIS Recent SIS Web SIS Query History None

49 World Representation Document Representation  Full text  Title and snippet Corpus Representation  Web  Result set – title and snippet  Result set – full text

50 Parameters Matching User representation World representation Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history None Full text Title and snippet Web Result set – full text Result set – title and snippet

51 Parameters Matching User representation World representation Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history None Full text Title and snippet Web Result set – full text Result set – title and snippet

52 Query Expansion All words in document Query focused The American Cancer Society is dedicated to eliminating cancer as a major health problem by preventing cancer, saving lives, and diminishing suffering through...

53 Parameters Matching User representation World representation Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history None Full text Title and snippet Web Result set – full text Result set – title and snippet All words Query focused

54 Parameters Matching User representation World representation Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history None Full text Title and snippet Web Result set – full text Result set – title and snippet All words Query focused

55 Personalizing Web Search Motivation Algorithms Results Future Work

56 Best Parameter Settings Matching User representation World representation Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history None Full text Title and snippet Web Result set – full text Result set – title and snippet All words Query focused All SIS Recent SIS Web SIS All SIS Recent SIS Web SIS Query history None Full text All words Query focused World focused Result set – title and snippet Web Query focused All SIS Title and snippet Result set – title and snippet Query focused

57 Seesaw Improves Retrieval No user model Random Relevance Feedback Seesaw

58 Text Alone Not Enough

59 Incorporate Non-text Features

60 Summary Rich user model important for search personalization Seesaw improves text based retrieval Need other features to improve Web Lots of room for improvement future

61 Personalizing Web Search Motivation Algorithms Results Future Work  Further exploration  Making Seesaw practical  User interface issues

62 Further Exploration Explore larger parameter space Learn parameters  Based on individual  Based on query  Based on results Give user control?

63 Making Seesaw Practical Learn most about personalization by deploying a system Best algorithm reasonably efficient Merging server and client  Query expansion Get more relevant results in the set to be re-ranked  Design snippets for personalization

64 User Interface Issues Make personalization transparent Give user control over personalization  Slider between Web and personalized results  Allows for background computation Creates problem with re-finding  Results change as user model changes  Thesis research – Re:Search Engine

65 Thank you!

66 Search Engines are for the Masses Best common ranking  DCG(i) = {  Sort results by number marked highly relevant, then by relevant Measure distance with Kendall-Tau Web ranking more similar to common  Individual’s ranking distance: 0.469  Common ranking distance: 0.445 Gain(i),if i = 1 DCG(i–1) + Gain(i)/log(i),otherwise


Download ppt "Seesaw Personalized Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR."

Similar presentations


Ads by Google