Seesaw Personalized Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.

Slides:

Advertisements

Similar presentations

Context-Sensitive Query Auto-Completion AUTHORS:NAAMA KRAUS AND ZIV BAR-YOSSEF DATE OF PUBLICATION:NOVEMBER 2010 SPEAKER:RISHU GUPTA 1.

Advertisements

Chapter 5: Introduction to Information Retrieval

Personalization and Search Jaime Teevan Microsoft Research.

Evaluating Search Engine

Search Engines and Information Retrieval

Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan Susan T.Dumains Eric Horvitz MIT,CSAILMicrosoft Researcher Microsoft.

Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.

Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.

Information Retrieval in Practice

Computer comunication B Information retrieval. Information retrieval: introduction 1 This topic addresses the question on how it is possible to find relevant.

Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.

An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,

Stuff I’ve Seen: A System for Personal Information Retrieval and Re-use by Seher Acer Elif Demirli Susan Dumais, Edward Cutrell, JJ Cadiz, Gavin Jancke,

Important Task in Patents Retrieval Recall is an Important Factor Given Query Patent -> the Task is to Search all Related Patents Patents have Complex.

Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.

Chapter 5: Information Retrieval and Web Search

Overview of Search Engines

Cohort Modeling for Enhanced Personalized Search Jinyun YanWei ChuRyen White Rutgers University Microsoft BingMicrosoft Research.

Finding and Re-Finding Through Personalization Jaime Teevan MIT, CSAIL David Karger (advisor), Mark Ackerman, Sue Dumais, Rob Miller (committee), Eytan.

CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.

Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.

Search Engines and Information Retrieval Chapter 1.

PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.

Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.

Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.

Tag Data and Personalized Information Retrieval 1.

Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.

Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.

Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.

Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.

Personalized Search Xiao Liu

Chapter 6: Information Retrieval and Web Search

UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.

University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.

Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006.

Discovering and Using Groups to Improve Personalized Search Jaime Teevan, Merrie Morris, Steve Bush Microsoft Research.

 Examine two basic sources for implicit relevance feedback on the segment level for search personalization. Eye tracking Display time.

WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.

Personalizing Search Jaime Teevan, MIT Susan T. Dumais, MSR and Eric Horvitz, MSR.

Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip Radlinski, Microsoft In Proceedings of WSDM

Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.

Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.

21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,

CiteSight: Contextual Citation Recommendation with Differential Search Avishay Livne 1, Vivek Gokuladas 2, Jaime Teevan 3, Susan Dumais 3, Eytan Adar 1.

Implicit User Modeling for Personalized Search Xuehua Shen, Bin Tan, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.

COLLABORATIVE SEARCH TECHNIQUES Submitted By: Shikha Singla MIT-872-2K11 M.Tech(2 nd Sem) Information Technology.

Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.

Post-Ranking query suggestion by diversifying search Chao Wang.

Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.

Sudhanshu Khemka.  Treats each document as a vector with one component corresponding to each term in the dictionary  Weight of a component is calculated.

DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.

Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.

Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.

To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent Presented by Jaime Teevan, Susan T. Dumais, Daniel J. Liebling Microsoft.

UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.

1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.

General Architecture of Retrieval Systems 1Adrienn Skrop.

Seesaw Personalized Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.

Potential for Personalization Transactions on Computer-Human Interaction, 17(1), March 2010 Data Mining for Understanding User Needs Jaime Teevan, Susan.

SEARCH AND CONTEXT Susan Dumais, Microsoft Research INFO 320.

Why indexing? For efficient searching of a document

Author: Kazunari Sugiyama, etc. (WWW2004)

Presentation transcript:

Seesaw Personalized Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR

Query expansion Personalization Algorithms Standard IR Document Query User Server Client

Query expansion Personalization Algorithms Standard IR Document Query User Server Client v. Result re-ranking

Result Re-Ranking Ensures privacy Good evaluation framework Can look at rich user profile Look at light weight user models  Collected on server side  Sent as query expansion

Seesaw Search EngineSeesaw dog 1 cat10 india 2 mit 4 search93 amherst12 vegas 1

Seesaw Search Engine query dog 1 cat10 india 2 mit 4 search93 amherst12 vegas 1

Seesaw Search Engine query dog 1 cat10 india 2 mit 4 search93 amherst12 vegas 1 dog cat monkey banana food baby infant child boy girl forest hiking walking gorp baby infant child boy girl csail mit artificial research robot web search retrieval ir hunt

Seesaw Search Engine query dog 1 cat10 india 2 mit 4 search93 amherst12 vegas Search results page web search retrieval ir hunt 1.3

Calculating a Document’s Score Based on standard tf.idf web search retrieval ir hunt 1.3

Calculating a Document’s Score Based on standard tf.idf (r i +0.5)(N-n i -R+r i +0.5) (n i -r i +0.5)(R-r i +0.5) w i = log User as relevance feedback  Stuff I’ve Seen index  More is better

Finding the Score Efficiently Corpus representation (N, n i )  Web statistics  Result set Document representation  Download document  Use result set snippet Efficiency hacks generally OK!

Evaluating Personalized Search 15 evaluators Evaluate 50 results for a query  Highly relevant  Relevant  Irrelevant Measure algorithm quality  DCG(i) = { Gain(i), DCG (i–1) + Gain(i)/log(i), if i = 1 otherwise

Evaluating Personalized Search Query selection  Chose from 10 pre-selected queries  Previously issued query cancer Microsoft traffic … bison frise Red Sox airlines … Las Vegas rice McDonalds … Pre-selected 53 pre-selected (2-9/query) Total: 137 Joe Mary

Seesaw Improves Text Retrieval Random Relevance Feedback Seesaw

Text Features Not Enough

Take Advantage of Web Ranking

Further Exploration Explore larger parameter space Learn parameters  Based on individual  Based on query  Based on results Give user control?

Making Seesaw Practical Learn most about personalization by deploying a system Best algorithm reasonably efficient Merging server and client  Query expansion Get more relevant results in the set to be re-ranked  Design snippets for personalization

User Interface Issues Make personalization transparent Give user control over personalization  Slider between Web and personalized results  Allows for background computation Creates problem with re-finding  Results change as user model changes  Thesis research – Re:Search Engine

Thank you!

END

Personalizing Web Search Motivation Algorithms Results Future Work

Personalizing Web Search Motivation Algorithms Results Future Work

Study of Personal Relevancy 15 participants  Microsoft employees  Managers, support staff, programmers, … Evaluate 50 results for a query  Highly relevant  Relevant  Irrelevant ~10 queries per person

Study of Personal Relevancy Query selection  Chose from 10 pre-selected queries  Previously issued query cancer Microsoft traffic … bison frise Red Sox airlines … Las Vegas rice McDonalds … Pre-selected 53 pre-selected (2-9/query) Total: 137 Joe Mary

Relevant Results Have Low Rank Highly Relevant Relevant Irrelevant

Relevant Results Have Low Rank Highly Relevant Relevant Irrelevant Rater 1 Rater 2

Same Results Rated Differently Average inter-rater reliability: 56% Different from previous research  Belkin: 94% IRR in TREC  Eastman: 85% IRR on the Web Asked for personal relevance judgments Some queries more correlated than others

Same Query, Different Intent Different meanings  “Information about the astronomical/astrological sign of cancer”  “information about cancer treatments” Different intents  “is there any new tests for cancer?”  “information about cancer treatments”

Same Intent, Different Evaluation Query: Microsoft  “information about microsoft, the company”  “Things related to the Microsoft corporation”  “Information on Microsoft Corp” 31/50 rated as not irrelevant  Only 6/31 do more than one agree  All three agree only for  Inter-rater reliability: 56%

Search Engines are for the Masses JoeMary

Much Room for Improvement Group ranking  Best improves on Web by 38%  More people  Less improvement

Much Room for Improvement Group ranking  Best improves on Web by 38%  More people  Less improvement Personal ranking  Best improves on Web by 55%  Remains constant

- Seesaw Search Engine- See- Seesaw Personalizing Web Search Motivation Algorithms Results Future Work

BM25 N nini NniNni w i = log riri R with Relevance Feedback Score = Σ tf i * w i

N nini (r i +0.5)(N-n i -R+r i +0.5) (n i -r i +0.5)(R-r i +0.5) riri R w i = log Score = Σ tf i * w i BM25with Relevance Feedback

(r i +0.5)(N-n i -R+r i +0.5) (n i - r i +0.5)(R-r i +0.5) User Model as Relevance Feedback N nini R riri Score = Σ tf i * w i (r i +0.5)(N’-n i ’-R+r i +0.5) (n i ’- r i +0.5)(R-r i +0.5) w i = log N’ = N+R n i ’ = n i +ri

User Model as Relevance Feedback N nini R riri World User Score = Σ tf i * w i

User Model as Relevance Feedback R riri User N nini World World related to query N nini Score = Σ tf i * w i

User Model as Relevance Feedback N nini R riri World User World related to query User related to query R N nini riri Query Focused Matching Score = Σ tf i * w i

User Model as Relevance Feedback N nini R riri World User Web related to query User related to query R N riri Query Focused Matching nini World Focused Matching Score = Σ tf i * w i

Parameters Matching User representation World representation Query expansion

Parameters Matching User representation World representation Query expansion Query focused World focused

Parameters Matching User representation World representation Query expansion Query focused World focused

User Representation Stuff I’ve Seen (SIS) index  MSR research project [Dumais, et al.]  Index of everything a user’s seen Recently indexed documents Web documents in SIS index Query history None

Parameters Matching User representation World representation Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history None

Parameters Matching User representation World representation Query expansion Query Focused World Focused All SIS Recent SIS Web SIS Query History None

World Representation Document Representation  Full text  Title and snippet Corpus Representation  Web  Result set – title and snippet  Result set – full text

Parameters Matching User representation World representation Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history None Full text Title and snippet Web Result set – full text Result set – title and snippet

Parameters Matching User representation World representation Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history None Full text Title and snippet Web Result set – full text Result set – title and snippet

Query Expansion All words in document Query focused The American Cancer Society is dedicated to eliminating cancer as a major health problem by preventing cancer, saving lives, and diminishing suffering through...

Parameters Matching User representation World representation Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history None Full text Title and snippet Web Result set – full text Result set – title and snippet All words Query focused

Parameters Matching User representation World representation Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history None Full text Title and snippet Web Result set – full text Result set – title and snippet All words Query focused

Personalizing Web Search Motivation Algorithms Results Future Work

Best Parameter Settings Matching User representation World representation Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history None Full text Title and snippet Web Result set – full text Result set – title and snippet All words Query focused All SIS Recent SIS Web SIS All SIS Recent SIS Web SIS Query history None Full text All words Query focused World focused Result set – title and snippet Web Query focused All SIS Title and snippet Result set – title and snippet Query focused

Seesaw Improves Retrieval No user model Random Relevance Feedback Seesaw

Text Alone Not Enough

Incorporate Non-text Features

Summary Rich user model important for search personalization Seesaw improves text based retrieval Need other features to improve Web Lots of room for improvement future

Personalizing Web Search Motivation Algorithms Results Future Work  Further exploration  Making Seesaw practical  User interface issues

Further Exploration Explore larger parameter space Learn parameters  Based on individual  Based on query  Based on results Give user control?

Making Seesaw Practical Learn most about personalization by deploying a system Best algorithm reasonably efficient Merging server and client  Query expansion Get more relevant results in the set to be re-ranked  Design snippets for personalization

User Interface Issues Make personalization transparent Give user control over personalization  Slider between Web and personalized results  Allows for background computation Creates problem with re-finding  Results change as user model changes  Thesis research – Re:Search Engine

Thank you!

Search Engines are for the Masses Best common ranking  DCG(i) = {  Sort results by number marked highly relevant, then by relevant Measure distance with Kendall-Tau Web ranking more similar to common  Individual’s ranking distance:  Common ranking distance: Gain(i),if i = 1 DCG(i–1) + Gain(i)/log(i),otherwise