Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.

Slides:

Advertisements

Similar presentations

Context-Sensitive Query Auto-Completion AUTHORS:NAAMA KRAUS AND ZIV BAR-YOSSEF DATE OF PUBLICATION:NOVEMBER 2010 SPEAKER:RISHU GUPTA 1.

Advertisements

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.

On Enhancing the User Experience in Web Search Engines Franco Maria Nardini.

Chapter 5: Introduction to Information Retrieval

Personalization and Search Jaime Teevan Microsoft Research.

1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.

Information Retrieval in Practice

Search Engines and Information Retrieval

Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan Susan T.Dumains Eric Horvitz MIT,CSAILMicrosoft Researcher Microsoft.

PROBLEM BEING ATTEMPTED Privacy -Enhancing Personalized Web Search Based on:  User's Existing Private Data Browsing History s Recent Documents 

Seesaw Personalized Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.

Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.

Modern Information Retrieval Chapter 2 Modeling. Probabilistic model the appearance or absent of an index term in a document is interpreted either as.

1 Query Language Baeza-Yates and Navarro Modern Information Retrieval, 1999 Chapter 4.

Sigir’99 Inside Internet Search Engines: Search Jan Pedersen and William Chang.

Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.

University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.

1 Web Information Retrieval Web Science Course. 2.

Chapter 5: Information Retrieval and Web Search

Overview of Search Engines

Improving web image search results using query-relative classifiers Josip Krapacy Moray Allanyy Jakob Verbeeky Fr´ed´eric Jurieyy.

CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.

Search Engines and Information Retrieval Chapter 1.

 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.

Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.

Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.

Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.

Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.

A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.

Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.

Understanding Query Ambiguity Jaime Teevan, Susan Dumais, Dan Liebling Microsoft Research.

Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.

Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.

Personalized Search Xiao Liu

INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.

Chapter 6: Information Retrieval and Web Search

Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

IR System Evaluation Farhad Oroumchian. IR System Evaluation System-centered strategy –Given documents, queries, and relevance judgments –Try several.

IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.

Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006.

Discovering and Using Groups to Improve Personalized Search Jaime Teevan, Merrie Morris, Steve Bush Microsoft Research.

 Examine two basic sources for implicit relevance feedback on the segment level for search personalization. Eye tracking Display time.

WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.

Personalizing Search Jaime Teevan, MIT Susan T. Dumais, MSR and Eric Horvitz, MSR.

Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip Radlinski, Microsoft In Proceedings of WSDM

Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.

Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.

Chapter 23: Probabilistic Language Models April 13, 2004.

CiteSight: Contextual Citation Recommendation with Differential Search Avishay Livne 1, Vivek Gokuladas 2, Jaime Teevan 3, Susan Dumais 3, Eytan Adar 1.

Implicit User Modeling for Personalized Search Xuehua Shen, Bin Tan, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

COLLABORATIVE SEARCH TECHNIQUES Submitted By: Shikha Singla MIT-872-2K11 M.Tech(2 nd Sem) Information Technology.

Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.

Post-Ranking query suggestion by diversifying search Chao Wang.

Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.

Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.

DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.

Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.

To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent Presented by Jaime Teevan, Susan T. Dumais, Daniel J. Liebling Microsoft.

IR Homework #2 By J. H. Wang Apr. 13, Programming Exercise #2: Query Processing and Searching Goal: to search for relevant documents Input: a query.

1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.

Seesaw Personalized Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.

Potential for Personalization Transactions on Computer-Human Interaction, 17(1), March 2010 Data Mining for Understanding User Needs Jaime Teevan, Susan.

University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G

SEARCH AND CONTEXT Susan Dumais, Microsoft Research INFO 320.

Information Retrieval in Practice

Information Storage and Retrieval Fall Lecture 1: Introduction and History.

Information Retrieval and Web Design

Introduction to Search Engines

Presentation transcript:

Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR

Demo

Personalizing Web Search Motivation Algorithms Results Future Work

Personalizing Web Search Motivation Algorithms Results Future Work

Study of Personal Relevancy 15 SIS users x ~10 queries Evaluate 50 results  Highly relevant / Relevant / Irrelevant Query selection  Previously issued query  Chose from 10 pre-selected queries Collected evaluations for 137 queries  53 of pre-selected queries (2-9/query)

Relevant Results Have Low Rank Highly Relevant Relevant Irrelevant

Same Query, Different Intent Different meanings  “Information about the astronomical/astrological sign of cancer”  “information about cancer treatments” Different intents  “is there any new tests for cancer?”  “information about cancer treatments”

Same Intent, Different Evaluation Query: Microsoft  “information about microsoft, the company”  “Things related to the Microsoft corporation”  “Information on Microsoft Corp” 31/50 rated as not irrelevant  Only 6/31 do more than one agree  All three agree only for

More to Understand Do people cluster?  Even if they can’t state their intention How are the differences reflected?  Can they be seen from the information on a person’s computer? Can we do better than the ranking that would make everyone the most happy?  Best common ranking: +38%  Best personalized ranking: +55%

Personalizing Web Search Motivation Algorithms Results Future Work

Personalization Algorithms Standard IR Related to relevance feedback Query expansion Document Query User Server Client v. Result re-ranking

Result Re-Ranking Takes full advantage of SIS Ensures privacy Good evaluation framework Look at light weight user models  Collected on server side  Sent as query expansion

BM25 N nini NniNni w i = log riri R with Relevance Feedback Score = Σ tf i * w i

BM25 with Relevance Feedback N nini (r i +0.5)(N-n i -R+r i +0.5) (n i -r i +0.5)(R-r i +0.5) riri R w i = log Score = Σ tf i * w i

(r i +0.5)(N-n i -R+r i +0.5) (n i - r i +0.5)(R-r i +0.5) User Model as Relevance Feedback N nini R riri Score = Σ tf i * w i (r i +0.5)(N’-n i ’-R+r i +0.5) (n i ’- r i +0.5)(R-r i +0.5) w i = log N’ = N+R n i ’ = n i +ri

User Model as Relevance Feedback N nini R riri World User Score = Σ tf i * w i

User Model as Relevance Feedback R riri User N nini World World related to query N nini Score = Σ tf i * w i

User Model as Relevance Feedback N nini R riri World User World related to query User related to query R N nini riri Query Focused Matching Score = Σ tf i * w i

User Model as Relevance Feedback N nini R riri World User Web related to query User related to query R N riri Query Focused Matching nini World Focused Matching Score = Σ tf i * w i

Parameters Matching User representation World representation Query expansion

Parameters Matching User representation World representation Query expansion Query focused World focused

Parameters Matching User representation World representation Query expansion Query focused World focused

User Representation Stuff I’ve Seen (SIS) index Recently indexed documents Web documents in SIS index Query history Relevance judgments None

Parameters Matching User representation World representation Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history Relevance feedback None

Parameters Matching User representation World representation Query expansion Query Focused World Focused All SIS Recent SIS Web SIS Query History Relevance Feedback None

World Representation Document Representation  Full text  Title and snippet Corpus Representation  Web  Result set – title and snippet  Result set – full text

Parameters Matching User representation World representation Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history Relevance feedback None Full text Title and snippet Web Result set – full text Result set – title and snippet

Parameters Matching User representation World representation Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history Relevance feedback None Full text Title and snippet Web Result set – full text Result set – title and snippet

Query Expansion All words in document Query focused The American Cancer Society is dedicated to eliminating cancer as a major health problem by preventing cancer, saving lives, and diminishing suffering through...

Parameters Matching User representation World representation Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history Relevance feedback None Full text Title and snippet Web Result set – full text Result set – title and snippet All words Query focused

Parameters Matching User representation World representation Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history Relevance feedback None Full text Title and snippet Web Result set – full text Result set – title and snippet All words Query focused

Parameters Matching User representation World representation Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history Relevance feedback None Full text Title and snippet Web Result set – full text Result set – title and snippet All words Query focused

Personalizing Web Search Motivation Algorithms Results Future Work

Baselines Best possible Random Text based ranking Web ranking URL Boost +1

Best Parameter Settings Richer user representation better  SIS > Recent > Web > Query History > None Suggests rich client important Efficiency hacks don’t hurt  Snippets query focused  Length normalization not an issue Query focus good

Text Alone Not Enough Better than some baselines  Better than random  Better than no user representation  Better than relevance feedback Worse than Web results Blend in other features  Web ranking  URL boost

Good, but Lots of Room to Grow Best combination: 9.1% improvement Best possible: 51.5% improvement Assumes best Web combination selected Only improves results 2/3 of the time

Personalizing Web Search Motivation Algorithms Results Future Work

Finding the Best Parameter Setting Almost always some parameter setting that improves results Use learning to select parameters  Based on individual  Based on query  Based on results Give user control?

Further Exploration of Algorithms Larger parameter space to explore  More complex user model subsets  Different parsing (e.g., phrases)  Tune BM25 parameters What is really helping?  Generic user model or personal model  Use different indices for the queries Deploy system

Practical Issues Efficiency issues  Can interfaces mitigate some of the issues? Merging server and client  Query expansion Get more relevant results in the set to be re-ranked  Design snippets for personalization

Thank you!