1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.

Slides:



Advertisements
Similar presentations
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Advertisements

Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Chapter 5: Introduction to Information Retrieval
Introduction to Information Retrieval
Basic IR: Modeling Basic IR Task: Slightly more complex:
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Personalization and Search Jaime Teevan Microsoft Research.
Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Evaluating Search Engine
Information Retrieval in Practice
Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan Susan T.Dumains Eric Horvitz MIT,CSAILMicrosoft Researcher Microsoft.
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Seesaw Personalized Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Sigir’99 Inside Internet Search Engines: Search Jan Pedersen and William Chang.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Important Task in Patents Retrieval Recall is an Important Factor Given Query Patent -> the Task is to Search all Related Patents Patents have Complex.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Improving web image search results using query-relative classifiers Josip Krapacy Moray Allanyy Jakob Verbeeky Fr´ed´eric Jurieyy.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
1 Query Operations Relevance Feedback & Query Expansion.
Chapter 6: Information Retrieval and Web Search
Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Question Answering over Implicitly Structured Web Content
IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.
 Examine two basic sources for implicit relevance feedback on the segment level for search personalization. Eye tracking Display time.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Personalizing Search Jaime Teevan, MIT Susan T. Dumais, MSR and Eric Horvitz, MSR.
Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip Radlinski, Microsoft In Proceedings of WSDM
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.
CiteSight: Contextual Citation Recommendation with Differential Search Avishay Livne 1, Vivek Gokuladas 2, Jaime Teevan 3, Susan Dumais 3, Eytan Adar 1.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
COLLABORATIVE SEARCH TECHNIQUES Submitted By: Shikha Singla MIT-872-2K11 M.Tech(2 nd Sem) Information Technology.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Basics of Databases and Information Retrieval1 Databases and Information Retrieval Lecture 1 Basics of Databases and Information Retrieval Instructor Mr.
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Seesaw Personalized Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Search Engine Architecture
SIS: A system for Personal Information Retrieval and Re-Use
Chapter 5: Information Retrieval and Web Search
CS246: Leveraging User Feedback
Web Search Engines.
INF 141: Information Retrieval
Information Retrieval and Web Design
Information Retrieval and Web Design
Presentation transcript:

1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005

2 Problem In IR, a query may have different meanings for different people In IR, a query may have different meanings for different people IR = information retrieval IR = information retrieval IR = Iran IR = Iran People are too lazy to type in long, detailed queries People are too lazy to type in long, detailed queries

3 Approach Keep a user profile (automated) Keep a user profile (automated) Rank retrieved documents by the information in user profile Rank retrieved documents by the information in user profile ≠explicit relevance feedback, because no user interaction ≠explicit relevance feedback, because no user interaction ≠pseudo relevance feedback, because user profile is long-term ≠pseudo relevance feedback, because user profile is long-term

4 User Profile? Web pages that the user viewed Web pages that the user viewed messaged viewed or sent messaged viewed or sent Calendar items Calendar items Documents (text files) on the user ’ s computer Documents (text files) on the user ’ s computer Profile is kept on the user machine for obvious reasons … Profile is kept on the user machine for obvious reasons … Treat each piece of information as an ordinary text document Treat each piece of information as an ordinary text document Think of a user profile as a user document database Think of a user profile as a user document database

5 System Architecture Search EngineUser Computer (Contains User Profile) 1. User Query 2. Top-N Documents 3. Re-rank Documents

6 Document Relevance Score Ignore search engine ranking Ignore search engine ranking Modify 神的公式 BM25 Modify 神的公式 BM25 w i = weight of query term i w i = weight of query term i N = total number of documents in corpus N = total number of documents in corpus n i = number of documents containing term i n i = number of documents containing term i R = total number of relevant documents by feedback R = total number of relevant documents by feedback r i = number of relevant documents containing term i r i = number of relevant documents containing term i

7 Alternate Variable Definition (1) N ≠ total number of documents in corpus N ≠ total number of documents in corpus N = number of documents retrieved by search engine N = number of documents retrieved by search engine n i ≠ number of documents containing term i n i ≠ number of documents containing term i n i = number of retrieved documents containing term i n i = number of retrieved documents containing term i Otherwise, we have to perform one web search for each query term Otherwise, we have to perform one web search for each query term

8 Alternate Variable Definition (2) The value of n i changes depending on how much of the retrieved document is seen: The value of n i changes depending on how much of the retrieved document is seen: –full text –title and search engine summary

9 Pseudo-Relevant Documents (1) Need to compute the value of R and r i in Need to compute the value of R and r i in This is where the user profile comes into play This is where the user profile comes into play There are two ways of defining R There are two ways of defining R –R = total number of documents in the user profile; independent of query –R = number of documents in the user profile that matches (boolean?  ) the query r i depends on the definition R r i depends on the definition R

10 Pseudo-Relevant Documents (2) We do not have to use all documents in the user profile We do not have to use all documents in the user profile Use only a subset of documents Use only a subset of documents –past query strings –viewed web pages –recently viewed documents

11 Term Frequency So far, we have seen many definitions of w i So far, we have seen many definitions of w i But BM25 also requires the term frequency (tf i ) value But BM25 also requires the term frequency (tf i ) value The value of tf i depends on how much of a document we see The value of tf i depends on how much of a document we see –full text –title and search engine summary

12 Query Expansion Expand the query by all words in the relevant documents (pseudo-rf) Expand the query by all words in the relevant documents (pseudo-rf) Expand the query by words surrounding the query terms in the relevant documents (pseudo-rf) Expand the query by words surrounding the query terms in the relevant documents (pseudo-rf)

13 Experiment Setup (1) 15 participants (Microsoft employees) 15 participants (Microsoft employees) Each participant has 10 queries Each participant has 10 queries The participants may pick their queries from pre-selected queries and/or make up their own queries The participants may pick their queries from pre-selected queries and/or make up their own queries MSN search engine (MSN.com) is used MSN search engine (MSN.com) is used Participants judges the top 50 results for each of their queries Participants judges the top 50 results for each of their queries –highly relevant, relevant, or not relevant

14 Ignore queries with no retrieved document or relevant document Ignore queries with no retrieved document or relevant document 131 queries left 131 queries left –53 pre-selected, 78 self-generated Participants provide documents on their computers as user profiles Participants provide documents on their computers as user profiles Experiment Setup (2)

15 Evaluation Measure Discounted Cumulative Gain (DCG) Discounted Cumulative Gain (DCG) Higher-ranking documents affect the performance more than the lower-ranking documents Higher-ranking documents affect the performance more than the lower-ranking documents G(i) = 0 if document is irrelevant G(i) = 0 if document is irrelevant G(i) = 1 if document is relevant G(i) = 1 if document is relevant G(i) = 2 if document is highly relevant G(i) = 2 if document is highly relevant Normalized to values between 0 (worst) and 1 (best) Normalized to values between 0 (worst) and 1 (best)

16 Experiment Combinations (1) Corpus Representation (N and n i ) Corpus Representation (N and n i ) –Full Text – from retrieved documents –Web – one web search per query term –Snippet – title and summary from retrieved documents Query Focus Query Focus –No – use all documents in user profile as relevant documents –Yes – use documents which matches the query in user profile as relevant documents User Representation (User Profile) User Representation (User Profile) –No Model – user profile not used –Query – query strings –Web – viewed web pages –Recent – recently viewed documents –Full Index – everything in user profile

17 Experiment Combinations (2) Document Representation (tf i ) Document Representation (tf i ) –Snippet – title and summary –Full Text Query Expansion Query Expansion –Near Query – use words near query terms in relevant documents –All Words – use all words in relevant documents Vary only one variable at a time; hold other constant Vary only one variable at a time; hold other constant Average DCG score for all 131 queries Average DCG score for all 131 queries

18 Experiment Result ㊣ ㊣ ㊣ ㊣ ㊣ ㊣ = 0.46

19 Real World Stuff Rand = random No = pure BM25 RF = pseudo- relevance feedback PS = Personalized URL = URL history boost Web = search engine ranking Mix = PS + Web by probability of relevance by rank position 

20 終於 Slight improvement when merged with search engine result … Slight improvement when merged with search engine result …