Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.

Slides:



Advertisements
Similar presentations
Information Retrieval and Organisation Chapter 11 Probabilistic Information Retrieval Dell Zhang Birkbeck, University of London.
Advertisements

Chapter 5: Introduction to Information Retrieval
Personalization and Search Jaime Teevan Microsoft Research.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Search Engines and Information Retrieval
Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan Susan T.Dumains Eric Horvitz MIT,CSAILMicrosoft Researcher Microsoft.
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Seesaw Personalized Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
Ch 4: Information Retrieval and Text Mining
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
HOW SEARCH ENGINE WORKS. Aasim Bashir.. What is a Search Engine? Search engine: It is a website dedicated to search other websites and there contents.
Search Engines and Information Retrieval Chapter 1.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
Concept Unification of Terms in Different Languages for IR Qing Li, Sung-Hyon Myaeng (1), Yun Jin (2),Bo-yeong Kang (3) (1) Information & Communications.
Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Tag Data and Personalized Information Retrieval 1.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Personalized Search Xiao Liu
Chapter 6: Information Retrieval and Web Search
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Parallel and Distributed Searching. Lecture Objectives Review Boolean Searching Indicate how Searches may be carried out in parallel Overview Distributed.
IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.
Personalizing Search Jaime Teevan, MIT Susan T. Dumais, MSR and Eric Horvitz, MSR.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip Radlinski, Microsoft In Proceedings of WSDM
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Post-Ranking query suggestion by diversifying search Chao Wang.
Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Learning in a Pairwise Term-Term Proximity Framework for Information Retrieval Ronan Cummins, Colm O’Riordan Digital Enterprise Research Institute SIGIR.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Information Retrieval and Web Search IR models: Vector Space Model Term Weighting Approaches Instructor: Rada Mihalcea.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.
Seesaw Personalized Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.
Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Search Engines and Search techniques
Martin Rajman, Martin Vesely
Chapter 25 - Automated Web Search (Search Engines)
SIS: A system for Personal Information Retrieval and Re-Use
Introduction to Information Retrieval
Information Retrieval and Web Design
Presentation transcript:

Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical and Computer Engineering

 Query ‘IR’ on Google Introduction Current Web Search Engines We are studying about ‘information retrieval’, so we need the pages about that! NO! And what is this? 2

 Query ‘IR’ on Google For the information-retrieval researcher the SIGIR homepage For the financial analyst stock quotes for Ingersoll-Rand For the chemist pages about infrared light Introduction We want the results like this 3

 Two methodologies for a Web search engine to incorporate information about a user 1. A user profile is communicated to the server 2. The results are downloaded and re-ranked Two methodologies(1/2) 4 User profile top ranked pages re-ranked

 Focusing the 2 nd method for several reasons 1. Ensuring privacy 2. Feasible to include computationally- intensive procedures 3. the re-ranking methods facilitate straightforward evaluation Two methodologies(2/2) 5

6 Traditional FB vs. Personal Profile FB # of documents in the corpus that contains the term i # of documents for which relevance feedback has been provided that contains the term i Relevance information (R, r i ) comes from the corpus Profiles are derived from a personal store

 A well known probabilistic weighting scheme Essentially sums over query terms the log odds of the query terms occurring in relevant and non-relevant documents Without relevance information relevance : With relevance information relevance : BM25 (Traditional FB) 7 tf i : the frequency with which that term appears in the document N : the number of document in the corpus n i : the number of documents in the corpus that contains the term i R: the number of documents for which relevance feedback has been provided r i : The number of documents for which relevance feedback has been provided that contain the term

 Using information outside of the Web corpus pulling the relevant document outside the document space Extending the notion of corpus Relevance Personal Profile FB 8 N’ = (N+R), n i ’=(n i +r i ) Substituting

 Estimating… N : the number of documents on the web Using the most frequent word in English, “the”, as the query The result n i : the number of document on the web that contain term i Probing the web by issuing on word queries 9 Representation Corpus(N, n i ) (1/2)

 Focusing the corpus presentation Corpus statistics can either be gathered from all of the documents on the Web or, only the subset of documents that are relevant to the query ( referred as a query focus ) An example, the query is “IR” a query-focused corpus consists only of documents that contain the term “IR” When the corpus representation is limited to a query focus, the user representation is correspondingly query focused 10 Representation Corpus(N, n i ) (2/2)

 A rich index of personal content that captured a user’s interests and computational activities could be obtained from desktop indices such as Google Desktop, Mac Tiger, Windows Desktop Search  In this paper, indexed all of the information created, copied or viewed by a user used Web pages, messages, calendar items, documents stored on the client machine  The most straightforward way to use this index Treating every document in it as a source of the user’s interests R : the number of documents in the index r i : the number of documents in the index that contain term i 11 Representation User (R, r i ) (1/2)

 Considering limiting documents Restricting the document type to the Web pages Limiting documents the most recent ones In this paper, considering documents indexed in the last month versus the full index of documents  Two lighter-weight representation Using the query terms that the user had issued in the past Boosting the search results with URLs from domains that the user had visited in past 12 Representation User (R, r i ) (2/2)

 Using the full text of documents in the results set Accessing the full text of each document takes considerable time  Also, experimented with using only the title and the snippet of the document returned by the Web search engine the snippet is inherently query focused  Query Expansion The inclusion of all of the terms occurring in the relevant documents a kind of blind or pseudo-relevance feedback in which the top-k documents are considered relevant 13 Representation Document(tf i ) and Query

 An evaluation collection 15 participants evaluate the top 50 Web search results for approximately 10 self-selected queries each For each search result, asked to determine highly relevant, relevant, or not relevant to the query Web search results from MSN Search 14 Evaluation Framework(1/4)

 Selecting the queries to be evaluated 1. users choose a query to mimic a search they had performed earlier that day 2. users select a query from a list formulated to be of general interest (e.g., “cancer”, “Bush”, “Web search”) A total of 131 queries 53 were pre-selected 78 were self-generated 15 Evaluation Framework(2/4)

 Each participant provided us with an index of the information on their personal computer in size from 10,000 to 100,000 items used to compute their personalized term weights  All participants were employees of Microsoft software engineers, researchers, program managers, and administrators 16 Evaluation Framework(3/4)

 Discounted Cumulative Gain(DCG)  [9]IR evaluation methods for retrieving highly relevant documents Cumulative Gain(GC) Example) G= CG = 17 Evaluation Framework(4/4) if i = 1 otherwise if i = 1 otherwise

18 Results Alternative Representations(1/2) Richer model Poorer model When using only documents related to the query to represent the corpus, the term weights represent how different the user is from the average person who submits the query

 a rich user profile is more important that a rich document representation  The best combination of 67 different combinations Corpus : Approximated by the result set title and snippets, which is inherently query focused User : Built from the user’s entire personal index, query focused Document and Query : Documents represented by the title and snippet returned by the search engine, with query expansion based on words that occur near the query term 19 Results Alternative Representations(2/2)

20 Results Baseline Comparisons

21 Thank you.