Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

Slides:

Advertisements

Similar presentations

ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.

Advertisements

ACM CIKM 2008, Oct , Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

Chapter 5: Introduction to Information Retrieval

Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.

Information Retrieval Models: Probabilistic Models

Evaluating Search Engine

Search Engines and Information Retrieval

Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan Susan T.Dumains Eric Horvitz MIT,CSAILMicrosoft Researcher Microsoft.

Chapter 12: Web Usage Mining - An introduction

Seesaw Personalized Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.

Modern Information Retrieval

A Scalable Semantic Indexing Framework for Peer-to-Peer Information Retrieval University of Illinois at Urbana-Champain Zhichen XuYan Chen Northwestern.

Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.

21 21 Web Content Management Architectures Vagan Terziyan MIT Department, University of Jyvaskyla, AI Department, Kharkov National University of Radioelectronics.

Patent Search QUERY Log Analysis Shariq Bashir Department of Software Technology and Interactive Systems Vienna.

University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.

An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,

Language Modeling Frameworks for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University.

Basic IR Concepts & Techniques ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A.

The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.

Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.

Overview of Search Engines

CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.

Search Engines and Information Retrieval Chapter 1.

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Prepare Yourself for IR Research ChengXiang Zhai Department of Computer.

CONCLUSION & FUTURE WORK Normally, users perform triage tasks using multiple applications in concert: a search engine interface presents lists of potentially.

Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Frame an IR Research Problem and Form Hypotheses ChengXiang Zhai Department.

Personalized Search Xiao Liu

Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign.

UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

Search Engine Architecture

IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.

WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI

Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.

WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.

Implicit User Modeling for Personalized Search Xuehua Shen, Bin Tan, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

Visualization in Text Information Retrieval Ben Houston Exocortex Technologies Zack Jacobson CAC.

Supporting Knowledge Discovery: Next Generation of Search Engines Qiaozhu Mei 04/21/2005.

More Than Relevance: High Utility Query Recommendation By Mining Users' Search Behaviors Xiaofei Zhu, Jiafeng Guo, Xueqi Cheng, Yanyan Lan Institute of.

Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

Automatic Labeling of Multinomial Topic Models

Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.

Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.

Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.

Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.

UOS Personalized Search Zhang Tao 장도. Zhang Tao Data Mining Contents Overview 1 The Outride Approach 2 The outride Personalized Search System 3 Testing.

Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.

Contextual Text Cube Model and Aggregation Operator for Text OLAP

A Study of Poisson Query Generation Model for Information Retrieval

Context-Sensitive IR using Implicit Feedback Xuehua Shen, Bin Tan, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.

A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.

Evaluation Anisio Lacerda.

Information Retrieval (in Practice)

Search Engine Architecture

Implementation Issues & IR Systems

What is IR? In the 70’s and 80’s, much of the research focused on document retrieval In 90’s TREC reinforced the view that IR = document retrieval Document.

Information Retrieval Models: Probabilistic Models

Author: Kazunari Sugiyama, etc. (WWW2004)

John Lafferty, Chengxiang Zhai School of Computer Science

Web Mining Department of Computer Science and Engg.

Search Engine Architecture

Presentation transcript:

Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign U.S.A.

Motivation Information retrieval is inherently an interactive process –A user’s information need is unlikely fully satisfied with just one query execution –A user often needs to interact with the system several times through query reformulation and document-browsing –Thus in general, a query exists in a search session A search session provides lots of contextual information for a query that can be exploited (e.g., previous queries and clickthrough data) Such contextual information is mostly ignored in existing search engines We aim at developing a session-based search engine that can exploit such contextual information to improve retrieval

Traditional vs. Session-based Retrieval Retrieval System Traditional (“1-query”) Document Collection Query=“IR applications” Results: D1 (infrared) D2 (infrared) D3 (retrieval) D4 (infrared) D5 (retrieval) Retrieval System Session-based Query=“IR applications” Results: D3 (retrieval) D5 (retrieval) Previous query= “retrieval systems” … Frequency in viewed docs: Infrared: 0 Retrieval: 5 … Uses more contextual information Gives more accurate results “IR” can mean either “information retrieval” or “infrared”

Research Issues What is an appropriate architecture for supporting session-based retrieval? –How to manage session information? How can we detect session boundaries? What contextual information should we exploit? How can we exploit such contextual information to improve document ranking? How can we display search results in the context of a session?

A Client-Server Architecture for Session-based IR Docs query Search Engine Top-N Server Side … User Search context User model results Personalized Agent query Client Side Local Collection Session Manager

Advantages of Server-Side Processing Persistent user profiles (imagine if a user often uses different machines) Have access to global user information –Can exploit information about all users to identify common access patterns –Can exploit information about similar users to help improve performance for any individual user Have access to all the documents –Can perform more powerful statistical analysis (e.g., to identify most frequently accessed docs) –Can improve document representation over time

Advantages of a Client-Side Agent Can capture more information about the user thus more accurate user modeling –Can exploit the complete interaction history (e.g., easily capture click-through information) –Can exploit a user’s other activities (e.g., searching immediately after reading an ) –Can detect session boundary more accurately More scalable (“distributed personalization”) Alleviate the problem of privacy for personalization

Session Boundary Detection Detection is generally easier if done on the client side –More information about the user can be exploited –E.g., knowing that “logout” and “login” happened between two queries Sever side has access to query co-occurrence patterns, which can help judge query coherence Possible clues for session boundary detection –Time interval between queries –Query coherence (based on word relatedness and/or query log analysis) –Activities in between two queries

Useful Session Context Information Previous queries in the same session Documents viewed and not viewed so far in the current session Other user activities during the same time as the current session Context information collected in a similar session by the current user or other users … …

Session-based Retrieval Models Framework: The risk minimization retrieval framework [Lafferty & Zhai 01, Zhai 02] can be naturally extended to support session-based retrieval One possible model (KL-divergence model) –Retrieval = estimating a query model + estimating a doc model + computing their KL-divergence –Session context information (and any other potentially useful information) can be used to estimate a better (session-based) query model Refinement of this model leads to specific retrieval formulas

Session-based Result Presentation Retrieval results can be displayed in the context of the current session –Previous search results in the session can be exploited to show which document has been consistently moving up in ranking as the user is reformulating the query –All the queries in the session can be combined and analyzed to generate a subtopic space for the user’s information need, and documents can be organized and displayed in this space Session-based result presentation can –Help a user digest the search results more effectively and more efficiently –Help a user to quickly focus on the important concept/topic dimensions –Help a user to figure out how to better formulate a query

ACES: A Contextual Engine for Search Architecture: server-side session management Session-boundary detection: probabilistic measure of query similarity Session-based ranking: use the KL-div retrieval model and estimate a query model based on –Original query –Displayed title and summary of viewed documents in the same session –Previous queries in the same search session Session-based result display: show ranks of each doc w.r.t. all the previous queries

ACES System Architecture Query Clickthrough Data Web Browser Internet Search Result Document Text Query Clickthrough Data Web/Application Server Search Profile Engine Capture Text DB RDBMS User Profile

Details of the Ranking Algorithm Query model updating using past queries q 1, q 2,…, q k Further query model updating using the displayed title and summary of the viewed documents s 1, s 2,…, s k  is a decay factor to emphasize the most recent context  is a parameter to control the influence of the clickthrough data Currently all parameters are set in an ad hoc way

Demo: Exploiting Previous Queries in ACES TREC AP data + Topics judgments Allow us to compare traditional search and contextual search ACES is still far away from a full-fledged session-based search engine… Much further research needs to be done…

Architecture of Personalized System Docs query Search Engine Top-N Server Side … Search context User model results Personalized Agent query Client Side Profile Collection Session Manager

C U S θQθQ Model Selection θDθD q d Document generation Query generation

Query Clickthrough Data Web Browser Internet Search Result Document Text Query Clickthrough Data Web/Application Server Search Context Engine Capturer AP Text DB RDBMS User Profile