Presentation is loading. Please wait.

Presentation is loading. Please wait.

Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

Similar presentations


Presentation on theme: "Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign."— Presentation transcript:

1 Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign U.S.A.

2 Motivation Information retrieval is inherently an interactive process –A user’s information need is unlikely fully satisfied with just one query execution –A user often needs to interact with the system several times through query reformulation and document-browsing –Thus in general, a query exists in a search session A search session provides lots of contextual information for a query that can be exploited (e.g., previous queries and clickthrough data) Such contextual information is mostly ignored in existing search engines We aim at developing a session-based search engine that can exploit such contextual information to improve retrieval

3 Traditional vs. Session-based Retrieval Retrieval System Traditional (“1-query”) Document Collection Query=“IR applications” Results: D1 (infrared) D2 (infrared) D3 (retrieval) D4 (infrared) D5 (retrieval) Retrieval System Session-based Query=“IR applications” Results: D3 (retrieval) D5 (retrieval) Previous query= “retrieval systems” … Frequency in viewed docs: Infrared: 0 Retrieval: 5 … Uses more contextual information Gives more accurate results “IR” can mean either “information retrieval” or “infrared”

4 Research Issues What is an appropriate architecture for supporting session-based retrieval? –How to manage session information? How can we detect session boundaries? What contextual information should we exploit? How can we exploit such contextual information to improve document ranking? How can we display search results in the context of a session?

5 A Client-Server Architecture for Session-based IR Docs query Search Engine Top-N Server Side 1.--- 2.--- 3.--- … User Search context User model results Personalized Agent query Client Side Local Collection Session Manager

6 Advantages of Server-Side Processing Persistent user profiles (imagine if a user often uses different machines) Have access to global user information –Can exploit information about all users to identify common access patterns –Can exploit information about similar users to help improve performance for any individual user Have access to all the documents –Can perform more powerful statistical analysis (e.g., to identify most frequently accessed docs) –Can improve document representation over time

7 Advantages of a Client-Side Agent Can capture more information about the user thus more accurate user modeling –Can exploit the complete interaction history (e.g., easily capture click-through information) –Can exploit a user’s other activities (e.g., searching immediately after reading an email) –Can detect session boundary more accurately More scalable (“distributed personalization”) Alleviate the problem of privacy for personalization

8 Session Boundary Detection Detection is generally easier if done on the client side –More information about the user can be exploited –E.g., knowing that “logout” and “login” happened between two queries Sever side has access to query co-occurrence patterns, which can help judge query coherence Possible clues for session boundary detection –Time interval between queries –Query coherence (based on word relatedness and/or query log analysis) –Activities in between two queries

9 Useful Session Context Information Previous queries in the same session Documents viewed and not viewed so far in the current session Other user activities during the same time as the current session Context information collected in a similar session by the current user or other users … …

10 Session-based Retrieval Models Framework: The risk minimization retrieval framework [Lafferty & Zhai 01, Zhai 02] can be naturally extended to support session-based retrieval One possible model (KL-divergence model) –Retrieval = estimating a query model + estimating a doc model + computing their KL-divergence –Session context information (and any other potentially useful information) can be used to estimate a better (session-based) query model Refinement of this model leads to specific retrieval formulas

11 Session-based Result Presentation Retrieval results can be displayed in the context of the current session –Previous search results in the session can be exploited to show which document has been consistently moving up in ranking as the user is reformulating the query –All the queries in the session can be combined and analyzed to generate a subtopic space for the user’s information need, and documents can be organized and displayed in this space Session-based result presentation can –Help a user digest the search results more effectively and more efficiently –Help a user to quickly focus on the important concept/topic dimensions –Help a user to figure out how to better formulate a query

12 ACES: A Contextual Engine for Search Architecture: server-side session management Session-boundary detection: probabilistic measure of query similarity Session-based ranking: use the KL-div retrieval model and estimate a query model based on –Original query –Displayed title and summary of viewed documents in the same session –Previous queries in the same search session Session-based result display: show ranks of each doc w.r.t. all the previous queries

13 ACES System Architecture Query Clickthrough Data Web Browser Internet Search Result Document Text Query Clickthrough Data Web/Application Server Search Profile Engine Capture Text DB RDBMS User Profile

14 Details of the Ranking Algorithm Query model updating using past queries q 1, q 2,…, q k Further query model updating using the displayed title and summary of the viewed documents s 1, s 2,…, s k  is a decay factor to emphasize the most recent context  is a parameter to control the influence of the clickthrough data Currently all parameters are set in an ad hoc way

15 Demo: Exploiting Previous Queries in ACES TREC AP data + Topics 1- 150 + judgments Allow us to compare traditional search and contextual search ACES is still far away from a full-fledged session-based search engine… Much further research needs to be done…

16 Architecture of Personalized System Docs query Search Engine Top-N Server Side 1.--- 2.--- 3.--- … Search context User model results Personalized Agent query Client Side Profile Collection Session Manager

17 C U S θQθQ Model Selection θDθD q d Document generation Query generation

18 Query Clickthrough Data Web Browser Internet Search Result Document Text Query Clickthrough Data Web/Application Server Search Context Engine Capturer AP Text DB RDBMS User Profile


Download ppt "Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign."

Similar presentations


Ads by Google