Download presentation
Presentation is loading. Please wait.
Published byBuck Booker Modified over 8 years ago
1
Towards Inferring Searcher Intent Eugene Agichtein
2
Intelligent Information Access Lab (IRLab) Eugene Agichtein, Emory University, IR Lab 2 Qi Guo (3 rd year Phd) Ablimit Aji (2 nd year PhD) Text and data mining Modeling information seeking behavior Web search and social media search Tools for medical informatics and public health In collaboration with: - Beth Buffalo (Neurology) - Charlie Clarke (Waterloo) - Ernie Garcia (Radiology) - Phil Wolff (Psychology) - Hongyuan Zha (GaTech) 1 st year graduate students: Julia Kiseleva, Dmitry Lagun, Qiaoling Liu, Wang Yu
3
Online Behavior and Interactions Eugene Agichtein, Emory University, IR Lab 3 Information sharing: blogs, forums, discussions Search logs: queries, clicks Client-side behavior: Gaze tracking, mouse movement, scrolling
4
Research Overview Eugene Agichtein, Emory University, IR Lab 4 4 Information sharing Health Informatics Cognitive Diagnostics Intelligent search Discover Models of Behavior (machine learning/data mining)
5
Main Application Areas Search: ranking, evaluation, advertising, search interfaces, medical search (clinicians, patients) Collaborative information sharing: searcher intent, success, expertise, content quality Health informatics: self reporting of drug side effects, co-morbidity, outreach/education Automatic cognitive diagnostics: stress, frustration, other impairments … Eugene Agichtein, Emory University, IR Lab 5
6
Talk Outline Overview of the Emory IR Lab Intent-centric Web Search Classifying intent of a query Contextualized search intent detection 6 Eugene Agichtein, Emory University, IR Lab
7
Web Retrieval Architecture Example centralized parallel architecture Crawlers Web [from Baeza-Yates and Jones, WWW 2008 tutorial] Eugene Agichtein, Emory University, IR Lab
8
Information Retrieval Process (User view) Eugene Agichtein, Emory University, IR Lab 8 Source Selection Search Query Selection Ranked List Examination Documents Delivery Documents Query Formulation Resource query reformulation, vocabulary learning, relevance feedback source reselection
9
Some Key Challenges for Web Search Query interpretation (infer intent) Ranking (high dimensionality) Evaluation (system improvement) Result presentation (information visualization) Eugene Agichtein, Emory University, IR Lab 9
10
Intent is “Hidden State” Generating Actions First (naïve) generative model of user actions: – Given a state (e.g., “Unsatisfied” with results) User generates actions such as query, click, browse Eugene Agichtein, Emory University, IR Lab 10 Intent “States” SatisfiedUnsatisfied
11
Problem Statement Given: Sequence of user actions and background knowledge, Predict user intent and future actions – Will define intent classes, actions next Example applications: – Predict document relevance (ranking, result presentation, summarization) – Predict next query (query suggestion, spelling correction) – Predict user satisfaction (market share) Eugene Agichtein, Emory University, IR Lab 11
12
Intent Classes (top level only) User intent taxonomy (Broder 2002) – Informational – want to learn about something (~40% / 65%) – Navigational – want to go to that page (~25% / 15%) – Transactional – want to do something (web-mediated) (~35% / 20%) Access a serviceDownloads Shop – Gray areas Find a good hub Exploratory search “see what’s there” History nonya food Singapore Airlines Jakarta weather Kalimantan satellite images Nikon Finepix Car rental Kuala Lumpur [from SIGIR 2008 Tutorial, Baeza-Yates and Jones] Eugene Agichtein, Emory University, IR Lab
13
Search Actions Keystrokes – query, scroll, CTRL-C, …) GUI: – scrolling, button press, clicks Mouse: – moving, scrolling, down/up, scroll Browser: – new tab, close, back/forward Eugene Agichtein, Emory University, IR Lab 13 All of these can be easily captured on SERP (javascript)
14
Problem 1: Detect Query Intent Query Intent Detection in multiple dimensions: Commercial, if assumed purpose of query is to make an immediate or future purchase Navigational, if assumed purpose of query is to locate a specific Website, informational otherwise Clickthrough Calculation - Estimating the average ad clickthrough rate for each query type 14 Eugene Agichtein, Emory University, IR Lab [Ashkan et al., ECIR 2009]
15
Dataset Construction Microsoft adCenter Search query log ~100M search impressions ~8M ad clicks associated with the impressions Seed: 1700 queries labeled by three researchers – Examine query, search result page (SRP) MTurk: 3000 new queries + 1000 Seed queries – 40 batches of 100 queries, each with 25 Seed, 75 MT – If agreement 75% bonus Results after resolution: – 42% Commercial; 55% Navigational 15 Eugene Agichtein, Emory University, IR Lab [Ashkan et al., ECIR 2009]
16
Amazon Mechanical Turk Service Eugene Agichtein, Emory University, IR Lab 16
17
17 Use Support Vector Machine (SVM) Classifier Support vectors Maximize margin SVMs maximize the margin around the separating hyperplane. A.k.a. large margin classifiers The decision function is fully specified by a subset of training samples, the support vectors. Quadratic programming problem Seen by many as most successful current text classification method Eugene Agichtein, Emory University, IR Lab
18
Features for Classification 18 CategoryFeatureDescription Query Specific Query length Number of characters in the query string Query segments Number of words in the query string URL-element Whether the query string contains any URL element, such as.com,.org Organic domain Total number of domains listed among the organic results of which the query string is a substring Content SERP Frequency of keywords extracted from the first search result page Click- through Host # Number of different target ad hosts clicked as results of the query Click per host Total number of ad clicks recorded for the query divided by Host # Top host significance Number of times a click happens on the most frequent target host as a result of the query, divided by click per host Decrease level for top two hosts Number of times a click happens on the most frequent target host divided by the number of times the second most frequent target host receives a click Average substring # Number of target hosts of which the query is a substring divided by total number of different hosts clicked for the query Substring ratio Total number of clicks on target hosts of which the query is a substring divided by total number of ad clicks for the query Deliberation time The average time between entering a query and an ad click for that query Eugene Agichtein, Emory University, IR Lab [Ashkan et al., ECIR 2009]
19
Intent Classification: Results 19 Eugene Agichtein, Emory University, IR Lab [Ashkan et al., ECIR 2009]
20
Clickthrough for Varying Intent Eugene Agichtein, Emory University, IR Lab 20 [Ashkan et al., ECIR 2009]
21
Talk Outline Overview of the Emory IR Lab Intent-centric Web Search Classifying intent of a query Contextualized search intent detection 21 Eugene Agichtein, Emory University, IR Lab
22
How Do We Know “True” User Intent? Ask the user (surveys, field studies, pop-ups) – Does not scale, users get annoyed Observe user actions and guess – Intent usually obvious to humans but not always Detect signals from user’s brain (fMRI, EEEG) and attempt to interpret neuron activity Eugene Agichtein, Emory University, IR Lab 22 Adapted from [Daniel M. Russell, 2007]
23
“Eyes are a Window to the Soul” Eye tracking gives information about search interests: – Eye position – Pupil diameter – Seekads and fixations Eugene Agichtein, Emory University, IR Lab 23 Reading Visual Search Camera
24
“An Eye Tracker on Every Table” And “nuclear reactor in every back yard”… Unlikely. Eye tracking equipment is bulky and expensive Can we infer gaze position from observable actions? Exploratory study from Google (Rodden et al.) says maybe: mouse position is sometimes related to eye position Eugene Agichtein, Emory University, IR Lab 24
25
Relationship Between Mouse and Gaze Position Searchers might use the mouse to focus reading attention, bookmark promising results, or not at all. Behavior varies with task difficulty and user expertise 25 [K. Rodden, X. Fu, A. Aula, and I. Spiro, Eye-mouse coordination patterns on web search results pages, Extended Abstracts of ACM CHI 2008] Eugene Agichtein, Emory University, IR Lab
26
Assume “Transitivity” Holds Given: – Gaze position ==> user intent and Mouse movement ==> gaze position Mouse movement ==> user intent Restate problem: Given user actions, infer current user’s intent, focusing on Individual User’s actions Eugene Agichtein, Emory University, IR Lab 26
27
From Query Type to Search Intent “obama” navigational informational Other examples: – Query bookmarks (refinding): ~40% of queries (J. Teevan et al., SIGIR 2007) – Research vs. Immediate Purchase Incorrect to classify the query into a single intent Classify user goals for each query instance Eugene Agichtein, Emory University, IR Lab 27
28
Dataset Creation: EMU Eugene Agichtein, Emory University, IR Lab 28 Firefox + LibX plugin Track whitelisted sites e.g., Emory, Google, Yahoo search… All SERP events logged (asynchronous http requests) 150 public use machines, ~5,000 opted-in users HTTP Log HTTP Server Usage Data Data Mining & Management Train Prediction Models
29
EMU: Querying Behavior Data Eugene Agichtein, Emory Univesity, IR Lab 29
30
Playback Example Eugene Agichtein, Emory University, IR Lab 30
31
Problem 1: Search Intent Classification Infer “personalized” intent {NAV, INFO, TRANSACT} for each search instance using EMU instrumentation – “obama” instance 1 NAV, but instance 2 INFO Focus: – Contribution of client/GUI events (Mouse movements) Eugene Agichtein, Emory University, IR Lab 31
32
Navigational query: “facebook” Eugene Agichtein, Emory University, IR Lab 32
33
Informational query: “spanish wine” Eugene Agichtein, Emory University, IR Lab 33
34
Transactional query: “integrator” Eugene Agichtein, Emory University, IR Lab 34
35
Mouse Features: Simple Eugene Agichtein, Emory University, IR Lab 35 First representation: – Trajectory length – Horizontal range – Vertical range Horizontal range Vertical range Trajectory length
36
Mouse Features: Full Eugene Agichtein, Emory University, IR Lab 36 Second representation: – 5 segments: initial, early, middle, late, and end – Each segment: speed, acceleration, rotation, slope, etc. 1 2 3 4 5
37
Learning to recover single search intent Eugene Agichtein, Emory University, IR Lab 37 Represent full client-side interactions with each SERP page as feature vectors Apply standard machine learning classification methods
38
Experimental Setup Dataset: – Gathered from mid-January 2008 until mid-March 2008 from the public-used machines in Emory University libraries. – Consist of ~1500 initial query instances/search sessions – Randomly sample 300 initial query instances Behavioral pattern for follow-up queries might be different Eugene Agichtein, Emory University, IR Lab 38
39
Creating “Truth” Labels Use our best guess based on clues: – Query terms – Next URL (eg. clicked result) – How user behaves before click/exit Eugene Agichtein, Emory University, IR Lab 39
40
Intent Statistics in Labeled Sample Eugene Agichtein, Emory University, IR Lab 40
41
Results: Classifying Search Intent Eugene Agichtein, Emory University, IR Lab 41 CSIP > CF >> CS > S
42
Results II: {Info/Transact} vs. Nav Eugene Agichtein, Emory University, IR Lab 42 All improved. Still, CSIP > CF >> CS > S
43
Salient features (by Info Gain) Eugene Agichtein, Emory University, IR Lab 43
44
Case Studies Summary CSIP can help identify: – Relatively rare navigational queries (re-finding queries or queries for obscure websites) – Informational queries that resemble navigational queries (coincides with a name of a website) Eugene Agichtein, Emory University, IR Lab 44
45
Outline Overview of research at the Emory IR Lab Dimensions of (commercial) search intent Classifying intent of a query Contextualized search intent detection Eugene Agichtein, Emory University, IR Lab 45
46
Informational vs. Transactional: Research vs. Purchase Intent 10 Users (grad students and staff) asked to – 1. Search for a best deal on an item they want to purchase immediately (Purchase intent) – 2. Research a product they want to purchase eventually (Research intent) Eye tracking and browser instrumentation performed in parallel EyeTech sysstems TM3 (integrated) – At reasonable resolution, samples reliably at ~12-15 Hz Eugene Agichtein, Emory University, IR Lab 46
47
Research Intent Eugene Agichtein, Emory University, IR Lab 47
48
Purchase Intent Eugene Agichtein, Emory University, IR Lab 48
49
Relationship between behavior and intent? Search intent is contextualized within a search session Implication 1: model session-level state Implication 2: improve detection based on client- side interactions Eugene Agichtein, Emory University, IR Lab 49
50
Contextualized Intent Inference SERP text Mouse trajectory, hovering/dynamics Scrolling Clicks Eugene Agichtein, Emory University, IR Lab 50
51
Model: Linear Chain CRF Eugene Agichtein, Emory University, IR Lab 51
52
52 HMM MEMM CRF S t-1 StSt OtOt S t+1 O t+1 O t-1 S t-1 StSt OtOt S t+1 O t+1 O t-1 S t-1 StSt OtOt S t+1 O t+1 O t-1... Conditional Random Fields (CRFs) [from Lafferty, McCallum, Pereira 2001] From HMMs to MEMMs to CRFs Eugene Agichtein, Emory University, IR Lab
53
Problem 2: Search Ad Receptiveness Hypothesis: the right time to serve any search ads: when searcher is receptive to seeing ads Receptiveness ≈ some search intent – Commercial? (navigational or informational) – Non-commercial? – “Background” interest Eugene Agichtein, Emory University, IR Lab 53
54
Predict Future Ad Clicks Within Session Eugene Agichtein, Emory University, IR Lab 54
55
Dataset: 440 Emory College Students Eugene Agichtein, Emory University, IR Lab 55
56
Results: Ad Click Prediction 200%+ precision improvement (within mission) Eugene Agichtein, Emory University, IR Lab 56
57
Varying Model Structure Eugene Agichtein, Emory University, IR Lab 57
58
Feature Analysis Eugene Agichtein, Emory University, IR Lab 58
59
Error Analysis: Mouse Noise Eugene Agichtein, Emory University, IR Lab 59
60
Within-mission intent change/frustration/digression Eugene Agichtein, Emory University, IR Lab 60
61
Current and Future Work Unsupervised intent clustering User vs. task Personalized behavior models Long-term interests/effects User mental state (frustration, satisfaction, …) Eugene Agichtein, Emory University, IR Lab 61
62
Challenges Separate context from intent (e.g., smart phones) User variability: individual differences, tasks Scale of data: representation, compression Privacy: client-side data similar to other PII – Can be abused and must be protected Obtaining realistic user data: see above – EMU toolbar tracking since 2007 in Emory Libraries (biased) Eugene Agichtein, Emory University, IR Lab 62
63
Other Application Areas Search: ranking, evaluation, advertising, search interfaces, medical search (clinicians, patients) Collaborative information sharing: searcher intent, success, expertise, content quality Health informatics: self reporting of drug side effects, co-morbidity, outreach/education Automatic cognitive diagnostics: stress, frustration, other impairments…. Eugene Agichtein, Emory University, IR Lab 63
64
Summary: From Behavior to State of Mind Approach: – Machine learning methods for detecting searcher intent – Calibrated and augmented with lab studies Foundational contributions: – Methods to mine and integrate wide range of interactions – Data-driven discovery of user state-of-mind Impact: – Intelligent, intuitive search and information sharing Eugene Agichtein, Emory University, IR Lab 64
65
Main References Classifying and Characterizing Query Intent, Azin Ashkan, Charles L. A. Clarke, Eugene Agichtein, Qi Guo, In ECIR 2009. Qi Guo and Eugene Agichtein, Exploring Client-Side Instrumentation for Personalized Search Intent Inference: Preliminary Experiments, Proc. of AAAI 2008 Workshop on Intelligent Techniques for Web Personalization (ITWP 2008) Qi Guo, Eugene Agichtein, Azin Ashkan and Charles L. A. Clarke: In the Mood to Click? Inferring Searcher Advertising Receptiveness, in Proc. of WI 2009 Other papers here: http://www.mathcs.emory.edu/~eugene/publications.html Eugene Agichtein, Emory University, IR Lab 65
66
Thank you! Yandex (for hosting my visit) Eugene Agichtein, Emory University, IR Lab 66 Supported by:
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.