Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards Inferring Searcher Intent Eugene Agichtein.

Similar presentations


Presentation on theme: "Towards Inferring Searcher Intent Eugene Agichtein."— Presentation transcript:

1 Towards Inferring Searcher Intent Eugene Agichtein

2 Intelligent Information Access Lab (IRLab) Eugene Agichtein, Emory University, IR Lab 2 Qi Guo (3 rd year Phd) Ablimit Aji (2 nd year PhD) Text and data mining Modeling information seeking behavior Web search and social media search Tools for medical informatics and public health In collaboration with: - Beth Buffalo (Neurology) - Charlie Clarke (Waterloo) - Ernie Garcia (Radiology) - Phil Wolff (Psychology) - Hongyuan Zha (GaTech) 1 st year graduate students: Julia Kiseleva, Dmitry Lagun, Qiaoling Liu, Wang Yu

3 Online Behavior and Interactions Eugene Agichtein, Emory University, IR Lab 3 Information sharing: blogs, forums, discussions Search logs: queries, clicks Client-side behavior: Gaze tracking, mouse movement, scrolling

4 Research Overview Eugene Agichtein, Emory University, IR Lab 4 4 Information sharing Health Informatics Cognitive Diagnostics Intelligent search Discover Models of Behavior (machine learning/data mining)

5 Main Application Areas Search: ranking, evaluation, advertising, search interfaces, medical search (clinicians, patients) Collaborative information sharing: searcher intent, success, expertise, content quality Health informatics: self reporting of drug side effects, co-morbidity, outreach/education Automatic cognitive diagnostics: stress, frustration, other impairments … Eugene Agichtein, Emory University, IR Lab 5

6 Talk Outline Overview of the Emory IR Lab  Intent-centric Web Search  Classifying intent of a query  Contextualized search intent detection 6 Eugene Agichtein, Emory University, IR Lab

7 Web Retrieval Architecture Example centralized parallel architecture Crawlers Web [from Baeza-Yates and Jones, WWW 2008 tutorial] Eugene Agichtein, Emory University, IR Lab

8 Information Retrieval Process (User view) Eugene Agichtein, Emory University, IR Lab 8 Source Selection Search Query Selection Ranked List Examination Documents Delivery Documents Query Formulation Resource query reformulation, vocabulary learning, relevance feedback source reselection

9 Some Key Challenges for Web Search Query interpretation (infer intent) Ranking (high dimensionality) Evaluation (system improvement) Result presentation (information visualization) Eugene Agichtein, Emory University, IR Lab 9

10 Intent is “Hidden State” Generating Actions First (naïve) generative model of user actions: – Given a state (e.g., “Unsatisfied” with results)  User generates actions such as query, click, browse Eugene Agichtein, Emory University, IR Lab 10 Intent “States” SatisfiedUnsatisfied

11 Problem Statement Given: Sequence of user actions and background knowledge, Predict user intent and future actions – Will define intent classes, actions next Example applications: – Predict document relevance (ranking, result presentation, summarization) – Predict next query (query suggestion, spelling correction) – Predict user satisfaction (market share) Eugene Agichtein, Emory University, IR Lab 11

12 Intent Classes (top level only) User intent taxonomy (Broder 2002)‏ – Informational – want to learn about something (~40% / 65%)‏ – Navigational – want to go to that page (~25% / 15%)‏ – Transactional – want to do something (web-mediated) (~35% / 20%)‏ Access a serviceDownloads Shop – Gray areas Find a good hub Exploratory search “see what’s there” History nonya food Singapore Airlines Jakarta weather Kalimantan satellite images Nikon Finepix Car rental Kuala Lumpur [from SIGIR 2008 Tutorial, Baeza-Yates and Jones] Eugene Agichtein, Emory University, IR Lab

13 Search Actions Keystrokes – query, scroll, CTRL-C, …) GUI: – scrolling, button press, clicks Mouse: – moving, scrolling, down/up, scroll Browser: – new tab, close, back/forward Eugene Agichtein, Emory University, IR Lab 13 All of these can be easily captured on SERP (javascript)

14 Problem 1: Detect Query Intent Query Intent Detection in multiple dimensions:  Commercial, if assumed purpose of query is to make an immediate or future purchase  Navigational, if assumed purpose of query is to locate a specific Website, informational otherwise Clickthrough Calculation - Estimating the average ad clickthrough rate for each query type 14 Eugene Agichtein, Emory University, IR Lab [Ashkan et al., ECIR 2009]

15 Dataset Construction Microsoft adCenter Search query log  ~100M search impressions  ~8M ad clicks associated with the impressions Seed: 1700 queries labeled by three researchers – Examine query, search result page (SRP) MTurk: 3000 new queries + 1000 Seed queries – 40 batches of 100 queries, each with 25 Seed, 75 MT – If agreement 75%  bonus Results after resolution: – 42% Commercial; 55% Navigational 15 Eugene Agichtein, Emory University, IR Lab [Ashkan et al., ECIR 2009]

16 Amazon Mechanical Turk Service Eugene Agichtein, Emory University, IR Lab 16

17 17 Use Support Vector Machine (SVM) Classifier Support vectors Maximize margin SVMs maximize the margin around the separating hyperplane. A.k.a. large margin classifiers The decision function is fully specified by a subset of training samples, the support vectors. Quadratic programming problem Seen by many as most successful current text classification method Eugene Agichtein, Emory University, IR Lab

18 Features for Classification 18 CategoryFeatureDescription Query Specific Query length Number of characters in the query string Query segments Number of words in the query string URL-element Whether the query string contains any URL element, such as.com,.org Organic domain Total number of domains listed among the organic results of which the query string is a substring Content SERP Frequency of keywords extracted from the first search result page Click- through Host # Number of different target ad hosts clicked as results of the query Click per host Total number of ad clicks recorded for the query divided by Host # Top host significance Number of times a click happens on the most frequent target host as a result of the query, divided by click per host Decrease level for top two hosts Number of times a click happens on the most frequent target host divided by the number of times the second most frequent target host receives a click Average substring # Number of target hosts of which the query is a substring divided by total number of different hosts clicked for the query Substring ratio Total number of clicks on target hosts of which the query is a substring divided by total number of ad clicks for the query Deliberation time The average time between entering a query and an ad click for that query Eugene Agichtein, Emory University, IR Lab [Ashkan et al., ECIR 2009]

19 Intent Classification: Results 19 Eugene Agichtein, Emory University, IR Lab [Ashkan et al., ECIR 2009]

20 Clickthrough for Varying Intent Eugene Agichtein, Emory University, IR Lab 20 [Ashkan et al., ECIR 2009]

21 Talk Outline Overview of the Emory IR Lab Intent-centric Web Search Classifying intent of a query  Contextualized search intent detection 21 Eugene Agichtein, Emory University, IR Lab

22 How Do We Know “True” User Intent? Ask the user (surveys, field studies, pop-ups) – Does not scale, users get annoyed Observe user actions and guess – Intent usually obvious to humans but not always Detect signals from user’s brain (fMRI, EEEG) and attempt to interpret neuron activity Eugene Agichtein, Emory University, IR Lab 22 Adapted from [Daniel M. Russell, 2007]

23 “Eyes are a Window to the Soul” Eye tracking gives information about search interests: – Eye position – Pupil diameter – Seekads and fixations Eugene Agichtein, Emory University, IR Lab 23 Reading Visual Search Camera

24 “An Eye Tracker on Every Table” And “nuclear reactor in every back yard”… Unlikely. Eye tracking equipment is bulky and expensive Can we infer gaze position from observable actions? Exploratory study from Google (Rodden et al.) says maybe: mouse position is sometimes related to eye position Eugene Agichtein, Emory University, IR Lab 24

25 Relationship Between Mouse and Gaze Position Searchers might use the mouse to focus reading attention, bookmark promising results, or not at all. Behavior varies with task difficulty and user expertise 25 [K. Rodden, X. Fu, A. Aula, and I. Spiro, Eye-mouse coordination patterns on web search results pages, Extended Abstracts of ACM CHI 2008] Eugene Agichtein, Emory University, IR Lab

26 Assume “Transitivity” Holds Given: – Gaze position ==> user intent and Mouse movement ==> gaze position  Mouse movement ==> user intent  Restate problem:  Given user actions, infer current user’s intent, focusing on Individual User’s actions Eugene Agichtein, Emory University, IR Lab 26

27 From Query Type to Search Intent “obama”  navigational  informational Other examples: – Query bookmarks (refinding): ~40% of queries (J. Teevan et al., SIGIR 2007) – Research vs. Immediate Purchase Incorrect to classify the query into a single intent  Classify user goals for each query instance Eugene Agichtein, Emory University, IR Lab 27

28 Dataset Creation: EMU Eugene Agichtein, Emory University, IR Lab 28 Firefox + LibX plugin Track whitelisted sites e.g., Emory, Google, Yahoo search… All SERP events logged (asynchronous http requests) 150 public use machines, ~5,000 opted-in users HTTP Log HTTP Server Usage Data Data Mining & Management Train Prediction Models

29 EMU: Querying Behavior Data Eugene Agichtein, Emory Univesity, IR Lab 29

30 Playback Example Eugene Agichtein, Emory University, IR Lab 30

31 Problem 1: Search Intent Classification Infer “personalized” intent {NAV, INFO, TRANSACT} for each search instance using EMU instrumentation – “obama” instance 1  NAV, but instance 2  INFO Focus: – Contribution of client/GUI events (Mouse movements) Eugene Agichtein, Emory University, IR Lab 31

32 Navigational query: “facebook” Eugene Agichtein, Emory University, IR Lab 32

33 Informational query: “spanish wine” Eugene Agichtein, Emory University, IR Lab 33

34 Transactional query: “integrator” Eugene Agichtein, Emory University, IR Lab 34

35 Mouse Features: Simple Eugene Agichtein, Emory University, IR Lab 35 First representation: – Trajectory length – Horizontal range – Vertical range Horizontal range Vertical range Trajectory length

36 Mouse Features: Full Eugene Agichtein, Emory University, IR Lab 36 Second representation: – 5 segments: initial, early, middle, late, and end – Each segment: speed, acceleration, rotation, slope, etc. 1 2 3 4 5

37 Learning to recover single search intent Eugene Agichtein, Emory University, IR Lab 37 Represent full client-side interactions with each SERP page as feature vectors Apply standard machine learning classification methods

38 Experimental Setup Dataset: – Gathered from mid-January 2008 until mid-March 2008 from the public-used machines in Emory University libraries. – Consist of ~1500 initial query instances/search sessions – Randomly sample 300 initial query instances Behavioral pattern for follow-up queries might be different Eugene Agichtein, Emory University, IR Lab 38

39 Creating “Truth” Labels Use our best guess based on clues: – Query terms – Next URL (eg. clicked result) – How user behaves before click/exit Eugene Agichtein, Emory University, IR Lab 39

40 Intent Statistics in Labeled Sample Eugene Agichtein, Emory University, IR Lab 40

41 Results: Classifying Search Intent Eugene Agichtein, Emory University, IR Lab 41 CSIP > CF >> CS > S

42 Results II: {Info/Transact} vs. Nav Eugene Agichtein, Emory University, IR Lab 42 All improved. Still, CSIP > CF >> CS > S

43 Salient features (by Info Gain) Eugene Agichtein, Emory University, IR Lab 43

44 Case Studies Summary CSIP can help identify: – Relatively rare navigational queries (re-finding queries or queries for obscure websites) – Informational queries that resemble navigational queries (coincides with a name of a website) Eugene Agichtein, Emory University, IR Lab 44

45 Outline Overview of research at the Emory IR Lab Dimensions of (commercial) search intent Classifying intent of a query Contextualized search intent detection Eugene Agichtein, Emory University, IR Lab 45

46 Informational vs. Transactional: Research vs. Purchase Intent 10 Users (grad students and staff) asked to – 1. Search for a best deal on an item they want to purchase immediately (Purchase intent) – 2. Research a product they want to purchase eventually (Research intent) Eye tracking and browser instrumentation performed in parallel EyeTech sysstems TM3 (integrated) – At reasonable resolution, samples reliably at ~12-15 Hz Eugene Agichtein, Emory University, IR Lab 46

47 Research Intent Eugene Agichtein, Emory University, IR Lab 47

48 Purchase Intent Eugene Agichtein, Emory University, IR Lab 48

49 Relationship between behavior and intent? Search intent is contextualized within a search session Implication 1: model session-level state Implication 2: improve detection based on client- side interactions Eugene Agichtein, Emory University, IR Lab 49

50 Contextualized Intent Inference SERP text Mouse trajectory, hovering/dynamics Scrolling Clicks Eugene Agichtein, Emory University, IR Lab 50

51 Model: Linear Chain CRF Eugene Agichtein, Emory University, IR Lab 51

52 52 HMM MEMM CRF S t-1 StSt OtOt S t+1 O t+1 O t-1 S t-1 StSt OtOt S t+1 O t+1 O t-1 S t-1 StSt OtOt S t+1 O t+1 O t-1... Conditional Random Fields (CRFs) [from Lafferty, McCallum, Pereira 2001] From HMMs to MEMMs to CRFs Eugene Agichtein, Emory University, IR Lab

53 Problem 2: Search Ad Receptiveness Hypothesis: the right time to serve any search ads: when searcher is receptive to seeing ads Receptiveness ≈ some search intent – Commercial? (navigational or informational) – Non-commercial? – “Background” interest Eugene Agichtein, Emory University, IR Lab 53

54 Predict Future Ad Clicks Within Session Eugene Agichtein, Emory University, IR Lab 54

55 Dataset: 440 Emory College Students Eugene Agichtein, Emory University, IR Lab 55

56 Results: Ad Click Prediction 200%+ precision improvement (within mission) Eugene Agichtein, Emory University, IR Lab 56

57 Varying Model Structure Eugene Agichtein, Emory University, IR Lab 57

58 Feature Analysis Eugene Agichtein, Emory University, IR Lab 58

59 Error Analysis: Mouse Noise Eugene Agichtein, Emory University, IR Lab 59

60 Within-mission intent change/frustration/digression Eugene Agichtein, Emory University, IR Lab 60

61 Current and Future Work Unsupervised intent clustering User vs. task Personalized behavior models Long-term interests/effects User mental state (frustration, satisfaction, …) Eugene Agichtein, Emory University, IR Lab 61

62 Challenges Separate context from intent (e.g., smart phones) User variability: individual differences, tasks Scale of data: representation, compression Privacy: client-side data similar to other PII – Can be abused and must be protected Obtaining realistic user data: see above – EMU toolbar tracking since 2007 in Emory Libraries (biased) Eugene Agichtein, Emory University, IR Lab 62

63 Other Application Areas Search: ranking, evaluation, advertising, search interfaces, medical search (clinicians, patients) Collaborative information sharing: searcher intent, success, expertise, content quality Health informatics: self reporting of drug side effects, co-morbidity, outreach/education Automatic cognitive diagnostics: stress, frustration, other impairments…. Eugene Agichtein, Emory University, IR Lab 63

64 Summary: From Behavior to State of Mind Approach: – Machine learning methods for detecting searcher intent – Calibrated and augmented with lab studies Foundational contributions: – Methods to mine and integrate wide range of interactions – Data-driven discovery of user state-of-mind Impact: – Intelligent, intuitive search and information sharing Eugene Agichtein, Emory University, IR Lab 64

65 Main References Classifying and Characterizing Query Intent, Azin Ashkan, Charles L. A. Clarke, Eugene Agichtein, Qi Guo, In ECIR 2009. Qi Guo and Eugene Agichtein, Exploring Client-Side Instrumentation for Personalized Search Intent Inference: Preliminary Experiments, Proc. of AAAI 2008 Workshop on Intelligent Techniques for Web Personalization (ITWP 2008) Qi Guo, Eugene Agichtein, Azin Ashkan and Charles L. A. Clarke: In the Mood to Click? Inferring Searcher Advertising Receptiveness, in Proc. of WI 2009 Other papers here: http://www.mathcs.emory.edu/~eugene/publications.html Eugene Agichtein, Emory University, IR Lab 65

66 Thank you! Yandex (for hosting my visit) Eugene Agichtein, Emory University, IR Lab 66 Supported by:


Download ppt "Towards Inferring Searcher Intent Eugene Agichtein."

Similar presentations


Ads by Google