Eugene Agichtein Mathematics & Computer Science Emory University

Eugene Agichtein Mathematics & Computer Science Emory University
Patterns in Web Search Eugene Agichtein Mathematics & Computer Science Emory University

Web Search Ranking Rank pages for a query using hundreds of features:
Content match, e.g., page terms, anchor text, term weights Prior document quality, e.g., web topology, spam features Evaluate accuracy, tune ranking functions on explicit relevance ratings Millions of users interact with the results

Query: SIGIR 2006 Users can help indicate most relevant results
Show results, clickthrough ##s

Outline Predicting search result preferences
Incorporating user behavior into ranking Behavior-based query segmentation Current research

User Interactions Goal: Harness rich user interactions with search results to improve quality of search Millions of users submit queries daily and interact with the search results Clicks, query refinement, dwell time User interactions with search engines are plentiful, but require careful interpretation Delete Task Task: predict general user preferences E.g., a user likely to prefer Page A > Page B

Interpreting User Interactions
Clickthrough and subsequent browsing behavior of individual users influenced by many factors Relevance of a result to a query Visual appearance and layout Result presentation order Context, history, etc. General idea: Aggregate interactions across all users and queries Compute “expected” behavior for any query/page Recover relevance signal for a given query “result to for query” “Other factors (e.g., presentation, order…)” bolding

Case Study: Clickthrough
Clickthrough frequency for all queries in sample More generally, the observed behavior can be modelled as mixture of relevance and other. In the curve (o(q…)) oberserd value for a given feature consists of cummulative and rel + cf Clickthrough (query q, document d, result position p) = expected (p) + relevance (q , d)

Clickthrough for Queries with Known Position of Top Relevant Result
In the key, perhaps just say “Top Relevant Result at Position 1” instead of PTR=1? You'll need to go through this slowly, so that people grok it. Might also mention Joachims experiments where he reversed the order of the top 10 results (for a small set of queries in a controlled setting). Relative clickthrough for queries top relevant result known to be at position 1

Clickthrough for Queries with Known Position of Top Relevant Result
Higher clickthrough at top non-relevant than at top relevant document You'll need to go through this slowly, so that people grok it. Might also mention Joachims experiments where he reversed the order of the top 10 results (for a small set of queries in a controlled setting). Relative clickthrough for queries with known relevant results in position 1 and 3 respectively

Deviation from Expected
Relevance component: deviation from “expected”: Relevance(q , d)= observed - expected (p) Why is the title bouncing between “Case Study: Clickthrough” and “Case Study: Signal in Noisy Clicks”?

Beyond Clickthrough: Rich User Interaction Space
Observed and Distributional features Observed features: aggregated values over all user interactions for each query and result pair Distributional features: deviations from the “expected” behavior for the query Represent user interactions as vectors in “Behavior Space” Presentation: what a user sees before a click Clickthrough: frequency and timing of clicks Browsing: what users do after the click “observed values:” bolding

Some User Interaction Features
Presentation ResultPosition Position of the URL in Current ranking QueryTitleOverlap Fraction of query terms in result Title Clickthrough DeliberationTime Seconds between query and first click ClickFrequency Fraction of all clicks landing on page ClickDeviation Deviation from expected click frequency Browsing DwellTime Result page dwell time DwellTimeDeviation Deviation from expected dwell time for query Can this slide be moved much earlier, when User Interaction Features are first mentioned? Or later, when you compare them? You might need to be ready to flip back later, anyway… Browsing subheading, couple of example for each type Raw and distribution for click and browsing

Predicting Result Preferences
Task: predict pairwise preferences A user will prefer Result A > Result B Models for preference prediction Current search engine ranking Clickthrough Full user behavior model “A user” -> “User” “Different preference prediction models” (“Several”?)

Clickthrough Model SA+N: “Skip Above” and “Skip Next” Example
Adapted from Joachims’ et al. [SIGIR’05] Motivated by gaze tracking Example Click on results 2, 4 Skip Above: 4 > (1, 3), 2>1 Skip Next: 4 > 5, 2>3 1 2 3 4 5 6 7 8

Distributional Model CD: distributional model, extends SA+N
Clickthrough considered iff frequency > ε than expected Click on result 2 likely “by chance” 4>(1,2,3,5), but not 2>(1,3) 1 2 3 4 5 This is somewhat confusing if done quickly. Do you need to use such a stark distribution? 6 7 8

User Behavior Model Full set of interaction features
Presentation, clickthrough, browsing Train the model with explicit judgments Input: behavior feature vectors for each query-page pair in rated results Use RankNet (Burges et al., [ICML 2005]) to discover model weights Output: a neural net that can assign a “relevance” score to a behavior feature vector Train system on pairs (where first point is to be ranked higher or equal to second). Use cross entropy cost  probabilistic model. Use gradient descent to train weights in neural net.

RankNet for User Behavior
RankNet: general, scalable, robust Neural Net training algorithms and implementation Optimized for ranking – predicting an ordering of items, not scores for each Trains on pairs (where first point is to be ranked higher or equal to second) Extremely efficient Uses cross entropy cost (probabilistic model) Uses gradient descent to set weights Restarts to escape local minima definition of “ranking”, as important here? Or maybe just mention that it is not just basic regression? Or maybe it was fine. “->” -> “()”

RankNet [Burges et al. 2005] For query results 1 and 2, present pair of vectors and labels, label(1) > label(2) Feature Vector1 Label1 NN output 1

RankNet [Burges et al. 2005] For query results 1 and 2, present pair of vectors and labels, label(1) > label(2) Feature Vector2 Label2 NN output 1 NN output 2

RankNet [Burges et al. 2005] For query results 1 and 2, present pair of vectors and labels, label(1) > label(2) Error is function of both outputs (Desire output1 > output2) NN output 1 NN output 2

RankNet [Burges et al. 2005] Update feature weights:
Cost function: f(o1-o2) – details in Burges et al. paper Modified back-prop Error is function of both outputs (Desire output1 > output2) NN output 1 NN output 2

Predicting with RankNet
Present individual vector and get score Feature Vector1 NN output

Evaluation Metrics Task: predict user preferences
Metric: pairwise agreement Precision for a query: Fraction of pairs predicted that agree with preferences derived from human ratings Recall for a query: Fraction of human-rated preferences predicted correctly Average Precision and Recall across all queries

Datasets Explicit judgments User behavior data
3,500 queries, top 10 results, relevance ratings converted to pairwise preferences for each query User behavior data Opt-in MSN Toolbar instrumentation Anonymized UserID, time, visited page Detect queries submitted to MSN Search engine Subsequent visited pages 120,000 instances of these 3,500 queries submitted at least 2 times over 21 days

Methods Compared Preferences inferred by:
Current search engine ranking: Baseline Result i > Result j iff i > j Clickthrough model: SA+N Clickthrough distributional model: CD Full user behavior model: UserBehavior Move preferences part above?

Results: Predicting User Preferences
full user behavior better than other methods that we and others have tried; browsing features the most important. Baseline < SA+N < CD << UserBehavior Rich user behavior features result in dramatic improvement

Contribution of Feature Types
Perhaps say “Full User Behavior Model:” in title? Presentation features not helpful Browsing features: higher precision, lower recall Clickthrough features > CD: richer model + learning

Amount of Interaction Data
Prediction accuracy for varying amount of user interactions per query Slight increase in Recall, substantial increase in Precision

Learning Curve Minimum precision of 0.7
Recall increases substantially with more days of user interactions

Experiments Summary: Preferences
Clickthrough distributional model: more accurate than previously published work Rich user behavior features: dramatic accuracy improvement Accuracy increases for frequent queries and longer observation period

Outline Predicting result preferences
Incorporating behavior into ranking Behavior-based query segmentation Current research directions

Web Search Ranking Rank pages relevant for a query
Content match e.g., page terms, anchor text, term weights Prior document quality e.g., web topology, spam features Hundreds of parameters Tune ranking functions on explicit document relevance ratings

Web Search Ranking: Revisited
Incorporate user behavior information Millions of users submit queries daily Rich user interaction features Complementary to content and web topology Some challenges: User behavior “in the wild” is not reliable How to integrate interactions into ranking What is the impact over all queries Behavior varies with information need!

User Behavior Models for Ranking
Use interactions from previous instances of query General-purpose (not personalized) Only for the queries with past user interactions Models: Rerank, clickthrough only: reorder results by number of clicks Rerank, predicted preferences (all user behavior features): reorder results by predicted preferences Integrate directly into ranker: incorporate user interactions as features for the ranker

Rerank, Clickthrough Only
Promote all clicked results to the top of the result list Re-order by click frequency Retain relative ranking of un-clicked results

Rerank, Preference Predictions
Re-order results by function of preference prediction score Experimented with different variants Using inverse of ranks Intuition: scores not comparable  merge ranks

Enhance Ranker Features with User Behavior Features
For a given query Merge original feature set with user behavior features when available User behavior features computed from previous interactions with same query Train RankNet [Burges et al., ICML’05] on the enhanced feature set

Feature Merging: Details
Query: SIGIR, fake results w/ fake feature values Result URL BM25 PageRank … Clicks DwellTime sigir2007.org 2.4 0.5 ? Sigir2006.org 1.4 1.1 150 145.2 acm.org/sigs/sigir/ 1.2 2 60 23.5 Value scaling: Binning vs. log-linear vs. linear (e.g., μ=0, σ=1) Missing Values: 0? (meaning for normalized feats s.t. μ=0?) Runtime: significant plumbing problems

Evaluation Metrics Precision at K: fraction of relevant in top K
NDCG at K: norm. discounted cumulative gain Top-ranked results most important MAP: mean average precision Average precision for each query: mean of the precision at K values computed after each relevant document was retrieved

Datasets 8 weeks of user behavior data 2006) from anonymized opt-in client instrumentation Millions of unique queries and interaction traces Random sample of 3,000 queries Gathered independently of user behavior 1,500 train, 500 validation, 1,000 test Explicit relevance assessments for top 10 results for each query in sample

Methods Compared Content only: BM25F Full Search Engine: RN
Hundreds of parameters for content match and document quality Tuned with RankNet Incorporating User Behavior Clickthrough: Rerank-CT Full user behavior model predictions: Rerank-All Integrate all user behavior features directly: +All

Content, User Behavior: Precision at K, queries with interactions
BM25 < Rerank-CT < Rerank-All < +All

Content, User Behavior: NDCG
BM25 < Rerank-CT < Rerank-All < +All

Full Search Engine, User Behavior: NDCG, MAP
MAP Gain RN 0.270 RN+ALL 0.321 0.052 (19.13%) BM25 0.236 BM25+ALL 0.292 0.056 (23.71%)

User Behavior Complements Content and Web Topology
BM25 (keyword-based ranking) + user behavior is better than full model with hundreds of features – keyword, web structure, et al. Method Gain RN (Content + Links) 0.632 RN + All (User Behavior) 0.693 0.061(10%) BM25 0.525 BM25+All 0.687 0.162 (31%)

Impact: All Queries, Precision at K
< 50% of test queries w/ prior interactions improves over all test queries

Impact: All Queries, NDCG
NDCG over all test queries

Which Queries Benefit Most
Most gains are for queries with poor ranking

Result Summary Incorporating user behavior into web search ranking dramatically improves relevance Providing rich user interaction features to ranker is the most effective strategy Large improvement shown for up to 50% of test queries

Promising Extensions Backoff (improve query coverage)
Model user intent/information need Personalization of various degrees Query segmentation

Identifying “Best Bet” Results by Mining Past User Behavior

How can we get the perfect top result for navigational queries?
7,000 unique queries. 1,2 Million searches. 10 Million user interactions. How can we get the perfect top result for navigational queries?

Not Quite a Ranking Problem
The “best bet” problem: Select the most appropriate result to display in the top position, if user behavior clearly indicates a preference for this result over all other results for the query. “Navigational” behavior associated with some queries (e.g., google, hotmail) Train a classifier (e.g., decision tree) on examples of “excellents” and “perfects” Classify <query, result> pairs based on tree

Training A Classifier Featureset: 30+ features
Dataset: 7,000 queries w/ rated results and >1 click Label: “Perfect”, “Excellent”  1 Otherwise  0 Method: train WinMine classifier [M. Chickering]

Results Method Precision Recall Prec Gain (%) RankNet 0.239 -
RankNet+UserBehavior 0.331 38.5% BehaviorClassifier 0.753 0.299 216% DomainAlgorithms 0.758 0.185 218% BehaviorClassifier exhibits significantly higher precision than RankNet and RankNet+UserBehavior BehaviorClassifier exhibits comparable precision and higher recall than domain algorithms over similar features

Example Rule

Potential Applications
Click spam detection Search abuse detection Personalization Domain-specific ranking Website optimization

Current Work Understanding searcher and author behavior in online sources Text Mining and Information Extraction for the Life Sciences Inferring social networks: beyond the blogosphere

Understanding searcher and author behavior in online sources
Searcher behavior: Infer models for human inference, decision making, learning within (and across) query sessions First pass: adapt collaborative filtering techniques to understand how behavior changes with browsed pages. Adapt information extraction and content presentation accordingly Author behavior: Beyond statistical language models: information content, update, information flow Implications for ranking, information extraction, question answering

Text Mining and Information Extraction for the Life Sciences
Improving automated diagnosis based on text in patient record (with School of Med) Add context for expert system rules Flag possible complications Public health: epidemics early detection, monitoring (with School Public Health) Identify complaints/notes in patient records that tend to co-occur with a syndrome Infer RL social information for more accurate epidemic modelling

Inferring social networks and information flow
Extend “blogosphere” diffusion work Entities, facts, events (with GaTech) Ditto for non-blog data, non-text data Question-Answer portals (Y! Answers) Infer author quality, “experts”

Summary Predicting User Preferences
Incorporating User Behavior into Ranking Behavior-based query segmentation Next: author and searcher understanding

Primary References http://www.mathcs.emory.edu/~eugene/
Improving Web Search Ranking by Incorporating User Behavior, E. Agichtein, E. Brill, and S. Dumais, in SIGIR 2006 Learning User Interaction Models for Predicting Web Search Result Preferences, E. Agichtein, E. Brill, S. Dumais, and R. Ragno, in SIGIR 2006 Identifying ”best bet” web search results by mining past user behavior, E. Agichtein and Z. Zheng, in KDD 2006 Web Information Extraction and User Modeling: Towards Closing the Gap, E. Agichtein, IEEE Data Engineering Bulletin, Dec. 2006 This and other work on Information Extraction and Text Mining:

Presentation Features
Features in SIGIR/KDD papers: Query terms in Title, Summary, URL Position of result Length of URL Depth of URL

Clickthrough Features
Fraction of clicks on URL Deviation from “expected” given result position Time to click Time to first click in “session” Deviation from average time for query

Browsing Features Dwell time Cumulative time on URL (CuriousBrowser)
Deviation from average time on URL Averaged over the “user” Averaged over all results for the query Number of subsequent non-result URLs

Eugene Agichtein Mathematics & Computer Science Emory University

Similar presentations

Presentation on theme: "Eugene Agichtein Mathematics & Computer Science Emory University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Eugene Agichtein Mathematics & Computer Science Emory University

Similar presentations

Presentation on theme: "Eugene Agichtein Mathematics & Computer Science Emory University"— Presentation transcript:

Similar presentations

About project

Feedback