Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Web Usage Mining Web Usage Mining (Clickstream Analysis) Mark Levene (Follow the links to learn more!)
Struggling or Exploring? Disambiguating Long Search Sessions
Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
Characterizing Web Content, User Interests, and Search Behavior by Reading Level and Topic Jin Young Kim*, Kevyn Collins-Thompson, Paul Bennett and Susan.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Studies of the Onset & Persistence of Medical Concerns in Search Logs Ryen White and Eric Horvitz Microsoft Research, Redmond
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung.
1 Web Search and Web Search Overlap: What the Deal? Amanda Spink Queensland University of Technology.
Ryen W. White, Microsoft Research Jeff Huang, University of Washington.
Ryen White, Susan Dumais, Jaime Teevan Microsoft Research {ryenw, sdumais,
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference (Sec. )
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
Probabilistic Model of Sequences Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
Cohort Modeling for Enhanced Personalized Search Jinyun YanWei ChuRyen White Rutgers University Microsoft BingMicrosoft Research.
Query Log Analysis Naama Kraus Slides are based on the papers: Andrei Broder, A taxonomy of web search Ricardo Baeza-Yates, Graphs from Search Engine Queries.
Information Re-Retrieval Repeat Queries in Yahoo’s Logs Jaime Teevan (MSR), Eytan Adar (UW), Rosie Jones and Mike Potts (Yahoo) Presented by Hugo Zaragoza.
From Devices to People: Attribution of Search Activity in Multi-User Settings Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz Microsoft Research,
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
David Pardoe Doran Chakraborty Peter Stone The University of Texas at Austin Department of Computer Science TacTex-09: A Champion Bidding Agent for Ad.
Topics and Transitions: Investigation of User Search Behavior Xuehua Shen, Susan Dumais, Eric Horvitz.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
Gradual Adaption Model for Estimation of User Information Access Behavior J. Chen, R.Y. Shtykh and Q. Jin Graduate School of Human Sciences, Waseda University,
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.
Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014.
Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Fan Guo 1, Chao Liu 2 and Yi-Min Wang 2 1 Carnegie Mellon University 2 Microsoft Research Feb 11, 2009.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Understanding and Predicting Personal Navigation Date : 2012/4/16 Source : WSDM 11 Speaker : Chiu, I- Chih Advisor : Dr. Koh Jia-ling 1.
Understanding Query Ambiguity Jaime Teevan, Susan Dumais, Dan Liebling Microsoft Research.
Personalizing Search on Shared Devices Ryen White and Ahmed Hassan Awadallah Microsoft Research, USA Contact:
Microsoft Research1 Characterizing Alert and Browse Services for Mobile Clients Atul Adya, Victor Bahl, Lili Qiu Microsoft Research USENIX Annual Technical.
Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
1 Date: 2012/9/13 Source: Yang Song, Dengyong Zhou, Li-wei Heal(WSDM’12) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Query Suggestion by Constructing.
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.
Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.
Algorithmic Detection of Semantic Similarity WWW 2005.
Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Analysing Clickstream Data: From Anomaly Detection to Visitor Profiling Peter I. Hofgesang Wojtek Kowalczyk ECML/PKDD Discovery.
Understanding User Goals in Web Search University of Seoul Computer Science Database Lab. Min Mi-young.
 Who Uses Web Search for What? And How?. Contribution  Combine behavioral observation and demographic features of users  Provide important insight.
Learning User Behaviors for Advertisements Click Prediction Chieh-Jen Wang & Hsin-Hsi Chen National Taiwan University Taipei, Taiwan.
Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Why Decision Engine Bing Demos Search Interaction model Data-driven Research Problems Q & A.
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent Presented by Jaime Teevan, Susan T. Dumais, Daniel J. Liebling Microsoft.
The Internet What is the Internet? The Internet is a lot of computers over the whole world connected together so that they can share information. It.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
SEARCH AND CONTEXT Susan Dumais, Microsoft Research INFO 320.
Chapter 3: Maximum-Likelihood Parameter Estimation
Assessing the Scenic Route: Measuring the Value of Search Trails in Web Logs Ryen W. White1 Jeff Huang2 1Microsoft Research 1University of Washington.
Topics and Transitions: Investigation of User Search Behavior
FRM: Modeling Sponsored Search Log with Full Relational Model
Personalizing Search on Shared Devices
Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
INF 141: Information Retrieval
Presentation transcript:

Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005

Goal  Understand user’s search behaviors by analyzing log data  Start with a large log of queries and URLs visited over a period of five weeks  Each URL has a topical category (e.g., Arts, Business, Computers, etc.) associated with it  Understand the nature of topics that users explore  The consistency of the topics a user visits over time

Model  Marginal Models The marginal models simply use the overall probability distribution for each of the 15 topics.  Markov Models The Markov models explicitly represent the probabilities of transitioning among topics. Consider the probability of moving from one topic to another on successive URL visits. The model has 225 states, each representing transitions from topic to topic (including transitions to the same topic).  We use maximum likelihood techniques to estimate all model parameters, and Jelinek-Mercer smoothing to estimate the probability distributions

User Groups  Individual previous behavior of each individual to predict their current behavior.  Groups we grouped together all individuals who had the same maximally visited topic based on their marginal model.  Population the entire population to predict the current behavior of an individual

Data Set  MSN Search May 22 to June 29, Over a five week period  The basic user actions Client ID = session ID TimeStamp Action (Query, Clicked) Value (a string for Query, a URL for Clicked).  More than 87 million actions from 2.7 million unique users.

Sampling  We selected a sample of 6,153 users who had more than 100 actions (either queries or URL visits)  This data set contains more than 660,000 URL visits for which we could assign topics.

Topic Categories  Open Directory Project (ODP)  For this experiment we used only the first-level categories from ODP.  We omitted the Regional and World categories since we were interested in topical distinctions, and added an Adult category.  The 15 categories we used are: Adult, Arts, Business, Computers, Games, Health, Home, Kids and Teens, News, Recreation, Reference, Science, Shopping, Society and Sports.

Topic Categories  URLs that were in the directory Category tags were automatically assigned to each URL using direct lookup in the ODP  URLs that were not in the directory Using the distribution of categories for the site and sub-site of a URL.

Sample user logs narrow focus on computer and computer games Elapsed Time->

Sample user logs Broader range of topics, arts, home, business, health, etc.

User Information  For the 6153 users  we found that the average number of different topics represented in the URLs selected by an individual was 7.2 with a standard deviation of 2.1.  Very few individuals focused exclusively on one topic  Very few covered the full range of 15 topics over the five week period.

Markov transition probabilities for the Population model

Topic Distribution  The three most frequent topics Arts (16%), Shopping (15%) Society (12%).  Beitzel et al. Adult Entertainment Music  Lau and Horvitz Products and Services Adult Entertainment.

Prediction accuracy (F1) for marginal and Markov models for Individuals, Groups and Population.

Prediction accuracy (F1) for Markov models when using different amounts of training data.

Prediction accuracy (F1) for Markov models for different gaps in time between training and testing

CONCLUSIONS AND FUTURE WORK  We examined topic dynamics in Web search and developed probabilistic models to predict topic transitions.  Group models provide a good balance between predictive accuracy and computational tractability for both marginal and Markov approaches.  We considered the influence on model accuracy of temporal proximity and amount of training data.  We would like to extend the results with a detailed characterization of the automated tagging process.