Topics and Transitions: Investigation of User Search Behavior Xuehua Shen, Susan Dumais, Eric Horvitz.

Slides:

Advertisements

Similar presentations

Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.

Advertisements

Struggling or Exploring? Disambiguating Long Search Sessions

Characterizing Web Content, User Interests, and Search Behavior by Reading Level and Topic Jin Young Kim*, Kevyn Collins-Thompson, Paul Bennett and Susan.

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Personalized Query Classification Bin Cao, Qiang Yang, Derek Hao Hu, et al. Computer Science and Engineering Hong Kong UST.

Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.

Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

1 Web Search and Web Search Overlap: What the Deal? Amanda Spink Queensland University of Technology.

Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)

Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan Susan T.Dumains Eric Horvitz MIT,CSAILMicrosoft Researcher Microsoft.

Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.

Link Analysis, PageRank and Search Engines on the Web

Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.

COMP 630L Paper Presentation Javy Hoi Ying Lau. Selected Paper “A Large Scale Evaluation and Analysis of Personalized Search Strategies” By Zhicheng Dou,

Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.

University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.

Connecting Diverse Web Search Facilities Udi Manber, Peter Bigot Department of Computer Science University of Arizona Aida Gikouria - M471 University of.

Cohort Modeling for Enhanced Personalized Search Jinyun YanWei ChuRyen White Rutgers University Microsoft BingMicrosoft Research.

1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.

1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

Maj David Robinson Online Behavior Analysis and Modeling Methodology (OBAMM)

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.

Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

Personalized Search Cheng Cheng (cc2999) Department of Computer Science Columbia University A Large Scale Evaluation and Analysis of Personalized Search.

Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

Hao Wu Nov Outline Introduction Related Work Experiment Methods Results Conclusions & Next Steps.

Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.

XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.

Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005.

1 Date: 2012/9/13 Source: Yang Song, Dengyong Zhou, Li-wei Heal(WSDM’12) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Query Suggestion by Constructing.

A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.

Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.

Algorithmic Detection of Semantic Similarity WWW 2005.

Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.

Computing for Social Needs Jennifer Mankoff UC Berkeley.

Adish Singla, Microsoft Bing Ryen W. White, Microsoft Research Jeff Huang, University of Washington.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.

Institute of Computing Technology, Chinese Academy of Sciences 1 A Unified Framework of Recommending Diverse and Relevant Queries Speaker: Xiaofei Zhu.

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

Post-Ranking query suggestion by diversifying search Chao Wang.

Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.

1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.

Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.

PERSONALIZED DIVERSIFICATION OF SEARCH RESULTS Date: 2013/04/15 Author: David Vallet, Pablo Castells Source: SIGIR’12 Advisor: Dr.Jia-ling, Koh Speaker:

Leveraging Knowledge Bases for Contextual Entity Exploration Categories Date:2015/09/17 Author:Joonseok Lee, Ariel Fuxman, Bo Zhao, Yuanhua Lv Source:KDD'15.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.

Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.

Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent Presented by Jaime Teevan, Susan T. Dumais, Daniel J. Liebling Microsoft.

Bringing Order to the Web : Automatically Categorizing Search Results Advisor ： Dr. Hsu Graduate ： Keng-Wei Chang Author ： Hao Chen Susan Dumais.

Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.

Context-Sensitive IR using Implicit Feedback Xuehua Shen, Bin Tan, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.

Evaluation Anisio Lacerda.

What’s next for search engine

Search Engines and Link Analysis on the Web

Topics and Transitions: Investigation of User Search Behavior

Author: Kazunari Sugiyama, etc. (WWW2004)

John Lafferty, Chengxiang Zhai School of Computer Science

Personalized Celebrity Video Search Based on Cross-space Mining

Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,

Presentation transcript:

Topics and Transitions: Investigation of User Search Behavior Xuehua Shen, Susan Dumais, Eric Horvitz

What’s next for the user?

Outline Problem Automatic Topic Tagging Predictive models Evaluation Experiments and analysis Conclusion and future directions

Problem Opportunity: Personalizing search Focus: What topics do users explore? How similar are users to each other, to special groups, and to the population at large? Data, data, data… –MSN search engine log –Query & clickthrough –87,449,277 rows, 36,895,634 URLs 5% sample from MSN logs, 05/29-06/29 Create predictive models of topic of queries and urls visited

Automatic Topic Tagging ODP (Open Directory Project) manually categorize URLs MSN extended methods with heuristics to cover more urls We develop a tool to automatically tag every URL in the log 15 top-level categories Arts, Business, Computers, Games, Health, Home, Kids_and_Teens, News, Recreation, Reference, Science, Shopping, Society, Sports, Adult

A Snippet multiple tagging Avg: 1.38 tags per URL ActionI D ClientIDElapedTimeActionValueTopCat b Chttp:// b Chttp:// b Q Birth certificateNULL Q yaho NULL Chttp:// d Chttp://tv.yahoo.com/news/ap/ / htmlArts d C french translatorNULL d Chttp://tv.zap2it.com/tveditorial/tve_main/1,1002,271|88515|1|, 00.htm Arts d Chttp://tv.zap2it.com/tveditorial/tve_main/1,1002,271|88515|1|, 00.htm Arts d Chttp:// =6180 Society de Chttp:// de Chttp:// de Chttp:// de Chttp://

Predictive Model: User Perspective Individual model Use only individual clickthrough to build a model for each user’s predictions Group model Group similar users to build a model for each group’s prediction (e.g., group users with same ‘max topic’ clickthrough) Population model Use clickthrough data for all users to build a model for all users predictions

Predictive Model: Considering Time Dependence Marginal model –Base probability for topics Markov model –Probability of moving from one topic to another Time-interval-specific Markov model –U ser search behavior has two different patterns ? ? ?

Evaluation Metrics KL (Kullback-Leibler) Divergence Likelihood Top K Match the real top K topics and predicted top K’ topics

Experiment 5 weeks data (05/22-06/29) Build models based on different subsets of total data Do prediction for a “holdout set”: Other weeks data

Results from Basic Experiment Marginal model: Individual model has best performance Markov model: Consistently better than corresponding marginal model Markov model: Individual model has no best performance: Why?

Results: Training Data Size Greater amounts of training data  Markov (same for Marginal) models improve But: Individual Markov model still can’t beat Population Markov model

Results: Smoothing Using population Markov model to smooth helps individual Markov model But: smoothed individual Markov model still can’t outperform population model

Results: Time Decay Effect When time of training data decays, the prediction accuracy decreases

Results: Time-Interval-Specific Markov Model Markov Models capture short time access pattern better

Conclusion Use ODP categorization to tag URLs visited by users Construct marginal and Markov models using tagged URLs Explore performance of marginal and Markov models to predict transitions among topics Set of results relating topic transition behaviors of population, groups, and specific users

Directions Study of reliability, failure modes of automated tagging process (use of expert human taggers) Combination of query and clickthrough topics Formulating and studying different groups of people Topic-centric evaluation Application of results in personalization of search experience – Interpretation of topics associated with queries –Ranking of results –Designs for client UI

Acknowledgement Susan and Eric for great mentoring and discussion Johnson and Muru for development support Haoyong for MSN Search Engine development environment