A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.

Slides:

Advertisements

Similar presentations

An Online News Recommender System for Social Networks Department of Computer Science University of Illinois at Urbana-Champaign Manish Agrawal, Maryam.

Advertisements

Advanced Google Becoming a Power Googler. (c) Thomas T. Kaun 2005 How Google Works PageRank: The number of pages link to any given page. “Importance”

WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.

Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu.

The Complex Dynamics of Collaborative Tagging Harry Halpin University of Edinburgh Valentin Robu CWI, Netherlands Hana Shepherd Princeton University WWW.

Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)

Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.

Why we search: Visualizing and predicting user behavior E Adar, D S Weld, B N Bershad, S Gribble Raju Balakrishnan.

Web queries classification Nguyen Viet Bang WING group meeting June 9 th 2006.

1 ETT 429 Spring 2007 Microsoft Publisher II. 2 World Wide Web Terminology Internet Web pages Browsers Search Engines.

MusicSense: Contextual Music Recommendation using Emotional Allocation Modeling Rui Cai, Chao Zhang, Chong Wang, Lei Zhang, and Wei-Ying Ma Proceedings.

A survey of tag cloud presentation techniques Mogens Nielsen June 6th 2007.

Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.

Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike.

TwitterSearch : A Comparison of Microblog Search and Web Search

Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.

Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

Authors: Maryam Kamvar and Shumeet Baluja Date of Publication: August 2007 Name of Speaker: Venkatasomeswara Pawan Addanki.

Topics and Transitions: Investigation of User Search Behavior Xuehua Shen, Susan Dumais, Eric Horvitz.

APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.

Relevance feedback using query-logs Gaurav Pandey Supervisors: Prof. Gerhard Weikum Julia Luxenburger.

1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

No Title, yet Hyunwoo Kim SNU IDB Lab. September 11, 2008.

Tag Data and Personalized Information Retrieval 1.

Google Directory By, Dixie E. Oyola. Google Directory The Google Web Directory integrates Google's sophisticated search technology with Open Directory.

Cloak and Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 左昌國 Seminar.

1 Clustering of search engine results by Google CWI, Amsterdam, The Netherlands Vrije Universiteit.

ON INCENTIVE-BASED TAGGING Xuan S. Yang, Reynold Cheng, Luyi Mo, Ben Kao, David W. Cheung {xyang2, ckcheng, lymo, kao, The University.

Web Document Clustering: A Feasibility Demonstration Oren Zamir and Oren Etzioni, SIGIR, 1998.

Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.

Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.

ON THE SELECTION OF TAGS FOR TAG CLOUDS (WSDM11) Advisor: Dr. Koh. Jia-Ling Speaker: Chiang, Guang-ting Date:2011/06/20 1.

Query trends CS 349 Presentation December 2 nd, 2008 Catherine Grevet.

Improving Classification Accuracy Using Automatically Extracted Training Data Ariel Fuxman A. Kannan, A. Goldberg, R. Agrawal, P. Tsaparas, J. Shafer Search.

Basic Search Engine Optimization. What is SEO?  SEO is an abbreviation for search engine optimization.

CSM06 Information Retrieval Lecture 1a – Introduction Dr Andrew Salway

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005.

Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.

Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

Meet the web: First impressions How big is the web and how do you measure it? How many people use the web? How many use search engines? What is the shape.

Instance-based mapping between thesauri and folksonomies Christian Wartena Rogier Brussee Telematica Instituut.

Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.

Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.

1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.

By Pamela Drake SEARCH ENGINE OPTIMIZATION. WHAT IS SEO? Search engine optimization (SEO) is the process of affecting the visibility of a website or a.

Adaptive Faceted Browsing in Job Offers Danielle H. Lee

The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.

Why Decision Engine Bing Demos Search Interaction model Data-driven Research Problems Q & A.

1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.

Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.

Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

Distinguishing humans from robots in web search logs preliminary results using query rates and intervals Omer Duskin Dror G. Feitelson School of Computer.

Harnessing P2P Power in the Classroom Julita Vassileva Department of Computer Science University of Saskatchewan, Canada.

Using Statistical Decision Theory and Relevance Models for Query-Performance Prediction Anna Shtok and Oren Kurland and David Carmel SIGIR 2010 Hao-Chin.

An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)

User Modeling for Personal Assistant

Guangbing Yang Presentation for Xerox Docushare Symposium in 2011

Add a +1 to make your Google marketing social.

Web Traffic Analysis Script PHP Web Traffic Analysis Script PHP Web Traffic Analysis Software.

Tagging with Queries: How and Why?

Accessing OECD Statistics WPFS December 2010

A Suite to Compile and Analyze an LSP Corpus

Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,

Journal of Web Semantics 55 (2019)

Presentation transcript:

A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim

Contents  Introduction  Building a Dataset  Are the Distributions Similar?  Investigating Website Content  Conclusion 2 / 20

Introduction tags 3 / 20

Introduction  Questions 1. Are queries and tags similar across URLs? 2. Can tag data be used to approximate user queries to a search engine? 3. Can query logs be used to suggest new tags for a particular webpage? 4. For what types of websites is the correlation between the term distributions for queries and tags the highest? 5. Which of the distributions, tags or queries, is most closely related to the content of the clicked websites? 4 / 20

Building a Dataset  AOL query log –Sizable –Recent (2006) –English queries –Available to academic researchers –657,426 users –A period of 3 months from March to May, 2006  Delicious tag –Collaborative tagging system  Final dataset: 4145 complete URLs –Google query, stemming, prunning 5 / 20

Are the Distributions Similar? tags or 6 / 20

Are the Distributions Similar?  Kullback-Leibler divergence 7 / 20

Are the Distributions Similar?  Jensen-Shannon divergence –Symmetric measure  Overlap coefficient V q : query logs V r : tags 8 / 20

Are the Distributions Similar? 9 / 20

Are the Distributions Similar?  Open directory project 10 / 20

Are the Distributions Similar? 11 / 20

Are the Distributions Similar? 12 / 20

Are the Distributions Similar? 13 / 20

Are the Distributions Similar? 14 / 20

Are the Distributions Similar? 15 / 20

Are the Distributions Similar? 16 / 20

Investigating Website Content 17 / 20

Investigating Website Content 18 / 20

Conclusion  Similarity between query term and tag –Vocabularies contain a large amount of overlap –Term frequency distributions are correlated –Similarity is not dependent on the topic area  Queries are more similar to content than to tags  Queries and tags are more similar to one another than to content  Future work –Models for automatically removing noise from the tag and query logs –Techniques for predicting useful tags from query distributions –Techniques for the effective use of tag data to improve different forms of Web search 19 / 20

Thank you