Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.

Similar presentations


Presentation on theme: "A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim."— Presentation transcript:

1 A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim

2 Contents  Introduction  Building a Dataset  Are the Distributions Similar?  Investigating Website Content  Conclusion 2 / 20

3 Introduction tags 3 / 20

4 Introduction  Questions 1. Are queries and tags similar across URLs? 2. Can tag data be used to approximate user queries to a search engine? 3. Can query logs be used to suggest new tags for a particular webpage? 4. For what types of websites is the correlation between the term distributions for queries and tags the highest? 5. Which of the distributions, tags or queries, is most closely related to the content of the clicked websites? 4 / 20

5 Building a Dataset  AOL query log –Sizable –Recent (2006) –English queries –Available to academic researchers –657,426 users –A period of 3 months from March to May, 2006  Delicious tag –Collaborative tagging system  Final dataset: 4145 complete URLs –Google query, stemming, prunning 5 / 20

6 Are the Distributions Similar? http://www.nytimes.com tags or 6 / 20

7 Are the Distributions Similar?  Kullback-Leibler divergence 7 / 20

8 Are the Distributions Similar?  Jensen-Shannon divergence –Symmetric measure  Overlap coefficient V q : query logs V r : tags 8 / 20

9 Are the Distributions Similar? 9 / 20

10 Are the Distributions Similar?  Open directory project 10 / 20

11 Are the Distributions Similar? 11 / 20

12 Are the Distributions Similar? 12 / 20

13 Are the Distributions Similar? 13 / 20

14 Are the Distributions Similar? 14 / 20

15 Are the Distributions Similar? 15 / 20

16 Are the Distributions Similar? 16 / 20

17 Investigating Website Content 17 / 20

18 Investigating Website Content 18 / 20

19 Conclusion  Similarity between query term and tag –Vocabularies contain a large amount of overlap –Term frequency distributions are correlated –Similarity is not dependent on the topic area  Queries are more similar to content than to tags  Queries and tags are more similar to one another than to content  Future work –Models for automatically removing noise from the tag and query logs –Techniques for predicting useful tags from query distributions –Techniques for the effective use of tag data to improve different forms of Web search 19 / 20

20 Thank you


Download ppt "A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim."

Similar presentations


Ads by Google