Zhichen Xu, Yun Fu, Jianchang Mao, and Difu Su Yahoo! Inc 2821 Mission College Blvd., Santa Clara, CA 95054 {zhichen, yfu, jmao, Towards.

Slides:

Advertisements

Similar presentations

Clustering Categorical Data The Case of Quran Verses

Advertisements

Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.

HT06, Position Paper, Tagging, Taxonomy, Flickr, Academic Article, ToRead, Presentation Cameron Marlow, Mor Naaman, danah boyd, Marc Davis Yahoo! Research.

Taxonomies, Lexicons and Organizing Knowledge Wendi Pohs, IBM Software Group.

A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.

Measuring Author Contributions to the Wikipedia B. Thomas Adler, Luca de Alfaro, Ian Pye and Vishwanath Raman Computer Science Dept. UC Santa Cruz, CA,

A Self Learning Universal Concept Spotter By Tomek Strzalkowski and Jin Wang Original slides by Iman Sen Edited by Ralph Grishman.

GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.

An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.

Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)

Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.

Tagging Systems Austin Wester. Tags A keywords linked to a resource (image, video, web page, blog, etc) by users without using a controlled vocabulary.

Tagging Systems Mustafa Kilavuz. Tags A tag is a keyword added to an internet resource (web page, image, video) by users without relying on a controlled.

6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.

Information Retrieval Definition: A part of computer science that studies the retrieval of information from a collection of written documents.

Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.

Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.

Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.

PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment Natalya F. Noy and Mark A. Musen.

Enterprise social bookmarking - in a community of practice in IBM, Denmark by Joachim Florentz Boye and Marianne Lykke Nielsen Royal School of Library.

Recommender systems Ram Akella November 26 th 2008.

The Social Web: A laboratory for studying s ocial networks, tagging and beyond Kristina Lerman USC Information Sciences Institute.

Chapter 5: Information Retrieval and Web Search

Tag-based Social Interest Discovery

«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,

Golder and Huberman, 2006 Journal of Information Science Usage Patterns of Collaborative Tagging System.

Adversarial Information Retrieval The Manipulation of Web Content.

Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.

Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.

A J Miles Rutherford Appleton Laboratory SKOS Standards and Best Practises for USING Knowledge Organisation Systems ON THE Semantic Web NKOS workshop ECDL.

By : Garima Indurkhya Jay Parikh Shraddha Herlekar Vikrant Naik.

2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.

No Title, yet Hyunwoo Kim SNU IDB Lab. September 11, 2008.

Computational Intelligence: Methods and Applications Lecture 30 Neurofuzzy system FSM and covering algorithms. Włodzisław Duch Dept. of Informatics, UMK.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Taxonomy of Similarity Mechanisms for Case-Based Reasoning.

Chapter 6: Information Retrieval and Web Search

Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.

PREMIS Controlled vocabularies Rebecca Guenther Sr. Networking & Standards Specialist, Library of Congress PREMIS Implementation Fair San.

Job scheduling algorithm based on Berger model in cloud environment Advances in Engineering Software (2011) Baomin Xu,Chunyan Zhao,Enzhao Hua,Bin Hu 2013/1/251.

1 CS 430: Information Discovery Lecture 25 Cluster Analysis 2 Thesaurus Construction.

How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.

Understanding User’s Query Intent with Wikipedia G 여 승 후.

Content-Based Image Retrieval Using Fuzzy Cognition Concepts Presented by Tienwei Tsai Department of Computer Science and Engineering Tatung University.

ROCK: A Robust Clustering Algorithm for Categorical Attributes Authors: Sudipto Guha, Rajeev Rastogi, Kyuseok Shim Data Engineering, Proceedings.,

Flickr Tag Recommendation based on Collective Knowledge BÖrkur SigurbjÖnsson, Roelof van Zwol Yahoo! Research WWW Summarized and presented.

Vector Space Models.

WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.

Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.

Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.

+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.

1 Generating Comparative Summaries of Contradictory Opinions in Text (CIKM09’)Hyun Duk Kim, ChengXiang Zhai 2010/05/24 Yu-wen,Hsu.

Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.

PREMIS Controlled vocabularies Rebecca Guenther Sr. Networking & Standards Specialist, Library of Congress PREMIS Implementation Fair Vienna,

Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.

Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.

CS791 - Technologies of Google Spring A Webbased Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.

Applying Link-based Classification to Label Blogs Smriti Bhagat, Irina Rozenbaum Graham Cormode.

Of 24 lecture 11: ontology – mediation, merging & aligning.

On Stability, Clarity, and Co-occurrence of Self-Tagging Aixin Sun and Anwitaman Datta Nanyang Technological University Singapore.

User Modeling for Personal Assistant

Information Organization: Overview

Cristian Ferent and Alex Doboli

Discovery of Blog Communities based on Mutual Awareness

Text Categorization Berlin Chen 2003 Reference:

Information Organization: Overview

Presentation transcript:

Zhichen Xu, Yun Fu, Jianchang Mao, and Difu Su Yahoo! Inc 2821 Mission College Blvd., Santa Clara, CA {zhichen, yfu, jmao, Towards the Semantic Web: Collaborative Tag Suggestions (cited by 74) Workshop on Collaborative Web Tagging, WWW2006 Presented by Jun-Ming Chen 11/28/2008

2 Outline Introduction Collaborative tag suggestion  A taxonomy of tags  Criteria for good tags  Collaborative Tag Suggestions  Tag Spam Elimination  Content-based Tag Suggestions  Tag Normalization  Temporal Tags  Adjustments Examples Conclusions

3 Introduction We propose a collaborative tag suggestion algorithm using these criteria to spot high-quality tags. The proposed algorithm employs a goodness measure for tags derived from collective user authorities to combat spam. The goodness measure is iteratively adjusted by a reward-penalty algorithm, which also incorporates other sources of tags, e.g., content-based auto-generated tags. Our experiments based on My Web 2.0 show that the algorithm is effective.

4 MyWeb 2.0

5 Introduction

6 Collaborative tag suggestion A taxonomy of tags Content-based tags  Autos, Honda Odyssey, batman, open source Context-based tags  tags describing locations and time such as San Francisco, Golden Gate Bridge, Attribute tags  author of a piece of content such as Jeremy’s Blog and Clay Shirky Subjective tags  Tags that express user’s opinion and emotion such as funny or cool Organizational tags  my paper or my work Golder and Huberman have also discussed tag categorization [3] [3] Golder, Scott A., Huberman, Bernardo A. “The Structure of Collaborative Tagging Systems.” HPL Technical Report

7 Collaborative tag suggestion Criteria for good tags An object, a specific tag, a generic tags High coverage of multiple facets High popularity (IR: term frequency) Least-effort (number of tags :minimize ; number of objects: small) Uniformity (normalization) (syntactic variance, synonym) Exclusion of certain types of tags (matching the prefixes of the tags entered by the user before)

8 Collaborative tag suggestion Criteria for good tags

9 Collaborative tag suggestion Collaborative Tag Suggestions & Tag Spam Elimination P s (t i | t j ; o) P a (t i | t j ) (any user) object o (the same user) tjtj titi the overlap in terms of the concepts between t i and t j minimize the overlap of the concepts identified by the suggested tags (Reward)

10 Collaborative tag suggestion Collaborative Tag Suggestions & Tag Spam Elimination S (t, o) a (u) Goodness measure (score) of the tag t to an object o the goodness measure of a (tag, object) pair is the sum of the authority scores of all users who have tagged the object with the tag In a simple case where we assign uniform authority score of 1.0 for every user. the authority score of a given user u the average quality of a given user’s tags object (u) is the set of objects tagged by the user u tag (o, u) denotes the set of tags assigned to object o by user u user (t, o) denotes the set of users who have tagged a given object o with the tag t. User u object tagS( t,o ) applepc1.8 machine0.6 red0.8 carautos2.6 BMW1.3 machine0.4

11 Collaborative tag suggestion Collaborative Tag Suggestions & Tag Spam Elimination At each step, after a tag t i is selected, we adjust the score for each remaining tag t’ as follows:  Penalize tag t’ by removing the redundant information, e.g., by subtracting Pa (t’ | t i ) * S (t i,o) from S (t’, o), i.e., This minimizes the overlap of the concepts identified by the suggested tags.  Reward tag t’ if it co-occurs with the selected tag t i when users tag object o. This rewarding mechanism also improves the uniformity of the suggested tags

12 Collaborative tag suggestion Collaborative Tag Suggestions & Tag Spam Elimination Basic Algorithm

13 Collaborative tag suggestion Content-based Tag Suggestions  analysis and classification of the tagged content and context Tag Normalization (improve tag uniformity, by computing the bi-grams of the tags in the currently chosen tag set)  stemming, edit distance, thesauri Temporal Tags  time sensitive Adjustments  dampening tag coverage score  adding coefficients to the penalizing and rewarding forces

14 Examples three fairly orthogonal facets it also honors the popular demand of users

15 Examples

16 Conclusions We discussed the basic criteria for a good tagging system and proposed a collaborative algorithm for suggesting tags that meet these criteria. Our preliminary experience shows that a simple embodiment of such an algorithm is effective. We have implemented a simplified tag suggestion scheme in My Web 2.0.  Our experience shows that this simple scheme is quite effective in suggesting appropriate tags that possess the properties proposed by us for a good tagging system.

17 Future work Develop metrics to quantitatively measure the quality of suggested tags, and study how tag suggestion can help to facilitate convergence of tag vocabulary Introduce automatically generated content-based tags and also consider the time-sensitivity of tags. Improve tag uniformity by normalizing semantically similar tags that are not similar in letters. The bi-gram method cannot achieve this. This would require incorporating certain linguistic analysis features. Using voting and existing tags alone may prevent new high-quality tags from emerging.

18 Comments How a social network can be used to understand relationships between different taggers How the co-occurrence of tags can be used to cover more facets of an object