By : Garima Indurkhya Jay Parikh Shraddha Herlekar Vikrant Naik.

By : Garima Indurkhya Jay Parikh Shraddha Herlekar Vikrant Naik

Paper 1 The Structure of Collaborative Tagging Systems Authors : Golder, S. and Huberman, B.,2005.

Contents What is tagging? Tagging & Taxonomy Aspects of Classification Kinds of Tags Case Study : Del.icio.us

What is Tagging? Marking the content with descriptive terms Examples : Catalog indexing by Librarian Keywords to describe a blog entry / Photo on web Collaborative tagging : practice of allowing anyone to freely attach keywords or tags to content Social Bookmark Managers: Del.icio.us (http://del.icio.us) Flickr (http://www.flickr.com)http://www.flickr.com CiteULike(http://www.citeulike.org/)http://www.citeulike.org/ Cloudalicious (http://cloudalicio.us/)

Tagging & Taxonomy Tagging Non-hierarchical Describe the information held within them Tag based search returns great variety of things simultaneously For example : the Tags for the article about cats in Africa could be cats, africa, animals, cheetahs etc. Taxonomy Hierarchical For example : the Taxonomy for the article about cats in Africa could be

Aspects of Classification Problems to be considered while classifying Semantic Polysemy Synonymy Cognitive Basic level variation Sense making

Kinds of Tags Several kinds of functions performed by tags for bookmarks Identifying What (or Who) it is About Identifying What it Is Identifying Who Owns It Identifying Qualities or Characteristics Self Reference Task Organizing

Case Study : Del.icio.us Del.icio.us Collaborative tagging system for web Social bookmark manager Storage of personal bookmarks Public nature of bookmarks

Case Study : Del.icio.us

Paper 2 On the Selection of Tags for Tag Clouds Authors : P. Venetis, et. al., WSDM, 2011.

Contents Tag Cloud System Model Properties of Tag Cloud Algorithms to generate Tag Clouds User Models for Tag Clouds Experimental Evaluation of algorithms Evaluation of User Models Conclusion

Tag Cloud Definition A visual representation of social tags, organized into paragraph - style layout, usually in alphabetical order, where the relative size and weight of the font for each tag corresponds to the relative frequency of its use. Compact Three dimension at a time! alphabetical order size indicating importance the tags themselves

Tag Cloud Tag cloud for our example “cats in africa”

Tag Cloud Uses of Tag Cloud Summarizing web search results Summarizing results over biomedical databases Summarizing results of structured queries

Tag Cloud Example of tag cloud for summarizing web search results

System Model Terminologies C = set of objects (e.g. web pages / articles) T = set of tags C q = set of objects for query q |C q | = number of objects in C q T q = set of tags for query q A q (t) = Association set for V tag t T q,there is c C q S = set of tags in tag cloud T q |S| = number of tags in tag cloud Partial (scoring) function s(t,c) : T x C [0,1] Similarity function Sim(.,.) : C x C [0,1]

Properties of Tag Cloud Extent of S The cardinality of S ext(s) = |s| Coverage of S Scored size of objects associated with S Where |C q | s,q = sum of scores for every c C q

Properties of Tag Cloud Overlap of S The extent of redundancy Cohesiveness of S How closely related the objects in each association set of S are

Properties of Tag Cloud Relevance of S Relevance between tags in S and original query q Popularity of S A tag is more popular if it is associated with many objects in C q.

Properties of Tag Cloud Independence of S Tags are Independent if they refer to dissimilar objects Balance of S Ratio of minimum size of Association set to the maximum size of Association set for a particular tag in a Tag cloud S.

Algos to generate Tag Clouds Single vs Multi-objective tag selection E.g. achieving high popularity, get more coverage, be more cohesive, Incorporating relevance Input to algorithms C q, T q and S ⊆ T q

Algos to generate Tag Clouds Popularity algorithm(POP) The most common algorithm in social information sharing A tag is more popular if it is associated with many objects in C q. It allows user to see what other people are mostly interested in sharing. For query q and parameter k, the algo returns top k tags in T q according to their |A q (t)|.

Tf-idf based algorithms( TF,WTF ) f (q, t, c) = s(t, c) (tf-idf method) f (q, t, c) = s(t, c).s(q, c) (weighted-idf or WTF method)

Maximum Coverage Algorithm(COV)

User Models for Tag Clouds Build an Ideal user satisfaction model Use this model to compare the tag clouds Base model: Coverage The probability that an object is of the user’s interest is r.p, while the probability that an object is of the user’s interest is p.

User Models for Tag Clouds Incorporating Relevance For an object the probability that it is of the user’s interest is and for every object the probability that it is of the user’s interest is p. Incorporating Cohesiveness For an object the probability that it is of the user’s interest is and for every object the probability that it is of the user’s interest is p.

User Models for Tag Clouds Incorporating Overlap For an object c that is contained by and no other association sets the probability that it is of the user’s interest is the one that can be seen in and for every object the probability that it is of the user’s interest is p. Taking into account Scores Closing Comment

Experimental Evaluation Datasets: CourseRank Del.icio.us

Experimental Evaluation of algorithms: CourseRank Most metrics are not correlated Only coverage and popularity correlated High coverage might not be highly relevant Algorithms impact metrics differently

Experimental Evaluation of algorithms : CourseRank

Experimental Evaluation of algorithms : del.icio.us Similar, but overall range of values for coverage metric is around 0.2-0.8, much lower than for CourseRank dataset

Impact on failure probability Algorithms impact failure probability differently

Evaluation of User Models 80% predicted correctly, even when failure probability small 100% for 0.15-0.25 difference, so if agreement, we get best tag cloud !

Conclusion Metrics generally not correlated So, different important aspects of tag cloud are covered. COV best algorithm to find tag cloud followed by POP POP works well with relevance and cohesiveness! User model- useful tool to identify tag clouds preferred by users

Future Work Extend model to capture balance metric Construct algorithm to minimize failure probability for a dataset and given extent Take into account items with unassigned and spam tags

Thank you!

By : Garima Indurkhya Jay Parikh Shraddha Herlekar Vikrant Naik.

Similar presentations

Presentation on theme: "By : Garima Indurkhya Jay Parikh Shraddha Herlekar Vikrant Naik."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

By : Garima Indurkhya Jay Parikh Shraddha Herlekar Vikrant Naik.

Similar presentations

Presentation on theme: "By : Garima Indurkhya Jay Parikh Shraddha Herlekar Vikrant Naik."— Presentation transcript:

Similar presentations

About project

Feedback