Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.

Slides:



Advertisements
Similar presentations
Technology Roadmap Project Harold Flescher VP-Elect, Technical Activities August 2008, Region 1 Meeting.
Advertisements

Web Mining.
Louisa Casely-Hayford e-Science Ontologies & Ontology tools for the CCLRC Neutron & Muon Facility.
Why, what were the idea ? 1.Create a data infrastructure, 2.Data + the knowledge products that are produced on the basis of data a) Efficiant access to.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
1.Data categorization 2.Information 3.Knowledge 4.Wisdom 5.Social understanding Which of the following requires a firm to expend resources to organize.
Implementing folderless document management using metadata.
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
The Complex Dynamics of Collaborative Tagging Harry Halpin University of Edinburgh Valentin Robu CWI, Netherlands Hana Shepherd Princeton University WWW.
Getting Started: Research and Literature Reviews An Introduction.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
Personalised Search on the World Wide Web Originally by Micarelli, Gasparetti, Sciarrone & Gauch
Computing Trust in Social Networks
Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
Technologies for The Semantic Web and for The Knowledge Web Enrico Motta Knowledge Media Institute The Open University.
OPAL Conference, August Social Tagging, Folksonomies & Controlled Vocabularies Inviting New Access Systems to our Academic Table Margaret Maurer.
Information Retrieval
Overview of Web Data Mining and Applications Part I
1 Classroom-Based Research: How to Be a Researcher in Your Classroom Basic Skills Initiative Teaching and Learning Workshop October 2009 Darla M. Cooper.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1 August 15th, 2012 BP & IA Team.
Development Principles PHIN advances the use of standard vocabularies by working with Standards Development Organizations to ensure that public health.
HTML Comprehensive Concepts and Techniques Intro Project Introduction to HTML.
World Bank, Africa Region, Africa Household Survey Databank - The World Bank - Africa.
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
SharePoint Users Group Content Classification Step by Step SharePoint 2007 and 2010.
ON THE ROAD TO BUSINESS APPLICATIONS OF SEMANTIC WEB TECHNOLOGY Sematic Web in Business - How to Proceed IASW Kari Oinonen Kiertotie 14.
 Copyright 2006 Digital Enterprise Research Institute. All rights reserved. Collaborative Building of Controlled Vocabularies Crosswalks Mateusz.
Using Taxonomies Effectively in the Organization v. 2.0 KnowledgeNets 2001 Vivian Bliss Microsoft Knowledge Network Group
Feasibility Study of a Wiki Collaboration Platform for Systematic Review Eileen Erinoff AHRQ Annual Meeting September 15, 2009.
Gradual Adaption Model for Estimation of User Information Access Behavior J. Chen, R.Y. Shtykh and Q. Jin Graduate School of Human Sciences, Waseda University,
An Online Knowledge Base for Sustainable Military Facilities & Infrastructure Dr. Annie R. Pearce, Branch Head Sustainable Facilities & Infrastructure.
Lecture 9: Knowledge Discovery Systems Md. Mahbubul Alam, PhD Associate Professor Dept. of AEIS Sher-e-Bangla Agricultural University.
Implicit An Agent-Based Recommendation System for Web Search Presented by Shaun McQuaker Presentation based on paper Implicit:
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Which of the two appears simple to you? 1 2.
Presented by: Apeksha Khabia Guided by: Dr. M. B. Chandak
Leveraging Reusability: Cost-effective Lexical Acquisition for Large-scale Ontology Translation G. Craig Murray et al. COLING 2006 Reporter Yong-Xiang.
1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)
Use of Hierarchical Keywords for Easy Data Management on HUBzero HUBbub Conference 2013 September 6 th, 2013 Gaurav Nanda, Jonathan Tan, Peter Auyeung,
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Using Taxonomies Effectively in the Organization KMWorld 2000 Mike Crandall Microsoft Information Services
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
P2Pedia A Distributed Wiki Network Management and Artificial Intelligence Laboratory Carleton University Presented by: Alexander Craig May 9 th, 2011.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
Center for E-Business Technology Seoul National University Seoul, Korea Social Ranking: Uncovering Relevant Content Using Tag-based Recommender Systems.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
CMPS 435 F08 These slides are designed to accompany Web Engineering: A Practitioner’s Approach (McGraw-Hill 2008) by Roger Pressman and David Lowe, copyright.
Algorithmic Detection of Semantic Similarity WWW 2005.
Software Architecture Evaluation Methodologies Presented By: Anthony Register.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
ELPUB 2010, Helsinki, Finland1 A Collaborative Faceted Categorization System – User Interactions Kurt Maly; Harris Wu; Mohammad Zubair ; Contact:
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Web 2.0: Making the Web Work for You, Illustrated Unit A: Research 2.0.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Your caption here POLYPHONET: An Advanced Social Network Extraction System from the Web Yutaka Matsuo Junichiro Mori Masahiro Hamasaki National Institute.
Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter:
Getting Started: Research and Literature Reviews An Introduction.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of.
Collaborative Filtering - Pooja Hegde. The Problem : OVERLOAD Too much stuff!!!! Too many books! Too many journals! Too many movies! Too much content!
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
an analysis in brazilian digital libraries of theses and dissertations
Unifying a Taxonomy to Reduce Customer Pain with Content Silos
Text & Web Mining 9/22/2018.
DELNET – Developing Library Network
Presentation transcript:

Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the seventeenth conference on Hypertext and hypermedia, August 22-25, 2006, Odense, Denmark

Folksonomy & Collaborative Tagging Defined Folksonomy: “a collaboratively generated, open ended labeling system that enables Internet users to categorize content such as web pages, online paragraphs and Web links” Collaborative Tagging: “Tagging a collection of documents commonly assessable to a large group rather than tagging contents located all over the web, which is instead called social bookmaking.”

Benefits of Collaborative Tagging Reveal an individual’s structural knowledge about documents Knowledge of how concepts in a domain are interrelated Tags codify the knowledge relationships between documents and concepts represented by the tags Tagging is low cost, work is spread over large groups of people not complicated, hierarchical nomenclature to learn. Users tag on the fly in plain language

Benefits of Collaborative Tagging Open ended: respond quickly to to changes and developments in the way users group content Considered democratic meta-data generation Generated by both content authors and users Allows users to search content that the user has tagged using a personal vocabulary Users that share interests share vocabularies, tags made by one user are helpful to another Use low frequency key words not served by controlled vocabulary Provide dynamic hyperlinks between tags, documents and users

Challenges Folksonomies can be seen as an emergent knowledge taxonomy but lack of a hierarchy prevents it from being widely adopted enterprises Suffer from polysemy, words having multiple related meanings and synononymy words that have the same or similar meanings Controlled vocabularies are not vulnerable to this Invite idiosyncratic tagging, which can create meta-noise and decreases the usability of the system

Design Components Community Identification Much research in the WWW community has been dedicated evolving topical communities or users and documents Existing community identification techniques fit into 3 categories Spectral Apply to singular value decomposition to large matrices representing relationships of elements in a large collection Global- attempt to ID all communities in a large collection Bibliometrics Local- identify pair wise affinity among users Network Flow based Hybrid- Can identify broader communities containing a known existing community

Design Components Community Identification The design for this paper uses a spectral design to identify global communities using authorship and usage of tabs and documents Documents, tags and users are all nodes in a network A link is added from each tag to every associated document A link is added from every user to each created tag or accessed

Design Components User and Document Recommendation High quality sources are important for people to be able to find Sources can be documents or people Experts tend to use high-quality documents, and can better associate documents with concepts Existing collaborative tagging systems are limited in identifying experts and quality documents through tallying tags or frequency of usage A Modified version of HITS algorithm is utilized by the authors to obtain expert hubs and high quality documents (authorities) related to a keyword based on usage and tag structure HITS is and algorithm known to be effective in finding high quality sources in hypertext environments

Design Components User and Document Recommendation The base set of documents includes the documents tagged by the keyword The set is expanded to include all tags associated with any documents in the root set, documents under these tags and users who have accessed these tags A link is added from each keyword to every document tagged with that keyword, from each user to every tag they have assigned of used The link structure is captured in matrix A where Ajj shows if there is a link from a node(document, tag, user) Users are sources (nodes with outgoing links only) Documents are sinks (nodes with incoming links only) Hubs calculated are guaranteed to be users and authorities documents

Design Components Ontology Generation An ontology or hierarchy is a useful structure when navigating Ontologies can assist with keyword search An ontology can be used to create a common hierarchy for a large collection of documents A person’s tags represent their structural knowledge about the documents they have viewed, A common hierarchy represents a form of global knowledge about the larger document collections

Design Components Discussion The discussed describes a framework that collects social knowledge from folksonomies Gains social knowledge from associations between tags and documents as well as links and user behavior The end result is a taxonomy of documents rather than a taxonomy of tags Synonymous and polysemy tags don’t present a problem since they change the the associative routes but not the spectral analyses Tags are intermediate objects between documents and users

Evaluation To test how effective this system was compared to alternative choices, used 3 types of evaluation: Offline Studies Paper based questionnaires and interviews Participants tagged set of documents, taxonomies were generated using different techniques and manually The hierarchy produced by author’s technique produced better results according to the participants

Evaluation Test websites Used existing websites with users and documents Added a tagging system and a feedback system to rate a tag as “useful” or “not useful” Based on tags and clicks it showed tags and expert users identified by the system had higher than average user ratings The experts identified in the system also had higher than average scores meaning knowledge users tend to create high quality tags

Evaluation Pilot systems Applied to ARCHON digital library a large knowledge environment For internal validation: Evaluated algorithms against the original data For external validation: We ask human subjects to evaluate results from design solutions through online feedback, questionnaires and interviews To test scalability: Simulate large amounts of user input data To test robustness: Study the impact of statistical sampling and disturbance of the input data

Conclusion Collaborative tagging systems have the potential to become infrastructure for gathering social knowledge.

Questions What situations have you used collaborative tagging, was it value added? Why why not? Collaborative tagging has potential to expose new ways in which people think about ideas and how they related, is there a way to do this with controlled vocabulary? How can folksonomies contribute to well established hierarchies and ontologies?