An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.

Slides:



Advertisements
Similar presentations
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
Advertisements

Improved TF-IDF Ranker
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
OntoBlog: Linking Ontology and Blogs Aman Shakya 1, Vilas Wuwongse 2, Hideaki Takeda 1, Ikki Ohmukai 1 1 National Institute of Informatics, Japan 2 Asian.
A Framework for Ontology-Based Knowledge Management System
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Aki Hecht Seminar in Databases (236826) January 2009
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
WMES3103 : INFORMATION RETRIEVAL
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Semi-Automatic Generation of Mini-Ontologies from Canonicalized Relational Tables Chris Hathaway.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Tag-based Social Interest Discovery
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
User Profiling based on Folksonomy Information in Web 2.0 for Personalized Recommender Systems Huizhi (Elly) Liang Supervisors: Yue Xu, Yuefeng Li, Richi.
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Bug Localization with Machine Learning Techniques Wujie Zheng
1 Ontology-based Semantic Annotatoin of Process Template for Reuse Yun Lin, Darijus Strasunskas Depart. Of Computer and Information Science Norwegian Univ.
Web Usage Mining for Semantic Web Personalization جینی شیره شعاعی زهرا.
Developing Trust Networks based on User Tagging Information for Recommendation Making Touhid Bhuiyan et al. WISE May 2012 SNU IDB Lab. Hyunwoo Kim.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Google’s Deep-Web Crawl By Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Halevy August 30, 2008 Speaker : Sahana Chiwane.
Center for E-Business Technology Seoul National University Seoul, Korea Social Ranking: Uncovering Relevant Content Using Tag-based Recommender Systems.
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
1 CS 430: Information Discovery Lecture 11 Latent Semantic Indexing.
Automated Conceptual Abstraction of Large Diagrams By Daniel Levy and Christina Christodoulakis December 2012 (2 days before the end of the world)
2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
SERVICE ANNOTATION WITH LEXICON-BASED ALIGNMENT Service Ontology Construction Ontology of a given web service, service ontology, is constructed from service.
Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
Neighborhood - based Tag Prediction
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
WSExpress: A QoS-Aware Search Engine for Web Services
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Presentation transcript:

An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009

Contents  Introduction  Related Research  System Architecture  Experimental Evaluation  Conclusion and Future Work 2

Introduction (1/3) 3

Introduction (2/3)  Problem with folksonomies –Tags can be idiosyncratic –Not understood by many users –Concept and internal structure are not explicit to the machine  Various solutions have been proposed –Refine the query result  Clustering, tag cloud –Takes an existing upper ontology as the base structure  WordNet  An integrated approach –Knowledge extracted from folksonomies + relevant terms from an existing upper ontology 4

Introduction (3/3)  Ontological structure extracted from folksonomies can be useful in many areas in CTS –Providing multi-dimensional views –Cataloguing and indexing –Query translation and tagging suggestion  Can enhance the precision and recall –by matching the query keywords and the potential results at the level of semantics 5

Related Research  Cosine similarity between tags –Measure the distance from one tag to another –Organize them into a hierarchical tree  Association rule mining has been adopted to analyze and structure folksonomies –Output of association rule mining on a folksonomy dataset  Association rule like A → B  To discover the relationships within tags in clusters, several existing ontology resources can be used as reference –E.g. WordNet  An et al., “Automatic Generation of Ontology from the Deep Web”  Laniado et al., “Using WordNet to turn a folksonomy into a hierarchy of concepts” 6

System Architecture (1/7)  Vocabularies used in folksonomy –Standard tags: genomics –Compound tags: evolutionary-genomics –Jargon tags: scientometrics, CSCW –Other nonsense tags: misspelling tags 7

System Architecture (2/7) - Low Support Association rule mining  Aim of association rule mining in CTS –Generate associations in the form t a → t c between tags t a and t c that have support and confidence above certain thresholds  Traditional association rule mining –Set a relatively high support and confidence threshold –This is likely to miss important associations among tags  Tags in folksonomies usually follow a Zipf distribution  Majority of the tags do not occur very frequently in the dataset  Low Support Association rule mining –Very low support threshold –Lower support may bring lots of noise in the rule set  Cosine similarity to filter out possible noise 8

System Architecture (3/7) - Low Support Association rule mining  LApriori algorithm (a simplified version of Apriori algorithm) –Only calculate the relationship between tag pairs * Apriori algorithm –Finding frequent itemsets using candidate generation –Find L k-1, the set of frequent (k-1)-itemsets and L k-1 is used to find L k 9

System Architecture (4/7) - Standard Tags  Use WordNet as the upper ontology –Compute each semantic relation between tags in terms of hypernym relation from WordNet –Possible semantic relation  more general( ⊇ ), less general( ⊆ ), equivalence(=)  In folksonomies, another definitions –essential tags: all distinct tags existing in association rules filtered by thresholds –candidate hypernyms: hypernyms that exist in its related tags 10

System Architecture (5/7) - Standard Tags  Folk2Onto algorithm 11 {food, beverage, wine, milk} For tag “wine”, ① U k = {} Candidate hypernym = {food} Then U k = {food} ② U k = {beverage} Candidate hypernym = {food} Then U k = {beverage} – break! ③ U k = {food} Candidate hypernym = {beverage} Then U k = {beverage} food beverage wine milk ① ② ③

System Architecture (6/7) - Compound Tags  Compound Tags are non-standard terms –Cannot be processed by WordNet without transformation  Jawbone (by Mike Wallace) –If they match certain defined criteria, the compound tags will be reserved and represented by its base term for more general parent finding –EndWithFilter  The last one is used to represent the whole compound  collaborative_tagging → tagging –StartsWithFilter  The first token is used to represent the whole word  Apply after the EndWithFilter 12

System Architecture (7/7) - Jargon Tags  Association rules show their relations with other common tags  Jargon tags are incorporated to the previously built ontological structure with a matcher using graph centrality in a similarity graph of tags –Considers each jargon tag as the central node of a subgraph –If there is more than one standard tag associated with the jargon tag  Tag with the highest cosine similarity index will have the priority  “folksonomy” and “tagging, plurality, social, ontology” –“Folksonomy → tagging” was selected (ranked by cosine similarity) 13

Experimental Evaluation (1/6)  Citeulike –Crawling keywords: including “science”, “philosophy”, “research” –30,769 rows of data  Flickr –Crawling keyword: “fruit” –18,555 rows of data  Pre-processing operations were performed to clean up the datasets –For dataset from Flickr, only kept one record for each user –Remove the tags called “no-tag” (a system generated tag for empty tag) –Remove objects with only one tag 14

Experimental Evaluation (2/6)  Threshold of parameters –Minimum support: 0.02% –Minimum confidence: 0.8 –Minimum cosine similarity: 0.2  Get 24,025 rules from citeulike at 0.02% minsup, 0.2 cosine similarity, 0.8 confidence thresholds 15

Experimental Evaluation (3/6)  Measure how far the extracted ontological structure will help to influence and improve the results of certain tasks –Multi-dimensional view, cataloguing and indexing  Multi-dimensional view –Result retrieved with the “fruit” was organized into several dimensions 16

Experimental Evaluation (4/6)  Multi-dimensional view –Comparing structure to an ontology (sei.cmu.edu) 17

Experimental Evaluation (5/6)  Multi-dimensional view –Comparing structure to cluster result (flickr.com) 18

Experimental Evaluation (6/6)  Cataloguing and Indexing –Evaluated the catalogues manually –Observe that compound and jargon terms have been appropriately incorporated  In total, 1540 terms were incorporated into the ontological structure –35.65%: standard terms –64%: non-standard terms (including 36.17% compound and 28.18% jargon terms) 19

Conclusion and Future Work  Mapping terms with WordNet ontology is not enough to find the relationships among them –WordNet does not cover special domain vocabulary and cannot reflect usage change –In CTS, many of the tags are in the form of jargon and compound terms  Applied the association rules to find semantically related tags  Ontological structures could be enriched and deepened using larger tag datasets and more specialized semantic lexical resources  Represent the extracted ontologies in the web using RDF and SPARQL will enable the integration with other web services 20