Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.

Slides:



Advertisements
Similar presentations
Query Classification Using Asymmetrical Learning Zheng Zhu Birkbeck College, University of London.
Advertisements

Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.
Linked data: P redicting missing properties Klemen Simonic, Jan Rupnik, Primoz Skraba {klemen.simonic, jan.rupnik,
Large-Scale Entity-Based Online Social Network Profile Linkage.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Social networks, in the form of bibliographies and citations, have long been an integral part of the scientific process. We examine how to leverage the.
SEMANTIC WEB INITIATIVES – MAKING LINKED OPEN DATA REAL EMMANUELLE DELMAS-GLASS November 21, 2014.
Linked-data Architecture Payam Barnaghi Centre for Communication Systems Research University of Surrey FIA Budapest Linked data session Budapest, May 2010.
Bring Order to Your Photos: Event-Driven Classification of Flickr Images Based on Social Knowledge Date: 2011/11/21 Source: Claudiu S. Firan (CIKM’10)
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Collective Collaborative Tagging System Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana.
Linked Data as an enabler of cross-media and multilingual content analytics for enterprises across Europe A.Gómez-Pérez (UPM) Project Coordinator.
Linked Sensor Data Harshal Patni, Cory Henson, Amit P. Sheth Ohio Center of Excellence in Knowledge enabled Computing (Kno.e.sis) Wright State University,
1 Integrating User Feedback Log into Relevance Feedback by Coupled SVM for Content-Based Image Retrieval 9-April, 2005 Steven C. H. Hoi *, Michael R. Lyu.
Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
Towards Semantic Web: An Attribute- Driven Algorithm to Identifying an Ontology Associated with a Given Web Page Dan Su Department of Computer Science.
The Social Web: A laboratory for studying s ocial networks, tagging and beyond Kristina Lerman USC Information Sciences Institute.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
NON-FUNCTIONAL PROPERTIES IN SOFTWARE PRODUCT LINES: A FRAMEWORK FOR DEVELOPING QUALITY-CENTRIC SOFTWARE PRODUCTS May Mahdi Noorian
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
Data Sets, Vocabularies and Tools Pablo N. Mendes Freie Universität Berlin 1st year review Luxembourg, December /02/11.
SVMLight SVMLight is an implementation of Support Vector Machine (SVM) in C. Download source from :
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Tag-based Social Interest Discovery
Classifying Tags Using Open Content Resources Simon Overell, Borkur Sigurbjornsson & Roelof van Zwol WSDM ‘09.
 Copyright 2006 Digital Enterprise Research Institute. All rights reserved. Collaborative Building of Controlled Vocabularies Crosswalks Mateusz.
Automated Patent Classification By Yu Hu. Class 706 Subclass 12.
Entity Recognition via Querying DBpedia ElShaimaa Ali.
An Example of Course Project Face Identification.
2014-May-07. What is the problem? What have others done? What is our solution? Does it work? Outline 2.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.
Boris Villazón-Terrazas, Ghislain Atemezing FI, UPM, EURECOM, Introduction to Linked Data.
Kickoff Meeting Opinion profile construction from Social Media. A case study of restaurant reviews Funded By Cogito Foundation Hatem Ghorbel ISIC-HE-Arc.
Math Information Retrieval Zhao Jin. Zhao Jin. Math Information Retrieval Examples: –Looking for formulas –Collect teaching resources –Keeping updated.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
BEHAVIORAL TARGETING IN ON-LINE ADVERTISING: AN EMPIRICAL STUDY AUTHORS: JOANNA JAWORSKA MARCIN SYDOW IN DEFENSE: XILING SUN & ARINDAM PAUL.
Facilitating Document Annotation using Content and Querying Value.
Sentiment Analysis with Incremental Human-in-the-Loop Learning and Lexical Resource Customization Shubhanshu Mishra 1, Jana Diesner 1, Jason Byrne 2, Elizabeth.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Using linked data to interpret tables Varish Mulwad September 14,
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Psychiatric document retrieval using a discourse-aware model Presenter : Wu, Jia-Hao Authors : Liang-Chih.
+ Karin Becker Instituto de Informática - Federal University of Rio Grande do Sul, Brazil Shiva Jahangiri, Craig A. Knoblock Information Sciences Institute,
Text Based Similarity Metrics and Delta for Semantic Web Graphs Krishnamurthy Koduvayur Viswanathan Monday, June 28,
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research,
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Musical Genre Categorization Using Support Vector Machines Shu Wang.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
© Copyright 2015 STI INNSBRUCK PlanetData D2.7 Recommendations for contextual data publishing Ioan Toma.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Facilitating Document Annotation Using Content and Querying Value.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Linked Open Data Dataset from Related Documents Petya Osenova and Kiril Simov IICT-BAS LDL-2016, LREC, Portoroz.
Linked Data Profiling Andrejs Abele UNLP PhD Day Supervisor: Paul Buitelaar.
Web Page Classifiers Inmaculada Hernández. Roadmap Introduction Classifiers Taxonomy Evaluation Conclusions & Future Work.
PROFILING USERS BY ESTIMATING COMPOSITE AND MULTI-VALUED ATTRIBUTES FROM BIG DATA SOURCES FOR SOCIAL STATISTICS PURPOSES NTTS 2017, Brussels, March.
Doron Goldfarb & Yann LE FRANC
The experiments based on word-embedding and SVM
Extracting Why Text Segment from Web Based on Grammar-gram
Linked Data Ryan McAlister.
Presentation transcript:

Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar

Overview  Terminology  Motivation  My approach  Evaluation  Conclusion  Future work

 Linked Data is about using the Web to connect related data that was not previously linked.  Resource Description Framework is represented by sets of subject-predicate-object triples, where the elements may be URIs, literals foaf:name “Andrejs Ābele”  Linked Open Data Cloud is a collection of Linked Data resources that are open and freely available Terminology

Linked Open Data Cloud Diagram  Publications  Life Sciences  Cross-Domain  Social Networking  Geographic  Government  Media  User-Generated Content  Linguistics

Motivation  Linked Data is hard to understand for humans  Only a small number of datasets provide a human readable overview or comprehensive metadata  When adding a new dataset to the LOD cloud, connections have to be identified to as many other relevant LOD datasets as possible  LOD Cloud Diagram relays on human classification

Existing solutions for LD profiling [1] [2] [3] [4] [5] [6]  Loupe 1  ProLOD++ 2  LOD Laundromat 3  LODStat 4  Aether 5  RDF-stats 6

Domain identification method using DBpedia Topic Extraction Domain Identification Domain

Input : Bio2RDF-sgd Description: The Saccharomyces Genome Database (SGD) collects and organizes information about the molecular biology and genetics of the yeast Saccharomyces cerevisiae 1.Most frequent terms (sgd_vocabulary, query, proper, phenotype, experiment) 2.Literal containing one of the terms ("protein 3.Identify DBpedia concept ( 4.Identify Category ( 5.Identify domain under which category fits best (Biology =>Life Sciences) Example

Datasets LOD cloud datasets (annotated in LOD Cloud Diagram) 405 datasets, 9 domains Media (13) Linguistics(34) Publications (111) Social Networking (41) Geography (29) Government (65) Cross Domain (25) User Generated (52) Life Sciences (35)

1.Extract URIs of properties and classes from datasets 2.Use classes and properties as features 3.Classify using Support Vector Machine classifier 4.Use Precision and Recall as metrics Extended baseline Enrich the data with human annotated tags from Linked Open Vocabularies Baseline approach

Precision and Recall for different domains using SVM

Correctly Classified Instances

Conclusion Does not require training Works with new and customized vocabularies Works only if datasets contain literals Can not identify User-Generated Content and Cross-Domain Using just classes and properties is hard to improve results above 75%

Future Work Evaluate alternative classification algorithms Use Literals and URIs for classification Classify datasets in more specific subdomains

Thank you!