A Snapshot of the OWL Web

Slides:



Advertisements
Similar presentations
Mitsunori Ogihara Center for Computational Science
Advertisements

SPARQL Dimitar Kazakov, with references to material by Noureddin Sadawi ARIN, 2014.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
1 CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS. 2 Introduction - We discuss here two mathematical formalisms which can be used as the basis for stating and.
An Introduction to RDF(S) and a Quick Tour of OWL
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
The National Center for Biomedical Ontology Online Knowledge Resources for the Industrial Age Mark A. Musen Stanford University
Of 27 lecture 7: owl - introduction. of 27 ece 627, winter ‘132 OWL a glimpse OWL – Web Ontology Language describes classes, properties and relations.
OWL TUTORIAL APT CSA 3003 OWL ANNOTATOR Charlie Abela CSAI Department.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
From SHIQ and RDF to OWL: The Making of a Web Ontology Language
Department of Computer Science, University of Maryland, College Park 1 Sharath Srinivas - CMSC 818Z, Spring 2007 Semantic Web and Knowledge Representation.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
1 Semantic Technologies: Diamond in the Rough? Unik Graduate Research Center Dr. Juan Miguel Gomez Universidad Carlos III de Madrid.
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
8/11/2011 Web Ontology Language (OWL) Máster Universitario en Inteligencia Artificial Mikel Egaña Aranguren 3205 Facultad de Informática Universidad Politécnica.
Patient Empowerment for Chronic Diseases System Sifat Islam Graduate Student, Center for Systems Integration, FAU, Copyright © 2011 Center.
A Snapshot of public Web Services Prof: Dr.Jainguo Lu Presenting Group: Aktar-uz-zaman Mohit Sud.
Building an Ontology of Semantic Web Techniques Utilizing RDF Schema and OWL 2.0 in Protégé 4.0 Presented by: Naveed Javed Nimat Umar Syed.
Of 33 lecture 10: ontology – evolution. of 33 ece 720, winter ‘122 ontology evolution introduction - ontologies enable knowledge to be made explicit and.
OWL 2 in use. OWL 2 OWL 2 is a knowledge representation language, designed to formulate, exchange and reason with knowledge about a domain of interest.
ISURF -An Interoperability Service Utility for Collaborative Supply Chain Planning across Multiple Domains Prof. Dr. Asuman Dogac METU-SRDC Turkey METU.
Ontology Repositories: Discussions and Perspectives Mathieu d’Aquin KMi, the Open University, UK
A School of Information Science, Federal University of Minas Gerais, Brazil b Medical University of Graz, Austria, c University Medical Center Freiburg,
Advanced topics in software engineering (Semantic web)
Problems in Semantic Search Krishnamurthy Viswanathan and Varish Mulwad {krishna3, varish1} AT umbc DOT edu 1.
Google’s Deep-Web Crawl By Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Halevy August 30, 2008 Speaker : Sahana Chiwane.
Copyright OpenHelix. No use or reproduction without express written consent1.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Organization of the Lab Three meetings:  today: general introduction, first steps in Protégé OWL  November 19: second part of tutorial  December 3:
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Semantic Publishing Benchmark Task Force Fourth TUC Meeting, Amsterdam, 03 April 2014.
Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.
ELIS – Multimedia Lab PREMIS OWL Sam Coppens Multimedia Lab Department of Electronics and Information Systems Faculty of Engineering Ghent University.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Characterizing Knowledge on the Semantic Web with Watson Mathieu d’Aquin, Claudio Baldassarre, Laurian Gridinoc, Sofia Angeletou, Marta Sabou, Enrico Motta.
Web Ontology Language (OWL). OWL The W3C Web Ontology Language (OWL) is a Semantic Web language designed to represent rich and complex knowledge about.
OWL W3C WORKING GROUP F2F MEETING M. Vacura. OWL 2  OWL 2 extends the OWL 1.0 with a small but useful set of features that have been requested by users,
WonderWeb. Ontology Infrastructure for the Semantic Web. IST WP4: Ontology Engineering Heiner Stuckenschmidt, Michel Klein Vrije Universiteit.
Ontology Technology applied to Catalogues Paul Kopp.
Swoogle: A Semantic Web Search and Metadata Engine Li Ding, Tim Finin, Anupam Joshi, Rong Pan, R. Scott Cost, Yun Peng Pavan Reddivari, Vishal Doshi, Joel.
Aleksandra Pawlik University of Manchester. Something that can be put into a workflow Well described - what the component does Behaves “well” - conforms.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
The Semantic Web By: Maulik Parikh.
The National Center for Biomedical Ontology
BioPortal as (the only functional) OOR SandBox (so far)
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
Semantic Web Foundations
Finding Near-Duplicate Web Pages: A Large-Scale Evaluation of Algorithms By Monika Henzinger Presented.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Statistical Data Analysis
Lecture 2 The Relational Model
Daniel Amyot and Jun Biao Yan
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Ontology.
ece 720 intelligent web: ontology and beyond
NJVR: The NanJing Vocabulary Repository
ece 627 intelligent web: ontology and beyond
An Empirical Study of Property Collocation on Large Scale of Knowledge Base 龚赛赛
Introduction Task: extracting relational facts from text
iSRD Spam Review Detection with Imbalanced Data Distributions
Searching and browsing through fragments of TED Talks
Ontology.
Magnet & /facet Zheng Liang
Statistical Data Analysis
The National Center for Biomedical Ontology
Filtering Properties of Entities By Class
Deep SEARCH 9 A new tool in the box for automatic content classification: DS9 Machine Learning uses Hybrid Semantic AI ConTech November.
MAPO: Mining and Recommending API Usage Patterns
A framework for ontology Learning FROM Big Data
Presentation transcript:

A Snapshot of the OWL Web Nicolas Matentzoglu, Samantha Bail, and Bijan Parsia School of Computer Science, University of Manchester, Manchester, UK

Authors - PhD Student, University of Manchester Bijan Parsia - PhD Student, University of Manchester - Clinical Systems, OWL, Ontologies Samantha Bail - Semantic Web, Ontologies, OWL, Explanation - Senior Lecturer in Computer Science, University of Manchester - Artificial Intelligence, Ontologies, Semantic Web, Knowledge Representation

Introduction Motiation - Testing and evaluation of proposed techniques form an important part. - There are some tools that are specifically tailored towards certain ontology (for example, the Snorocket reasoner2 is aimed at classifying the SNOMED CT ontology). Therefore, a wide variety of suitable ontologies as input for testing purposes and detailed characterisations of real ontologies are required. Work of this paper - survey ontologies as they exist on the web - describe the creation of a corpus of OWL DL ontologies which allows random sampling of ontologies for a variety of empirical applications.

Datesets in practice Curated ontology repositories - NCBO Bio Portal : an open repository of biomedical ontologies which invites submissions from OWL researchers - TONES : part of the TONES project as a means of gathering suitable ontologies for testing OWL applications. TONES can be considered rather static in comparison with frequently updated repositories, such as BioPortal Large-Scale Crawl-Based repositories - Swoogle : a crawl-based semantic web search engine that was established in 2004. (no public API) - Billion Triple Challenge: a search engine which indexes documents based on a web crawler that targets semantic web documents( provide APIs which allow users to retrieve lists of search results for a given keyword ).

Gathering a Corpus of OWL DL Ontologies Data collection: - Using a standard web crawler with a large seed list of URLs obtained from existing repositories and previous crawls. - The seeds are as follows: – 336,414 URLs of potential ontology files obtained directly from a Google search, Swoogle, OBO foundry, Dumontier Labs,9 and the Prot′eg′e Library. – 43,006 URLs obtained from an experimental crawl in 2011. – 413 ontologies downloaded from the BioPortal REST API.

Gathering a Corpus of OWL DL Ontologies Data curation: - Identifying Valid OWL Files 1. Apply syntactic heuristics, like documents which are clearly not OWL or don’t contain OWL declaration 2. Remove byte-identical files 3. Load and save files with the OWL API. 4. Remove duplicates by excluding files that have a byte-identical OWL/XML serialisation - Cluster Detection Two filtering steps based on similar file sizes and file names, and based on the source of the OWL file - OWL DL Filtering 1. missing class- (77.4%), annotation- (67.8%), object property-(34.4%) and data property declarations (15.1%) 2. the use of reserved vocabulary

Gathering a Corpus of OWL DL Ontologies Provenance Data: - Domain Sources: - File Extensions and Syntax: (Interestingly, it appears that only a single file in the corpus had the extension .owx, the recommended extension for OWL/XML serialisations.)

Entity usage - no statement is made about which dataset is ‘better’, as this obviously depends heavily on the purpose. - the collections in this section are largely left untouched and are not curated in the way the Web Crawl was: they may even contain OWL Full and RDFS.

Entity usage Logical Axioms - The majority of ontologies in the crawl-based collections (Crawl and Swoogle) are in the lower two bins of fairly small ontologies

Constructors and Axiom Types In the crawl: - the basic constructors in AL (intersection, universal restrictions, existential restrictions of the type ∃r., and atomic negation) which are used by the majority - property-based constructors, such as inverse properties I (35% of ontologies) and property hierarchies H (30%) - a very small number of ontologies make use of qualified number restrictions Q (5%) and complex property hierarchies R (4%) which might be explained by the fact that they were only introduced with OWL 2.

Constructors and Axiom Types - the most frequently used axiom types in the crawl corpus are AnnotationAssertion and SubClassOf axioms. - Domain and range axioms on object properties’ frequency is roughly pairwise identical across all collections, which may indicate that ontology developers generally add domain and range axioms together when introducing object properties. - While the clear majority of axioms are fairly ‘trivial’ SubClassOf and ClassAssertion axioms, more complex axiom types occur frequently in these OWL ontologies.

Data Types A very small number of built-in datatypes occur frequently in the five collections, whereas the remaining types are only used rarely. The most frequently used datatypes are rdf:plainLiteral and xsd:string datatypes.

OWL Profiles The OWL 2 profiles are relevant for OWL reasoners, which are only compatible with OWL DL ontologies, or may be tailored towards a specific subset of OWL 2 DL. The majority of ontologies in the crawl corpus does not fall into any of the sub-profiles EL, QL, or RL. The Swoogle ontologies are largely in a combination of the RL and QL profiles (due to them being fairly inexpressive), with only a fraction (5.8%) being more expressive.

Overlap Analysis Two ontologies are similar if the overlap (the intersection of the signatures divided by the union of the signatures) is at least 90%. There exists a containment relation between two ontologies O1, O2, if sig(O1) ⊆ sig(O2) or sig(O2) ⊆ (O1). The containment overlap (Con.) between the crawl corpus and the Swoogle sample, which is likely to be caused by the heavy use of Swoogle results as seeds for the web crawler.

Conclusions Presented an overview of the OWL ontology landscape with a focus on the application of different collections for empirical evaluations. Presented an approach to creating a large corpus of OWL DL ontologies suitable for testing and evaluation purposes, characterised the corpus, and compared it to other existing collections of OWL ontologies. The direct comparison of these ontology metrics allows OWL tool developers to make an informed decision when selecting a suitable collection of OWL ontologies for testing purposes. A careful filtering procedure of a crawl-based corpus brings the resulting set closer to curated repositories in terms of ontology size and expressivity.