A Scalable RDBMS-Based Inference Engine for RDFS/OWL Oracle New England Development Center 1.

Slides:

Advertisements

Similar presentations

Using the SQL Access Advisor

Advertisements

1 IDX. 2 What you will learn: What IDX is Why its important How to use it Tips and tricks Introduction Q & A.

Info to Enterprise Migration Implementation Case Study: SBC Corporation Presented to the Crystal Decisions Regional Users Group for the Bay Area on October.

1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.

Zhongxing Telecom Pakistan (Pvt.) Ltd

1 Building a Fast, Virtualized Data Plane with Programmable Hardware Bilal Anwer Nick Feamster.

Exploiting ebXML Registry Semantics in the eHealth Domain*

1 Hyades Command Routing Message flow and data translation.

11gR1 OWLPrime Oracle New England Development Center Zhe Wu Ph.D. Consultant Member of Technical Staff Dec

12 Copyright © 2005, Oracle. All rights reserved. Query Rewrite.

18 Copyright © 2005, Oracle. All rights reserved. Distributing Modular Applications: Introduction to Web Services.

11 Copyright © 2005, Oracle. All rights reserved. Creating the Business Tier: Enterprise JavaBeans.

1 Copyright © 2005, Oracle. All rights reserved. Introducing the Java and Oracle Platforms.

17 Copyright © 2005, Oracle. All rights reserved. Deploying Applications by Using Java Web Start.

Universität Innsbruck Leopold Franzens Copyright 2006 DERI Innsbruck LarCK Workshop, ISWC/ASWC Busan, Korea 16-Feb-14 Towards Scalable.

AIFB Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 1 Mind the Web! Valentin Zacharias, Andreas Abecker, Imen.

Chapter 6 File Systems 6.1 Files 6.2 Directories

1 Chapter 12 File Management Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,

Copyright 2006 Digital Enterprise Research Institute. All rights reserved. MarcOnt Initiative Tools for collaborative ontology development.

1 DTI/EPSRC 7 th June 2005 Reacting to HCI Devices: Initial Work Using Resource Ontologies with RAVE Dr. Ian Grimstead Richard Potter BSc(Hons)

1/ 26 AGROVOC and the OWL Web Ontology Language: the Agriculture Ontology Service - Concept Server OWL model NKOS workshop Alicante,

George Anadiotis, Spyros Kotoulas and Ronny Siebes VU University Amsterdam.

Database Systems: Design, Implementation, and Management

Introduction Lesson 1 Microsoft Office 2010 and the Internet

Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,

Configuration management

Suite Suite 2 TPF Software – Overview Binary Editor Remote Scripts zTREX Add-Ins & Project Integration with Source Control Manager.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide

Database Performance Tuning and Query Optimization

13 Copyright © 2005, Oracle. All rights reserved. Monitoring and Improving Performance.

Database Performance Tuning and Query Optimization

© Tally Solutions Pvt. Ltd. All Rights Reserved 1 Housekeeping in Shoper 9 POS February 2010.

State of Connecticut Core-CT Project Query 8 hrs Updated 6/06/2006.

1 tRelational/DPS Overview. 2 ADABAS Data Transfer: business needs and issues tRelational & DPS Overview Summary Questions? Demo Agenda.

Chapter 10: Virtual Memory

1 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. An Introduction to Data.

4 Oracle Data Integrator First Project – Simple Transformations: One source, one target 3-1.

Database System Concepts and Architecture

IPM THEORY CHALLENGE QUIZ NUMBER 1. Q1 - We are able to place organisations into which of the following categories based on their prime purpose A.Profit.

IPM THEORY CHALLENGE QUIZ NUMBER 3 Unit 3 Outcome 2.

© 2012 National Heart Foundation of Australia. Slide 2.

Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Introduction to Computer Administration Introduction.

Chapter 9: The Client/Server Database Environment

Executional Architecture

Node Lessons Learned James Hudson Wisconsin Department of Natural Resources.

© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Use Case: Populating Business Objects.

Unit Test Practice Expressions & Equations Part 1 Unit Part 1 Unit.

The DDS Benchmarking Environment James Edmondson Vanderbilt University Nashville, TN.

Chapter 13 The Data Warehouse

Installing Windows XP Professional Using Attended Installation Slide 1 of 30Session 8 Ver. 1.0 CompTIA A+ Certification: A Comprehensive Approach for all.

WaveMaker Visual AJAX Studio 4.0 Training

Analyzing Minerva1 AUTORI: Antonello Ercoli Alessandro Pezzullo CORSO: Seminari di Ingegneria del SW DOCENTE: Prof. Giuseppe De Giacomo.

The Design and Implementation of Minimal RDFS Backward Reasoning in 4store Manuel Salvadores, Gianluca Correndo, Steve Harris, Nick Gibbins, and Nigel.

Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.

Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz.

RDF Triple Stores Nipun Bhatia Department of Computer Science. Stanford University.

8 Apr, 2005 OWLIM - OWL DLP support within Sesame Damyan Ognyanov Ontotext Lab, Sirma AI.

Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.

RDF languages and storages part 1 - expressivness Maciej Janik Conrad Ibanez CSCI 8350, Fall 2004.

Triple Stores. What is a triple store? A specialized database for RDF triples Can ingest RDF in a variety of formats Supports a query language – SPARQL.

Sesame A generic architecture for storing and querying RDF and RDFs Written by Jeen Broekstra, Arjohn Kampman Summarized by Gihyun Gong.

Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server

OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.

Physical Database Design and Performance

Presentation transcript:

A Scalable RDBMS-Based Inference Engine for RDFS/OWL Oracle New England Development Center 1

Agenda Background 10gR2 RDF 11g RDF/OWL 11g OWL support RDFS++, OWLSIF, OWLPrime Inference design & implementation in RDBMS Performance Completeness evaluation through queries Future work 2

Storage Use DMLs to insert triples incrementally insert into rdf_data values (…, sdo_rdf_triple_s(1,,, )); Use Fast Batch Loader with a Java interface Inference (forward chaining based) Support RDFS inference Support User-Defined rules PL/SQL API create_rules_index Query using SDO_RDF_MATCH Select x, y from table(sdo_rdf_match( (?x rdf:type :Protein) (?x :name ?y) ….)); Seamless SQL integration Shipped in 2005 Oracle 10gR2 RDF Oracle Database 3

Oracle 11g RDF/OWL New features Bulk loader Native OWL inference support (with optional proof generation) Semantic operators Performance improvement Much faster compared to 10gR2 Loading Query Inference Shipped (Linux platform) in 2007 Java API support (forthcoming) Jena & Sesame 4

Oracle 11g OWL is a scalable, efficient, forward- chaining based reasoner that supports an expressive subset of OWL-DL 5

Why? Why inside RDBMS? Size of semantic data grows really fast. RDBMS has transaction, recovery, replication, security, … RDBMS is efficient in processing queries. Why OWL-DL? It is a widely adopted ontology language standard. Why OWL-DL subset? Have to support large ontologies (with large ABox) Hundreds of millions of triples and beyond No existing reasoner handles complete DL semantics at this scale Neither Pellet nor KAON2 can handle LUBM10 or ST ontologies on a setup of 64 Bit machine, 4GB Heap ¹ Why forward chaining? Efficient query support Can accommodate any graph query patterns 1 The summary Abox: Cutting Ontologies Down to Size. ISWC

OWL Subsets Supported Three subsets for different applications RDFS++ RDFS plus owl:sameAs and owl:InverseFunctionalProperty OWLSIF (OWL with IF semantics) Based on Dr. Horsts pD* vocabulary¹ OWLPrime rdfs:subClassOf, subPropertyOf, domain, range owl:TransitiveProperty, SymmetricProperty, FunctionalProperty, InverseFunctionalProperty, owl:inverseOf, sameAs, differentFrom owl:disjointWith, complementOf, owl:hasValue, allValuesFrom, someValuesFrom owl:equivalentClass, equivalentProperty Jointly determined with domain experts, customers and partners 1 Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary OWL DL OWL Lite OWLPrime 7

RDFS has 14 entailment rules defined in the SPEC. E.g. rule : aaa rdfs:domain XXX. uuu aaa yyy. uuu rdf:type XXX. OWLPrime has 50+ entailment rules. E.g. rule : aaa owl:inverseOf bbb. bbb rdfs:subPropertyOf ccc. ccc owl:inverseOf ddd. aaa rdfs:subPropertyOf ddd. xxx owl:disjointWith yyy. a rdf:type xxx. b rdf:type yyy. a owl:differentFrom b. Semantics Characterized by Entailment Rules These rules have efficient implementations in RDBMS 8

One very heavily used space is that where RDFS plus some minimal OWL is used to enhance data mapping or to develop simple schemas. -James Hendler ¹ Complexity distribution of existing ontologies ² Out of 1,200+ real-world OWL ontologies Collected using Swoogle, Google, Protégé OWL Library, DAML ontology library … 43.7% (or 556) ontologies are RDFS 30.7% (or 391) ontologies are OWL Lite 20.7% (or 264) ontologies are OWL DL. Remaining OWL FULL Applications of Partial DL Semantics A Survey of the web ontology landscape. ISWC

Option1: add user-defined rules Both 10gR2 RDF and 11g RDF/OWL supports user-defined rules in this form (filter is supported) E.g. to support core semantics of owl:intersectionOf chair Support Semantics beyond OWLPrime (1) Antecedents Consequents ?x :parentOf ?y. ?z :brotherOf ?x. ?z :uncleOf ?y 1. :FemaleAstronaut rdfs:subClassOf :Female 2. :FemaleAstronaut rdfs:subClassOf :Astronaut 3.?x rdf:type :Female. ?x rdf:type :Astronaut. x rdf:type :FemaleAstronaut 10 (updated: typo above has been corrected after the talk)

Option2: Separation in TBox and ABox reasoning TBox tends to be small in size Generate a class subsumption tree using complete DL reasoners (like Pellet, KAON2, Fact++, Racer, etc) ABox can be arbitrarily large Use Oracle OWL to infer new knowledge based on the class subsumption tree from TBox Support Semantics beyond OWLPrime (2) TBo x TBox & Complete class tree ABox DL reasoner OWL Prime 11

11g OWL Inference PL/SQL API SEM_APIS.CREATE_ENTAILMENT( Index_name sem_models(GraphTBox, GraphABox, …), sem_rulebases(OWLPrime), passes, Inf_components, Options ) Use PROOF=T to generate inference proof SEM_APIS.VALIDATE_ENTAILMENT( sem_models((GraphTBox, GraphABox, …), sem_rulebases(OWLPrime), Criteria, Max_conflicts, Options ) Above APIs can be invoked from Java clients through JDBC Typical Usage: First load RDF/OWL data Call create_entailment to generate inferred graph Query both original graph and inferred data Inferred graph contains only new triples! Saves time & resources Typical Usage: First load RDF/OWL data Call create_entailment to generate inferred graph Call validate_entailment to find inconsistencies 12

Advanced Options Give users more control over inference process Selective inference (component based) Allows more focused inference. E.g. give me only the subClassOf hierarchy. Set number of passes Normally, inference continue till no further new triples found Users can set the number of inference passes to see if what they are interested has already been inferred E.g. I want to know whether this person has more than 10 friends Set tablespaces used, parallel index build Change statistics collection scheme 13

11g OWL Usage Example Create an application table create table app_table(triple sdo_rdf_triple_s); Create a semantic model exec sem_apis.create_sem_model(family, app_table,triple); Load data Use DML, Bulk loader, or Batch loader insert into app_table (triple) values(1, sdo_rdf_triple_s(family',,, )); … Run inference exec sem_apis.create_entailment(family_idx,sem_models(family), sem_rulebases(owlprime)); Query both original model and inferred data select p, o from table(sem_match('( ?p ?o)', sem_models(family'), sem_rulebases(owlprime), null, null)); 14 After inference is done, what will happen if - New assertions are added to the graph Inferred data becomes incomplete. Existing inferred data will be reused if create_entailment API invoked again. Faster than rebuild. - Existing assertions are removed from the graph Inferred data becomes invalid. Existing inferred data will not be reused if the create_entailment API is invoked again.

Separate TBox and ABox Reasoning Utilize Pellet and Oracles implementation of Jena Graph API Create a Jena Graph with Oracle backend Create a PelletInfGraph on top of it PelletInfGraph.getDeductionsGraph Issues encountered: no subsumption for anonymous classes from Pellet inference. Similar approach applies to Racer Pro, KAON2, Fact, etc. through DIG Solution: create intermediate named classes 15

Soundness Soundness of 11g OWL verified through Comparison with other well-tested reasoners Proof generation A proof of an assertion consists of a rule (name), and a set of assertions which together deduce that assertion. Option PROOF=T instructs 11g OWL to generate proof TripleID1 : Address rdf:type owl:InverseFunctionaProperty. TripleID2 :John : Address :John_at_yahoo_dot_com. TripleID3 :Johnny : Address :John_at_yahoo_dot_com. :John owl:sameAs :Johnny (proof := TripleID1, TripleID2, TripleID3, IFP) 16

Design & Implementation 17

Design Flow Extract rules Each rule implemented individually using SQL Optimization SQL Tuning Rule dependency analysis Dynamic statistics collection Benchmarking LUBM UniProt Randomly generated test cases TIP Avoid incremental index maintenance Partition data to cut cost Maintain up-to-date statistics 18

Background- Storage scheme Two major tables for storing graph data VALUES table stores mapping from URI (etc) to integers IdTriplesTable stores basically SID, PID, OID Execution Flow Inference Start Check/Fire Rule 1 Check/Fire Rule 2 … … Check/Fire Rule n Insert New triples? Y Create N 5 Copy Exchange Table … …. … Exchange Partition IdTriplesTable Partition for a semantic model New Partition for inferred graph 6 Un-indexed, Partitioned Temporary Table SID PID OID … …. … VALUE ID IdTriplesTable SID PID OID … … … 19

Performance Evaluation 20

Database Setup Linux based commodity PC (1 CPU, 3GHz, 2GB RAM) Database installed on machine semperf3 semperf3 semperf1 semperf2 Giga-bit Network Database 11g Two other PCs are just serving storage over network 21

Machine/Database Configuration NFS configuration rw,noatime,bg,intr,hard,timeo=600,wsize=32768,rsize=32768,tcp Hard disks: 320GB SATA 7200RPM (much slower than RAID). Two on each PC Database (11g release on Linux 32bit platform) Parameter Value Description db_block_size 8192 size of a database block memory_target 1408M memory area for a server process + memory area for storing data and control information for the database instance workarea_size_policy auto enables automatic sizing of areas used by memory intensive processes statistics_level TYPICAL enables collection of statistics for database self management 22

Tablespace Configuration Created bigfile (temporary) tablespaces LOG files located on semperf3 diskA TablespaceMachineDiskComment USER_TBSsemperf2 diskA for storing users application table. It is only used during data loading. Not relevant for inference. Temporary Tablespace semperf1diskB Oracles temporary tablespace is for intermediate stages of SQL execution. UNDOsemperf2diskB for undo segment storage SEM_TSsemperf3diskB for storing graph triples 23

Inference Performance Ontology (size) RDFSOWLPrimeOWLPrime + Pellet on TBox #Triples inferred (millions) Time#Triples inferred (millions) Time#Triples inferred (millions) Time LUBM million min 14s3.058m 01s3.258min 21s LUBM million h 19min61.257hr 0min65.257h 12m UniProt 20 million 3.424min 06s50.83hr 1minNA Results collected on a single CPU PC (3GHz), 2GB RAM (1.4G dedicate to DB), Multiple Disks over NFS 2.52k triples/s 6.49k triples/s As a reference (not a comparison) BigOWLIM loads, inferences, and stores (2GB RAM, P4 3.0GHz, java -Xmx1600) - LUBM50 in 26 minutes ¹ - LUBM1000 in 11h 20min ¹ Note: Our inference time does not include loading time! Also, set of rules is different. 1 From OWLIM Pragmatic OWL Semantic Repository slides, Sept

Query Answering After Inference LUBM ontology has intersectionOf, Restriction etc. that are not supported by OWLPrime Ontology LUBM million & 3+ million inferred LUBM Benchmark Queries Q1Q2Q3Q4Q5Q6Q7 OWLPrime # answers Complete? YYYYYNN OWLPrime + Pellet on TBox # answers Complete? YYYYYYY 25

Query Answering After Inference (2) Ontology LUBM million & 3+ million inferred LUBM Benchmark Queries Q8Q9Q10Q11Q12Q13Q14 OWLPrime # answers Complete? NNNYNYY OWLPrime + Pellet on TBox # answers Complete? YYYYYYY 26

Query Answering After Inference (3) Ontology LUBM million & 60+ million inferred LUBM Benchmark Queries Q1Q2Q3Q4Q5Q6Q7 OWLPrime # answers Complete? Y Unknown YYYNN OWLPrime + Pellet on TBox # answers Complete? Y Unknown YYY Y 27

Query Answering After Inference (4) Ontology LUBM million & 60+ million inferred LUBM Benchmark Queries Q8Q9Q10Q11Q12Q13Q14 OWLPrime # answers Complete? NNNYN Unknown OWLPrime + Pellet on TBox # answers Complete? Y Unknown YYY 28

Implement more rules to cover even richer DL semantics Further improve inference performance Seek a standardization of the set of rules. To promote interoperability among vendors Look into schemes that cut the size of ABox Look into incremental maintenance Future Work 29 richer semantics Scale

For More Information or semantic technologies 30

Appendix 31