Presentation is loading. Please wait.

Presentation is loading. Please wait.

What’s New in Oracle Database 12c Graph Database

Similar presentations


Presentation on theme: "What’s New in Oracle Database 12c Graph Database"— Presentation transcript:

1 What’s New in Oracle Database 12c Graph Database
Xavier Lopez, Ph.D. Senior Director Zhe Wu, Ph.D. Architect

2 Agenda Graph Database Strategy Customer Use Cases Oracle Spatial and Graph RDF Graph Features Future Plans

3 Graph Database Strategy
Support Graph Data Types… …On all enterprise platforms Oracle Database Oracle NoSQL Database Oracle Big Data Appliance Oracle Cloud

4 What Sets Us Apart? Scalability: Trillions of triples
Transactional: Concurrent loading and updates with ACID properties Security: OLS security labels at “triple” level (OLS). Standards based: W3C Manageable: Use existing DB tools, utilities and expertise Multi-type support: graph, relational, search, geospatial … Multi-platform: Relational database, NoSQL, Hadoop 4

5 RDF Graph v. Property Graph
Property Graphs RDF Semantic Graphs Use Case: Social network analysis Analytics: Clustering, centrality, page rank, path finding Analytics Execution In-memory, In-database Use Case: Linked data, semantic metadata layer Analytics: pattern matching, Inferencing Analytics Execution In-database

6 RDF Semantic Graph feature of Oracle Spatial and Graph
For Oracle Database 12c

7 Two Application Use Cases
Linked Data Entity Analytics Find related content & relations by navigating connected entities “Reason” across entities Find related content & relations by navigating connected entities “Reason” across entities SPARQL pattern matching Detecting related entities across large, sparse, disparate collections of data Inferencing: Applying rules on asserted data Unified metadata model for distributed data sources Flexible model for sparse and evolving data Validate semantic and structural consistency

8 Linked Data in Support of Distributed Data
Mid-Tier Server Application 2 Application 3 Graph-based Metadata Layer Application 1 SPARQL W3C standard, flexible model for sparse and evolving data Common vocabulary enables data integration & app development Relational data stays in place, apps don’t need to change Shared Ontologies RDF Graph SQL Inventory Graph Sales Graph HR Schema Inventory Schema Sales Schema HR Database Inventory Database Sales Database Database Server

9 Linked Data in Enterprise
Access & Presentation Layer Index Semantic Graph model Data Servers Event Server Hadoop Appliance Content Mgmt BI Server Data Warehouse Data Sources / Types Human Sourced Information Machine Generated Data Social Media Subscription Services Transaction Systems

10 Linked Data / Enterprise Metadata
Industries Life Sciences Finance Media Networks & Communications Defense & Intelligence Police Hutchinson 3G Austria

11 Novartis Institutes for BioMedical Research (NIBR)
Business Challenge Link database information on genes, proteins, metabolic pathways, compounds, ligands, etc. to original sources. Increase productivity for accessing, sharing, searching, navigating, cross-linking, analyzing internal /external data The Novartis Institutes for BioMedical Research (NIBR) is the global pharmaceutical research organization for Novartis committed to discovering innovative medicines to treat diseases with high unmet medical need With more than 6,000 scientists, physicians and business professionals around the world, we have an open, entrepreneurial and innovative culture that encourages collaboration as we work to push the boundaries of science to change the practice of medicine Solution Semantic integration layer on RDF graph Rich domain-specific terminology (biology, chemistry and medicine) 1.6 M terms Terminology Hub: 8 GB of referential data that cross-references between data repositories.

12 RDF Semantic Graph-based Applications
Linked Data Entity Analytics Find related content & relations by navigating connected entities “Reason” across entities Find related content & relations by navigating connected entities “Reason” across entities SPARQL pattern matching Detecting related entities across large, sparse, disparate collections of data Inferencing: Applying rules on asserted data Unified metadata model for distributed data sources Flexible model for sparse and evolving data Validate semantic and structural consistency

13 Knowledge Management in Intelligence Domain
Extracted Entities & Relationships Information Extraction Feature Extraction, Term Extraction Person: Abduwali Abdukhadir Muse Nationality: Somalian Country: UK Group: Al Shabab Ideology: Islamist Person: ? Nationality: Pakistani Country: Pakistan Group: ? Person: Chehab Abdouljamid Bouyaly Country: Morocco Group: al Qaeda Currently resides Member of Supports Link ? Has Search, Presentation, Report, Visualization, Query Financial Data Telephone Records RDF Intelligence Ontologies Internet Traffic SQL/SPARQL Enterprise Data Spatial images Documents Data Sources Contents Repository Databases Web resources Blogs, Mails, news, RSS feeds National Intelligence Scenario

14 RDF Semantic Graph Features
Oracle Spatial and Graph RDF Semantic Graph Features

15 Oracle Database 12c RDF Semantic Graph Database
Exadata ready Compression & partitioning Parallel load, inference, query High availability Label security: triple-level W3C standards compliance Semantic Indexing of text Enterprise Manager Native RDF graph data store Manages billions of triples Optimized storage architecture Load / Storage SPARQL-Jena/Joseki, Sesame SQL/graph query, B-tree indexing Ontology assisted SQL query Query So let’s briefly cover the main features of Oracle Spatial and Graph option that make these kind of solutions possible. First, on the left side of this slide you’ll observe the three key services provided by the RDF graph database: storage, query and inferencing. These are three requirements for an RDF graph database. First, we must manage large amounts of graph data – in many cases into the 100s of billions of triples. Second, we support the SPARQL query language used to query RDF data. Third, we provide a native reasoning engine – in the database. But set’s Oracle apart from competing RDF databases is the scalability and performance of the database when handling this type of graph data. And the reason is that we can leverage all of the database services, that are highlighted on the left side of the slide. Capabilities like: Exadata, compression, partitioning, parallell load, query and inference. And finally, for manageability, we leverage Data Guard, enterprise manager and other important Oracle database utilities that well known and understood by developers. RDFS, OWL2 RL, EL, SKOS User-defined rules Incremental, parallel reasoning User-defined inferencing Plug-in architecture Reasoning Semantic indexing framework Integration with OBIEE, Oracle R Enterprise Oracle Data Mining Analytics

16 Support for Apache Jena and OpenRDF Sesame
Leverage existing investments in open source frameworks Provides application developers with: Easy-to-use Java APIs to access Oracle databases and RDF files A standard-compliant SPARQL web service endpoint (Joseki, Fuseki) Data loading (RDF/XML, N-TRIPLES, N-QUADS, TriG ,Turtle) JSON output Oracle-specific extensions for query execution control and management

17 Relational to RDF Mapping
RDB to RDF Mapping RDF views on relational tables Enables SPARQL query on distributed resources Views: Automatic and custom Aligns with W3C RDB2RDF standard No duplication of data and storage RDF Views on Relational tables Semantic graph queries provide a powerful capability to discover relationships through pattern matching on RDF graph data. RDF views extend semantic discovery to relational tables without requiring the conversion of relational tables to RDF triples. This removes the need to duplicate data and the associated storage previously required to perform RDF graph queries on relational data sets. Semantic graph queries and RDF views can also be used to enable data integration and discovery within and across relational schema and RDF graphs in Oracle Database. This simplifies semantic discovery workflows. RDF views on relational tables, views, and SQL query results are supported. The W3C specifications for automatic mapping (called Direct Mapping), custom mapping (using the W3C R2RML language) and materializing views are supported. RDF views present relational data in RDF triple format so it can be queried using SPARQL and connected with other linked data and RDF graphs to relate and facilitate enterprise data integration.

18 Oracle Label Security Data Classification
Fine grained security through integration with Oracle Label Security Model level security through GRANT/REVOKE privileges Oracle Label Security - mandatory access control Labels assigned to both users and data Data labels determine the sensitivity of the rows or the rights a person must posses in order to read or write the data. User labels indicate their access rights to the data records. 18

19 Core Inferencing Features
Forward-chaining based inference engine in the database Native rulebases: RDFS, OWL 2 RL, OWL 2 EL, SKOS Validation of inferred data Proof generation User defined inferencing - Temporal reasoning, Spatial reasoning Ladder Based Inference - Fine grained security for inference graph Integration with external OWL 2 reasoners (TrOWL, Pellet)

20 RDF Semantic Graph: Graph Visualization & Modeling Support
Semantic Modeling Cytoscape Protégé Oracle Confidential – Internal/Restricted/Highly Restricted 20

21 Analyzing RDF with Oracle BI and Oracle Advanced Analytics

22 Oracle Partner Tools: (IO Informatics)

23 Oracle Partner Tools: Tom Sawyer Social Network Analysis

24 Manageability of RDF Semantic Graph
Built in support from Oracle Database utilities and tools Ingest / Replicate / Recover Tune / Analyze Manage Bulk load: Apache Jena bulk loader Oracle external tables & SQL*Loader (Direct Path) w/ PL/SQL Bulk Load API Replicate & recover: Data Guard: physical standby Data Pump: staging tables Recovery Manager: RMAN Tune load/ query/ inference: Parallelism Btree indexing triple/quad Typed literals indexing SPARQL query hints Statistics gathering Dynamic Sampling Analyze performance: Enterprise Manager: view optimizer plans, monitor execution / resource usage Control query execution: in database & Jena client Create & monitor graph w/ SQL Developer: Semantic Network Models, virtual models Btree indexes Rule bases Entailments Security data labels Semantic index policies Bulk Load/Replicate/Recover Apache Jena: load 100,000s RDF/OWL data files – Turtle, RDF/XML, N-Triples, RDF/JSON, TriG, N-Quads - into Oracle Database using prepareBulk / completeBulk methods in OracleBulkUpdateHandler Java class External Tables / SQL*Loader: load N-TRIPLE / N-QUAD input files into a staging table and loaded into the graph w/ syntax correctness & duplicates checking and default index creation (PSCGM) Data Guard: physical standby block level replication Data Pump: Import/Export can move staging tables created from files or results of SPARQL queries on the database between databases Recovery Manager: RMAN backups & recovery operate at the tablespace/datafile/database level & therefore support RDF data Tune/Analyze Parallelism is supported for load, query and inference Btree indexing enables non-unique indexing in any combination of subject-predicate-object-named graph- model Typed literals indexing: index XML, Text, & Spatial OGC Well Known Text literal types to filter SPARQL queries SPARQL query hints for index choice, type (hash/nested loop) & order of join, order of query base graph pattern evaluation, etc. Statistics: Oracle Database optimizer analysis of statistics critical to performance of SPARQL queries & OWL inference Dynamic sampling: may produce optimal execution plans over static sampling given inherent flexibility of the RDF data model Enterprise Manager: Analyze optimizer query plan & performance of execution / resource usage for loading, querying, entailments Control query execution: Timeout, abort, SPARQL2SQL translation, hints in SPARQL syntax, property path, result cache, mid-tier cache, user-defined functions Create & monitor graph: use SQL Developer to run RDF Semantic Graph Pl/SQL APIs & issue SQL queries to view views describing graph components

25 Open Geospatial Consortium: GeoSPARQL Support
Defines a Vocabulary for Spatial Query Patterns Classes Spatial Object, Feature, Geometry Properties Topological relations Links between features and geometries Datatypes for geometry literals ogc:wktLiteral, ogc:gmlLiteral Query Functions Topological relations, distance, buffer, intersection, …

26 Graph Support on Oracle NoSQL DB
Brings horizontal scalability to RDF graph applications RDF Graph for Oracle NoSQL RDF Graph support in Oracle NoSQL Database Enterprise Edition High performance Key Value store SPARQL 1.1 access to graph data Jena & Joseki SPARQL Web Services Massive horizontal scalability Support for World Wide Web Consortium (W3C) Semantic Web standards

27 When to Consider a NoSQL Database for Graphs
Horizontal scalability, low query latency/cost, ease of install & management RDF Graph for Oracle NoSQL High volume, simple queries (low latency) Queries aggregating over most of the graph (e.g. what are the hobbies of the most popular people in the network) Frequent, large-scale updates Large Data Centers

28 Quick Steps to Get Started

29 Quick Steps to Get Started
Using SQL/PLSQL APIs exec create_sem_model insert/delete triples, bulk load, run SEM_MATCH, create_entailment, … Initialize Creating a tablespace ‘ts’ Run as SYS in SQL*Plus exec sem_apis.create_sem_network(‘ts’) Run as SYS (for only) in SQL*Plus exec mdsys.enableGeoRaster; Install Oracle Database 12c or Use a Prebuilt VM from OTN Using Java APIs Load/Query/Inference through GraphOracleSem, DatasetGraphOracleSem, OracleBulkUpdateHandler, … Configure Joseki/Fuseki web service endpoint SPARQL Query SPARQL Update REST APIs

30 Quick Steps to Get Started
Using SQL/PLSQL APIs exec create_sem_model insert/delete triples, bulk load, run SEM_MATCH, create_entailment, … Initialize Creating a tablespace ‘ts’ Run as SYS in SQL*Plus exec sem_apis.create_sem_network(‘ts’) Run as SYS (for only) in SQL*Plus exec mdsys.enableGeoRaster; Install Oracle Database 12c or Use a Prebuilt VM from OTN Using Java APIs Load/Query/Inference through GraphOracleSem, DatasetGraphOracleSem, OracleBulkUpdateHandler, … Configure Joseki/Fuseki web service endpoint SPARQL Query SPARQL Update REST APIs

31 Quick Steps to Get Started
Using SQL/PLSQL APIs exec create_sem_model insert/delete triples, bulk load, run SEM_MATCH, create_entailment, … Initialize Creating a tablespace ‘ts’ Run as SYS in SQL*Plus exec sem_apis.create_sem_network(‘ts’) Run as SYS (for only) in SQL*Plus exec mdsys.enableGeoRaster; Install Oracle Database 12c or Use a Prebuilt VM from OTN Using Java APIs Load/Query/Inference through GraphOracleSem, DatasetGraphOracleSem, OracleBulkUpdateHandler, … Configure Joseki/Fuseki web service endpoint SPARQL Query SPARQL Update REST APIs

32 Performance

33 Oracle Spatial and Graph - LUBM 200K on 3-Node RAC X2-4 Load, Inference and Query Performance
The LUBM 200K Graph has 48+ Billion triples (edges) Original graph has 26.6 Billion unique triples (quads) Inference produced another 21.4 Billion triples Data Loading Performance Triples Loaded and Indexed Per Second (TLIPS): 273K Inference Performance Triples Inferred and Indexed Per Second (TIIPS): 327K SPARQL Query Performance Query Results Per Second (QRPS): K 48+ Billion edges graph Setup: Hardware: Sun Server X2-4, 3-node RAC - Each node configured with 1TB RAM, 4 CPU 2.4GHz 10-Core Intel E7-4870) - Storage: Dual Node 7420, both heads configured as: Sun ZFS Storage 7420  4 CPU 2.00GHz 8-Core (Intel E7-4820) 256G Memory 4x SSD SATA2 512G (READZ) 2x SATA 500G 10K. Four disk trays with 20 x 900GB 4x SSD 73GB (WRITEZ) Software: Oracle Database , SGA_TARGET=750G and PGA_AGGREGATE_TARGET=200G Note: Only one node in this RAC was used for performance test. Test performed in April 2013.

34 Degrees of Parallelism
Oracle Spatial and Graph – LUBM 4400K on Exadata X4-2 Load, Inference and Query Performance Degrees of Parallelism Data set Load (B triples/hr) OWL Inference Query (B answers/hr) 256* LUBM 4400K 605.4B / 115.2hrs B / 86hrs 30m 92.5B / 22.5 hrs Data Loading Performance Triples Loaded and Indexed Per Second (TLIPS): 1.420M Inference Performance Triples Inferred and Indexed Per Second (TIIPS): M SPARQL Query Performance Query Results Per Second (QRPS): M 1.08 Trillion edges graph Setup: Open cursors = 1000 Processes = 1000 SGA = 132GB, PGA = 100GB 32K blocksize was given to all graph tablespaces TEMP group was created with 3 bigfile tablespaces Test performed in Aug/Sept 2014. Exadata X4-2 High capacity full rack ZS3-2 with 2 controllers, 8 trays of disk Eight compute nodes of Exadata Oracle DB standard install of Exadata * A mix of DOP used: 296, 256, 192 Oracle Confidential – Internal

35 Best Practices in Solving Performance Issues
When there is an underperforming SQL in RDF data loading, inference, or query operations, check: Have you gathered statistics? APIs: export_model_stats,export_entailment_stats, export _network_stats, import_model_stats, import_entailment_stats, import_network_stats Have you tried parallel execution? Balanced hardware is key. Have you tried dynamic sampling? (Level 6, 8, 11) Is there a lack of indexes (including text index)? DO NOT just add indexes without careful & thorough testing

36 Best Practices in Solving Performance Issues (2)
When there is an underperforming SQL in RDF data loading, inference, or query operations, check: Have you looked at the plan? Is it possible to write the same query in a different way? Is it possible to simplify? Simpler queries  Better chance to find more efficient ways to execute Tweak plan through hints Send a small, reproducible test case with the execution plan to Oracle Support or post it on the Forum

37 Best Practices in Solving Performance Issues (3)
Find the top thread(s) in Java VM Are there excessive GC activities? Try –XX:+UseParallelGC, -XX:+UseConcMarkSweepGC, … Has the heap size been set properly? Try larger heap size, analyze heap by performing a heap dump Send a small, reproducible test case with the thread dump to Oracle Support or post it on the Forum

38 Cool Ongoing Activities:
Enable Oracle Cloud Services: Oracle Social Network Integration with Oracle business applications and middleware Ongoing support for RDF Graph on all major platforms Relational Database NoSQL Database Big Data (Hadoop) Cloud

39

40 Appendix

41 W3C Semantic Technology Stack
Core Technologies URI Uniform resource identifier RDF Resource description framework RDFS RDF Schema OWL Web ontology language RIF and SPARQL are also standards 41

42 Subject Predicate Object
What is RDF A graph data model for web resources and their relationships The graph can be serialized into - RDF/XML, N3, N-TRIPLE, … Construction unit: Triple (or assertion, or fact) <http://foobar> <:produces> <:mp3> Quads (named graphs) add context, provenance, identification, etc. to assertions <http://foobar> <:produces> <:mp3 > <:ProductGraph> “CA” Subject Predicate Object


Download ppt "What’s New in Oracle Database 12c Graph Database"

Similar presentations


Ads by Google