Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved. T004v03 Massive Scalability for RDF Storage and Analysis Presented by David Wood,

Similar presentations


Presentation on theme: "© Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved. T004v03 Massive Scalability for RDF Storage and Analysis Presented by David Wood,"— Presentation transcript:

1 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved. T004v03 Massive Scalability for RDF Storage and Analysis Presented by David Wood, CTO Tom Adams, Sales Engineer Andrew Newman, Software Engineer Tucana Technologies, Inc. Reston, Virginia USA May 2004

2 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved.T004v03 - 2 Agenda The Tucana Knowledge Server and Kowari Where we fit Performance metrics & scaling Real-world deployment examples Where are we headed?

3 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved.T004v03 - 3 Tucana and Kowari The Tucana Knowledge Server is a secure, distributed, scalable, transaction-safe, native RDF database. – Stores, manages and analyzes RDF data – iTQL/RDQL query language support – Single instance scales to 1B triples – Federated query capability available – JRDF & Jena API support – Pluggable data models (full text, RDBMSs, etc) – Commercial (academic licenses available) – 100% Java 1.4.2 http://www.tucanatech.com/

4 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved.T004v03 - 4 Tucana and Kowari Kowari is the Open Source basis of the Tucana Knowledge Server – MPL v1.1 – No security, limited APIs/documentation, no pluggable data models – Limited data types (string, URI, date, datetime, number) – Limited scaling (>10M triples on 32-bit, >50M on 64-bit) – No graph-based analysis algorithm support (graph segment matching) http://www.kowari.org/ Colophon: Kowari is a small Australian marsupial and Tucana is a constellation in the Southern sky.

5 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved.T004v03 - 5 Tucana Knowledge Server (TKS) in Enterprise Architecture

6 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved.T004v03 - 6 Tucana Knowledge Server (TKS) Data Flow & Federation

7 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved.T004v03 - 7 Tucana System Interfaces Data Sources RDF native Structured data sources (e.g. RDBMS) via importation Metadata from unstructured data sources via entity extractors XML or other tagged formats via XSLT Rich Site Summary (RSS) feeds Access Web services (SOAP, WSDL) COM (ASP, etc.) JavaBean Java APIs JRDF & Jena JSP tag library XSLT Descriptors Query language Command line Web UI RDF/OWL editors/viewers via evolving industry APIs

8 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved.T004v03 - 8 Tucana Supported Platforms Runs 64- or 32-bit (requires Java 1.4.2) GNU/Linux on Intel or Opteron Sun Solaris on SPARC or Intel (Opteron coming Dec ‘04) Windows on Intel – NT4 – 2000 – XP Note: AIX, HP/UX, Mac OS X operational – Future support on roadmap based upon customer demand

9 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved.T004v03 - 9 Performance Metrics Read/Write comparisons to RDBMSs when storing RDF Load performance Query execution performance Go triple crazy!

10 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved.T004v03 - 10 Read/Write Comparison

11 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved.T004v03 - 11 Load Performance

12 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved.T004v03 - 12 Query Execution Performance

13 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved.T004v03 - 13 Go triple crazy! 32 bit: about 100 million statements (using explicit I/O, which is now the default on 32 bit platforms) 64 bit: about a billion statements (using mapped I/O)

14 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved.T004v03 - 14 Why do we scale? Designed from the ground up to be scalable Optimized for reads/very fast writing Dealing with low level aspects of file system Have lots of room for further speedups – Drop indices, increase triple block size, flatten tree Bottlenecks – Virtual memory limits of OS – Thread stacks – Sharing same area of VM

15 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved.T004v03 - 15 Real-World Deployment Examples Business Needs Satisfied Enterprise Software Company Automobile Manufacturer Genomics Research Defense Integrator

16 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved.T004v03 - 16 Business Needs Satisfied Get answers to questions – Inferencing and discovery – Change impact and dependency analysis – Variable views of data elements and their relationships Unify disparate information sources – Metadata repositories – Unstructured information (MSOffice and PDF documents, email, content mgt, web pages, RSS sources/news feeds) – Other complex data sources Share and re-use knowledge – Within and between enterprises

17 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved.T004v03 - 17 Enterprise Software Company Critical Need: Provide automated document routing based on a business-specific ontology. Solution: Classify documents against ontology, store classifications and ontology in the Tucana Knowledge Server and build multiple business applications on top. Result: Standards-based metadata management unifies and delivers change impact analysis across a multi-application distributed, staged, software environment.

18 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved.T004v03 - 18 Auto Manufacturer Critical Need: Analyze quality test and measurement over time for trends – Relying on entrenched vendor - a Tucana OEM – OEM tried RDBMS – not an option Solution: Embed Tucana Knowledge Server into OEM’s existing product for test and measurement. Result: Enables high value trend analyses that have not been possible before for customer.

19 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved.T004v03 - 19 Genomics Research Critical Need: Collaborative project with big pharma – Concerned with Oracle flexibility & “schema hell” – Need scalability & secure collaborative environment Solution: Rapidly analyze data in Tucana Knowledge Server using application they co- develop with integration partner. Result: Deliver collaborative research system for use with strategic customer to accelerate joint discovery and competitive advantage to both companies.

20 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved.T004v03 - 20 Defense Integrator Critical Need: Intel agency overwhelmed with data – Automated analysis to improve decision speed & accuracy. – Proto-type software does not scale / agency requires COTS Solution: Deploy Tucana Knowledge Server with metadata extraction incumbent (SRA NetOwl) and scale to billions of records Result: More automated analysis, faster accurate decisions against large data volumes on scalable COTS platform

21 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved.T004v03 - 21 Analyze Disparate Data - Now Query Engine RDF Full Text (Lucene) RSS Feeds RDF API

22 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved.T004v03 - 22 Analyze Disparate Data - Soon XML DB XPath SQL RDBMS Query Engine

23 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved.T004v03 - 23 Analyze Disparate Data - Soon Single Query Representation Other Data Sources Note: Distributed queries already supported. RDBMS

24 © Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved.T004v03 - 24 Thank You David Wood (david@tucanatech.com) Tom Adams (tom@tucanatech.com) Andrew Newman (andrew@tucanatech.com) Tucana Technologies, Inc. http://www.tucanatech.com/ http://www.kowari.org/


Download ppt "© Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved. T004v03 Massive Scalability for RDF Storage and Analysis Presented by David Wood,"

Similar presentations


Ads by Google