© Copyright Tucana Technologies, Inc. 2003-2004. All rights reserved. T004v03 Massive Scalability for RDF Storage and Analysis Presented by David Wood,

Slides:



Advertisements
Similar presentations
Technology for the Audit Team Copyright © 2008 ACL Services Ltd. Peter B. Millar Director, Business Development 25 June 2008 ACL AuditExchange 2009.
Advertisements

© Copyright 2007 Exempler Telecom Test Automation System Exempler - We pride ourselves with providing lightweight robust engineering solutions.
From Startup to Enterprise A Story of MySQL Evolution Vidur Apparao, CTO Stephen OSullivan, Manager of Data and Grid Technologies April 2009.
Leveraging Commercial Graph DB Technologies in Open Source and Polyglot Application Environments Brian Clark, VP Product Management Objectivity, Inc.
Data Services for Service Oriented Architecture in Finance D. Britton Johnston Chief Technology Evangelist.
Implementing Tableau Server in an Enterprise Environment
Chapter 1: The Database Environment
Copyright © 2003 Pearson Education, Inc. Slide 8-1 The Web Wizards Guide to PHP by David Lash.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
…to Ontology Repositories Mathieu dAquin Knowledge Media Institute, The Open University From…
The Integration of Biological Data Using Semantic Web Technologies Susie Stephens Principal Product Manager, Life Sciences Oracle
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
© 1998, Progress Software Corporation 1 Migration of a 4GL and Relational Database to Unicode Tex Texin International Product Manager.
18 Copyright © 2005, Oracle. All rights reserved. Distributing Modular Applications: Introduction to Web Services.
11 Copyright © 2005, Oracle. All rights reserved. Creating the Business Tier: Enterprise JavaBeans.
1 Copyright © 2005, Oracle. All rights reserved. Introducing the Java and Oracle Platforms.
17 Copyright © 2005, Oracle. All rights reserved. Deploying Applications by Using Java Web Start.
Copyright CompSci Resources LLC Web-Based XBRL Products from CompSci Resources LLC Virginia, USA. Presentation by: Colm Ó hÁonghusa.
Copyright © 2006 Data Access Technologies, Inc. Open Source eGovernment Reference Architecture Approach to Semantic Interoperability Cory Casanave, President.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Copyright 2006 Digital Enterprise Research Institute. All rights reserved. MarcOnt Initiative Tools for collaborative ontology development.
Enterprise Java and Data Services Designing for Broadly Available Grid Data Access Services.
Beyond Order Orchestration: Plan and Execute with Allocation and Backlog Management Rajat Bhargav, Director, Product Strategy Danny Smith, Executive Director,
Selecting an Advanced Energy Management System May 2007 Chris Greenwell – Director Energy Markets Scott Muench - Manager Technical Sales © 2007 Tridium,
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1 Navigating the Internet of Things Laurent ZELMANOWICZ Vice President EMEA ISV, OEM.
13 Copyright © 2005, Oracle. All rights reserved. Monitoring and Improving Performance.
QA practitioners viewpoint
© 2011 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. Towards a Model-Based Characterization of Data and Services Integration Paul.
© 2010 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. TIBCO Spotfire Application Data Services TIBCO Spotfire European User Conference.
Copyright© 2003 Avaya Inc. All rights reserved Avaya Interactive Dashboard (AID): An Interactive Tool for Mining Avaya Problem Ticket Database Ziyang Wang.
Requirements Engineering for Semantic CMS
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Introduction to Computer Administration Introduction.
XML: Extensible Markup Language
1. 2 Captaris Workflow Microsoft SharePoint User Group 16 May 2006.
4/6/ :35 AM © 2004 Microsoft Corporation. All rights reserved.
Mission Critical Messaging Platform Roni Havas Unified Communications Solution Specialist Specialists Technology Unit – EPG - Microsoft Israel
CA's Management Database (MDB): The EITM Foundation -WO108SN.
Pentaho Open Source BI Goldwin. Pentaho Overview Pentaho is the commercial open source software for Business Pentaho is the commercial open source software.
25 seconds left…...
IT Analytics for Symantec Endpoint Protection
1 PART 1 ILLUSTRATION OF DOCUMENTS  Brief introduction to the documents contained in the envelope  Detailed clarification of the documents content.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1.
Michael Povolotsky CMSC491s/691s. What is Virtuoso? Virtuoso, known as Virtuoso Universal Server, is a multi-protocol RDBMS Includes an object-relational.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
1 Semantic Data Management Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies.
Triple Stores.
Managing & Integrating Enterprise Data with Semantic Technologies Susie Stephens Principal Product Manager, Oracle
Information Integration Intelligence with TopBraid Suite SemTech, San Jose, Holger Knublauch
Overview of SQL Server Alka Arora.
1 Progress Software’s OpenEdge Platform Which database is right for your environment? Simon Epps.
Wikis are websites where pages can be edited using an online document editor. Users can easily edit and share content. Enterprise wikis are platforms.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
Oracle Database 11g Semantics Overview Xavier Lopez, Ph.D., Dir. Of Product Mgt., Spatial & Semantic Technologies Souripriya Das, Ph.D., Consultant Member.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
Triple Stores. What is a triple store? A specialized database for RDF triples Can ingest RDF in a variety of formats Supports a query language – SPARQL.
2 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. BI Publisher: Technology and Architecture.
MarkLogic The Only Enterprise NoSQL Database Presented by: Aashi Rastogi ( ) Sanket Patel ( )
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
The Holmes Platform and Applications
Making the Case for Business Intelligence
Triple Stores.
Open Source distributed document DB for an enterprise
Triple Stores.
Azure Enables Mobility, Easy Sync and Share, and Allows Companies to Retain Data Control MINI-CASE STUDY “Azure provides the full stack of technology that.
Triple Stores.
Presentation transcript:

© Copyright Tucana Technologies, Inc All rights reserved. T004v03 Massive Scalability for RDF Storage and Analysis Presented by David Wood, CTO Tom Adams, Sales Engineer Andrew Newman, Software Engineer Tucana Technologies, Inc. Reston, Virginia USA May 2004

© Copyright Tucana Technologies, Inc All rights reserved.T004v Agenda The Tucana Knowledge Server and Kowari Where we fit Performance metrics & scaling Real-world deployment examples Where are we headed?

© Copyright Tucana Technologies, Inc All rights reserved.T004v Tucana and Kowari The Tucana Knowledge Server is a secure, distributed, scalable, transaction-safe, native RDF database. – Stores, manages and analyzes RDF data – iTQL/RDQL query language support – Single instance scales to 1B triples – Federated query capability available – JRDF & Jena API support – Pluggable data models (full text, RDBMSs, etc) – Commercial (academic licenses available) – 100% Java

© Copyright Tucana Technologies, Inc All rights reserved.T004v Tucana and Kowari Kowari is the Open Source basis of the Tucana Knowledge Server – MPL v1.1 – No security, limited APIs/documentation, no pluggable data models – Limited data types (string, URI, date, datetime, number) – Limited scaling (>10M triples on 32-bit, >50M on 64-bit) – No graph-based analysis algorithm support (graph segment matching) Colophon: Kowari is a small Australian marsupial and Tucana is a constellation in the Southern sky.

© Copyright Tucana Technologies, Inc All rights reserved.T004v Tucana Knowledge Server (TKS) in Enterprise Architecture

© Copyright Tucana Technologies, Inc All rights reserved.T004v Tucana Knowledge Server (TKS) Data Flow & Federation

© Copyright Tucana Technologies, Inc All rights reserved.T004v Tucana System Interfaces Data Sources RDF native Structured data sources (e.g. RDBMS) via importation Metadata from unstructured data sources via entity extractors XML or other tagged formats via XSLT Rich Site Summary (RSS) feeds Access Web services (SOAP, WSDL) COM (ASP, etc.) JavaBean Java APIs JRDF & Jena JSP tag library XSLT Descriptors Query language Command line Web UI RDF/OWL editors/viewers via evolving industry APIs

© Copyright Tucana Technologies, Inc All rights reserved.T004v Tucana Supported Platforms Runs 64- or 32-bit (requires Java 1.4.2) GNU/Linux on Intel or Opteron Sun Solaris on SPARC or Intel (Opteron coming Dec ‘04) Windows on Intel – NT4 – 2000 – XP Note: AIX, HP/UX, Mac OS X operational – Future support on roadmap based upon customer demand

© Copyright Tucana Technologies, Inc All rights reserved.T004v Performance Metrics Read/Write comparisons to RDBMSs when storing RDF Load performance Query execution performance Go triple crazy!

© Copyright Tucana Technologies, Inc All rights reserved.T004v Read/Write Comparison

© Copyright Tucana Technologies, Inc All rights reserved.T004v Load Performance

© Copyright Tucana Technologies, Inc All rights reserved.T004v Query Execution Performance

© Copyright Tucana Technologies, Inc All rights reserved.T004v Go triple crazy! 32 bit: about 100 million statements (using explicit I/O, which is now the default on 32 bit platforms) 64 bit: about a billion statements (using mapped I/O)

© Copyright Tucana Technologies, Inc All rights reserved.T004v Why do we scale? Designed from the ground up to be scalable Optimized for reads/very fast writing Dealing with low level aspects of file system Have lots of room for further speedups – Drop indices, increase triple block size, flatten tree Bottlenecks – Virtual memory limits of OS – Thread stacks – Sharing same area of VM

© Copyright Tucana Technologies, Inc All rights reserved.T004v Real-World Deployment Examples Business Needs Satisfied Enterprise Software Company Automobile Manufacturer Genomics Research Defense Integrator

© Copyright Tucana Technologies, Inc All rights reserved.T004v Business Needs Satisfied Get answers to questions – Inferencing and discovery – Change impact and dependency analysis – Variable views of data elements and their relationships Unify disparate information sources – Metadata repositories – Unstructured information (MSOffice and PDF documents, , content mgt, web pages, RSS sources/news feeds) – Other complex data sources Share and re-use knowledge – Within and between enterprises

© Copyright Tucana Technologies, Inc All rights reserved.T004v Enterprise Software Company Critical Need: Provide automated document routing based on a business-specific ontology. Solution: Classify documents against ontology, store classifications and ontology in the Tucana Knowledge Server and build multiple business applications on top. Result: Standards-based metadata management unifies and delivers change impact analysis across a multi-application distributed, staged, software environment.

© Copyright Tucana Technologies, Inc All rights reserved.T004v Auto Manufacturer Critical Need: Analyze quality test and measurement over time for trends – Relying on entrenched vendor - a Tucana OEM – OEM tried RDBMS – not an option Solution: Embed Tucana Knowledge Server into OEM’s existing product for test and measurement. Result: Enables high value trend analyses that have not been possible before for customer.

© Copyright Tucana Technologies, Inc All rights reserved.T004v Genomics Research Critical Need: Collaborative project with big pharma – Concerned with Oracle flexibility & “schema hell” – Need scalability & secure collaborative environment Solution: Rapidly analyze data in Tucana Knowledge Server using application they co- develop with integration partner. Result: Deliver collaborative research system for use with strategic customer to accelerate joint discovery and competitive advantage to both companies.

© Copyright Tucana Technologies, Inc All rights reserved.T004v Defense Integrator Critical Need: Intel agency overwhelmed with data – Automated analysis to improve decision speed & accuracy. – Proto-type software does not scale / agency requires COTS Solution: Deploy Tucana Knowledge Server with metadata extraction incumbent (SRA NetOwl) and scale to billions of records Result: More automated analysis, faster accurate decisions against large data volumes on scalable COTS platform

© Copyright Tucana Technologies, Inc All rights reserved.T004v Analyze Disparate Data - Now Query Engine RDF Full Text (Lucene) RSS Feeds RDF API

© Copyright Tucana Technologies, Inc All rights reserved.T004v Analyze Disparate Data - Soon XML DB XPath SQL RDBMS Query Engine

© Copyright Tucana Technologies, Inc All rights reserved.T004v Analyze Disparate Data - Soon Single Query Representation Other Data Sources Note: Distributed queries already supported. RDBMS

© Copyright Tucana Technologies, Inc All rights reserved.T004v Thank You David Wood Tom Adams Andrew Newman Tucana Technologies, Inc.