Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Adam Samuel R. Kate Abadi Marcus Madden MIT Daniel Hurwitz Technion:

Slides:



Advertisements
Similar presentations
1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.
Advertisements

GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
Store RDF Triples In A Scalable Way Liu Long & Liu Chunqiu.
SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,
SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.
Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.
Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics, University of Zurich Summarized by: Arpit Gagneja.
Graph Data Management Lab, School of Computer Scalable SPARQL Querying of Large RDF Graphs Xu Bo
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
IST Databases and DBMSs Todd S. Bacastow January 2005.
Managing & Integrating Enterprise Data with Semantic Technologies Susie Stephens Principal Product Manager, Oracle
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.
Scaling Jena in a commercial environment The Ingenta MetaStore Project Purpose ● Give an example of a big, commercial app using Jena. ● Share experiences.
Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Abadi, Adam Marcus, Samuel R. Madden, Kate Hollenbach VLDB, 2007 Oct 15, 2014.
Course Introduction Introduction to Databases Instructor: Joe Bockhorst University of Wisconsin - Milwaukee.
RDF Storage methods and Systems Nikolaou Charalampos (A.M.: M953)‏ Kotsifakos Alexios (A.M.: M964)‏ Department of Informatics and Telecommunications.
RAJIKA TANDON DATABASES CSE 781 – Database Management Systems Instructor: Dr. A. Goel.
Database System Concepts and Architecture Lecture # 2 21 June 2012 National University of Computer and Emerging Sciences.
Hexastore: Sextuple Indexing for Semantic Web Data Management
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
Ultrawrap: SPARQL Execution on Relational Data Juan F. Sequeda, Daniel P. Miranker University of Texas - Austin ISWC 2009 Seoul National University Internet.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
NoSQL Databases NoSQL Concepts SoftUni Team Technical Trainers Software University
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
PHP and MySQL CS How Web Site Architectures Work  User’s browser sends HTTP request.  The request may be a form where the action is to call PHP.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
Object Persistence Design Chapter 13. Key Definitions Object persistence involves the selection of a storage format and optimization for performance.
C-Store: How Different are Column-Stores and Row-Stores? Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 8, 2009.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Lecture2: Database Environment Prepared by L. Nouf Almujally 1 Ref. Chapter2 Lecture2.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13.
C-Store: Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar 27, 2009.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
C-Store: Data Model and Data Organization Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May 17, 2010.
C-Store: RDF Data Management Using Column Stores Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 24, 2009.
RDF languages and storages part 1 - expressivness Maciej Janik Conrad Ibanez CSCI 8350, Fall 2004.
Indexes and Views Unit 7.
RDF-3X: a RISC-style Engine for RDF Presented by Thomas Neumann, Gerhard Weikum Max-Planck-Institut fur Informatik Saarbrucken, Germany Session 19: System.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Steven Seida How Does an RDF Knowledge Store Compare to an RDBMS?
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Triple Storage. Copyright  2006 by CEBT Triple(RDF) Storages  A triple store is designed to store and retrieve identities that are constructed from.
Relational Algebra p BIT DBMS II.
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
RDF storages and indexes Maciej Janik September 1, 2005 Enterprise Integration – Semantic Web.
RDF languages and storages part 2 - indexing semi-structure data Maciej Janik Conrad Ibanez CSCI 8350, Fall 2004.
Sesame A generic architecture for storing and querying RDF and RDFs Written by Jeen Broekstra, Arjohn Kampman Summarized by Gihyun Gong.
Chapter 04 Semantic Web Application Architecture 23 November 2015 A Team 오혜성, 조형헌, 권윤, 신동준, 이인용.
Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Abadi, Adam Marcus, Samuel R. Madden, Kate Hollenbach David Yona Seminar On.
Oracle Announced New In- Memory Database G1 Emre Eftelioglu, Fen Liu [09/27/13] 1 [1]
1 RDF Storage and Retrieval Systems Jan Pettersen Nytun, UiA.
Databases and DBMSs Todd S. Bacastow January
Physical Changes That Don’t Change the Logical Design
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Middleware independent Information Service
COMP 430 Intro. to Database Systems
Column Stores For Wide and Sparse Data
RDF Stores S. Sakr and G. A. Naymat.
Physical Database Design
Systems Analysis and Design
INFO/CSE 100, Spring 2006 Fluency in Information Technology
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Adam Samuel R. Kate Abadi Marcus Madden MIT Daniel Hurwitz Technion: Israel Institute of Technology Computer Science Dept. Seminar in Databases –

It’s All Semantics, Anyway! “All our work, our whole life is a matter of semantics, …Everything depends on our understanding of them.” – Felix Frankfurter The Semantic Web ▫Tim Berners-Lee: “I have a dream…” ▫Sharing the wealth ▫W3C Technicalities ▫Implementation ▫Access 2

Resource Description Framework(RDF) You’d make a great model Representing information ▫Semantics ▫Graph of resource relations ▫Statements about resources  Subject  Property  Object No storage requirements Remind me how we’re related? 3

RDF Triples Semantic breakdown ▫“Rick Hull wrote Foundations of Databases.” Representation ▫Graph ▫Statement ▫XML format Foundations of DatabasesRick Hull hasAuthor Rick Hull 4

Triples Storage Relational database ▫3-column schema Performance issues ▫Waiting is the rust of the soul ▫One massive triples table ▫Queries require many self-joins SELECT C.obj FROM TRIPLES AS A, TRIPLES AS B, TRIPLES AS C WHERE A.subj = B.subj AND B.subj = C.subj AND A.prop = ‘copyright’ AND A.obj = “2001” AND B.prop = ‘author’ AND B.obj = “Fox, Joe” AND C.prop = ‘title’ 5

Getting Down to Business I.Current State of the Art RDF in RDBMs II.A Simple Alternative Vertically partitioned approach III.Benchmarks Querying the candidates IV.Evaluation Storage requirements and implementation V.Results 6

Current State of the Art Majority use RDBMs Multi-layered architecture Querying: SPARQL converted to SQL RDF layer RDBM Result SetSQL query SPARQL queryRDF in XML/Graph SELECT ?title FROM table WHERE { ?book author “Fox, Joe” ?book copyright “2001” ?book title ?title } SELECT C.obj FROM TRIPLES AS A, TRIPLES AS B, TRIPLES AS C WHERE A.subj = B.subj AND B.subj = C.subj AND A.prop = ‘copyright’ AND A.obj = “2001” AND B.prop = ‘author’ AND B.obj = “Fox, Joe” AND C.prop = ‘title’ 7

Property Table Technique Goal: speed up queries over triple-stores Idea: cluster triples containing properties defined over similar subjects ▫Example: “title”, “author”, “copyright”  Books, journals, CDs, etc. Reduces number of self-joins 8

Clustered Property Table 9

Property-Class Table 10

Property Tables: Issues NULLs Multi-valued attributes Proliferation of unions and joins 11 Rick Hull hasAuthor John Green hasAuthor Foundations of Databases

Property Tables Summary The Good ▫Reduce subject-subject self-joins The Bad ▫Sluggish on cross-table joins ▫How do we cluster property tables? 12

Getting Down to Business I.Current State of the Art RDF in RDBMs II.A Simple Alternative Vertically partitioned approach III.Benchmarks Querying the candidates IV.Evaluation Storage requirements and implementation V.Results 13

Vertically Partitioned Approach Goal: speed up queries over triples-store Idea: one table per property ▫Column 1: Subjects ▫Column 2: Objects 14

Vertically Partitioned Approach 15

Vertically Partitioned Approach: Advantages Support for multi-valued attributes Support for heterogeneous records 16

Vertically Partitioned Approach: Advantages Access requested properties only No need for clustering algorithms Less is more: fewer and faster joins 17

Vertically Partitioned Approach: Disadvantages More joins than property tables ▫Multi-property queries – merge joins Slower insertions into tables ▫Multiple-table access for same-subject statements ▫Solution: batch insertions Standard DBMSs not optimal for this approach 18

DB Orientation: Column vs Row Row-Oriented DBMS Column-Oriented ID1, “XYZ”ID2, “ABC” ID3, “MNO” ID4, “DEF” ID5, “GHI” … DBMS Memory File ID1, ID2, ID3, ID4, ID5 “XYZ”, “ABC”, “MNO”, “DEF”, “GHI” … DBMS Memory File 19

Jargon for the Noggin’ Tuple Tuple metadata ▫Timestamp ▫Number of attributes ▫NULL flags 20

Column-Oriented DBMS + Only relevant columns are retrieved - Slower insertions Advantages for Vertical Partitioning: ▫Separate tuple metadata ▫Fixed-length tuples ▫Column-oriented data compression ▫Optimized merge code 21

Materialized Path Expressions Problem: for a path of length n properties ▫n-1 subject-object joins required E.g. Find books whose authors were born in 1860 BooksAuthors“1860” hasAuthor wasBorn 22

Materialized Path Expressions Goal: eliminate joins across multiple tables How: Combine property paths into a single table SELECT B.subj FROM triples AS A, triples AS B WHERE A.prop = wasBorn AND A.obj = “1860” AND A.subj = B.obj AND B.prop = “Author” SELECT A.subj FROM predtable AS A, WHERE A.author:wasBorn = “1860” BooksAuthors“1860” hasAuthor wasBorn hasAuthor:wasBorn 23

Materialized Path Expressions: Breakdown Precalculate path expression ▫No join at query time ▫Easy implementation in vertically partitioned schema  Simply add table “hasAuthor:wasBorn”  Property Table Technique: Add column “hasAuthor:wasBorn” Added cost: recalculating after insertions 24

Getting Down to Business I.Current State of the Art RDF in RDBMs II.A Simple Alternative Vertically partitioned approach III.Benchmarks Querying the candidates IV.Evaluation Storage requirements and implementation V.Results 25

Benchmark: Dataset Barton Libraries ▫50 million triples  77% multi-valued ▫221 unique properties  37% multi-valued ▫Good representation of Semantic Web data RDF/XML converted into triples 26

Benchmark: Longwell GUI for exploring RDF data User applies filters to property panels Longwell-style queries provide realistic benchmark for testing 27

Benchmark: Longwell GUI 28

Benchmark: Longwell queries 7 queries were chosen Each query represents typical browsing session ▫Exercises on query diversity 29

Getting Down to Business I.Current State of the Art RDF in RDBMs II.A Simple Alternative Vertically partitioned approach III.Benchmarks Querying the candidates IV.Evaluation Storage requirements and implementation V.Results 30

Evaluation: Schema Implementations Performance comparison of all 3 schemas 1.Triple Store 2.Property Table Store 3.Vertically Partitioned Store A.Row-oriented (Postgres) B.Column-oriented (C-Store) 31

Evaluation: Size Matters Memory usage per implementation 1.Triple Store GBytes 2.Property Table store - 14 GBytes 3.Vertically Partitioned Store (Postgres) GBytes 4.Vertically Partitioned Store (C-Store) GBytes 32

Getting Down to Business I.Current State of the Art RDF in RDBMs II.A Simple Alternative Vertically partitioned approach III.Benchmarks Querying the candidates IV.Evaluation Storage requirements and implementation V.Results 33

Results 34

Scalability How does performance scale with size of data? Increased number of triples from 1 million to 50 million. 35

Results: Scalability Vertical partitioning schemes scale linearly Triple-store scales super-linearly ▫Prevalent sorting operations 36

Results: Materialized Path Expressions 37

Results: Further Widening 38

Summary Semantic Web users require fast responses to queries Current triple-stores just don’t cut it ▫Can’t stand up to sluggish self-joins Property tables are good, but have their limitations Vertical partitioning takes the cake ▫Competes with optimal performance of property table solution ▫Step toward an interactive-time Semantic Web 39

Thank you! Questions? 40