Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

1 ICS-FORTH EU-NSF Semantic Web Workshop 3-5 Oct Christophides Vassilis Database Technology for the Semantic Web Vassilis Christophides Dimitris Plexousakis.
Benchmarking traversal operations over graph databases Marek Ciglan 1, Alex Averbuch 2 and Ladialav Hluchý 1 1 Institute of Informatics, Slovak Academy.
Evaluating “find a path” reachability queries P. Bouros 1, T. Dalamagas 2, S.Skiadopoulos 3, T. Sellis 1,2 1 National Technical University of Athens 2.
Outline  Introduction  Background  Distributed DBMS Architecture  Distributed Database Design  Semantic Data Control ➠ View Management ➠ Data Security.
1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.
Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao 1 1 gStore: Answering SPARQL Queries Via Subgraph Matching 1 Peking University, 2 Hong.
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
Knowledge Graph: Connecting Big Data Semantics
Designing Indexing Structure for Discovering Relationships in RDF Graphs Stanislav Bartoň.
Building and Analyzing Social Networks Web Data and Semantics in Social Network Applications Dr. Bhavani Thuraisingham February 15, 2013.
Ameet N Chitnis, Abir Qasem and Jeff Heflin 11 November 2007.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 The Enhanced Entity- Relationship (EER) Model.
Chapter 14 (Web): Object-Oriented Data Modeling
Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.
Time-Variant Spatial Network Model Vijay Gandhi, Betsy George (Group : G04) Group Project Overview of Database Research Fall 2006.
Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics, University of Zurich Summarized by: Arpit Gagneja.
1 Chapter 2 Reviewing Tables and Queries. 2 Chapter Objectives Identify the steps required to develop an Access application Specify the characteristics.
Summary Graphs for Relational Database Schemas Xiaoyan Yang (NUS) Cecilia M. Procopiuc, Divesh Srivastava (AT&T)
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Abadi, Adam Marcus, Samuel R. Madden, Kate Hollenbach VLDB, 2007 Oct 15, 2014.
ASP.NET Programming with C# and SQL Server First Edition
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
DAY 14: ACCESS CHAPTER 1 Tazin Afrin October 03,
Hexastore: Sextuple Indexing for Semantic Web Data Management
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Copyright 2002 Prentice-Hall, Inc. Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich Chapter 20 Object-Oriented.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
IDB, SNU Dong-Hyuk Im Efficient Computing Deltas between RDF Models using RDFS Entailment Rules (working title)
GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao {
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
A Logic Programming Approach to Scientific Workflow Provenance Querying* Shiyong Lu Department of Computer Science Wayne State University, Detroit, MI.
Clustering XML Documents for Query Performance Enhancement Wang Lian.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
C-Store: RDF Data Management Using Column Stores Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 24, 2009.
SAGA: Array Storage as a DB with Support for Structural Aggregations SSDBM 2014 June 30 th, Aalborg, Denmark 1 Yi Wang, Arnab Nandi, Gagan Agrawal The.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
GStore: Answering SPARQL Queries Via Subgraph Matching Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao Peking University, 2 Hong.
An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006.
XML and Database.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
RDF-3X: a RISC-style Engine for RDF Presented by Thomas Neumann, Gerhard Weikum Max-Planck-Institut fur Informatik Saarbrucken, Germany Session 19: System.
RDFPath: Path Query Processing on Large RDF Graph with MapReduce Martin Przyjaciel-Zablocki et al. University of Freiburg ESWC May 2013 SNU IDB.
Graph Data Management Lab, School of Computer Science Branch Code: A Labeling Scheme for Efficient Query Answering on Tree
Melbourne, Australia, Oct., 2015 gSparsify: Graph Motif Based Sparsification for Graph Clustering Peixiang Zhao Department of Computer Science Florida.
Efficient Discovery of XML Data Redundancies Cong Yu and H. V. Jagadish University of Michigan, Ann Arbor - VLDB 2006, Seoul, Korea September 12 th, 2006.
2) Database System Concepts and Architecture. Slide 2- 2 Outline Data Models and Their Categories Schemas, Instances, and States Three-Schema Architecture.
Author: Akiyoshi Matonoy, Toshiyuki Amagasay, Masatoshi Yoshikawaz, Shunsuke Uemuray.
LECTURE TWO Introduction to Databases: Data models Relational database concepts Introduction to DDL & DML.
SQL Basics Review Reviewing what we’ve learned so far…….
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
Probabilistic Data Management
XML-Based RDF Data Management for Efficient Query Processing
Column Stores For Wide and Sparse Data
UMBC AN HONORS UNIVERSITY IN MARYLAND
RDF Stores S. Sakr and G. A. Naymat.
Lec 3: Object-Oriented Data Modeling
Big Data Analytics: Exploring Graphs with Optimized SQL Queries
Accelerating Regular Path Queries using FPGA
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar

Outline Motivation Background and related work Problem statement Our contributions Assumptions Experimental process Results Conclusions

Motivation Semantic Web libraries scientific databases industry social networks Computer-to- computer communication

RDF Schema Schema Instance

RDF Schema RDF Triples

Related Work Triple store Property tables Class property tables Dynamic table model Vertically partitioned tables (Abadi, et al 2007)‏ Path based approach (Matono, et al 2005) Require more self joins, normal joins, NULL value storage

Vertical Partitioning A table is created for each property First SubjectObject 'r1''Picasso' 'r4''August' Last SubjectObject 'r1''Picasso' 'r4''Rodin' Paints SubjectObject 'r1''r2' 'r1''r3'... etc.

Path-based Model Path signatures relate to instance data Path pathidpathexp 1'' 2'#first' 3'#last' 4'#paints' 5'#title<#paints' 6'#sculpts' 7'#title<#sculpts' Resource namepathidroot 'r1'1'r1' 'r2'4'r1' 'r3'4'r1' 'r4'1'r4' 'Picasso'2'r1' 'Pablo'3'r1' 'August'2'r4' 'Rodin'3'r4'... Our enhancement

Problem Statement Given: A set of RDF triples Vertical partitioning storage model Path-based storage model Find: Query plans for the various categories of queries under these two storage schemes. Objective: To determine query types that perform comparatively better or worse in two storage models Why is this challenging? Need for efficient storage of structured data Different application domains use RDF, generic storage schemes should support a diverse workload.

Contributions Identification of benchmark queries schema, instance, path, and aggregate queries Enhancement to the path-based schema that addresses different types of workloads Comparison of path-based model and vertical partitioning Analysis of cyclic queries

Query Types Schema queries find all types of artists list all property names list nodes with 2 or more descendants. find the transitive sub-classes of a class 'sculpture' list properties with 2 or more descendants Instance queries find the titles of all paintings by Picasso select all nodes within one edge-length of R4 list all the properties of node r4 Schema vs Instance Path Non-path Aggregate Cycle Relationship DiameterConstraints intermediate node terminal node Connection List

Query Types Path queries find the title of any painting painted by anyone display all the titles of work done by artists find the names of all the sculptors...with constraint on intermediate node find an artist's name where the artifact is a painting...with terminal node constraints display all the titles of work done by Picasso

Query Types Path queries connection queries list all the properties of node r4 is there a connection between 'Picasso' and 'Guernica'? diameter queries select all nodes in the graph within one edge- length of R4 non-simple path queries detect loops in the dataset starting at 'Picasso' detect loops in the whole dataset

Query Types Aggregate queries find all nodes with 2 or more properties list all subjects that have two instances of a single property Relationship queries find any relationship between r1 and r4

Assumptions Using a small dataset, with the assumption that number of joins and efficiency of the queries will not change significantly with larger datasets No explicit storage of the RDF schema in the vertically- partitioned scheme (application independent)‏ INSERT, UPDATE, & DELETE are insignificant compared to SELECT Key nodes in the path-based model are well-defined In practice, key nodes, would be generated dynamically after user load analysis

Experimental Process Validation parameters Nodes Edges Number of joins Number of tables CPU cost Storage bytes Setup both schemes in Oracle 10g for the RDF graph shown earlier Materialized path lengths in path-based scheme Generated query plans Analyzed queries based on the validation parameters Cycle queries – joins are not supported

Dataset used for experiment

* For CPU cost and bytes (storage) the entry in the table indicates which scheme used less CPU cycles or occupied less space. In cases where both required an identical or similar amount of computation or storage, we indicate this with “same”. Queries which cannot be answered are indicated by ‘--‘. Experimental Results

Conclusions & Observations Vertical Partitioning performs well for Short path length, terminal node constraints. Offers storage benefits for instance queries without path expressions. Enhanced Path Based model performs well for Schema queries, path queries, cycle queries Queries which the original path-based could not address and the enhanced model could answer: Connection queries and diameter queries Path queries with intermediate node constraints

Conclusion (Cont'd)‏ Both the schemes show the same performance on instance queries without path expressions. Both the schemes do not address relationship queries Interesting results for cycle queries specifying the start node gives a bad performance than when the start node is not specified specifying the start node uses Oracle Filter.

Future Work Test large and diverse datasets Test vertical partitioning with a column-orientated database like MonetDB Pruning strategies for cycle queries Impose join indexes Find approaches to answer relationship queries Storage classification based on the application domain

Thank You Questions? Please see for a copy of the report that accompanies this presentation, including a full bibliography