Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.

Slides:



Advertisements
Similar presentations
1 ICS-FORTH EU-NSF Semantic Web Workshop 3-5 Oct Christophides Vassilis Database Technology for the Semantic Web Vassilis Christophides Dimitris Plexousakis.
Advertisements

Benchmarking traversal operations over graph databases Marek Ciglan 1, Alex Averbuch 2 and Ladialav Hluchý 1 1 Institute of Informatics, Slovak Academy.
Evaluating “find a path” reachability queries P. Bouros 1, T. Dalamagas 2, S.Skiadopoulos 3, T. Sellis 1,2 1 National Technical University of Athens 2.
1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.
Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao 1 1 gStore: Answering SPARQL Queries Via Subgraph Matching 1 Peking University, 2 Hong.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
Knowledge Graph: Connecting Big Data Semantics
Designing Indexing Structure for Discovering Relationships in RDF Graphs Stanislav Bartoň.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
RDF Databases By: Chris Halaschek. Outline Motivation / Requirements Storage Issues Sesame General Introduction Architecture Scalability RQL Introduction.
Visibility Information Exchange Web System. Source Data Import Source Data Validation Database Rules Program Logic Storage RetrievalPresentation AnalysisInterpretation.
Ameet N Chitnis, Abir Qasem and Jeff Heflin 11 November 2007.
Xyleme A Dynamic Warehouse for XML Data of the Web.
1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 The Enhanced Entity- Relationship (EER) Model.
Chapter 6: Database Evolution Title: AutoAdmin “What-if” Index Analysis Utility Authors: Surajit Chaudhuri, Vivek Narasayya ACM SIGMOD 1998.
Chapter 10: Stream-based Data Management Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core Authors:
Time-Variant Spatial Network Model Vijay Gandhi, Betsy George (Group : G04) Group Project Overview of Database Research Fall 2006.
Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.
Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics, University of Zurich Summarized by: Arpit Gagneja.
Graph Data Management Lab, School of Computer Scalable SPARQL Querying of Large RDF Graphs Xu Bo
Storing RDF Data in Hadoop And Retrieval Pankil Doshi Asif Mohammed Mohammad Farhan Husain Dr. Latifur Khan Dr. Bhavani Thuraisingham.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.
Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Abadi, Adam Marcus, Samuel R. Madden, Kate Hollenbach VLDB, 2007 Oct 15, 2014.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Hexastore: Sextuple Indexing for Semantic Web Data Management
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Entity Framework Overview. Entity Framework A set of technologies in ADO.NET that support the development of data-oriented software applications A component.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
IDB, SNU Dong-Hyuk Im Efficient Computing Deltas between RDF Models using RDFS Entailment Rules (working title)
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao {
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
A Logic Programming Approach to Scientific Workflow Provenance Querying* Shiyong Lu Department of Computer Science Wayne State University, Detroit, MI.
Clustering XML Documents for Query Performance Enhancement Wang Lian.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
C-Store: RDF Data Management Using Column Stores Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 24, 2009.
SAGA: Array Storage as a DB with Support for Structural Aggregations SSDBM 2014 June 30 th, Aalborg, Denmark 1 Yi Wang, Arnab Nandi, Gagan Agrawal The.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
GStore: Answering SPARQL Queries Via Subgraph Matching Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao Peking University, 2 Hong.
An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006.
XML and Database.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
RDF-3X: a RISC-style Engine for RDF Presented by Thomas Neumann, Gerhard Weikum Max-Planck-Institut fur Informatik Saarbrucken, Germany Session 19: System.
RDFPath: Path Query Processing on Large RDF Graph with MapReduce Martin Przyjaciel-Zablocki et al. University of Freiburg ESWC May 2013 SNU IDB.
Melbourne, Australia, Oct., 2015 gSparsify: Graph Motif Based Sparsification for Graph Clustering Peixiang Zhao Department of Computer Science Florida.
Efficient Discovery of XML Data Redundancies Cong Yu and H. V. Jagadish University of Michigan, Ann Arbor - VLDB 2006, Seoul, Korea September 12 th, 2006.
GRIN: A Graph Based RDF Index Octavian Udrea 1 Andrea Pugliese 2 V. S. Subrahmanian 1 1 University of Maryland College Park 2 Università di Calabria.
2) Database System Concepts and Architecture. Slide 2- 2 Outline Data Models and Their Categories Schemas, Instances, and States Three-Schema Architecture.
Author: Akiyoshi Matonoy, Toshiyuki Amagasay, Masatoshi Yoshikawaz, Shunsuke Uemuray.
SPADE on Android
EvoGen: a Generator for Synthetic Versioned RDF Marios Meimaris Institute for the Management of Information Systems Research Center “Athena”
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
A paper on Join Synopses for Approximate Query Answering
Probabilistic Data Management
Associative Query Answering via Query Feature Similarity
XML-Based RDF Data Management for Efficient Query Processing
Column Stores For Wide and Sparse Data
UMBC AN HONORS UNIVERSITY IN MARYLAND
RDF Stores S. Sakr and G. A. Naymat.
Big Data Analytics: Exploring Graphs with Optimized SQL Queries
Accelerating Regular Path Queries using FPGA
Presentation transcript:

Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar

Motivation Need for efficient storage of structured data Semantic Web libraries, scientific databases, industry Social Networks

RDF Schema Schema Instance

RDF Schema RDF Triples

Related Work Triple store Property tables Class property tables Dynamic table model Vertically partitioned tables (Abadi, et al 2007)‏ Path based approach (Matono, et al 2005)

Vertical Partitioning A table is created for each property First SubjectObject 'r1''Picasso' 'r4''August' Last SubjectObject 'r1''Picasso' 'r4''Rodin' Paints SubjectObject 'r1''r2' 'r1''r3'... etc.

Path-based Model Path signatures relate to instance data Path pathidpathexp 1'' 2'#first' 3'#last' 4'#paints' 5'#title<#paints' 6'#sculpts' 7'#title<#sculpts' Resource namepathidroot 'r1'1'r1' 'r2'4'r1' 'r3'4'r1' 'r4'1'r4' 'Picasso'2'r1' 'Pablo'3'r1' 'August'2'r4' 'Rodin'3'r4'... Our enhancement

Problem Statement Given: A set of RDF triples Vertical partitioning storage model Path-based storage model Find: Query plans for the various categories of queries under these two storage schemes. Objective: To determine query types that perform comparatively better or worse in two storage models Why is the problem hard? Different application domains use RDF, generic storage schemes should support a diverse workload.

Contributions Identification of benchmark queries schema, instance, path, and aggregate queries Enhancement to the path-based schema that addresses different types of workloads Comparison of path-based model and vertical partitioning Analysis of cyclic queries

Query Types Schema queries find all types of artists list all property names list nodes with 2 or more descendants. find the transitive sub-classes of a class 'sculpture' list properties with 2 or more descendants Instance queries find the titles of all paintings by Picasso select all nodes within one edge-length of R4 list all the properties of node r4

Query Types Path queries find the title of any painting painted by anyone display all the titles of work done by artists find the names of all the sculptors...with constraint on intermediate node find an artist's name where the artifact is a painting...with terminal node constraints display all the titles of work done by Picasso connection queries list all the properties of node r4 is there a connection between 'Picasso' and 'Guernica'? diameter queries select all nodes in the graph within one edge-length of R4 non-simple path queries detect loops in the dataset starting at 'Picasso' detect loops in the whole dataset

Query Types Aggregate queries find all nodes with 2 or more properties list all subjects that have two instances of a single property Relationship queries find any relationship between r1 and r4

Assumptions Using a small dataset, with the assumption that number of joins and efficiency of the queries will not change significantly with larger datasets No explicit storage of the RDF schema in the vertically- partitioned scheme INSERT, UPDATE, & DELETE are insignificant compared to SELECT Key nodes in the path-based model are well-defined In practice, key nodes, would be generated dynamically after user load analysis

Experimental Process Validation parameters Nodes Edges Number of joins Number of tables CPU cost Storage bytes Setup both schemes in Oracle 10g for the RDF graph shown earlier Materialized path lengths in path-based scheme Generated query plans Analyzed queries based on the validation parameters Cycle queries – joins are not supported

Conclusions & Observations Vertical Partitioning performs well for Short path length, terminal node constraints. Offers storage benefits for instance queries without path expressions. Enhanced Path Based model performs well for Schema queries, path queries, cycle queries Queries which the original path-based could not address and the enhanced model could answer: Connection queries and diameter queries Path queries with intermediate node constraints

Conclusion (Cont'd)‏ Both the schemes show the same performance on instance queries without path expressions. Both the schemes do not address relationship queries Interesting results for cycle queries specifying the start node gives a bad performance than when the start node is not specified specifying the start node uses Oracle Filter.

Future Work Test large and diverse datasets Test vertical partitioning with a column-orientated database like MonetDB Pruning strategies for cycle queries Impose join indexes Find approaches to answer relationship queries Storage classification based on the application domain

Thank You Questions? Please see for a copy of the report that accompanies this presentation, including a full bibliography