Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.

Similar presentations


Presentation on theme: "Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar."— Presentation transcript:

1 Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar

2 Motivation Need for efficient storage of structured data Semantic Web libraries, scientific databases, industry Social Networks

3 RDF Schema Schema Instance

4 RDF Schema RDF Triples

5 Related Work Triple store Property tables Class property tables Dynamic table model Vertically partitioned tables (Abadi, et al 2007)‏ Path based approach (Matono, et al 2005)

6 Vertical Partitioning A table is created for each property First SubjectObject 'r1''Picasso' 'r4''August' Last SubjectObject 'r1''Picasso' 'r4''Rodin' Paints SubjectObject 'r1''r2' 'r1''r3'... etc.

7 Path-based Model Path signatures relate to instance data Path pathidpathexp 1'' 2'#first' 3'#last' 4'#paints' 5'#title<#paints' 6'#sculpts' 7'#title<#sculpts' Resource namepathidroot 'r1'1'r1' 'r2'4'r1' 'r3'4'r1' 'r4'1'r4' 'Picasso'2'r1' 'Pablo'3'r1' 'August'2'r4' 'Rodin'3'r4'... Our enhancement

8 Problem Statement Given: A set of RDF triples Vertical partitioning storage model Path-based storage model Find: Query plans for the various categories of queries under these two storage schemes. Objective: To determine query types that perform comparatively better or worse in two storage models Why is the problem hard? Different application domains use RDF, generic storage schemes should support a diverse workload.

9 Contributions Identification of benchmark queries schema, instance, path, and aggregate queries Enhancement to the path-based schema that addresses different types of workloads Comparison of path-based model and vertical partitioning Analysis of cyclic queries

10 Query Types Schema queries find all types of artists list all property names list nodes with 2 or more descendants. find the transitive sub-classes of a class 'sculpture' list properties with 2 or more descendants Instance queries find the titles of all paintings by Picasso select all nodes within one edge-length of R4 list all the properties of node r4

11 Query Types Path queries find the title of any painting painted by anyone display all the titles of work done by artists find the names of all the sculptors...with constraint on intermediate node find an artist's name where the artifact is a painting...with terminal node constraints display all the titles of work done by Picasso connection queries list all the properties of node r4 is there a connection between 'Picasso' and 'Guernica'? diameter queries select all nodes in the graph within one edge-length of R4 non-simple path queries detect loops in the dataset starting at 'Picasso' detect loops in the whole dataset

12 Query Types Aggregate queries find all nodes with 2 or more properties list all subjects that have two instances of a single property Relationship queries find any relationship between r1 and r4

13 Assumptions Using a small dataset, with the assumption that number of joins and efficiency of the queries will not change significantly with larger datasets No explicit storage of the RDF schema in the vertically- partitioned scheme INSERT, UPDATE, & DELETE are insignificant compared to SELECT Key nodes in the path-based model are well-defined In practice, key nodes, would be generated dynamically after user load analysis

14 Experimental Process Validation parameters Nodes Edges Number of joins Number of tables CPU cost Storage bytes Setup both schemes in Oracle 10g for the RDF graph shown earlier Materialized path lengths in path-based scheme Generated query plans Analyzed queries based on the validation parameters Cycle queries – joins are not supported

15 Conclusions & Observations Vertical Partitioning performs well for Short path length, terminal node constraints. Offers storage benefits for instance queries without path expressions. Enhanced Path Based model performs well for Schema queries, path queries, cycle queries Queries which the original path-based could not address and the enhanced model could answer: Connection queries and diameter queries Path queries with intermediate node constraints

16 Conclusion (Cont'd)‏ Both the schemes show the same performance on instance queries without path expressions. Both the schemes do not address relationship queries Interesting results for cycle queries specifying the start node gives a bad performance than when the start node is not specified specifying the start node uses Oracle Filter.

17 Future Work Test large and diverse datasets Test vertical partitioning with a column-orientated database like MonetDB Pruning strategies for cycle queries Impose join indexes Find approaches to answer relationship queries Storage classification based on the application domain

18 Thank You Questions? Please see http://www.cs.umn.edu/~cmueller/cs8715 for a copy of the report that accompanies this presentation, including a full bibliography


Download ppt "Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar."

Similar presentations


Ads by Google