Presentation is loading. Please wait.

Presentation is loading. Please wait.

Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.

Similar presentations


Presentation on theme: "Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the."— Presentation transcript:

1 Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the WEB with RDF, OQL and SPARQL SW-Store: a vertically partitioned DBMS for Semantic Web data management

2 Overview 1. The Problem and the Solution Motivation Current State of Art - RDF in RDBMS and Property tables Vertically Partitioned Approach Column Oriented DBMS for Vertical Partitioning 2. Benchmarks, Comparisons and Results 3. SW-Store – Design System Architecture Storage System Query Engine and Query Translation The rest of it Conclusion

3 Motivation Efficient storage mechanism for RDF triples Query : Find the authors of books whose title contains the word “Transaction” The easy way : Have a three column schema with subject, property and object as labels

4 Motivation Efficient storage mechanism for RDF triples Query : Find the authors of books whose title contains the word “Transaction” “5 way self join” The easy way : Have a three column schema with subject, property and object as labels

5 Property table approach Basic Idea : create tables based on properties as labels Two approaches 1. Clustered property table … cluster properties that tend to be defined together 2. Property class table … cluster based on type property of subjects

6 Two sides of coin Advantages: Significantly reduces subject-subject self joins on triples table Opens up possibility of attribute typing. Disadvantages: Many queries will still need joins as they will access data from multiple tables Unstructured data – Subjects won’t have all properties defined. Multivalued attributes.

7 A simpler alternative : Vertical partitioning Basic Idea: Subject-Object columns for each property. Advantages: Effective handling of multivalued attributes Elimination of null values – heterogeneous records Only property tables required by a query needs to be read No clustering algorithms Fewer unions But of course, Number of joins required just exploded!! Slower inserts

8 Extending a column oriented DBMS Basic Idea: store as collections of columns rather than collection of rows No wastage of bandwidth as projections on data happen before it is pulled into main memory. Record header is stored in separate columns thus reducing the tuple width and letting us choose different compression techniques for each column. Source: smithal – spatial databases CSCI 8715

9 Benchmark and Evaluation Barton Libraries dataset provided by Simile Project at MIT A benchmark set of 7 queries of varying type Triple Data store Property tables Vertically partitioned – row oriented Vertically partitioned – Column oriented

10 Results Property table and vertical partitioning outperforms triple store by a factor of 2-3. C-Store adds another factor of 10 performance improvement For Property table, careful selection of column names are required. Vertical partitioning represents the best case and worst case scenario Linear scaling for all tested queries

11 Hybrid storage representation Single columned Column oriented sparse compression schemes SW-Store – A standalone vertically partitioned database/storage layer

12 Data representation

13 Query engine and Query Translation Each column scanned to produce tuples that satisfies all three predicates Tupleize operator becomes merge join over two column vertical partitions Query translator converts

14 Overflow table to perform updates A mechanism to support inserts in a batch. Additional table in the standard triples schema Not indexed or read optimized Properties that appear very small number of times in overflow table are not merged due to cost of merging. Horizontal “chunks” to improve the efficiency of merging Disadvantage: Queries must go to both overflow table and vertical partitions Merge must be performed – Still expensive

15 Discussions: Multivalued attributes can not be implemented. Overflow table – Significant overhead??? “Overflow tables might turn out to be useful while adding very rare predicates” – How? Queries that do not restrict on property values are very rare for RDF applications. -- ? Potential scalability issues when the number of properties are high? Queries including unrestricted property problem are removed from the validation dataset. – what would be the impact?What if queries are not restricted to a limited number of properties? Are real world queries like this?

16 Thank you!


Download ppt "Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the."

Similar presentations


Ads by Google