Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Adam Samuel R. Kate Abadi Marcus Madden MIT Daniel Hurwitz Technion:

Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Adam Samuel R. Kate Abadi Marcus Madden Hollenbach @ MIT Daniel Hurwitz Technion: Israel Institute of Technology Computer Science Dept. Seminar in Databases – 18.1.2010 1

It’s All Semantics, Anyway! “All our work, our whole life is a matter of semantics, …Everything depends on our understanding of them.” – Felix Frankfurter The Semantic Web ▫Tim Berners-Lee: “I have a dream…” ▫Sharing the wealth ▫W3C Technicalities ▫Implementation ▫Access 2

Resource Description Framework(RDF) You’d make a great model Representing information ▫Semantics ▫Graph of resource relations ▫Statements about resources  Subject  Property  Object No storage requirements Remind me how we’re related? 3

RDF Triples Semantic breakdown ▫“Rick Hull wrote Foundations of Databases.” Representation ▫Graph ▫Statement ▫XML format Foundations of DatabasesRick Hull hasAuthor Rick Hull 4

Triples Storage Relational database ▫3-column schema Performance issues ▫Waiting is the rust of the soul ▫One massive triples table ▫Queries require many self-joins SELECT C.obj FROM TRIPLES AS A, TRIPLES AS B, TRIPLES AS C WHERE A.subj = B.subj AND B.subj = C.subj AND A.prop = ‘copyright’ AND A.obj = “2001” AND B.prop = ‘author’ AND B.obj = “Fox, Joe” AND C.prop = ‘title’ 5

Getting Down to Business I.Current State of the Art RDF in RDBMs II.A Simple Alternative Vertically partitioned approach III.Benchmarks Querying the candidates IV.Evaluation Storage requirements and implementation V.Results 6

Current State of the Art Majority use RDBMs Multi-layered architecture Querying: SPARQL converted to SQL RDF layer RDBM Result SetSQL query SPARQL queryRDF in XML/Graph SELECT ?title FROM table WHERE { ?book author “Fox, Joe” ?book copyright “2001” ?book title ?title } SELECT C.obj FROM TRIPLES AS A, TRIPLES AS B, TRIPLES AS C WHERE A.subj = B.subj AND B.subj = C.subj AND A.prop = ‘copyright’ AND A.obj = “2001” AND B.prop = ‘author’ AND B.obj = “Fox, Joe” AND C.prop = ‘title’ 7

Property Table Technique Goal: speed up queries over triple-stores Idea: cluster triples containing properties defined over similar subjects ▫Example: “title”, “author”, “copyright”  Books, journals, CDs, etc. Reduces number of self-joins 8

Clustered Property Table 9

Property-Class Table 10

Property Tables: Issues NULLs Multi-valued attributes Proliferation of unions and joins 11 Rick Hull hasAuthor John Green hasAuthor Foundations of Databases

Property Tables Summary The Good ▫Reduce subject-subject self-joins The Bad ▫Sluggish on cross-table joins ▫How do we cluster property tables? 12

Vertically Partitioned Approach Goal: speed up queries over triples-store Idea: one table per property ▫Column 1: Subjects ▫Column 2: Objects 14

Vertically Partitioned Approach 15

Vertically Partitioned Approach: Advantages Support for multi-valued attributes Support for heterogeneous records 16

Vertically Partitioned Approach: Advantages Access requested properties only No need for clustering algorithms Less is more: fewer and faster joins 17

Vertically Partitioned Approach: Disadvantages More joins than property tables ▫Multi-property queries – merge joins Slower insertions into tables ▫Multiple-table access for same-subject statements ▫Solution: batch insertions Standard DBMSs not optimal for this approach 18

DB Orientation: Column vs Row Row-Oriented DBMS Column-Oriented ID1, “XYZ”ID2, “ABC” ID3, “MNO” ID4, “DEF” ID5, “GHI” … DBMS Memory File ID1, ID2, ID3, ID4, ID5 “XYZ”, “ABC”, “MNO”, “DEF”, “GHI” … DBMS Memory File 19

Jargon for the Noggin’ Tuple Tuple metadata ▫Timestamp ▫Number of attributes ▫NULL flags 20

Column-Oriented DBMS + Only relevant columns are retrieved - Slower insertions Advantages for Vertical Partitioning: ▫Separate tuple metadata ▫Fixed-length tuples ▫Column-oriented data compression ▫Optimized merge code 21

Materialized Path Expressions Problem: for a path of length n properties ▫n-1 subject-object joins required E.g. Find books whose authors were born in 1860 BooksAuthors“1860” hasAuthor wasBorn 22

Materialized Path Expressions Goal: eliminate joins across multiple tables How: Combine property paths into a single table SELECT B.subj FROM triples AS A, triples AS B WHERE A.prop = wasBorn AND A.obj = “1860” AND A.subj = B.obj AND B.prop = “Author” SELECT A.subj FROM predtable AS A, WHERE A.author:wasBorn = “1860” BooksAuthors“1860” hasAuthor wasBorn hasAuthor:wasBorn 23

Materialized Path Expressions: Breakdown Precalculate path expression ▫No join at query time ▫Easy implementation in vertically partitioned schema  Simply add table “hasAuthor:wasBorn”  Property Table Technique: Add column “hasAuthor:wasBorn” Added cost: recalculating after insertions 24

Benchmark: Dataset Barton Libraries ▫50 million triples  77% multi-valued ▫221 unique properties  37% multi-valued ▫Good representation of Semantic Web data RDF/XML converted into triples 26

Benchmark: Longwell GUI for exploring RDF data User applies filters to property panels Longwell-style queries provide realistic benchmark for testing 27

Benchmark: Longwell GUI 28

Benchmark: Longwell queries 7 queries were chosen Each query represents typical browsing session ▫Exercises on query diversity 29

Evaluation: Schema Implementations Performance comparison of all 3 schemas 1.Triple Store 2.Property Table Store 3.Vertically Partitioned Store A.Row-oriented (Postgres) B.Column-oriented (C-Store) 31

Evaluation: Size Matters Memory usage per implementation 1.Triple Store - 8.3 GBytes 2.Property Table store - 14 GBytes 3.Vertically Partitioned Store (Postgres) - 5.2 GBytes 4.Vertically Partitioned Store (C-Store) - 2.7 GBytes 32

Results 34

Scalability How does performance scale with size of data? Increased number of triples from 1 million to 50 million. 35

Results: Scalability Vertical partitioning schemes scale linearly Triple-store scales super-linearly ▫Prevalent sorting operations 36

Results: Materialized Path Expressions 37

Results: Further Widening 38

Summary Semantic Web users require fast responses to queries Current triple-stores just don’t cut it ▫Can’t stand up to sluggish self-joins Property tables are good, but have their limitations Vertical partitioning takes the cake ▫Competes with optimal performance of property table solution ▫Step toward an interactive-time Semantic Web 39

Thank you! Questions? 40

Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Adam Samuel R. Kate Abadi Marcus Madden MIT Daniel Hurwitz Technion:

Similar presentations

Presentation on theme: "Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Adam Samuel R. Kate Abadi Marcus Madden MIT Daniel Hurwitz Technion:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Adam Samuel R. Kate Abadi Marcus Madden MIT Daniel Hurwitz Technion:

Similar presentations

Presentation on theme: "Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Adam Samuel R. Kate Abadi Marcus Madden MIT Daniel Hurwitz Technion:"— Presentation transcript:

Similar presentations

About project

Feedback