Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Adam Samuel R. Kate Abadi Marcus Madden MIT Daniel Hurwitz Technion:

Similar presentations


Presentation on theme: "Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Adam Samuel R. Kate Abadi Marcus Madden MIT Daniel Hurwitz Technion:"— Presentation transcript:

1 Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Adam Samuel R. Kate Abadi Marcus Madden Hollenbach @ MIT Daniel Hurwitz Technion: Israel Institute of Technology Computer Science Dept. Seminar in Databases – 18.1.2010 1

2 It’s All Semantics, Anyway! “All our work, our whole life is a matter of semantics, …Everything depends on our understanding of them.” – Felix Frankfurter The Semantic Web ▫Tim Berners-Lee: “I have a dream…” ▫Sharing the wealth ▫W3C Technicalities ▫Implementation ▫Access 2

3 Resource Description Framework(RDF) You’d make a great model Representing information ▫Semantics ▫Graph of resource relations ▫Statements about resources  Subject  Property  Object No storage requirements Remind me how we’re related? 3

4 RDF Triples Semantic breakdown ▫“Rick Hull wrote Foundations of Databases.” Representation ▫Graph ▫Statement ▫XML format Foundations of DatabasesRick Hull hasAuthor Rick Hull 4

5 Triples Storage Relational database ▫3-column schema Performance issues ▫Waiting is the rust of the soul ▫One massive triples table ▫Queries require many self-joins SELECT C.obj FROM TRIPLES AS A, TRIPLES AS B, TRIPLES AS C WHERE A.subj = B.subj AND B.subj = C.subj AND A.prop = ‘copyright’ AND A.obj = “2001” AND B.prop = ‘author’ AND B.obj = “Fox, Joe” AND C.prop = ‘title’ 5

6 Getting Down to Business I.Current State of the Art RDF in RDBMs II.A Simple Alternative Vertically partitioned approach III.Benchmarks Querying the candidates IV.Evaluation Storage requirements and implementation V.Results 6

7 Current State of the Art Majority use RDBMs Multi-layered architecture Querying: SPARQL converted to SQL RDF layer RDBM Result SetSQL query SPARQL queryRDF in XML/Graph SELECT ?title FROM table WHERE { ?book author “Fox, Joe” ?book copyright “2001” ?book title ?title } SELECT C.obj FROM TRIPLES AS A, TRIPLES AS B, TRIPLES AS C WHERE A.subj = B.subj AND B.subj = C.subj AND A.prop = ‘copyright’ AND A.obj = “2001” AND B.prop = ‘author’ AND B.obj = “Fox, Joe” AND C.prop = ‘title’ 7

8 Property Table Technique Goal: speed up queries over triple-stores Idea: cluster triples containing properties defined over similar subjects ▫Example: “title”, “author”, “copyright”  Books, journals, CDs, etc. Reduces number of self-joins 8

9 Clustered Property Table 9

10 Property-Class Table 10

11 Property Tables: Issues NULLs Multi-valued attributes Proliferation of unions and joins 11 Rick Hull hasAuthor John Green hasAuthor Foundations of Databases

12 Property Tables Summary The Good ▫Reduce subject-subject self-joins The Bad ▫Sluggish on cross-table joins ▫How do we cluster property tables? 12

13 Getting Down to Business I.Current State of the Art RDF in RDBMs II.A Simple Alternative Vertically partitioned approach III.Benchmarks Querying the candidates IV.Evaluation Storage requirements and implementation V.Results 13

14 Vertically Partitioned Approach Goal: speed up queries over triples-store Idea: one table per property ▫Column 1: Subjects ▫Column 2: Objects 14

15 Vertically Partitioned Approach 15

16 Vertically Partitioned Approach: Advantages Support for multi-valued attributes Support for heterogeneous records 16

17 Vertically Partitioned Approach: Advantages Access requested properties only No need for clustering algorithms Less is more: fewer and faster joins 17

18 Vertically Partitioned Approach: Disadvantages More joins than property tables ▫Multi-property queries – merge joins Slower insertions into tables ▫Multiple-table access for same-subject statements ▫Solution: batch insertions Standard DBMSs not optimal for this approach 18

19 DB Orientation: Column vs Row Row-Oriented DBMS Column-Oriented ID1, “XYZ”ID2, “ABC” ID3, “MNO” ID4, “DEF” ID5, “GHI” … DBMS Memory File ID1, ID2, ID3, ID4, ID5 “XYZ”, “ABC”, “MNO”, “DEF”, “GHI” … DBMS Memory File 19

20 Jargon for the Noggin’ Tuple Tuple metadata ▫Timestamp ▫Number of attributes ▫NULL flags 20

21 Column-Oriented DBMS + Only relevant columns are retrieved - Slower insertions Advantages for Vertical Partitioning: ▫Separate tuple metadata ▫Fixed-length tuples ▫Column-oriented data compression ▫Optimized merge code 21

22 Materialized Path Expressions Problem: for a path of length n properties ▫n-1 subject-object joins required E.g. Find books whose authors were born in 1860 BooksAuthors“1860” hasAuthor wasBorn 22

23 Materialized Path Expressions Goal: eliminate joins across multiple tables How: Combine property paths into a single table SELECT B.subj FROM triples AS A, triples AS B WHERE A.prop = wasBorn AND A.obj = “1860” AND A.subj = B.obj AND B.prop = “Author” SELECT A.subj FROM predtable AS A, WHERE A.author:wasBorn = “1860” BooksAuthors“1860” hasAuthor wasBorn hasAuthor:wasBorn 23

24 Materialized Path Expressions: Breakdown Precalculate path expression ▫No join at query time ▫Easy implementation in vertically partitioned schema  Simply add table “hasAuthor:wasBorn”  Property Table Technique: Add column “hasAuthor:wasBorn” Added cost: recalculating after insertions 24

25 Getting Down to Business I.Current State of the Art RDF in RDBMs II.A Simple Alternative Vertically partitioned approach III.Benchmarks Querying the candidates IV.Evaluation Storage requirements and implementation V.Results 25

26 Benchmark: Dataset Barton Libraries ▫50 million triples  77% multi-valued ▫221 unique properties  37% multi-valued ▫Good representation of Semantic Web data RDF/XML converted into triples 26

27 Benchmark: Longwell GUI for exploring RDF data User applies filters to property panels Longwell-style queries provide realistic benchmark for testing 27

28 Benchmark: Longwell GUI 28

29 Benchmark: Longwell queries 7 queries were chosen Each query represents typical browsing session ▫Exercises on query diversity 29

30 Getting Down to Business I.Current State of the Art RDF in RDBMs II.A Simple Alternative Vertically partitioned approach III.Benchmarks Querying the candidates IV.Evaluation Storage requirements and implementation V.Results 30

31 Evaluation: Schema Implementations Performance comparison of all 3 schemas 1.Triple Store 2.Property Table Store 3.Vertically Partitioned Store A.Row-oriented (Postgres) B.Column-oriented (C-Store) 31

32 Evaluation: Size Matters Memory usage per implementation 1.Triple Store - 8.3 GBytes 2.Property Table store - 14 GBytes 3.Vertically Partitioned Store (Postgres) - 5.2 GBytes 4.Vertically Partitioned Store (C-Store) - 2.7 GBytes 32

33 Getting Down to Business I.Current State of the Art RDF in RDBMs II.A Simple Alternative Vertically partitioned approach III.Benchmarks Querying the candidates IV.Evaluation Storage requirements and implementation V.Results 33

34 Results 34

35 Scalability How does performance scale with size of data? Increased number of triples from 1 million to 50 million. 35

36 Results: Scalability Vertical partitioning schemes scale linearly Triple-store scales super-linearly ▫Prevalent sorting operations 36

37 Results: Materialized Path Expressions 37

38 Results: Further Widening 38

39 Summary Semantic Web users require fast responses to queries Current triple-stores just don’t cut it ▫Can’t stand up to sluggish self-joins Property tables are good, but have their limitations Vertical partitioning takes the cake ▫Competes with optimal performance of property table solution ▫Step toward an interactive-time Semantic Web 39

40 Thank you! Questions? 40


Download ppt "Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Adam Samuel R. Kate Abadi Marcus Madden MIT Daniel Hurwitz Technion:"

Similar presentations


Ads by Google