Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hexastore: Sextuple Indexing for Semantic Web Data Management

Similar presentations


Presentation on theme: "Hexastore: Sextuple Indexing for Semantic Web Data Management"— Presentation transcript:

1 Hexastore: Sextuple Indexing for Semantic Web Data Management
Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics, University of Zurich Session: Indexing and Query Processing, VLDB 2008 Summarized by Jaeseok Myung Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University, Seoul, Korea

2 Overview Hexastore – Sextuple Indexing In this presentation,
A Triple (S, P, O) can be represented in six ways (3! = 6) SPO, SOP, PSO, POS, OSP, OPS Every possible indexing scheme can be materialized Allows quick and scalable query processing Up to five times bigger index space is needed In this presentation, Review conventional RDF storage structures Introduction to Hexastore Discussion Center for E-Business Technology

3 Physical Designs for RDF Storage (1/4)
Giant Triples Table SELECT ?title WHERE { ?book <title> ?title. ?book <author> <Fox, Joe>. ?book <copyright> <2001> } Join! Join! Entire Table Scan! Redundancy! Center for E-Business Technology

4 Physical Designs for RDF Storage (2/4)
Clustered Property Table Contains clusters of properties that tend to be defined together Center for E-Business Technology

5 Physical Designs for RDF Storage (3/4)
Property-Class Table Exploits the type property of subjects to cluster similar sets of subjects together in the same table Unlike clustered property table, a property may exist in multiple property-class tables Values of the type property Center for E-Business Technology

6 Physical Designs for RDF Storage (4/4)
Vertically Partitioned Table The giant table is rewritten into n two column tables where n is the number of unique properties in the data We don’t have to Maintain null values Have a certain clustering algorithm property subject object Center for E-Business Technology

7 Motivation The problem of having non-property-bound queries
Center for E-Business Technology

8 Hexastore: Sextuple Indexing
Center for E-Business Technology

9 Hexastore: Sextuple Indexing
Center for E-Business Technology

10 Five-fold Increase in Index Space
Sharing The Same Terminal Lists SPO-PSO, SOP-OSP, POS-OPS The key of each of the three resources in a triple appears in two headers and two vectors, but only in one list Center for E-Business Technology

11 Mapping Dictionary Replacing all literals by unique IDs using a mapping dictionary Mapping dictionary compresses the triple store Reduced redundancy, Saving a lot of physical space We can concentrate on a logical index structure rather than the physical storage design S P O object214 hasColor blue belongsTo object352 S P O 1 2 3 4 ID Value object214 1 hasColor Center for E-Business Technology

12 Clustered B+-Tree (RDF-3X, VLDB 2008)
Store everything in a clustered B+-Tree Triples are sorted in lexicographical order Allowing the conversion of SPARQL patterns into range scan We don’t have to do entire table scan S P O 1 2 3 4 Actually, we don’t need this table! <Mapping Dictionary> ID Value object214 1 hasColor 002 000 001 002 003 Center for E-Business Technology

13 Argumentation Concise and Efficient Handling of Multi-valued Resources
Index can contain multiple items cf. Multi-valued Property Table Avoidance of NULLs Only those RDF elements that are relevant to a particular other element need to be stored in a particular index No ad-hoc Choices Needed Most other RDF data storage schemes require several ad-hoc decisions about their data representation architecture ex. Clustered Property Table (which properties to be stored together) Center for E-Business Technology

14 Argumentation Reduced I/O cost
Other RDF storage schemes may need to access multiple tables which are irrelevant to a query Queries that are not bounded by property All First-step Pairwise Joins are Fast Merge-Joins The key of resources in all vectors and lists used in a Hexastore are sorted Reduction of Unions and Joins ex. a list of subjects related to two particular objects through any property Hexastore can use osp index Center for E-Business Technology

15 Treating the Path Expression Problem
Select B.subj FROM triples AS A, triples AS B WHERE A.prop = wasBorn AND A.obj = ‘1860’ AND A.subj = B.obj AND B.prop = ‘Author’ A path expression requires (n-1) subject-object self-joins where n is the length of the path Vertical Partitioning Materialized Path Expressions (A.author:wasBorn = ‘1860’) n-1C2 = O(n2) possible additional properties Hexastore (n-1) merge-join using pso and pos indices Center for E-Business Technology

16 Experimental Evaluation
Setup 2.8GHz dual core, 16GB RAM Competitors Column-oriented Vertical Partitioning Approaches COVP1 – PSO Index COVP2 – PSO Index + POS Index (second copy) Hexastore SPO, SOP, PSO, POS, OSP, OPS Datasets Barton, MIT library data, 61 mil. triples, 258 properties LUBM, A synthetic benchmark data set(10 univ.), 6.8 mil. triples, 18 predicates Center for E-Business Technology

17 Performance (Barton Data)
Center for E-Business Technology

18 Performance (LUBM, 10) Center for E-Business Technology

19 Memory Usage In practice, Hexastore requires a four-fold increase in memory in comparison to COVP1, which is an affordable cost for the derived advantages Center for E-Business Technology

20 Conclusion Hexastore: Sextuple-Indexing Scheme My Question
Worst-case five-fold storage increase in comparison to a conventional triples table Quick and scalable general-purpose query processing All pairwise joins in a Hexastore can be rendered as merge joins My Question Main-memory Indexing (Is it possible?) 7GB RAM for 6 mil. triples Other Options? Center for E-Business Technology


Download ppt "Hexastore: Sextuple Indexing for Semantic Web Data Management"

Similar presentations


Ads by Google