Store RDF Triples In A Scalable Way 2012-03-23 Liu Long & Liu Chunqiu.

Store RDF Triples In A Scalable Way 2012-03-23 Liu Long & Liu Chunqiu

Scalable RDF Store Based on HBase and MapReduce 2012-03-23 Liu Chunqiu

OUTLINE RDF Schema Example Conventional RDF Storage Structures Briefly Introduce HBase and MapReduce Scalable RDF Store Based on HBase and MapReduce Conclusion

RDF Schema Example In the example above, the resource "horse" is a subclass of the class "animal". ex:horse rdfs:subClassOf ex:animal

Physical Designs for RDF Storage (1/4) Giant Triples Table SELECT ?title WHERE { ?book ?title. ?book. ?book }  Entire Table Scan!  Redundancy!

Physical Designs for RDF Storage (2/4) Clustered Property Table Contains clusters of properties that tend to be defined together

Physical Designs for RDF Storage (3/4) Property-Class Table Exploits the type property of subjects to cluster similar sets of subjects together in the same table Unlike clustered property table, a property may exist in multiple property-class tables Values of the type property

Physical Designs for RDF Storage (4/4) Vertically Partitioned Table The giant table is rewritten into n two column tables where n is the number of unique properties in the data We don’t have to Maintain null values Have a certain clustering algorithm subject property object

HBase A data row = a sortable rowkey + an arbitrary number of columns Columns are grouped into column families A data cell can hold multiple versions which are distinguished by timestamp HBase is like a multi-dimensional sorted map : (Row Key, Family : Column,Timestamp )-----> Value Row Key: Key1 { Column Family A { Column X: T1 Value1 T2 Value2 Column Y: T3 Value3 } Column Family B { Column Z: T4 Value4 1.Data stored in the same column family are stored together on file system,while might be distributed 2.Hbase provides a B+ tree-like index on row key by default 3.Secondary Indexes on columns can also be created with extra efforts

MapReduce

Scalable RDF Store Based on HBase and MapReduce Present a RDF storage schema on HBase. The storage schema considers characteristics of both RDF data model and HBase structure Present a MapReduce algorithm for SPARQL Basic Graph Pattern Processing Evaluation

The Storage Schema on HBase build up six tables to store RDF triples : S_PO,P_SO, O_SP, PS_O, SO_P and PO_S Data are stored in row keys and column names These six tables have different row key and column name combinations Row Key: Subject URI I { Column Family: VALUE { Column: (predicate1, object1) Column: (predicate2, object2) Column: (predicate3, object3) } S_PO data row Column value is null !

The Storage Schema on HBase P1: (s, p, o) P2: (?s, p, o) P3: (s,? p, o) P4: (s, p,? o) P5: (?s,? p, o) P6: (s, ?p,? o) P7: (?s, p,? o) P8: (?s,? p,? o) S_PO P_SO O_SP SP_O SO_P PO_S P1 and P8 can be handled by any table

The Storage Schema on HBase Advantages Support for multi-valued properties Support for parallel processing Reduced I/O cost Suitable for HBase Disadvantages Requires more storage space The update operation will be more complicated

Query Processing SELECT ?x ?y ?z WHERE { ?x rdf:type ub:GraduateStudent. ?y rdf:type ub:University. ?z rdf:type ub:Department. ?x ub:memberOf ?z. ?z ub:subOrganizationOf ?y. ?x ub:undergraduateDegreeFrom ?y. } LUBM Q2 ( x, y, z ) x y (x, y)(x, y, z) (y, z) (x, z) z Join on x Join on y Join on (x, y) Join on z A MapReduce job

Process of MapReduce Assign unique ids to all triple patterns Retrieve results for all triple patterns and persist results to file system Calculate expressions for triple patterns and final result Create and submit the MapReduce job Mappers receive input (k, v) pairs Reducer retrieve (k, v) pairs Input of mapper : (1, Department0) (2, (Student0, Department0)) (3, (Department0, University0)) Output Of mapper: (Department0, (1, )) (Department0, (2, Student0)) (Department0, (3, University0)) (a) Output of reducer: (Student0,University0, Department0) (b) ? x ub: memberOf ?z (x,z) ?z rdf:type ub:Department. ?x ub:memberOf ?z. ?z ub:subOrganizationOf ?y.

Process of MapReduce Notice: If the query doesn't need to join, then there will not be any MapReduce job LUBM Query6 SELECT ?X WHERE { ?X rdf:type ub:Student } NO need to join!!!! The result of the only triple pattern will be directly used as final output

Evaluation

UniversitiesQ1(sec)Q2(sec)Q6(sec)Q7(sec)Q9(sec) 2017.043.38.650.247.8 4019.155.411.267.756.9 6023.366.017.179.070.0 8029.377.227.398.893.3 10038.2100.435.1131.5136.4 Q6 is the most efficient query

Evaluation UniversitiesQ1(sec)Q2(sec)Q6(sec)Q7(sec)Q9(sec) 2017.043.38.650.247.8 4019.155.411.267.756.9 6023.366.017.179.070.0 8029.377.227.398.893.3 10038.2100.435.1131.5136.4 Q6 is the most efficient query Q1 is the fastest among rest queries

Evaluation UniversitiesQ1(sec)Q2(sec)Q6(sec)Q7(sec)Q9(sec) 2017.043.38.650.247.8 4019.155.411.267.756.9 6023.366.017.179.070.0 8029.377.227.398.893.3 10038.2100.435.1131.5136.4 Q6 is the most efficient query Q1 is the fastest among rest queries Performance result against LUBM(20, 0) is not quite efficient Compare results of LUBM(20,0)and LUBM(100,0),MapReduce shows great scalability when data size grows

Conclude and Think This approach works against large RDF dataset Waste a lot of storage space MapReduce algorithm need to be more efficient Enrich storage schema Use data compression Reduce execution steps during query processing Integrate with other MapReduce algorithm SO

Store RDF Triples In A Scalable Way 2012-03-23 Liu Long & Liu Chunqiu.

Similar presentations

Presentation on theme: "Store RDF Triples In A Scalable Way 2012-03-23 Liu Long & Liu Chunqiu."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Store RDF Triples In A Scalable Way 2012-03-23 Liu Long & Liu Chunqiu.

Similar presentations

Presentation on theme: "Store RDF Triples In A Scalable Way 2012-03-23 Liu Long & Liu Chunqiu."— Presentation transcript:

Similar presentations

About project

Feedback