Presentation is loading. Please wait.

Presentation is loading. Please wait.

SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,

Similar presentations


Presentation on theme: "SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,"— Presentation transcript:

1 SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden, and Kate Hollenbach. 2009. The VLDB Journal. Group 4 Surabhi Mithal 4282643 Nipun Garg 4282567 http://www-users.cs.umn.edu/~smithal/

2 O UTLINE Introduction to Semantic Web Motivation Problem Statement Challenges Major Contributions Related Work Key Concepts Assumptions Validation Methodology Results Improvements

3 I NTRODUCTION TO SEMANTIC WEB : A N EXAMPLE Source : http://www.w3.org/People/Ivan/CorePresentations/SWTutorial/: http://www.w3.org/People/Ivan/CorePresentations/SWTutorial/ A simplified bookstore data (dataset “A”)

4 EXAMPLE CONT : GRAPH REPRESENATION http:// …isbn/0006 51409X Ghosh, Amitav http://www.amitavghosh.com The Glass Palace 2000 London Harper Collins a:title a:year a:city a:p_name a:name a:homepage a:author a:publisher

5 A NOTHER BOOKSTORE DATA ( DATASET “F”) ABCD 1 IDTitreTraducte ur Original 2 ISBN 2020286682Le Palais des Miroirs $A12$ISBN 0-00-6511409- X 3 4 5 6 IDAuteur 7 ISBN 0-00- 6511409-X $A11$ 8 9 10 Nom 11 Ghosh, Amitav 12 Besse, Christianne

6 EXAMPLE CONT : GRAPH REPRESENATION http:// …isbn/000651409 X Ghosh, Amitav Besse, Christianne Le palais des miroirs f:original f:nom f:traducteur f:auteur f:titre http:// …isbn/20203866 82 f:nom

7 DATA INTEGRATION ACROSS THE TWO DATASETS : SEMANTIC WEB http:// …isbn/000651409X Ghosh, Amitav Besse, Christianne Le palais des miroirs f:original f:nom f:traducte ur f:auteur f:titre http:// …isbn/2020386682 f:nom http:// …isbn/000651409X Ghosh, Amitav http://www.amitavghosh.com The Glass Palace 2000 London Harper Collins a:title a:yea r a:city a:p_name a:name a:homepage a:autho r a:publishe r

8 DATA INTEGRATION ACROSS THE TWO DATASETS : SEMANTIC WEB http:// …isbn/000651409X Ghosh, Amitav Besse, Christianne Le palais des miroirs f:origina l f:nom f:traducte ur f:auteur f:titre http:// …isbn/2020386682 f:nom http:// …isbn/000651409X Ghosh, Amitav http://www.amitavghosh.com The Glass Palace 2000 London Harper Collins a:title a:yea r a:city a:p_name a:name a:homepage a:autho r a:publishe r SAME URI

9 DATA INTEGRATION ACROSS THE TWO DATASETS :SEMANTIC WEB a:title Ghosh, Amitav Besse, Christianne Le palais des miroirs f:origina l f:no m f:traducte ur f:auteur f:titre http:// …isbn/2020386682 f:nom Ghosh, Amitav http://www.amitavghosh.com The Glass Palace 2000 London Harper Collins a:yea r a:cit y a:p_name a:nam e a:homepage a:autho r a:publishe r http:// …isbn/000651409X User of data “F” can now ask queries like: “give me the title of the original”

10 M OTIVATION Integration and sharing of data across different applications and organizations. The Semantic Web logical data model is called “Resource Description Framework. Semantic web concept has issues related to scalability and performance due to the nature of the data. Current data management solutions for RDF scale poorly.

11 P ROBLEM S TATEMENT Input : RDF data in the form of triples e.g. The Glass Palace hasAuthor Amitav Ghosh Output : Efficient storage system for RDF data. Objective : Improve the query performance for complex real world queries.

12 C HALLENGES Find all authors of books whose title has the word “Transaction”. 5 way self join!

13 M AJOR C ONTRIBUTIONS AND N OVELTY Introduction of a new concept of vertically partitioning RDF data and use of a column- oriented database to improve performance and increase simplicity. The performance evaluation of the new and existing techniques with a real world example. A new column oriented database SW-store is proposed which is based on the above approach.

14 R ELATED W ORK – P ROPERTY TABLES HP L ABORATORIES - J ENA Property Clustered Tables and Property Class Tables Approach 1: A data clustering approach. Approach 2: Creates clusters based on subject’s type. Limitations: Accuracy of Clustering algorithms. NULLs in data. Multivalued attributes.

15 S AMPLE DATABASE Source : - SW-Store: a vertically partitioned DBMS for Semantic Web data management Too many NULLs

16 K EY C ONCEPTS : V ERTICAL PARTITIONING AND C OLUMN O RIENTED S TORE Vertical partitioning of data and further storing this vertically partitioned data into a column oriented database. Subject-object columns for each property. Advantages: Effective handling of Multivalued attributes. Elimination of NULLs The number of unions is less. Column oriented storage. Advantages: no wastage of bandwidth as projections on data happen before it is pulled into main memory. record header is stored in separate columns thus reducing the tuple width and letting us choose different compression techniques for each column.

17 K EY C ONCEPTS : SW - STORE SW-store is a column oriented DBMS optimized for storing RDF Single column table for subjects. Representing Sparse data Overflow tables

18 A SSUMPTIONS Postgres is assumed to be the best available choice for a row oriented RDBMS because of effective handling of NULLs. Queries that do not restrict on property values are very rare for RDF applications. Moderate amount of Insert/Updates on RDF store. Critique for Assumption : Limited Insert/Update If the overflow tables get filled rapidly, the batch operation to update the column oriented store will occur more often degrading the performance as a whole.

19 V ALIDATION METHODOLOGY Barton Libraries dataset provided by the Simile Project at MIT (http://simile.mit.edu/rdf-test-data/barton). The benchmark is set of 7 queries which is based on a browsing session of Long well, a UI built by Simile group for querying the library dataset. These queries are executed on: Triple data store (subject, property, object table with no improvements on Postgres). Property tables ( on Postgres) Vertically partitioned data in a row oriented store (Postgres). Vertically partitioned data in a column oriented store (C- Store).

20 V ALIDATION METHODOLOGY Strengths : Real world data and query scenarios. Comparison of all the existing techniques the proposed technique. Weaknesses :- Avoiding queries involving unrestricted property problem which are particularly prevalent for vertical partitioned scenarios. Accuracy of clustering for property tables. Performance may differ when using different underlying databases.

21 R ESULTS From the results, it is clear that proposed storage scheme outperforms the exiting methods in terms of query time.

22 I MPROVEMENTS – S PATIAL P ERSPECTIVE Schema design- Queries are fired on vertically partitioned tables as well as overflow tables. Owing to the heaviness of spatial data, there should be some spatial indexing like R* TREE or GRID to make these queries faster. Restrictive nature - Spatial queries are not restricted to only specific “properties” which is an important assumption on their part. E.g. Landmarks Tables should be partitioned in a better way rather than just handling one property per table! e.g. Grouping similar properties together based on domain knowledge.


Download ppt "SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,"

Similar presentations


Ads by Google