Presentation is loading. Please wait.

Presentation is loading. Please wait.

GRIN: A Graph Based RDF Index Octavian Udrea 1 Andrea Pugliese 2 V. S. Subrahmanian 1 1 University of Maryland College Park 2 Università di Calabria.

Similar presentations


Presentation on theme: "GRIN: A Graph Based RDF Index Octavian Udrea 1 Andrea Pugliese 2 V. S. Subrahmanian 1 1 University of Maryland College Park 2 Università di Calabria."— Presentation transcript:

1 GRIN: A Graph Based RDF Index Octavian Udrea 1 Andrea Pugliese 2 V. S. Subrahmanian 1 1 University of Maryland College Park 2 Università di Calabria

2 2 Motivation Plenty of large RDF datasets:  TAP, GovTrack, ChefMoz, CIA World Factbook  Many many more (see rdfdata.org) Query languages: RDQL, RQL, SPARQL DB systems: Jena, Sesame, RDFBroker Indexing?  Based on relational database indexes  Has to be rooted in the characteristics of the query language

3 Contributions Lightweight mechanism for indexing large RDF datasets  GRIN: Graph-based RDF INdex Query answer algorithms for SPARQL-like queries Evaluation on two real-world datasets: TAP (Stanford) and ChefMoz (chefmoz.org) 3

4 Outline RDF data and queries The GRIN Index structure Answering queries Experimental evaluation 4

5 RDF graph example (ChefMoz) 5

6 RDF query example 6

7 Query example in SPARQL 7 X SELECT ?v1 ?v2 ?v3 WHERE { {(?v1 attire ?v3). (?v1 cuisine Italian)} {(?v2 attire ?v3). (?v2 cuisine Italian). (?v2 location Norfolk)} {(Norfolk locatedIn NE/USA)} } FROM ChefMoz

8 Native RDF systems: Jena2 Stores RDF as (subject, property, value) in a relational table Indexes on each of the three attributes Translates SPARQL/RDQL into SQL 8 X 6 self-joins

9 Native RDF systems: Sesame Broekstra et al., ISWC 2002 The Sesame SAIL API improves on Jena:  Supports RDF Schema inference  Separates RDFS from the triple table  Supports database schema generation based on the underlying RDF schema of a dataset The problem of too many joins remains 9

10 Native RDF systems: RDFBroker Sintek et al., ESWC 2006 The database schema is built based on signatures – the set of properties used on a resource Reduces the number of joins between tables 10

11 The human perspective 11

12 The human perspective 12

13 The human perspective 13

14 The human perspective 14

15 The human perspective 15

16 Outline RDF data and queries The GRIN Index structure Answering queries Experimental evaluation 16

17 GRIN intuition Resources “closer” in the RDF graph are more likely to be part of the same answer  Hence they should appear on the same page GRIN will group resources in circles around selected center resources Query evaluation:  Find the smallest circle that contains the answer  Evaluate query only on resources in that circle 17

18 The GRIN Index structure GRIN is a binary tree in which:  Leaf nodes are sets of resources (and the associated triples)  Inner nodes are circles consisting of a center resource and a radius  Each node is fully contained in its parent Distance metric: shortest path distance in the undirected graph 18

19 Building the index: clustering 19

20 Building the index: clustering 20

21 Building the index: clustering 21

22 Building the index: clustering 22

23 Building the index: clustering Standard k-medoids clustering (Kaufman & Rousseeuw, 1987) How many clusters?  R is the set of resources  M is the maximum number of resources per page Average link gives the best performance for the inter-cluster distance 23

24 Building the index: the tree 24

25 Building the index: the tree 25

26 Building the index: the tree 26

27 Outline RDF data and queries The GRIN Index structure Answering queries Experimental evaluation 27

28 Queries to constraints Extract constraints from the query:  d(?v1, Italian) ≤ 1  d(?v2, Norfolk) ≤ 1  d(?v3, Italian) ≤ 2  …and so on 28

29 Query evaluation 29 Goal: identify the smallest circle that is guaranteed to contain an answer to the query 1. Perform a depth-first traversal 2. For each index node, evaluate the constraints 3. If the constraints guarantee an answer, perform subgraph matching

30 Query evaluation 30

31 Evaluating constraints Constraints:  d(?v1, Italian) ≤ 1, d(?v2, Norfolk) ≤ 1, d(?v3, Italian) ≤ 2 Question: is ?v1 in the circle (Grivanti, 3)?  d(Grivanti,?v1) ≤ d(Grivanti, Italian) + d(?v1, Italian) ≤ 1 + 1 = 2  ?v1 must be in the circle (Grivanti, 3) 31

32 Evaluating constraints Question: is ?v3 in (Grivanti, 3)?  d(Grivanti, ?v3) ≤ d(Grivanti, Italian) + d(Italian, ?v3) ≤ 1 + 2 = 3  ?v3 must be in (Grivanti, 3)  Similarly, ?v2 is in the same circle 32

33 Subgraph matching Perform subgraph matching on the resources in the circles guaranteed to contain an answer  Algorithm by Cordella et. al, IEEE PAMI 26(10), 2006 Worst-time complexity of O(N!)  Where N is the maximum number of nodes in either graph  In practice, GRIN makes N very small 33

34 Outline RDF data and queries The GRIN Index structure Answering queries Experimental evaluation 34

35 Experimental framework Comparison between GRIN, Sesame, Jena2 and RDFBroker (in-memory)  Index build time  Memory consumption at query time  Query time Two real-world datasets:  TAP (Stanford): datasets between 1.5MB and 300MB  ChefMoz (chefmoz.org): 220 MB 35

36 Index build time 36

37 Memory consumption 37

38 Query time 38

39 Average degree of a query node 39

40 Conclusions Method for indexing large RDF graphs adapted to the characteristics of RDF queries Avoids expensive join operations Gives better query times than Jena2, Sesame and RDFBroker Current and future work:  Disk-based index  Analysis of overlap and coverage 40


Download ppt "GRIN: A Graph Based RDF Index Octavian Udrea 1 Andrea Pugliese 2 V. S. Subrahmanian 1 1 University of Maryland College Park 2 Università di Calabria."

Similar presentations


Ads by Google