Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan.

Similar presentations


Presentation on theme: "1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan."— Presentation transcript:

1 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

2 2 Maintenance of RDF Aggregate Views Introduction of RDF and RDQL RDQL Extension for Aggregate Views Aggregate View Maintenance Algorithms AMX Implementation and Experiments Related Work

3 3 Introduction Resource Description Framework (RDF)  W3C Recommendation  Represents metadata about resources identifiable on the web (by Uniform Resource Identifier (URI))  Triple: (Resource, Property, Value) (Artist, rdf:type, rdfs:Class) (Painter, rdf:type, rdfs:Class) (Painter, rdfs:subClassOf, Artist)

4 ]> ]> <rdf:RDF xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns1="http://www.auctionschema.com/schema1#"> Guy RDF Schema RDF Instance

5 ]> ]> <rdf:RDF xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns1="http://www.auctionschema.com/schema1#"> Guy Artist String Painter fname subClassOf &r1 Guy fname &r1 = http://www.artist.net#guyrose

6

7 7 RDQL: RDF Query Language SELECT?highprice WHERE (?artist,, "Rose"), (?artist,, "Guy"), (?artist,, ?artifact), (?artifact,, ?price), (?price,, ?highprice), (?artifact,, ?date) AND 2004-04-01 <= ?date <= 2004-04-30 USING ns1 FOR http://www.auctionschema.com/schema1#> graph pattern

8 8 RDQL Extension for Aggregates and Views CREATEVIEW AS SELECT max(?highprice) WHERE (?artist,, "Rose"), (?artist,, "Guy"), (?artist,, ?artifact), (?artifact,, ?price), (?price,, ?highprice), (?artifact,, ?date) AND 2004-04-01 <= ?date <= 2004-04-30 USING ns1 FOR http://www.auctionschema.com/schema1#>

9 9 Aggregate Query Aggregate operators, e.g. min, max, sum, count, average GROUP BY clause Output a table of tuples  Output can be (i) an RDF instance or (ii) a table  Advantage of (i): allows us to further query the result  However, (ii) allows any forms of tables, which include the possibility to output in the form of an RDF instance if the table consists of a set of RDF tuples.

10 We are expanding the syntax of RDQL so that it allows constants in SELECT clauses which equivalently creates new resources using the constants. For example, the previous query can be modified as follows CREATEVIEW AS SELECT,, max(?highprice) WHERE (?artist,, "Rose"), (?artist,, "Guy"), (?artist,, ?artifact), (?artifact,, ?price), (?price,, ?highprice), (?artifact,, ?date) AND 2004-04-01 <= ?date <= 2004-04-30 USING ns1 FOR http://www.auctionschema.com/schema1#> The result is a valid RDF statement (,,``800000"^^ns1:USD)

11 11 Aggregate View Maintenance Relational Approach  Store all triples in a relational table with schema (Resource, Property, Value) OR  Store resources and values of the same property in a separate relational table with schema (Resource, Value)  #self-joins = (#triples in where-clause) – 1  Large number of delta rules during relational view maintenance  expensive

12 12 Aggregate View Maintenance Our Approach  Localized search in RDF graphs  Modified version of breadth-first search starting at the inserted/deleted edge  auxiliary data are needed for certain aggregate views min, max, avg

13 13 Distributive Aggregate Function An aggregate function f is distributive w.r.t a source update operation if and only if  the updated value is based on its old value and update without reference to the source.  Examples: count, sum, average w.r.t. insertion, deletion and update  For average, we will need an additional attribute size which stores the size of intermediate result S in order to compute the correct updated value (or, we can use sum, count to calculate it) max and min are distributive w.r.t. insertion, but not deletion and update  Auxiliary data computed from S help to avoid the need to refer to the source.

14 graph pattern

15 BAG

16 800000

17 SELECT max(?highprice) BAG 800000, 500000

18 18 Compute Aggregates Algorithm CAA Algorithm CAA(I, Q) /* Input: RDF graph I, query Q */ /* Output: table T(Q, I) */ 1) GP  BuildGP(Q); X  aggregate variables of Q; 2) Y  GROUP BY variables of Q; 3) S  [VRetrieve(θ, GP, X U Y) | θ  MSearchAll(GP, Q, I)]; 4) Return T(Q, I)  TCompute(S, Q);

19 19 Aggregate View Maintenance Algorithms AMX AMI – Insertion AMD – Deletion AMT – Triple Modification AMR – Resource Modification

20 Update: Insertion BAG 800000, 500000 paints

21 BAG 800000, 500000 paints

22 SELECT max(?highprice) BAG 800000, 500000, 60000 paints

23 23 AMI for Insertion Algorithm AMI(I, Q, A(Q, I), T(Q, I), t) /* Input: RDF graph I, query Q, auxiliary data A(Q, I), query result T(Q, I), inserted triple t */ /* Output: table T(Q, I U t), auxiliary data A(Q, I U t) * 1) GP  BuildGP(Q); 2) X  aggregate variables of Q; 3) Y  GROUP BY variables of Q; 4) If TMatch(GP, t) == TRUE, then a) ΔS  [VRetrieve(θ, GP, X U Y) | θ  MSearch(GP, Q, t, I U t)]; b) return (T(Q, I U t), A(Q, I U t))  TMaintain I (T(Q,I), ΔS, A(Q, I), Q); 5) else, return (T(Q, I U t), A(Q, I U t))  (T(Q, I), A(Q, I));

24 24 Algorithm MSearch(GP, Q, t, I) /* Input: graph pattern GP, query Q, triple t, RDF graph I */ /* Output: Θ = {θ | θ is a pattern matching} */ 1) Θ   ; 2) for each t’  GP s.t.  θ’, t θ’ = t’ θ’, a) for each θ  bSearch(t, t’, GP, I), i. if θ satisfies the constraints in Q, then Θ  Θ U θ; 3) return Θ;

25 25 Handling GROUP BY From GROUP BY clause, each tuple in ΔS affects a particular group. TMaintain I only maintain each affected group (and its corresponding auxiliary data) using affecting tuples. Delete empty groups and insert new groups.

26 26 TMaintain I Handling sum, count, min, max  No auxiliary data required  Suppose f(x) is an aggregate function on attribute x, F the original result, F’ the new result F’ = F + if f = sum F’ = F + |ΔS| if f = count F’ = min([F] U π x (ΔS)) if f = min F’ = max([F] U π x (ΔS)) if f = max  π x (ΔS) projects a bag of values of x from ΔS

27 27 TMaintain I Handling average  We need size of S size’ = size+|ΔS|

28 BAG 800000, 500000, 60000 Update: Deletion paints

29 BAG 800000, 500000, 60000 paints

30 SELECT max(?highprice) BAG 500000, 60000 paints

31 31 AMD for Deletion Algorithm AMD(I, Q, A(Q, I), T(Q, I), t) /* Input: RDF graph I, query Q, auxiliary data A(Q, I), query result T(Q, I), deleted triple t */ /* Output: table T(Q, I - t), auxiliary data A(Q, I - t) * 1) GP  BuildGP(Q); 2) X  aggregate variables of Q; 3) Y  GROUP BY variables of Q; 4) If TMatch(GP, t) == TRUE, then a) ΔS  [VRetrieve(θ, GP, X U Y) | θ  MSearch(GP, Q, t, I)]; b) return (T(Q, I - t), A(Q, I - t))  TMaintain D (T(Q,I), ΔS, A(Q, I), Q); 5) else, return (T(Q, I - t), A(Q, I - t))  (T(Q, I), A(Q, I));

32 32 TMaintain D Handling min, max  Min and max are not distributive w.r.t. deletion  We need to store π x (S) which projects a bag of values of x from S  The new aggregate value F’ is obtained by: F’ = min(π x (S - ΔS)) if f = min F’ = max(π x (S - ΔS)) if f = max  We need to update π x (S) to become π x (S) - π x (ΔS)

33 33 Implementation and Experiment Implemented in Java Jena – RDQL Engine of HP Comparison with Relational Approach (standard view maintenance algorithm on relational tables)  Counting Algorithm in Gupta et al. "Maintaining Views Incrementally", SIGMOD 1993 Dataset: Chef Moz Project RDF dump Data stored in memory

34 34

35 35 Other Related Work Volz, Oberle, Studer [DBFUSION’02]  the first to introduce a view mechanism for RDF data  Their views require that 1. the results contain class instances (i.e., a subject or object variable), or 2. the result itself has the pattern of RDF statement (i.e., a triple containing subject, predicate and object). Magkanaraki et al [ISWC’03]  proposed RVL, a view definition language that can also create virtual RDF schemas and restructure class and property hierarchies such that new resources, property values, classes and property types can be created. None of these works specifically address (i) aggregates in RDF or (ii) the problem of maintaining aggregate RDF views.

36 36 Summary Aggregate Views are important for RDF applications RDQL Extension for Views and Aggregates Aggregate View Maintenance Algorithms AMX  Localized search in RDF graphs

37 37 Thank you very much! Questions and Answers


Download ppt "1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan."

Similar presentations


Ads by Google