Presentation is loading. Please wait.

Presentation is loading. Please wait.

G-CORE: The LDBC Graph Query Language Proposal

Similar presentations


Presentation on theme: "G-CORE: The LDBC Graph Query Language Proposal"— Presentation transcript:

1 G-CORE: The LDBC Graph Query Language Proposal
LDBC GraphQL Task Force: Renzo Angles, Marcelo Arenas, Pablo Barceló, Peter Boncz, George Fletcher, Alastair Green, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Arnau Prat, Oskar van Rest, Hannes Voigt

2 What, how and where Goals:
Recommend a query language core that could be incorporated in future versions of industrial graph query languages. Perform deep academic analysis of the expressiveness and complexity of evaluation of the query language To ensure a powerful yet practical query language

3 What, how and where 2 years of work in the GraphQL task force
3-day workshop in Santiago de Chile (Aug 8 – 10, 2017) Writing workshop for SIGMOD industrial paper (Dagstuhl, Nov 26 - Dec 1, 2017)

4 Principle: closed query language
A query should have as input and output a graph Queries can be composed To allow subqueries and views We consider the property graph data model A query can produce a set of paths as output, so the data model needs to be extended

5 Principle: Paths are first class citizens
The property graph data model is extended to have paths as first-class citizens Extended property graphs are defined as: “directed graphs with paths, which can have labels and properties on nodes, edges and paths” In particular, paths can have labels and property values For example, length is a property of paths

6 Extended property graphs
Assume that: L is an infinite set of label names for nodes, edges and paths K is an infinite set of property names V is an infinite set of literals (actual values) Moreover, given a set X, assume that SET(X) is the set of all finite subsets of X, and LIST(X) is the set of all finite lists of elements from X

7 Extended property graphs
An extended property graph is a tuple G = (N, E, P, ρ, δ, λ, σ), where: N is a finite set of nodes E is a finite set of edges ρ : E → (N × N) λ : (N ∪ E ∪ P) → SET(L) σ : (N ∪ E ∪ P) × K → SET(V) We assume that there is a finite set of tuples (a,b) in (N ∪ E ∪ P) × K such that σ(a,b) is not the empty set

8 Extended property graphs
An extended property graph is a tuple G = (N, E, P, ρ, δ, λ, σ), where: P is a finite set of paths We assume that N, E and P are pairwise disjoint δ : P → LIST(N ∪ E) We assume that for every path p ∈ P, either δ(p) = [a] with a ∈ N or δ(p) = [a1, e1, a2, ..., an, en, an+1], with: n ≥ 1 aj ∈ N for every 1 ≤ j ≤ n+1 ej ∈ E for every 1 ≤ j ≤ n ρ(ej) = (aj, aj+1) or ρ(ej) = (aj+1, aj) for every 1 ≤ j ≤ n

9 Principle: efficient evaluation
The query language was carefully designed to have tractable evaluation (in data complexity) We decided to use a shortest path semantics as its interaction with regular expressions does not lead to intractability As opposed to a semantics based on simple paths It connects the practical work done in industrial proposals with the foundational research on graph databases

10 Principle: expressive query language
The query languages includes a powerful graph construction mechanism It also includes path expressions This is a powerful mechanism that allows, among other features, to express regular expressions with variables

11 Syntax of G-CORE: basic notation
Variables: x, y, z, … node: (x) edge: [y] path: /z/ Given a variable u, we use notation: u:L L is a label of a node, edge or path u u:L { p = v } the value of property p is v for u

12 Syntax of G-CORE: basic notation
Given a path variable p, we can use notation p<r> where r is a regular expression It indicates that the labels of the edges of the path form a string conforming to the regular expression r In regular expressions: Kleene star: r* union: r1 + r2 concatenation: r1 r2

13 Syntax of G-CORE: queries
query ::= {  graphClause  }, fullGraphQuery; graphClause ::= “GRAPH”, identifier, “AS“, fullGraphQuery; fullGraphQuery ::= basicGraphQuery,  { setOperation, basicGraphQuery }; setOperation ::=   “UNION”  |  “INTERSECTION”  |  “EXCEPT”;

14 Syntax of G-CORE: basic graph queries
basicGraphQuery ::= constructGraphClause,  matchClause; matchClause uses operator MATCH It matches nodes, edges and paths in a graph, resulting in a binding table constructGraphClause uses operator CONSTRUCT GRAPH It projects a binding table into a graph

15 MATCH clause MATCH binds variables using pattern matching
Its semantics is based on homomorphisms it results in an (imaginary) binding table where the columns are the variables and the rows the bindings We use set semantics for binding tables, so a binding table is a set of tuples, where each tuple has a binding (value) for each variable MATCH (a:Person {age = 21} )-[:knows]->(b)-[:knows]->(c) MATCH (a:Person {age = 21})-/p<:friend*>/->(b:Person {age = 21}) MATCH (a:Person {age = 21} )-/<:knows*>/->(b)-/<:knows*>/->(c) MATCH (a)-/p<:knows*> {confidence = 0.9}/-(b)

16 MATCH clause MATCH distinguishes between an edge label and a regular expressions composed by a single edge label: MATCH (a)-/p<:knows>/-(b) // p binds to a path object MATCH (a)-[p:knows]-(b) // p binds to an edge object It is possible to check for multiple labels: (x:L1:L2), [y:L3:L4:L5] and /z:L6:L7/ The semantics of these expressions is conjunctive

17 MATCH clause Variables are typed depending to which kind of object (node, edge, or path) they are assigned The expression (x)-[x]->(y) is invalid

18 MATCH (x)-[:name]->(y) OPTIONAL (x)-[:phone]->(z)
Optional matchings The use of an operator for obtaining optional matchings has been identified as an important feature A MATCH clause is then of the form: MATCH p1, p2, ..., pk OPTIONAL q1, q2, ..., ql OPTIONAL r1, r2, ..., rm where each pi, qi, and ri is a basic graph pattern Example: MATCH (x)-[:name]->(y) OPTIONAL (x)-[:phone]->(z)

19 Optional matchings There is only one level of nesting for the OPTIONAL operator MATCH p1, p2 OPTIONAL q1, q2 OPTIONAL r1, r is evaluated as follows: Each basic graph pattern pi, qi, ri is matched against the input graph producing binding tables Pi, Qi and Ri The final result is: ((P1 JOIN P2) LEFT OUTER JOIN (Q1 JOIN Q2)) LEFT OUTER JOIN (R1 JOIN R2)

20 Graph source The graph where the matching operates on can be specified by ON after a basic graph pattern The specification is optional. If no graph is specified the matching will operate on the base graph If a graph is specified, the matching will operator on that graph Graphs can be specified by an identifier for named graphs or by a subquery

21 Graph source Recall that in a query we can define several named graphs: query ::= {  graphClause  }, fullGraphQuery; graphClause ::= “GRAPH”, identifier, “AS“, fullGraphQuery; Example: GRAPH Employees AS … MATCH (x)-[:name]->(y) ON Employees

22 Syntax of G-CORE: MATCH clause
matchClause ::= “MATCH”, fullGraphPattern, [ whereClause ], { “OPTIONAL”, fullGraphPattern, [ whereClause ] }; fullGraphPattern ::= basicGraphPatternWithLocation, { “,”, basicGraphPatternWithLocation }; basicGraphPatternWithLocation ::= basicGraphPattern, [ “ON”, subQuery | identifier ];

23 Syntax of G-CORE: MATCH clause
subQuery ::= “(”, fullGraphQuery, ”)”; basicGraphPattern ::= nodePattern, { relationshipPattern, nodePattern }; relationshipPattern ::= edgePattern | pathPattern;

24 Syntax of G-CORE: MATCH clause
nodePattern: (a:Person {age = 21} ) edgePattern: -[:knows]-> -[:knows {confidence = 0.9}]-> pathPattern: -/p<:knows*> {confidence = 0.9}/- -/:Ancestor/->

25 WHERE clause WHERE filters bindings resulting from the MATCH clause whereClause ::= “WHERE”, filterExpression; filterExpression ::= atomicExpression | “(“, filterExpression, “AND”, filterExpression, “)” | “(“, filterExpression, “OR”, filterExpression, “)” | “(“, ”NOT”, filterExpression, “)”;

26 Filter expressions: main components
Logical connectives: AND, OR, NOT Grouping: parenthesis Graph Functions: labels(n), properties(n), nodes(p), rels(p), startNode(r), endNode(r), length(p), bound(n), … Standard Functions: abs( ), sin( ), …

27 Filter expressions: main components
Label tests: n:Person true if n has the label Person n:_ true if n has any label Collections (sets or lists): Membership test: element IN collection Indexing: collection[elem] Slicing: collection[slice]

28 Filter expressions: main components
IN is used to construct more expressive label tests: Person IN labels(n) Student IN labels(n) OR Professor IN labels(n) Student IN labels(n) AND NOT SoccerFan IN labels(n)

29 Filter expressions: main components
Notation x.prop, where x is a variable and prop is a property name, can be used in filter expressions x.first_name = “Peter” It is true if one of the values in x.first_name is “Peter” Recall that x.first_name is a set x.confidence < y.confidence It is true if x.confidence = { value1 }, y.confidence = { value2 }, and value1 < value2

30 Filter expressions: main components
Subqueries can be used in filter conditions: atomicExpression ::= … | existentialSubquery; existentialSubquery ::= "EXISTS", fullGraphQuery; Subqueries can refer to variables from outer scope

31 Examples of existential subqueries
MATCH (x)-[:friend]->(y) WHERE EXISTS { (x)-[:friend]->(z)-[:friend]->(y) } MATCH (x)-[:friend]->()-[:friend]->(z) WHERE NOT EXISTS { (x)-[:friend]->(z) } WHERE clause does not introduce new bindings in the outer scope It just filters MATCH bindings

32 CONSTRUCT GRAPH clause
Recall that a basic graph query is defined as: basicGraphQuery ::= constructGraphClause, matchClause; A CONSTRUCT GRAPH clause is defined as: constructGraphClause ::= “CONSTRUCT GRAPH”, fullProjectionPattern, [ setStatement ], [ removeStatement ] ;

33 Projection patterns fullProjectionPattern ::= basicProjectionPatternWithCondition, { “,”, basicProjectionPatternWithCondition } ; basicProjectionPatternWithCondition ::= basicProjectionPattern, [ “WHEN”, expression ] ;

34 A simple example CONSTRUCT GRAPH (a)-[:knows]->(b)
MATCH (a)-[:friend]->(b) (a)-[:knows]->(b) is a basic projection pattern MATCH clause produces a binding table Each tuple in this table has bindings for variables a and b Basic projection pattern (a)-[:knows]->(b) is instantiated on every tuple of the binding table Each instantiation forms a small graph isomorphic to the basic projection pattern The result of the query is the union of all pattern instantiations on the given binding table

35 A simple example CONSTRUCT GRAPH (a)-[:knows]->(b)
MATCH (a)-[:friend]->(b) Each node pattern and edge pattern with no variable assigned by the user gets an anonymous variable assigned The query above is equivalent to: CONSTRUCT GRAPH (a)-[e:knows]->(b)

36 A simple example CONSTRUCT GRAPH (a)-[e:knows]->(b)
MATCH (a)-[:friend]->(b) Given a tuple in a binding table, the instantiation of the variables in a basic projection pattern is done as follows: If the variable is bound in the tuple (it has a value), then the variable is assigned its value in the tuple If the variable is unbound in the tuple (it does not have a value), then it is assigned a newly created object of the respective kind

37 A simple example CONSTRUCT GRAPH (a)-[e:knows]->(b)
MATCH (a)-[:friend]->(b) In this example, for each tuple in the binding table produced by the MATCH clause: a and b are assigned the values in the tuple e is assigned a newly created object of type edge (a)-[:knows]->(b) is replaced by (a)-[e:knows]->(b) as we need to create an edge between nodes a and b anyway

38 Some more simple examples
CONSTRUCT GRAPH (a)-[e:knows]->(c) MATCH (a)-[:friend]->(b) CONSTRUCT GRAPH (c)-[e:knows]->(d)

39 Objects are immutable An object cannot be modified by the query language We are defining a query language, not an update language The following query is incorrect: CONSTRUCT GRAPH (a:UnrequitedLove) MATCH (a)-[:loves]->(b) WHERE NOT (b)-[:loves]->(a) as it modifies the node object stored in a

40 Objects are immutable Copy patterns like (=n), ()-[=e]->() or (a)-/=p/->(b) can be used in the query language This copies all labels and properties for a created object Additional labels and properties can be specified for the created object The following query is correct: CONSTRUCT GRAPH (=a:UnrequitedLove) MATCH (a)-[:loves]->(b) WHERE NOT (b)-[:loves]->(a)

41 Conditional construction
Assume that property degree for an edge (a)-[:knows]->(b) specifies how well a knows b It is a value from 0 to 1 The following query uses a condition when constructing the output graph: CONSTRUCT GRAPH (a)-[:knowsWell]->(b) WHEN e.degree > 0.8 MATCH (a)-[e:knows]->(b)

42 Use of several WHEN conditions
CONSTRUCT GRAPH (a)-[:knowsWell]->(b) WHEN e.degree > 0.8 (a)-[:barelyKnows]->(b) WHEN e.degree < 0.1 MATCH (a)-[e:knows]->(b)

43 Adding and removing labels and properties
Recall that a CONSTRUCT GRAPH clause is defined as: constructGraphClause ::= “CONSTRUCT GRAPH”,  fullProjectionPattern,      [  setStatement  ], [  removeStatement  ] ; setStatement can be used to add labels and properties to an unbound variable In this context a variable is unbound if it does not occur in the MATCH clause, so no value is assigned to it in the MATCH clause Recall that objects are immutable removeStatement can be used to remove labels and properties from an unbound variable

44 Adding and removing labels and properties
Adding some labels and properties: SET n.value := 1 SET labels(n) := labels(m) SET properties(n) := properties(m) SET n:Person Removing some labels and properties: REMOVE n.value REMOVE n:Person

45 Aggregating graph projection
CONSTRUCT GRAPH (x)-->(y) MATCH (a)-->(b) 2 4 5 1 3 6 7

46 Aggregating graph projection
CONSTRUCT GRAPH (x GROUP a)-->(y) MATCH (a)-->(b) 2 5 1 4 3 6

47 Aggregating graph projection
CONSTRUCT GRAPH (x)-->(y) MATCH (a)-->(b) is equivalent to: CONSTRUCT GRAPH (x GROUP a,b)-->(y) and also to: CONSTRUCT GRAPH (x GROUP a,b)-->(y GROUP a,b)

48 Aggregating graph projection
All grouping variables have to be bound variables in the binding tables Unbound node patterns not specifying grouping variables are implicitly grouped by all variables bound in the binding tables Unbound edge patterns are implicitly grouped by the grouping variables of their adjacent nodes

49 A more elaborated example
We would like to have a query that removes the labels and properties from an input graph It just keeps the structure of the input graph A first failed attempt: CONSTRUCT GRAPH (x)-[u]->(y) MATCH (a)-[e]->(b)

50 A more elaborated example
A second failed attempt: CONSTRUCT GRAPH (=a)-[=e]->(=b) REMOVE … MATCH (a)-[e]->(b) This attempt does not work even if we use the REMOVE clause to remove the labels and properties in a, b and e

51 A more elaborated example
A successful attempt: CONSTRUCT GRAPH (x GROUP a)-[u GROUP e]->(y GROUP b) MATCH (a)-[e]->(b)

52 Existential subqueries revisited
MATCH (x)-[:friend]->()-[:friend]->(z) WHERE NOT EXISTS { (x)-[:friend]->(z) } is a shorthand for: WHERE NOT EXISTS { CONSTRUCT GRAPH (x)-[:friend]->(z) MATCH (x)-[:friend]->(z) }

53 PATH clause We initially define a query as: query ::=
{  graphClause  }, fullGraphQuery; But in the G-CORE grammar is defined as: {  pathClause  |  graphClause  }, fullGraphQuery;

54 PATH clause A PATH clause is used to define path patterns
Syntactically, a path starts on the left with a vertex, and ends at its rightmost vertex PATH name AS ()-[a:A]->()-[b:B]->() The pattern can be augmented to be nonlinear, by using BRANCH PATH name AS (x)-[a:A]->()-[b:B]->(y) BRANCH (x)-[c:C]->(y), (y)-[d:D]->(x)

55 PATH clause A PATH clause can have a COST clause:
PATH name AS ()-[a:A]-()-[b:B]-()-[c:C]-() COST a.value + b.value + c.value A PATH clause can have a WHERE clause: PATH unrequited_love AS (a) -[:loves]->(b) WHERE NOT (b)-[:loves]->(a) which is equivalent to: WHERE NOT EXISTS { (b)-[:loves]->(a) }

56 PATH usage in MATCH Match pairs of friends liking the same type of music: PATH same AS (a:Person)-[:friend]->(b:Person) WHERE a.preferredMusic = b.preferredMusic MATCH (x)-/<~same>/->(y) Pairs of people connected by a path of friends liking the same type of music: MATCH (x)-/<~same*>/->(y)

57 PATH usage in MATCH Chains of friends liking the same type of music:
PATH same AS (a:Person)-[:friend]->(b:Person) WHERE a.preferredMusic = b.preferredMusic MATCH ()-/p<~same*>/->() MATCH ()-/10 SHORTEST p<~same*>/->()

58 PATH usage in MATCH Chains of friends liking the same type of music:
PATH same AS (a:Person)-[:friend]->(b:Person) WHERE a.preferredMusic = b.preferredMusic MATCH ()-/10 DISJUNCT SHORTEST p<~same*>/->()

59 A G-CORE example PATH conf_publ AS ()-[p:PublishedIn]->(:Conference) WHERE NOT p.type = "proceedings" PATH conf_coauthor AS ()<-[:Author]-(p)-[:Author]->() WHERE (p)-/<~conf_publ>/->() CONSTRUCT GRAPH (a)-[e:Coauthor]->(b) SET e.length = length(p) MATCH (a)-/p<~conf_coauthor*>/->(b)

60 Summary G-CORE is a closed query language
Paths are first class citizens in G-CORE The data model used in G-CORE is an extension of property graphs with paths G-CORE was carefully designed to have tractable evaluation G-CORE is an expressive query language   It includes a powerful graph construction mechanism and path expressions

61 Summary Language features:
MATCH matches nodes, edges and paths in a graph, resulting in a binding table WHERE filters the tuples of a binding table CONSTRUCT GRAPH projects a binding table into a graph PATH is used to define path patterns G-CORE also includes CONSTRUCT TABLE to project a binding table into a table

62 CONSTRUCT TABLE clause
query ::= { pathClause | graphClause }, fullQuery; fullQuery ::= fullGraphQuery | tabularQuery; tabularQuery ::= constructTableClause, matchClause;

63 CONSTRUCT TABLE clause
CONSTRUCT TABLE is used in constructTableClause to project a binding table into a table CONSTRUCT TABLE includes a list of items to be returned, where each item is: a variable bound to a graph object (nodes, edges, or paths), or an expression followed by the keyword AS and the name of an unbound variable

64 An example of CONSTRUCT TABLE
CONSTRUCT TABLE a, r.weight, 2*r.weight AS doubleWeight, b.name AS name MATCH (a)-[r]->(b) This returns records with the fields a, r.weight, doubleWeight, name

65 Final comments Multivalued properties can be dealt with by inline aggregation, reducing them to single values An existential subquery in a WHERE clause was defined as: existentialSubquery ::= "EXISTS", fullGraphQuery; but it is defined as follows in G-CORE: existentialSubquery ::= "EXISTS", fullQuery;


Download ppt "G-CORE: The LDBC Graph Query Language Proposal"

Similar presentations


Ads by Google