Download presentation
Presentation is loading. Please wait.
1
A gentle introduction to graph databases
Michael Green A gentle introduction to graph databases
2
Michael Green Slides available afterwards DBA.SE chat
3
Plan of Attack What are graphs Databases - graph versus relational SQL Server’s functionality (demo) Where it might end up
4
What are Graphs “In a mathematician's terminology, a graph is a collection of points and lines connecting some .. of them.“ [1] Graph databases have a mathematical basis, as do relational. Point = nodes = vertex Line = edge = arc Points need not be connected. Edges connect exactly two nodes. [1]
5
Graph Features A node need not be connected There is no upper limit on how many other nodes a node can connect to An edge connects exactly two nodes
6
Graph Features Directed / undirected Cyclic / acyclic Property graphs (weighted) Connected / discrete components Simple / multigraph - at most one edge between two nodes Self-connected nodes Labelled Some terminology Directed: *I* sent and *to* you – direction Labels – multiple labels are OK
7
A simple example Michael (Person) SQL Saturday (Event) Presenting at
Is about Works with SQL Server (Product) This is a directed graph. Edges are directed: there is a “from” and a “to”. Both nodes and edges can have properties (the name) and labels. In degree and out degree. A node can be disconnected in = out = zero. This one is cyclic.
8
Example: Tree Directed / undirected Cyclic / acyclic Property graphs (weighted) Connected / discrete components Simple / multigraph - at most one edge between two nodes Self-connected nodes Organisation’s org chart B-Tree Query plan An ERD is a graph but not a Tree
9
Example: Roads Directed / undirected – one-way streets Cyclic / acyclic Property graphs (weighted) – speed limits, tolls Connected / discrete components (Tasmania?) Simple / multigraph - at most one edge between two nodes Self-connected nodes
10
Directed? Service versus line
Weighted (journey time) Cyclic mostly a tree except for the loop (the clue is in the name) and a few others Connected – not interested in stations to which we cannot take a train.
11
The internet in 2003 - http://www.opte.org/the-internet/
Directed, unweighted, cyclic, multigraph, self-connected The internet in
12
What are Graph Databases
“A database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data.” [1] If the DBMS presents its interface as nodes & edges it is a graph database. [1]
13
What are Graph Databases
A node is the “thing” An edge is how things are connected Both can have properties Edges and nodes are “labelled” i.e. enumerable i.e. can have a PK
14
What are Graph Databases
It’s about how the DBMS interfaces with consumers Many internal representations are possible On-disk storage is not a determinant Key-value, relational and graph can solve any given problem In-memory, columnstore, fixedvar – they’re all relational
15
Graph on the persistence spectrum
Flat file Key – value Column family Relational Graph
16
Thinking in graphs Files & key-value rows & fields Relational sets, selections & projections Graph sets & paths
17
Graph DB Features Directed / undirected Cyclic / acyclic Property graphs (weighted) Connected / discrete components Simple / multigraph - at most one edge between two nodes Self-connected nodes Some terminology Specifically, DBs are directed
18
Graph versus Relational
Entity type -> table Entity instance -> row Relationship -> FK Normalisation DRI The model enforces no container that corresponds to a table. Products are moving toward stronger schema. Labels take the role of defining types. No limit on which nodes an edge can connect (cf joining on non-FKs e.g. shoe_size <-> house_number <-> description)
19
Graph versus Relational
Entity type -> table Entity instance -> row Relationship -> FK Normalisation DRI Entity type -> ? (label) Entity instance -> node Relationship -> edge Multi-role permitted Not mandated The model enforces no container that corresponds to a table. Products are moving toward stronger schema. Labels take the role of defining types. Edges can have properties; foreign keys cannot. No limit on which nodes an edge can connect (cf joining on non-FKs e.g. shoe_size <-> house_number <-> description)
20
Use Cases Where connectedness is as, or more, important than content Social – friends of friends (of friends of friends …) – especially indeterminate depth Fraud detection – “Is X connected to failed companies?” – “Are the parties in this transaction suspiciously connected?” Network modelling – “If this router goes down what services are lost?” Code dependency analysis – “If I change this data type, where must I re-program?” Crime detection – metadata Many other examples.
21
SQL Server Demo
22
Alternatives MS GraphEngine MS Azure CosmosDB
Many other vendors – DBEngine, Wikipedia Neo4j, Cypher query language DataStax: Graph on Cassandra, Gremlin programming API NodeXL for MS Excel Cypher / Gremlin is like SQL / LINQ
23
Neo4j Example
24
Where it might end up RC1 release blog [1] ALTER existing tables to graph tables Extended to temporary, in-memory etc Transitive closure Pollymorphism Improved syntax, as for joins [1]
25
Where it might end up GPUs – node-wise parallel execution R / Python / data science & AI Visualisation in SSMS, SSRS SSAS – OLAP graphs LINQ to SQL Graph Path analytic functions SSMS, SSRS present geography results differently
26
Some links Graph processing with SQL Server
Graph version of Wide World Importers Graph Engine Azure Cosmos DB
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.