Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMP5338 – Advanced Data Models

Similar presentations


Presentation on theme: "COMP5338 – Advanced Data Models"— Presentation transcript:

1 COMP5338 – Advanced Data Models
Week 6: Graph Data Model and Introduction to Neo4j

2 Outline Brief Review of Graphs Examples of Graph Data
Graph Engines and Graph Databases Introduction to Neo4j COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

3 Graphs A graph is just a collection of vertices and edges
Vertex is also called Node Edge is also called Arc/Link COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

4 Type of Graphs Undirected graphs Directed graphs
Edges have no orientation (direction) (a, b) is the same as (b, a) Directed graphs Edges have orientation (direction) (a, b) is not the same as (b, a) COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

5 Representing Graph Data
Data structures used to store graphs in programs Adjacency list Adjacency matrix COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

6 Adjacency List Directed Graph Undirected Graph
COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

7 Adjacency matrix – Directed Graph
Undirected Graph COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

8 Outline Brief Review of Graphs Examples of Graph Data
Graph Engines and Graph Databases Introduction to Neo4j COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

9 Examples of graphs Social graphs Computer Network topologies
Organization structure Facebook, LinkedIn, etc. Computer Network topologies Data centre layout Network routing tables Road, Rail and Airline networks Biological data COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

10 Social Graphs and extension
A small social graph Related information can be captured in the graph Page 2 of the graph database book Page 3 of the graph database book COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

11 Social Graph with Various Relationships
Page 19 of the graph database book COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

12 Transaction information
Page 23 of the graph database book COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

13 Outline Brief Review of Graphs Examples of Graph Data
Graph Engines and Graph Databases Introduction to Neo4j COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

14 Typical way to store connected data
Who are Bob’s friends? SELECT p1.Person FROM Person p1 JOIN PersonFriend pf ON pf.FriendID = p1.ID JOIN Person p2 ON pf.PersonID = p2.ID WHERE p2.Person = ”Bob” Page 13 of the graph database book COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

15 RDBMS to store Graphs Who are friends with Bob? SELECT p1.Person
FROM Person p1 JOIN PersonFriend pf ON pf.PersonID = p1.ID JOIN Person p2 ON pf.FriendID = p2.ID WHERE p2.Person = ”Bob” COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

16 RDBMS to store Graphs Who are Alice’s friends-of-friends?
SELECT p1.Person AS PERSON, p2.Person AS FRIEND_OF_FRIEND FROM PersonFriend pf1 JOIN Person p1 ON pf1.PersonID = p1.ID JOIN PersonFriend pf2 ON pf2.PersonID = pf1.FriendID JOIN Person p2 ON pf2.FriendID = p2.ID WHERE p1.Person = ”Alice” AND pf2.FriendID <> p1.ID COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

17 Graph Technologies Graph Processing Graph Databases
take data in any input format and perform graph related operations OLAP – OnLine Analysis Processing of graph data Google Pregel, Apache Giraph Graph Databases manage, query, process graph data support high-level query language native storage of graph data OLTP – OnLine Transaction Processing possible OLAP – also possible COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

18 Data Models of Graph Databases
RDF (Resource Description Framework) Model Express node-edge relation as “subject, predicate, object” triple (RDF statement) SPARQL query language Examples: AllegroGraph, Apache Jena Property Graph Model Express node and edge as object like entities, both can have properties Various query language Examples Apache Titan Support various NoSQL storage engine: BerkeleyDB, Cassandra, HBase Structural query language: Gremlin Neo4j Native storage manager for graph data (Index-free Adjacency) Declarative query language: Cypher query language COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

19 Resource Description Framework
Proposed in 1999 by W3C to describe resources on the web as part of semantic web activity Designed to be read and understood by computers A major use case is to write ontologies and to support inference RDF identifies resources using URI and describe resources with properties and property values Descriptions are a collection of triples Subject – what does it describe (the resource) Predicate – what property of the subject Object – what is the value of the property Example – description of a web page Subject – index.html Predicate – creation date Object – 3 September, 2013 index.html has a creation-date of 3 September, 2013 COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

20 RDF/XML An XML syntax for RDF Describes the following:
<xml version=”1.0”?> <rdf:RDF xmlns:rdf=” xmlns:exterms=” <rdf:Description rdf:about=” <exterms:creation-date>September 3, 2013</exterms:creation-date> </rdf:Description> </rdf:RDF> Describes the following: has a creation-date of September 3, 2013 COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

21 RDF/Triples <rdf:RDF xmlns:rdf=" xmlns:foaf=" xmlns:rdfs=" <foaf:Person rdf:nodeID=“Pi"> <foaf:name>Pi</foaf:name> <foaf:age>14</foaf:age> <foaf:knows> <foaf:Person rdf:nodeID=“R.Parker”> <foaf:name>Richard Parker</foaf:name> </foaf:Person> </foaf:knows> </rdf:RDF> RDF based graph storage system storage and retrieval Triples using SQL-like declarative query language SPARQL Four Triples Pi foaf:name “Pi” Pi foaf:age “14” Pi foaf:knows R.Parker R.Parker foaf:name “Richard Parker” COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

22 Property Graph Model Proposed by Neo technology
No standard definition or specification Both Nodes and Edges can have property RDF model cannot express edge property in a natural and easy to understand way The actual storage varies The query language varies since: 1 year common: sailing COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

23 Outline Brief Review of Graphs Examples of Graph Data
Graph Engines and Graph Databases Introduction to Neo4j COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

24 Neo4j Native graph storage using property graph model
Index-free Adjacency Nodes and Relationships are stored Supports property indexes Replication Single master-multiple slaves replication Neo4j cluster is limited to master/slave replication configuration Database engine is not distributed Cypher – query language Also supports SPARQL We will discuss the internals and indexing next week COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

25 Property Graph Model Property graph has the following characteristics
It contains nodes and relationships Nodes contain properties Properties are stored in the form of key-value pairs A node can have labels Relationships connect nodes Has a direction, a type/name, start node, end node No dangling relationships (can’t delete node with a relationship) Properties Both nodes and relationships have properties Useful in modeling and querying based on properties of relationships COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

26 Property Graph ModelExample
COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

27 Property Graph Model: Nodes
COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

28 Property Graph Model: Relationships
COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

29 Property Graph Model: Properties
COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

30 Property Graph Model: Labels
A node may be labeled with any number of labels, including none, making labels an optional addition to the graph COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

31 Property Graph Model: Paths
A path is one or more nodes with connecting relationships, typically retrieved as a query or traversal result. A path of length zero A path of length one COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

32 Cypher Cypher is a query language specific to Neo4j
Easy to read and understand It uses patterns to represents core concepts in the property graph model It uses clauses to build queries; Certain clauses and keywords are inspired by SQL COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

33 Cypher patterns: node A single node Related nodes Labels
 A node is described using a pair of parentheses, and is typically given an identifier E.g.: (n) means a node n Related nodes Related nodes are expressed using an arrow between two nodes E.g. (a)-->(b); (a)-->(b)<--(c); or (a)-->()<--(c) Labels Label(s) can be attached to a node E.g.: (a:User)-->(b) Specifying properties Properties are a list of name value pairs enclosed in a curly brackets E.g.: (a { name: "Andres", sport: "Brazilian Ju-Jitsu" }) COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

34 Cypher patterns: relationships
Basic Relationships Directions are not important: (a)--(b) Named relationship: (a)-[r]->(b) Named and typed relationship: (a)-[r:REL_TYPE]->(b) Specifying Relationship that may belong to one of a set of types: (a)-[r:TYPE1|TYPE2]->(b) Typed but not named relationship: (a)-[:REL_TYPE]->(b) Relationship of variable lengths (a)-[*2]->(b) describes a path of length 2 between node a and node b (a)-[*3..5]->(b) describes a path of minimum length of 3 and maximum length of 5 between node a and node b Either bound can be omitted (a)-[*3..]->(b), (a)-[*..5]->(b) Both bounds can be omitted as well (a)-[*]->(b) COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

35 Cypher Clauses - Create
Create nodes or relationships with properties Create a node matrix1 with the label Movie CREATE (matrix1:Movie {title:'The Matrix', released:1999, tagline:'Welcome to the Real World'}) Create a node keanu with the label Actor CREATE (keanu:Actor {name:'Keanu Reeves', born:1964}) Create a relationship ACTS_IN CREATE (keanu)-[:ACTS_IN {roles:'Neo'}]->(matrix1) COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

36 Cypher – Read MATCH … RETURN Return all nodes
MATCH (n) RETURN n Return all nodes with a given label MATCH (movie:Movie) RETURN movie Return all actors in the movie “The Matrix” MATCH (a:Actor)-[:ACTS_IN]->(m:Movie{title:"The Matrix"}) RETURN a.name COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

37 Cypher - Update MATCH … SET/REMOVE … RETURN
Set all the age property for all actor nodes MATCH (n:Actor) SET n.age = n.born RETURN n Remove a property REMOVE n.age Remove a label MATCH (n:Actor{name:"Keanu Reeves"}) REMOVE n:Actor COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

38 Cypher - Delete MATCH … DELETE Delete relationship
MATCH (n{name:"Keanu Reeves"})-[r:ACTS_IN]->() DELETE r Delete a node and all possible relationship MATCH (m{title:'The Matrix'})-[r]-() DELETE m,r COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

39 Upcoming Assessment Quiz tonight in the lab!
Group assignment instruction can be viewed in Elearning and edlms The data set can be downloaded from ELearning Work in group of up to 3 students Both design and implementation parts Submit a report and demo your system On week 10 You can form group now An Elearning link will be available for group sign up soon COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)

40 References Ian Robinson, Jim Webber and Emil Eifrem, Graph Databases, Second Edition, O’Reilly Media Inc., June 2015 You can download this book from the Neo4j site, will redirect you to The Neo4j Manual v2.2.5 The Neo4j Graph Database ( Cypher Query Language ( Noel Yuhanna, Market Overview: Graph Databases, Forrester White Paper, May, 2015 Renzo Angeles, A Comparison of Current Graph Data Models, ICDE Workshops 2013 (DOI /ICDEW ) Renzo Angeles and Claudio Gutierrez, Survey of Graph Database Models, ACM Computing Surveys, Vol. 40, N0. 1, Article 1, February 2008 (DOI / ) COMP5338 "Advanced Data Models" (A. Fekete & Y. Zhou)


Download ppt "COMP5338 – Advanced Data Models"

Similar presentations


Ads by Google