Graph Database - Neo4j ISQS3358, Spring 2016
Graph Database A graph database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data. Graph databases employ nodes, properties, and edges. Nodes represent entities such as people, businesses, accounts, or any other item you might want to keep track of. Properties are pertinent information that relate to nodes. Edges are the lines that connect nodes to nodes, or nodes to properties and they represent the relationship between the two. Most of the important information is stored in the edges.
Graph Database What are graph databases & When to use a graph database, 3’54”, Graph database case – money laundering, 3’26” Graph databases: Neo4J, 5’11” Neo4J Titan, 4’51” Titan GraphX Use Cases for Neo4j
Neo4j
About Neo4j Introduced in 2010 Open Source tool Java-based Graphical Database Neo is a database designed for network-oriented data It uses Cypher as graph query language
Neo4j, the Graph Database A Graph Database: A Property Graph contains Nodes, Relationships with Properties on both Perfect for highly connected data A Graph Database: A declarative query language, called Cypher Scalable: could have a social network of multiple earths High-performance and reliability with High-availability
Neo4J Model
Neo4j Storage Record Layout
Traversals – how do they work? Relationship Expanders: given (a path to) a node, returns Relationships to continue traversing from that node Evaluators: given (a path to) a node, returns whether to: Continue traversing on that branch (i.e. expand) or not Include (the path to) the node in the result set or not Then a projection to Path, Node or Relationship applied to each path in the result set Uniqueness level: policy for when it is ok to revisit a node that has already been visited
Cypher - Just convenient traversal descriptions? Builds on the same infrastructure as Traversals - Expanders but not on the full Traversal system Uses graph pattern matching for traversing the graph Recursive matching with backtracking START x=... matching x-->y, x-->z, y-->z, z-->a-->b, z-->b
Neo4j Adoption
Benefits of using Neo4J Organizes data in Networks Representation is natural and intuitive High performance traversal over domain data Captures semi-structured data easily, which is impossible in a relational database Encourages agile methodologies Lower maintenance costs Shorter development times
Drawbacks Since Neo4j utilizes navigational model, it is hard to execute arbitrary queries Ex: “how many of my customers over age 25 and a last name that starts with an F have purchased items the last two months?” Lacks in tool and framework support
From SQL to Cypher Cypher queries end with a return statement rather than begin with what you want to return as in SQL
Where is Neo4j used? Master Data Management Network and Data Centre Real-Time Recommendations Identity and Access Management Digital Asset Management Fraud Detection Social Media
Combining Neo4J and Hadoop Hadoop is good for data crunching, but the end-results in flat files, which is hard to visualize your network data. Neo4J is perfect for working with networked data Method: Prepare data using HIVE, which is then transformed into MapReduce jobs The MapReduce jobs are utilized to create nodes and relationships in Neo4J Make Neo4J’s batch importer read the files from the cluster directly Perform necessary steps to describe the nodes, relationships and their properties.
Case Study
Demo – Neo4j
Demo..
Install Neo4J thanks/?edition=community&flavour=winstall64&release=2.3.3 &_ga=
Big Data Exercises