Presentation is loading. Please wait.

Presentation is loading. Please wait.

NoSQL: Graph Databases. Databases Why NoSQL Databases?

Similar presentations


Presentation on theme: "NoSQL: Graph Databases. Databases Why NoSQL Databases?"— Presentation transcript:

1 NoSQL: Graph Databases

2 Databases Why NoSQL Databases?

3 Trends in Data

4 Data is getting bigger: “Every 2 days we create as much information as we did up to 2003” – Eric Schmidt, Google

5 Data is more connected: Text HyperText RSS Blogs Tagging RDF

6 Trend 2: Connectedness Information connectivity Text Documents Hypertext Feeds Blogs Wikis UGC Tagging Folksonomies RDFa Onotologies GGG

7 Data is more Semi-Structured: If you tried to collect all the data of every movie ever made, how would you model it? Actors, Characters, Locations, Dates, Costs, Ratings, Showings, Ticket Sales, etc.

8 Architecture Changes Over Time DB Application 1980’s: Single Application

9 Architecture Changes Over Time DB Application 1990’s: Integration Database Antipattern Application

10 Architecture Changes Over Time 2000’s: SOA DB Application DB Application DB Application RESTful, hypermedia, composite apps

11 Side note: RDBMS performance Salary list Most Web apps Social Network Location-based services

12 NOSQL Not Only SQL

13 Less than 10% of the NOSQL Vendors

14 Four NOSQL Categories

15 Key Value Stores Most Based on Dynamo: Amazon Highly Available Key-Value Store Data Model: – Global key-value mapping – Big scalable HashMap – Highly fault tolerant (typically) Examples: – Redis, Riak, Voldemort

16 Key Value Stores: Pros and Cons Pros: – Simple data model – Scalable Cons – Create your own “foreign keys” – Poor for complex data

17 Column Family Most Based on BigTable: Google’s Distributed Storage System for Structured Data Data Model: – A big table, with column families – Map Reduce for querying/processing Examples: – HBase, HyperTable, Cassandra

18 Column Family: Pros and Cons Pros: – Supports Simi-Structured Data – Naturally Indexed (columns) – Scalable Cons – Poor for interconnected data

19 Document Databases Data Model: – A collection of documents – A document is a key value collection – Index-centric, lots of map-reduce Examples: – CouchDB, MongoDB

20 Document Databases: Pros and Cons Pros: – Simple, powerful data model – Scalable Cons – Poor for interconnected data – Query model limited to keys and indexes – Map reduce for larger queries

21 Graph Databases Data Model: – Nodes and Relationships Examples: – Neo4j, OrientDB, InfiniteGraph, AllegroGraph

22 Graph Databases: Pros and Cons Pros: – Powerful data model, as general as RDBMS – Connected data locally indexed – Easy to query Cons – Sharding ( lots of people working on this) Scales UP reasonably well – Requires rewiring your brain

23 What are graphs good for? Recommendations Business intelligence Social computing Geospatial Systems management Web of things Genealogy Time series data Product catalogue Web analytics Scientific computing (especially bioinformatics) Indexing your slow RDBMS And much more!

24 What is a Graph?

25 An abstract representation of a set of objects where some pairs are connected by links. Object (Vertex, Node) Link (Edge, Arc, Relationship)

26 Different Kinds of Graphs Undirected Graph Directed Graph Pseudo Graph Multi Graph Hyper Graph

27 More Kinds of Graphs Weighted Graph Labeled Graph Property Graph

28 What is a Graph Database? A database with an explicit graph structure Each node knows its adjacent nodes As the number of nodes increases, the cost of a local step (or hop) remains the same Plus an Index for lookups

29 Relational Databases

30 Graph Databases

31

32

33

34 Neo4j Tips Each entity table is represented by a label on nodes Each row in a entity table is a node Columns on those tables become node properties. Remove technical primary keys, keep business primary keys Add unique constraints for business primary keys, add indexes for frequent lookup attributes

35 Neo4j Tips Replace foreign keys with relationships to the other table, remove them afterwards Remove data with default values, no need to store those Data in tables that is denormalized and duplicated might have to be pulled out into separate nodes to get a cleaner model. Indexed column names, might indicate an array property (like email1, email2, email3) Join tables are transformed into relationships, columns on those tables become relationship properties

36 Node in Neo4j

37 Relationships in Neo4j Relationships between nodes are a key part of Neo4j.

38 Relationships in Neo4j

39 Twitter and relationships

40 Properties Both nodes and relationships can have properties. Properties are key-value pairs where the key is a string. Property values can be either a primitive or an array of one primitive type. For example String, int and int[] values are valid for properties.

41 Properties

42 Paths in Neo4j A path is one or more nodes with connecting relationships, typically retrieved as a query or traversal result.

43 Traversals in Neo4j Traversing a graph means visiting its nodes, following relationships according to some rules. In most cases only a subgraph is visited, as you already know where in the graph the interesting nodes and relationships are found. Traversal API Depth first and Breadth first.

44 Starting and Stopping

45 Preparing the database

46 Wrap mutating operations in a transaction.

47 Creating a small graph

48 Print the data

49 Remove the data

50 The Matrix Graph Database

51 Traversing the Graph


Download ppt "NoSQL: Graph Databases. Databases Why NoSQL Databases?"

Similar presentations


Ads by Google