Presentation on theme: "THE GRAPH REVOLUTION How to change the way you think about NSFs and achieve Nirvana."— Presentation transcript:
THE GRAPH REVOLUTION How to change the way you think about NSFs and achieve Nirvana
MISSION Make you more productive Solve all your problems Blow your mind
WHO AM I? Co-founder of OpenNTF.org Principal at Red Pill Development Champion, blogger, loudmouth, etc Lead developer, OpenNTF Domino API
THE NUMBERS PROBLEM Thousands of small data silos (NSFs) Hundreds of indexes in each Thousands of documents in each
THE LOGIC PROBLEM Data schemas are in UI Serialization options are limited Relationships are a lot of work
WHAT IS A GRAPH? Elements (vertexes and edges) Key/Value pairs Index-free adjacency
WHY IS A GRAPH? Speed Scalability Intuitive
PEOPLE GRAPH Each person is a vertex Each relationship is an edge Nathan (v1) knows (e1) Christian(v1)
MOVIE GRAPH Each movie is a vertex Each crew member is vertex Each character is a vertex The Matrix (v1) stars (e1) Keanu (v2) Keanu (v2) portrays (e2) Neo (v3) Neo (v3) appearsIn (e3) The Matrix (v1)
WORKFLOW GRAPH Each request is a vertex Each task is a vertex Each user is a vertex Request (v1) requires (e1) Submission (v2) User (v3) submits (e2) Submission (v2) Request (v1) requires (e3) Validation(v4) User (v3) assigns (e4) Validation(v4) Validation(v4) assignedTo (e5) User (v5) User (v5) approves (e6) Validation(v4)
WHAT IS AN NSF? Documents Item/Value pairs Extraordinarily bad indexes
GRAPH -> NSF Elements Key/Value pairs Index-free Documents Item/Value pairs Terrible indexing MATCH MADE IN HEAVEN?
TINKERPOP API JDBC for Graphs tinkerpop.blueprints defines structural rules Graphs, Vertex, Edge, Transactions Implementation limited to version 2.6 because 3 requires Java 8
OPENNTF DOMINO API Documents with keys (Serializable -> MD5 -> UNID) Auto-type coercion Document implements Map includes Document.get(“fName + lName”)
IMPLEMENTATION 1.0 Create single NSF Each Vertex is a Document Each Edge is a Document Each Vertex has an IN Edge id (unid) list Each Vertex has an OUT Edge id (unid) list Each Edge has IN id (unid) and OUT id Each Edge has label property Vertex.forAll(IN).getLabel(“knows”).getVertex(OUT)
EXPERIENCE 1.1 (MAY 2013) Almost all requests for Edges based on label Vertex has an Edge id list for each label Vertex.forAll(IN, “knows”).getVertex(OUT) Dramatic performance gains (20x)
EXPERIENCE 1.2 (JUNE 2013) Vertexes need models get/setProperty just too open-ended
RESULTS Hundreds of thousands of vertexes Millions of edges NO SWEAT
A QUESTION… If each Vertex is a Document, why can’t every Document be a Vertex?
THE DREAM Tens of millions of enterprise documents. Decades of accumulated knowledge. One big warehouse. No migration required.
IMPLEMENTATION 2.0 Vertexes need models; models are hard. Graph must consume many NSFs UniversalID not enough; need MetaversalID Can’t modify some Vertexes
TINKERPOP.FRAMES Java models for Graph data Interfaces public interface User extends VertexFrame public String public void setFirstName(String = “likes”) public Iterable = “likes”) public Edge addLikes(Vertex vertex); }
GRAPH SHARDING One graph can have many ElementStores (NSFs) Element stores based on Frame interfaces Stores respect ACLs and can cross servers Can store vertexes and/or edges
METAVERSALIDS ReplicaID + UniversalID 16 char hex + 32 char hex = bulky 16 char hex = 64-bit number AKA long long can hold same information NoteCoordinate (x,y,z) Stores as byte
METAVERSALID LISTS NoteList is fast List Stores as byte[size*24] Document.writeBinary(“knowsList”, NoteList.toBytes())
MODIFICATION PROBLEM User Vertexes obviously should be NAB Every User comment, rating or workflow would result in change to Person doc Result: admins Hulk out
MODIFICATION SOLUTION ProxyVertexes store Graph data in separate Document Real Document properties are read/write against original doc Proxy NSF defined per ElementStore NSF
EXAMPLES Likes Ratings Comments Workflow Star Wars
THE NUMBERS PROBLEM Thousands of small data silos Hundreds of indexes in each Thousands of documents in each Millions of vertexes across the enterprise No indexes needed
THE LOGIC PROBLEM Data schemas are in UI Serialization options are limited Relationships are a lot of work Schemas are defined with Java interfaces Anything can be written to any key/value pair Relationships are trivial
FUTURES Alternative serialization strategies? Multi-threaded write-backs? Querying & search enhancements? RxJava integration? Index support? Automated model discovery?
MISSION Make you more productive? Solve all your problems? Blow your mind?