Presentation is loading. Please wait.

Presentation is loading. Please wait.

Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org)

Similar presentations


Presentation on theme: "Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org)"— Presentation transcript:

1 drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org) drsql@hotmail.com

2 drsql.org Who am I? Been in IT for over 19 years Microsoft MVP For 10 Years Corporate Data Architect Written five books on database design – Ok, so they were all versions of the same book. They at least had slightly different titles each time A section of that book inspired me to flesh out examples

3 drsql.org Contact info Louis Davidson - louis@drsql.orglouis@drsql.org Website – http://drsql.org <-- Get slides and code herehttp://drsql.org Twitter – http://twitter.com/drsqlhttp://twitter.com/drsql

4 drsql.org Philosophy I will spend most of our time describing the algorithms for implementing tree hierarchies I have fully working code examples for everything we cover I will minimally introduce the code and show basically how it works The code is available and being added to over time, as I try new stuff… Comments are coveted 4

5 drsql.org Hierarchies 5

6 drsql.org Hierarchies Trees - Single Parent Hierarchies Graphs – Multi Parent Hierarchies – Note: Graphs can be complex to deal with as a whole, but often you can deal with them as a set of trees 6 Screw Piece of Wood Wood with TapeScrew and Tape Tape

7 drsql.org Cycles in Hierarchies “I’m my own grandpa” setup Must be understood or can cause infinite loop in processing Generally disallowed in trees May be supported in graphs, particularly for establishing relationships 7 Parent Child Grandparent

8 drsql.org Hierarchy Uses Trees – Species – Jurisdictions – “Simple” Organizational Charts (Or at least the base manager-employee part of the organization) – Directory folders Graph – Bill of materials – Complex Organization Chart (all those dotted lines!) – Genealogies Biological (Typically with a limit of parent cardinality 0 to 2 ) Family Tree – (Sky is the limit) – Social Networking Relationships Example: (Bob is connected to Sue, Sue is connected to Fred, Fred is connected to Bob) 8

9 drsql.org Implementation of a Hierarchy “There is more than one way to shave a dog” – None of which are pleasant for the dog or the shaver – And the doctor who orders it only asks for a bald dog Hierarchies are not at all natural to manipulate/query using relational code – And the natural, recursive processing of a node at a time is horribly difficult and slow in relational code – So, multiple methods of processing them have arisen through the years The topic (much like the topic of how cruel it is to shave a dog), inspires religious-like arguments I find all of the implementation possibilities fascinating, so I set out to try them all and make a presentation\demo code to share 9

10 drsql.org Working with Trees - Background Node recursion Relational Recursion 10

11 drsql.org Tree Processing Algorithms There are several methods for processing trees in SQL We will look at – Fixed Levels – Adjacency List – HierarchyId – Path Technique – Nested Sets – Kimball Helper Table – I mean it – None are useless at all…in fact, all are useful 11

12 drsql.org Coding for trees Manipulation: – Creating a new node – Moving/Reparenting a node – Deleting a node (without children) – Note: No tree algorithms allow for “simple” SQL solutions to all of these problems Usage – Getting the children of a node – Getting the parent of a node – Aggregating along the tree We will have demos of all of these operations…available at least 12

13 drsql.org Reparenting Example Starting with: Perhaps ending with: 13 Dragging along all of it’s child nodes along with it

14 drsql.org Implementing a tree – Fixed Levels CREATE TABLE CompanyHierarchy ( Company varchar(100) NOT NULL, Headquarters varchar(100) NOT NULL, Branch varchar(100) NOT NULL, PRIMARY KEY (Company, Headquarters, Branch) ) Very limited, but very fast and easy to work with I will not demo this structure today because it’s use is both extremely obvious and limited 14

15 drsql.org Implementing a tree – Adjacency List CREATE TABLE CompanyHierarchy ( Organization varchar(100) NOT NULL PRIMARY KEY, ParentOrganization varchar(100) NULL REFERENCES CompanyHierarchy (Organization), Name varchar(100) NOT NULL ) INDEX on (ParentOrganization) Every row includes the key value of the parent in the row Parent-less rows have NULL parent value Data creation code is the easiest to write, by orders of magnitude Retrieval code is the most complex to write (though not as inefficient as it might seem) 15

16 drsql.org Adjacency List – Adding a Node 16 New Node

17 drsql.org 17

18 drsql.org 18 Simply set the parent and done!

19 drsql.org Implementing a tree – Path Method CREATE TABLE CompanyHierarchy ( OrganizationId int NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, Path varchar(900) ) INDEX ON Path Every row includes a representation of the path to their parent Processing makes use of like and string processing (I have seen a case that used fixed length binary values) Limitation on path size for string manipulation/indexing 19 900 Bytes allows for indexed manipulations

20 drsql.org Path Method Adding a Node 20 New Node

21 drsql.org 21 New Id = 9

22 drsql.org 22 Plus the New Id Path from the parent

23 drsql.org Implementing a tree – Path Method CREATE TABLE CompanyHierarchy ( OrganizationId int NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, Path varchar(900) ) INDEX ON Path Every row includes a representation of the path to their parent Processing makes use of like and string processing (I have seen a case that used fixed length binary values) Limitation on path size for string manipulation/indexing 23

24 drsql.org Implementing a tree – HierarchyId CREATE TABLE CompanyHierarchy ( OrganizationId int NOT NULL PRIMARY KEY NONCLUSTERED, Name varchar(100) NOT NULL, OrgNode hierarchyId NOT NULL, OrganizationLevel AS companyOrgNode.GetLevel() ) CLUSTERED INDEX ON (organizationLevel,companyOrgNode) Somewhat unnatural method to the typical SQL programmer Similar to the Path Method, and has some of the same limitations when moving around nodes Node path does not use data natural to the table, but rather position in hierarchy unless you set the values 24

25 drsql.org Implementing a tree – Nested Sets CREATE TABLE CompanyHierarchy ( Organization varchar(100) NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, Left int NOT NULL, Right int NOT NULL, UNIQUE (Right, Left) ) Query processing is done using range queries Structure is quite slow to maintain due to fragile structure Can produce amazing performance for retrieval queries 25

26 drsql.org New Node Nested Sets – Adding a Node

27 drsql.org Updating Right Values

28 drsql.org And the One Left value right of the new node

29 drsql.org Renumber, leaving gap for child

30 drsql.org The New Node

31 drsql.org Set the New Node’s Left/Right

32 drsql.org Implementing a tree – Nested Sets CREATE TABLE CompanyHierarchy ( Organization varchar(100) NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, Left int NOT NULL, Right int NOT NULL, UNIQUE (Right, Left) ) Query processing is done using range queries Structure is quite slow to maintain due to fragile structure Can produce amazing performance for retrieval queries 32

33 drsql.org Implementing a tree – Nested Sets With Gaps I named it GappedNestedSets in the downloads, but it could also be called ChaoticNestedSets Viewing data in the structure is identical to the Nested Sets basic implementation There are changes in how you manage the structures. The way I implemented it for the downloads: – When a child node is inserted, we decide what to maintained based on f(Right – Left) > 3 False: Make space, leaving a configurable gap. So if GapSize = 100, then Right – Left > 100 True: Insert the row with Right = Left + 1 – Delete and Reparent do not clean up after themselves Code not written: Clean up the to avoid hitting limit of the datatype chosen for positioning values (Left and Right) 33 New node would need no space New node would still require space to be made

34 drsql.org Implementing a tree – Kimball Helper Developed initially for data warehousing since data is modified all at once with a fixed cost Basically explodes the hierarchy into a table that turns all hierarchy manipulations into a relational query Maintenance can be slightly costly, but using the data is extremely fast 34

35 drsql.org Implementing a tree – Kimball Helper For the rows in yellow, expands to the table shown: 35 ParentIdChildIdDistanceParentRootNodeChildLeafNode 11010 12110 14211 15211 22000 24101 25101

36 drsql.org Performance Examples and Limitations The following tests were run multiple times, and the results were taken from one such run. Clearly the results are not scientific, and done with random data. However, they very much match my expectations from my research. Load times were captured loading one row at a time. Test machine (this laptop I am using tonight) was a: – Lenovo Yoga Pro 2, Haswell ULT i7 (4th Gen Intel Mobile Processor), 2.4Ghz Dual Core (Hyperthreaded), 8GB RAM, 256 GB SSD Note: All load times include time to load 5 transactions per node, plus some time variation for surfing the web for Disney World news when the operation was taking forever 36

37 drsql.org Performance Example Explanation For each performance test (which I will show the code later), I ran three query sets on each data set: 1.Load the tree (until my computer couldn’t do it in a reasonable number of hours and I resorted to a side loading of the data) 2.Fetch all children from the root node 3.Aggregate data for all children at all levels 37

38 drsql.org Performance Comparisons 38

39 drsql.org Performance Comparisons 39

40 drsql.org Performance Comparisons 40

41 drsql.org Performance Comparisons 41

42 drsql.org Performance Comparisons 42

43 drsql.org Performance Comparisons 43

44 drsql.org Performance Comparisons 44

45 drsql.org Performance Comparisons 45

46 drsql.org Performance Comparisons 46

47 drsql.org Performance Comparisons 47

48 drsql.org Performance Comparisons 48

49 drsql.org Method Comparison 49

50 drsql.org Method Applicability Method -> Applicability Adjacency List HierarchyIdPathMethodNestedSetKimball Helper General Purpose Hierarchies *** * VERY Large Hierarchy Queries ******* Offline Reporting ****** (Cost of maintaining limits use) *** OLTP Use***** ** (Perhaps slower to load nodes) Highly Concurrent Modification ******* Highly Concurrent Queries ****** Unlimited Hierarchy Size ** * (Width unlimited, Effective depth limited by 900 byte index limit) *** 50

51 drsql.org Demo Code Example code for all examples available for download. Will demo hierarchies and graphs. 51

52 drsql.org Future Improvements Use SQL Server 2014 In-Memory Database to help with locking and brute force operations, for all versions of hierarchy algorithm Implement Gapped Version of Nested Sets Done! Load an order of magnitude more data Try these examples on a “real” computer! 52

53 drsql.org Graphs Generally implemented in same manner as adjacency list – Can be processed in the same manner as an adjacency list – Primary difference is child can have > 1 parent node – Cycles are generally acceptable Graph structure will always be external to data structure Graphs are even more natural data structures than trees 53

54 drsql.org Graphs are Everywhere Almost any many to many can be a graph 54 Movie ActorActingCast DirectorMovieDirector

55 drsql.org Contact info Louis Davidson - louis@drsql.orglouis@drsql.org Website – http://drsql.org <-- Get slides and code herehttp://drsql.org Twitter – http://twitter.com/drsqlhttp://twitter.com/drsql SQL Blog http://sqlblog.com/blogs/louis_davidsonhttp://sqlblog.com/blogs/louis_davidson Simple Talk Blog – What Counts for a DBA http://www.simple-talk.com/community/blogs/drsql/default.aspx http://www.simple-talk.com/community/blogs/drsql/default.aspx

56 drsql.org Thank you That’s all folks! 56


Download ppt "Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org)"

Similar presentations


Ads by Google