Presentation is loading. Please wait.

Presentation is loading. Please wait.

October 15-18, 2013 Charlotte, NC How to Model and Implement a Hierarchy in SQL Server AD-318-S Louis Davidson (drsql.org)

Similar presentations


Presentation on theme: "October 15-18, 2013 Charlotte, NC How to Model and Implement a Hierarchy in SQL Server AD-318-S Louis Davidson (drsql.org)"— Presentation transcript:

1 October 15-18, 2013 Charlotte, NC How to Model and Implement a Hierarchy in SQL Server AD-318-S Louis Davidson (drsql.org) drsql@hotmail.com

2 October 15-18, 2013 | Charlotte, NC Please silence cell phones

3 Explore Everything PASS Has to Offer Free SQL Server and BI Web Events Free 1-day Training Events Regional Event Local User Groups Around the World Free Online Technical Training This is Community 3 Business Analytics Training Session Recordings PASS Newsletter

4 Session Evaluations ways to access Go to passsummit/evals Download the GuideBook App and search: PASS Summit 2013 Follow the QR code link displayed on session signage throughout the conference venue and in the program guide Submit by 5pm Friday Oct. 18 to WIN prizes Your feedback is important and valuable. 4

5 Who am I? Been in IT for over 19 years Microsoft MVP For 10 Years Corporate Data Architect Written five books on database design Ok, so they were all versions of the same book. They at least had slightly different titles each time

6 6 Hierarchies

7 7 Agenda Representing the data structures primarily for the documentation Modeling Optimization of the data structures to meet the needs of the client Hierarchies, more than most structures, have many ways to implement the same thing Implementation Code review of the code used to get the numbers and conclusions presented Proof

8 8 Hierarchies Trees - Single Parent Hierarchies Graphs – Multi Parent Hierarchies Note: Graphs can be complex to deal with as a whole, but often you can deal with them as a set of trees Screw Piece of Wood Wood with TapeScrew and Tape Tape

9 9 Cycles in Hierarchies Parent Child “I’m my own grandpa” syndrome Must be understood or can cause infinite loop in processing Generally disallowed in trees Generally handled in graphs Grandparent

10 10 Hierarchy Uses Trees Species Jurisdictions “Simple” Organizational Charts (Or at least the base manager- employee part of the organization) Directory folders Graph Bill of materials Complex Organization Chart (all those dotted lines!) Genealogies Biological (Typically with limit cardinality of parents to 2 ) Family Tree – (Sky is the limit)

11 11 Modeling a hierarchy Typically, there is only one way to model a hierarchy One row is related to another row to indicate the relationship The variation will be in how you implement that relationship No matter how you implement things, the logical view to the user will remain the same Modeling choices include: Include the hierarchy structure with user data or not? How many hierarchies need to be represented? How many parents can a node have? Can the same node have the same parent more than once for the same or different purpose?

12 12 Variations on a Theme Simple one parent “tree” Two simple one parent “trees” (Getting closer to a developer reporting hierarchy Simple One Parent Tree in External Structure Graph, allowing Employee to have multiple reports but only one for a Relationship Type Graph, allowing Employee to have multiple people they report to for the same reason (REALITY!)

13 13 Implementation of a Hierarchy “There is more than one way to shave a dog” None of which are pleasant for the dog or the shaver And the doctor who orders it only asks for a bald dog Hierarchies are not at all natural to manipulate/query using relational code And the natural, recursive processing of a node at a time is horribly difficult and slow in relational code So, multiple methods of processing them have arisen through the years The topic (much like the topic of how cruel it is to shave a dog), inspires religious-like arguments I find all of the implementation possibilities fascinating, so I set out to do an overview of them all…

14 14 Working with Trees - Background Node recursion Relational Recursion

15 15 Tree Processing Algorithms There are several methods for processing trees in SQL We will cover Fixed Levels Adjacency List HierarchyId Path Technique Nested Sets Kimball Helper Table Without giving away too much, pretty much all of the methods have some use…

16 Preconceived Notions Which method/algorithm do you expect to be fastest? Fixed Levels Adjacency List HierarchyId Path Technique Nested Sets Kimball Helper Table 16

17 17 Coding for trees Manipulation: Creating a new node Moving/Reparenting a node Deleting a node (without children) Note: No tree algorithms allow for “simple” SQL solutions to all of these problems Usage Getting the children of a node Getting the parent of a node Aggregating along the tree We will have demos of all of these operations…

18 18 Reparenting Example Starting with: Perhaps ending with: Dragging along all of it’s child nodes along with it

19 19 Implementing a tree – Fixed Levels CREATE TABLE CompanyHierarchy ( Company varchar(100) NULL, Headquarters varchar(100) NULL, Branch varchar(100) NULL, PRIMARY KEY (Company, Headquarters, Branch) ) Very limited, but very fast and easy to work with I will not demo this structure today because it’s use is both extremely obvious and limited

20 20 Implementing a tree – Adjacency List Every row includes the key value of the parent in the row Parent-less rows have no parent value Code is the most complex to write (though not as inefficient as it might seem) CREATE TABLE CompanyHierarchy ( Organization varchar(100) NOT NULL PRIMARY KEY, ParentOrganization varchar(100) NULL REFERENCES CompanyHierarchy (Organization), Name varchar(100) NOT NULL )

21 21 Adjacency List – Adding a Node New Node

22 22

23 23 Simply set the parent and done!

24 24 Implementing a tree – Path Method Every row includes a representation of the path to their parent Processing makes use of like and string processing ( I have seen a case that used fixed length binary values) Limitation on path size for string manipulation/indexing CREATE TABLE CompanyHierarchy ( OrganizationId int NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, Path varchar(900) )

25 Path Method Adding a Node 25 New Node

26 26 New Id = 9

27 27 Plus the New Id Path from the parent

28 28 Implementing a tree – Path Method Every row includes a representation of the path to their parent Processing makes use of like and string processing ( I have seen a case that used fixed length binary values) Limitation on path size for string manipulation/indexing CREATE TABLE CompanyHierarchy ( OrganizationId int NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, Path varchar(900) )

29 29 Implementing a tree – HierarchyId Somewhat unnatural method to the typical SQL Programmer Similar to the Path Method, and has some of the same limitations when moving around nodes Node path does not use data natural to the table, but rather positional locationing CREATE TABLE CompanyHierarchy ( OrganizationId int NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, OrgNode hierarchyId not null )

30 30 Implementing a tree – Nested Sets Query processing is done using range queries Structure is quite slow to maintain due to fragile structure Can produce excellent performance for queries CREATE TABLE CompanyHierarchy ( Organization varchar(100) NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, Left int NOT NULL, Right int NOT NULL )

31 New Node Nested Sets – Adding a Node

32 Updating Right Values

33 And the One Left value right of the new node

34 Renumber, leaving gap for child

35 The New Node

36 Set the New Node’s Left/Right

37 37 Implementing a tree – Nested Sets Query processing is done using range queries Structure is quite slow to maintain due to fragile structure Can produce excellent performance for queries CREATE TABLE CompanyHierarchy ( Organization varchar(100) NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, Left int NOT NULL, Right int NOT NULL )

38 38 Implementing a tree – Kimball Helper Developed initially for data warehousing since data is modified all at once with a fixed cost Basically explodes the hierarchy into a table that turns all hierarchy manipulations into a relational query Maintenance can be slightly costly, but using the data is extremely fast

39 39 Implementing a tree – Kimball Helper For the rows in yellow, expands to the table shown: ParentIdChildIdDistanceParentRootNodeChildLeafNode 11010 12110 14211 15211 22000 24101 25101

40 40 Performance Examples and Limitations The following tests were run multiple times, and the results were taken from one such run. Clearly the results are not scientific, and done with random data. However, they very much match my expectations from my research. Load times were captured loading one row at a time. Test machine was a: Samsung Series 9, Sandy Bridge i5, 1.6Ghz Dual Core (Hyperthreaded), 4GB RAM, 128 GB SSD Note: All load times include time to load 5 transactions per node

41 41 Performance Example Explanation For each performance test (which I will show the code later), I ran three query sets on each data set: 1.Load the tree (until my computer couldn’t do it before PASS) 2.Fetch all children from the root node 3.Aggregate data for all children at all levels

42 42 Performance Comparisons 157

43 43 Performance Comparisons

44 44 Performance Comparisons 46203 14618

45 45 Performance Comparisons

46 46 Performance Comparisons

47 47 Performance Comparisons Note: HierarchyId dropped as it took over an hour to return results

48 48 Performance Comparisons 175

49 49 Performance Comparisons

50 50 Performance Comparisons 3899720 1320000

51 51 Performance Comparisons Note: HierarchyId dropped as it took over an hour to return results

52 52 Performance Comparisons Note: HierarchyId dropped as it took over an hour to return results

53 53 Method Comparison 444000

54 54 Method Applicability Method -> Applicabilit y Adjacency List HierarchyIdPathMethodNestedSetKimball Helper General Purpose Hierarchies **** VERY Large Hierarchy Queries ****** Offline Reporting **** (Cost of maintaining limits use) *** OLTP Use******** (Perhaps slower to load nodes) Highly Concurrent Modification ***** Highly Concurrent Queries **** Unlimited Hierarchy Size *** (Very high CPU Use) * (Width unlimited, Effective depth limited by 900 byte index limit) ***

55 Did I change any of your minds? 55

56 56 Graphs Generally implemented in same manner as adjacency list Can be processed in the same manner as an adjacency list Primary difference is child can have > 1 parent node Cycles are generally acceptable Graph structure will always be external to data structure Graphs are even more natural data structures than trees

57 57 Graphs are Everywhere Almost any many to many can be a graph Movie ActorActingCast DirectorMovieDirector

58 58 Demo Setup For each style of hierarchy, we will see how to: Implement a physical model that models the corporate hierarchy of the previous graphics Create Stored Procedures for Insert, Reparenting, Deleting Data Queries to access and aggregate the data in the hierarchy

59 59 Demo Code Example code for all examples available for download. Will demo hierarchies and graphs.

60 Contact info Louis Davidson - louis@drsql.orglouis@drsql.org Website – http://drsql.org <-- Get slides herehttp://drsql.org Twitter – http://twitter.com/drsqlhttp://twitter.com/drsql SQL Blog http://sqlblog.com/blogs/louis_davidsonhttp://sqlblog.com/blogs/louis_davidson Simple Talk Blog – What Counts for a DBA http://www.simple-talk.com/community/blogs/drsql/default.aspx http://www.simple-talk.com/community/blogs/drsql/default.aspx

61 October 15-18, 2013 | Charlotte, NC Thank you for attending this session and the 2013 PASS Summit in Charlotte, NC 61


Download ppt "October 15-18, 2013 Charlotte, NC How to Model and Implement a Hierarchy in SQL Server AD-318-S Louis Davidson (drsql.org)"

Similar presentations


Ads by Google