Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org)

Slides:



Advertisements
Similar presentations
CC SQL Utilities.
Advertisements

What is a Database By: Cristian Dubon.
Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014
Hierarchies & Trees in SQL by Joe Celko copyright 2008.
AVL Trees COL 106 Amit Kumar Shweta Agrawal Slide Courtesy : Douglas Wilhelm Harder, MMath, UWaterloo
Advanced Data Structures
October 15-18, 2013 Charlotte, NC How to Model and Implement a Hierarchy in SQL Server AD-318-S Louis Davidson (drsql.org)
Balanced Binary Search Trees
Data Structures Data Structures Topic #13. Today’s Agenda Sorting Algorithms: Recursive –mergesort –quicksort As we learn about each sorting algorithm,
Trees. 2 Definition of a tree A tree is like a binary tree, except that a node may have any number of children Depending on the needs of the program,
Index Sen Zhang. INDEX When a table contains a lot of records, it can take a long time for the search engine of oracle (or other RDBMS) to look through.
Binary Search Introduction to Trees. Binary searching & introduction to trees 2 CMPS 12B, UC Santa Cruz Last time: recursion In the last lecture, we learned.
Chapter 12 Trees. Copyright © 2005 Pearson Addison-Wesley. All rights reserved Chapter Objectives Define trees as data structures Define the terms.
HOW TO OPTIMIZE A HIERARCHY IN SQL SERVER Louis Davidson (drsql.org)
Drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org)
Week 7 - Wednesday.  What did we talk about last time?  Recursive running time  Master Theorem  Introduction to trees.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Binary Trees Chapter 6.
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
Data Structures and Algorithms Semester Project – Fall 2010 Faizan Kazi Comparison of Binary Search Tree and custom Hash Tree data structures.
Trees & Graphs Nell Dale & John Lewis (adaptation by Michael Goldwasser and Erin Chambers)
1 HEAPS & PRIORITY QUEUES Array and Tree implementations.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
1 Overview of Databases. 2 Content Databases Example: Access Structure Query language (SQL)
Drsql.org How to Write a DML Trigger Louis Davidson drsql.org.
Trees By Charl du Plessis. Contents Basic Terminology Basic Terminology Binary Search Trees Binary Search Trees Interval Trees Interval Trees Binary Indexed.
File Processing - Indexing MVNC1 Indexing Jim Skon.
Module 5 Planning for SQL Server® 2008 R2 Indexing.
DATA STRUCTURE & ALGORITHMS (BCS 1223) CHAPTER 8 : SEARCHING.
Trees and Graphs CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
March 7 & 9, Csci 2111: Data and File Structures Week 8, Lectures 1 & 2 Multi-Level Indexing and B-Trees.
Chapter 6 Binary Trees. 6.1 Trees, Binary Trees, and Binary Search Trees Linked lists usually are more flexible than arrays, but it is difficult to use.
November 6-9, Seattle, WA Triggers: Born Evil or Misunderstood? Louis Davidson.
Dimu' Rumpak © 2009 by Prentice Hall 1 Getting Started Didimus Rumpak, M.Si. Database Concepts Chapter 1 1.
COSC 2007 Data Structures II Chapter 15 External Methods.
Symbol Tables and Search Trees CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
Master Data Management & Microsoft Master Data Services Presented By: Jeff Prom Data Architect MCTS - Business Intelligence (2008), Admin (2008), Developer.
Louis Davidson drsql.org.  Introduction  Trigger Coding Basics  Designing a Trigger Solution  Advanced Trigger Concepts  Summary.
Use of ICT in Data Management AS Applied ICT. Back to Contents Back to Contents.
1 Multi-Level Indexing and B-Trees. 2 Statement of the Problem When indexes grow too large they have to be stored on secondary storage. However, there.
M180: Data Structures & Algorithms in Java Trees & Binary Trees Arab Open University 1.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
CIS 250 Advanced Computer Applications Database Management Systems.
Louis Davidson drsql.org.  Introduction  Designing a Trigger Solution  Trigger Coding Basics  Advanced Trigger Concepts  Summary SQL Saturday East.
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
Week 7 - Wednesday.  What did we talk about last time?  Recursive running time  Master Theorem  Symbol tables.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Content based on Chapter 10 Database Management Systems, (3 rd.
BTM 382 Database Management Chapter 8 Advanced SQL Chitu Okoli Associate Professor in Business Technology Management John Molson School of Business, Concordia.
Winter 2016CISC101 - Prof. McLeod1 CISC101 Reminders Assignment 5 is posted. Exercise 8 is very similar to what you will be doing with assignment 5. Exam.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
COMP261 Lecture 23 B Trees.
How to Write a DML Trigger
Querying Hierarchical Data
INLS 623– Database Systems II– File Structures, Indexing, and Hashing
How to Model and Implement a Hierarchy in SQL Server
Tree-Structured Indexes
How to Implement a Hierarchy in SQL Server
Let Me Finish... Isolating Write Operations
Faster Data Structures in Transactional Memory using Three Paths
Let Me Finish... Isolating Write Operations
Let Me Finish... Isolating Write Operations
slides created by Marty Stepp and Alyssa Harding
“Introduction To Database and SQL”
Trees and Binary Trees.
Tree-Structured Indexes
Let Me Finish... Isolating Write Operations
Let Me Finish... Isolating Write Operations
CSE 326: Data Structures Lecture #14
Presentation transcript:

drsql.org How to Implement a Hierarchy in SQL Server Louis Davidson (drsql.org)

drsql.org Who am I? Been in IT for over 19 years Microsoft MVP For 10 Years Corporate Data Architect Written five books on database design – Ok, so they were all versions of the same book. They at least had slightly different titles each time

drsql.org 3 Hierarchies 3

drsql.org 4 Hierarchies Trees - Single Parent Hierarchies Graphs – Multi Parent Hierarchies – Note: Graphs can be complex to deal with as a whole, but often you can deal with them as a set of trees 4 Screw Piece of Wood Wood with TapeScrew and Tape Tape

drsql.org 5 Cycles in Hierarchies 5 Parent Child “I’m my own grandpa” syndrome Must be understood or can cause infinite loop in processing Generally disallowed in trees May be supported in graphs, particularly for establishing relationships Grandparent

drsql.org 6 Hierarchy Uses Trees – Species – Jurisdictions – “Simple” Organizational Charts (Or at least the base manager-employee part of the organization) – Directory folders Graph – Bill of materials – Complex Organization Chart (all those dotted lines!) – Genealogies Biological (Typically with limit cardinality of parents to 2 ) Family Tree – (Sky is the limit) – Social Networking Relationships Example: (Bob is connected to Sue, Sue is connected to Fred, Fred is connected to Bob) 6

drsql.org 7 Implementation of a Hierarchy “There is more than one way to shave a dog” – None of which are pleasant for the dog or the shaver – And the doctor who orders it only asks for a bald dog Hierarchies are not at all natural to manipulate/query using relational code – And the natural, recursive processing of a node at a time is horribly difficult and slow in relational code – So, multiple methods of processing them have arisen through the years The topic (much like the topic of how cruel it is to shave a dog), inspires religious-like arguments I find all of the implementation possibilities fascinating, so I set out to do an overview of them all… 7

drsql.org 8 Working with Trees - Background Node recursion Relational Recursion 8

drsql.org 9 Tree Processing Algorithms There are several methods for processing trees in SQL We will look at – Fixed Levels – Adjacency List – HierarchyId – Path Technique – Nested Sets – Kimball Helper Table Without giving away too much, pretty much all of the methods have some use… 9

drsql.org 10 Coding for trees Manipulation: – Creating a new node – Moving/Reparenting a node – Deleting a node (without children) – Note: No tree algorithms allow for “simple” SQL solutions to all of these problems Usage – Getting the children of a node – Getting the parent of a node – Aggregating along the tree We will have demos of all of these operations…available at least 10

drsql.org 11 Reparenting Example Starting with: Perhaps ending with: 11 Dragging along all of it’s child nodes along with it

drsql.org 12 Implementing a tree – Fixed Levels CREATE TABLE CompanyHierarchy ( Company varchar(100) NOT NULL, Headquarters varchar(100) NOT NULL, Branch varchar(100) NOT NULL, PRIMARY KEY (Company, Headquarters, Branch) ) Very limited, but very fast and easy to work with I will not demo this structure today because it’s use is both extremely obvious and limited 12

drsql.org 13 Implementing a tree – Adjacency List Every row includes the key value of the parent in the row Parent-less rows have NULL parent value Code is the most complex to write (though not as inefficient as it might seem) CREATE TABLE CompanyHierarchy ( Organization varchar(100) NOT NULL PRIMARY KEY, ParentOrganization varchar(100) NULL REFERENCES CompanyHierarchy (Organization), Name varchar(100) NOT NULL ) 13

drsql.org 14 Adjacency List – Adding a Node 14 New Node

drsql.org 15

drsql.org 16 Simply set the parent and done!

drsql.org 17 Implementing a tree – Path Method 17 Every row includes a representation of the path to their parent Processing makes use of like and string processing ( I have seen a case that used fixed length binary values) Limitation on path size for string manipulation/indexing CREATE TABLE CompanyHierarchy ( OrganizationId int NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, Path varchar(900) ) 900 Bytes allows for indexed manipulations

drsql.org Path Method Adding a Node 18 New Node

drsql.org 19 New Id = 9

drsql.org 20 Plus the New Id Path from the parent

drsql.org 21 Implementing a tree – Path Method 21 Every row includes a representation of the path to their parent Processing makes use of like and string processing ( I have seen a case that used fixed length binary values) Limitation on path size for string manipulation/indexing CREATE TABLE CompanyHierarchy ( OrganizationId int NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, Path varchar(900) )

drsql.org 22 Implementing a tree – HierarchyId 22 Somewhat unnatural method to the typical SQL Programmer Similar to the Path Method, and has some of the same limitations when moving around nodes Node path does not use data natural to the table, but rather positional locationing CREATE TABLE CompanyHierarchy ( OrganizationId int NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, OrgNode hierarchyId not null )

drsql.org 23 Implementing a tree – Nested Sets Query processing is done using range queries Structure is quite slow to maintain due to fragile structure Can produce excellent performance for queries CREATE TABLE CompanyHierarchy ( Organization varchar(100) NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, Left int NOT NULL, Right int NOT NULL ) 23

drsql.org New Node Nested Sets – Adding a Node

drsql.org Updating Right Values

drsql.org And the One Left value right of the new node

drsql.org Renumber, leaving gap for child

drsql.org The New Node

drsql.org Set the New Node’s Left/Right

drsql.org 30 Implementing a tree – Nested Sets Query processing is done using range queries Structure is quite slow to maintain due to fragile structure Can produce excellent performance for queries CREATE TABLE CompanyHierarchy ( Organization varchar(100) NOT NULL PRIMARY KEY, Name varchar(100) NOT NULL, Left int NOT NULL, Right int NOT NULL ) 30

drsql.org 31 Implementing a tree – Kimball Helper Developed initially for data warehousing since data is modified all at once with a fixed cost Basically explodes the hierarchy into a table that turns all hierarchy manipulations into a relational query Maintenance can be slightly costly, but using the data is extremely fast 31

drsql.org 32 Implementing a tree – Kimball Helper For the rows in yellow, expands to the table shown: 32 ParentIdChildIdDistanceParentRootNodeChildLeafNode

drsql.org 33 Performance Examples and Limitations The following tests were run multiple times, and the results were taken from one such run. Clearly the results are not scientific, and done with random data. However, they very much match my expectations from my research. Load times were captured loading one row at a time. Test machine (this laptop I am using tonight) was a: – Lenovo Yoga Pro 2, Haswell ULT i7 (4 th Gen Intel Mobile Processor), 2.4Ghz Dual Core (Hyperthreaded), 8GB RAM, 256 GB SSD Note: All load times include time to load 5 transactions per node 33

drsql.org 34 Performance Example Explanation For each performance test (which I will show the code later), I ran three query sets on each data set: 1.Load the tree (until my computer couldn’t do it in a reasonable number of hours) 2.Fetch all children from the root node 3.Aggregate data for all children at all levels 34

drsql.org 35 Performance Comparisons 35

drsql.org 36 Performance Comparisons 36

drsql.org 37 Performance Comparisons 37

drsql.org 38 Performance Comparisons 38

drsql.org 39 Performance Comparisons 39

drsql.org 40 Performance Comparisons 40

drsql.org 41 Performance Comparisons 41

drsql.org 42 Performance Comparisons 42

drsql.org 43 Performance Comparisons 43

drsql.org 44 Performance Comparisons 44

drsql.org 45 Performance Comparisons 45

drsql.org 46 Method Comparison 46

drsql.org 47 Demo Code Example code for all examples available for download. Will demo hierarchies and graphs. 47

drsql.org 48 Method Applicability Method -> Applicability Adjacency List HierarchyIdPathMethodNestedSetKimball Helper General Purpose Hierarchies *** * VERY Large Hierarchy Queries ******* Offline Reporting ****** (Cost of maintaining limits use) *** OLTP Use***** ** (Perhaps slower to load nodes) Highly Concurrent Modification ******* Highly Concurrent Queries ****** Unlimited Hierarchy Size ** * (Width unlimited, Effective depth limited by 900 byte index limit) *** 48

drsql.org 49 Future Improvements Use SQL Server 2014 In-Memory Database to help with locking and brute force operations Adjust Nested Sets to use fractional numbers to reduce load time costs Load an order of magnitude more data Try these examples on a “real” computer! 49

drsql.org 50 Graphs Generally implemented in same manner as adjacency list – Can be processed in the same manner as an adjacency list – Primary difference is child can have > 1 parent node – Cycles are generally acceptable Graph structure will always be external to data structure Graphs are even more natural data structures than trees 50

drsql.org 51 Graphs are Everywhere Almost any many to many can be a graph 51 Movie ActorActingCast DirectorMovieDirector

drsql.org 52 Demo Setup For each style of hierarchy, we will see how to: – Implement a physical model that models the corporate hierarchy of the previous graphics – Create Stored Procedures for Insert, Reparenting, Deleting Data – Queries to access and aggregate the data in the hierarchy 52

drsql.org 53 Demo Code Example code for all examples available for download. Will demo hierarchies and graphs. 53

drsql.org 54 Contact info Louis Davidson - Website – <-- Get slides herehttp://drsql.org Twitter – SQL Blog Simple Talk Blog – What Counts for a DBA

drsql.org Thank you That’s all folks! 55