Download presentation
Presentation is loading. Please wait.
Published byEdith Atkins Modified over 9 years ago
1
Trees General principles Ways of thinking Chapter 17 & 18 in DS&PS Chapter 4 in DS&AA
2
Applications Coding –Huffman, prefix Parsing/Compiling –tree is standard internal representation for code Information Storage/Retrieval –binary trees, AA-trees, AVL, Red-Black, Splay Game-Playing (Scenario analysis) –virtual trees –alpha-beta search Decision Trees –representation of choices –automatically constructed from data
3
General Trees Tree Definition –distinguished root node –all other node’s have unique, sole parent Depth of a node: –number of edges from root to node Height of a node: –number of edges from node to deepest descendant Balanced: –Goal: O(log n) insert/delete/find –height of any sons of any node differs by less than 1 (k) K-arity: –nodes have at most k sons
4
Depth of a Node 0 1 1 22 1 2 Often convenient to add another field to node structure for additional information such as: depth, height, visited, cost, father, number of visits, number of nodes below, etc.
5
Height of a Node 3 1 0 0 0 2 0 0 1 00
6
Simple Relationships Leaf height is 0 Height of a node is 1+maximum height of sons Root depth is 0 Depth of a node is 1+ depth of father These can be computed recursively.
7
Three Tree Representations List: (variable number of children) –son representation –Object value; –NodeList children; Sibling: (variable number of children) –Sibling representation –Object value; –Node child; // the leftmost child –Node sibling; // each node points Array (k is bound on number of children) – Object value; – Node[k] children;
8
Sibling Representation a b c de d f a b d c d ef
9
Depth of node (list rep) Recall depth(node) is number of links form node to root. Idea: – depth of sons is 1+ depth of father call depth(root, 0) Define depth(node n,int d) mark depth at node n = d for each son of n, call depth(son,d+1) (use iterator) Marking can be done in two ways: –have an addition field (int depth) for each node –have an array int depth[number of nodes]
10
Depth of node (sibling rep) Compute the depth of a node Recall depth(node) is number of links form node to root. Idea: –depth of left son is 1+ depth of father –depth of siblings is same as depth of father Call depth(root, 0) Define depth(node n, int d) mark depth at node n as d call depth(n.leftson,d+1) call depth(n.sibling, d)
11
Height of Node List representation: –if node is leaf, height = 0; –else height = 1 +max(height of sons) Sibling representation –if node is leaf, height = 0; –else height = max (1 + height of leftson, max of heights of siblings)
12
Virtual Trees Trees are often conceptual objects, but take too much room to store. Store only what is needed. Representation: Node: –object value –Node nextSon(): returns null if no more sons, else returns the next son In this representation you generate son’s on the fly E.G. in game playing typically only store depth of tree nodes.
13
Standard Operations Copying Traversals –preorder, inorder, postorder, level-order –illustrated with printing, but any processing ok Find (Object o) Insertion(Object o) Deletion(Object o) Complexity of these operations varies with constraints / structure of tree that must be preserved.
14
Binary Trees Object Representation: node has –Object value; –Node left, right; Array Representation –use Object[] –requires you know size of tree, or use growable arrays –no pointer overhead –Trick: if node is stored at i, then left son stored at 2*i right son stored at 2*i+1 –root stored at 1 –father of node i is at i/2 Generalizes to k-ary trees naturally.
15
Binary Search Trees Left < Right –i.e. any descendant of a node in left is less than any descendant of a node in right. Operations: let d be depth of tree –object find(key k) sometimes key and object are the same –insert(object o) or insert(key k, object o) –Object findMin() –removeMin() –removeElement(object o) –Cost: all O(d) via separate and conquer
16
Removing elements is tricky How would you remove value at root? Plan for remove(object o) 1. Find o, i.e. let n be node in tree with value o 2. Keep a ptr to the father of n 3. If ( n.right == null) ptr.son = n.left // not code 4. Else a. find min in n.right b. remove min from n.right c. ptr.son = new node(min, n.left, n.right) Assumes appropriate constructor. Make pictures of the cases.
17
Support routines BinaryNode findMin(BinaryNode n) Recursively –if (n.left == null) return n –else return left.findMin() –O(d) Time and Space BinaryNode findMin(BinaryNode n) Iteratively –while ( n.left !=null) n= n.left –return n –O(d) Time, O(1) space
18
Remove Min removeMin(BinaryNode n): idea –Node n’ = n.findMin() –father(n’).right = n.right – // idea ok, code not right –What if minimum is root? BinaryNode removeMin(BinaryNode n) –if (n.left != null) n.left = removeMin(n.left) –else n = n.right –return n
19
Min remove Examples
20
Remove Node Examples a b c de f g
21
removeNode BinaryNode removeNode(BinaryNode x, BinaryNode n) // remove x from n if (x<n) n.left=removeNode(x, n.left) else if (x>n) n.right=removeNode(x, n.right) // Now x = n else if (n.left != null & n.right !=null) n.data = findMin(n.right).data n.right =removeMin(n.right) else (// left or right is empty) n = (n.left != null) ? N.left : n.right; return n
22
Find a node (three meanings) Search tree: –given a node id, find id in tree. Search tree: –find a node with a specific property, e.g. –kth largest element (Order Statistic) –Separate and conquer answers in log(n) time Arbitrary tree –find a node with a specific property –E.g. node is a position in game tree, find win –E.g. node is particular tour, find node(tour) with least cost
23
Separate and Conquer Finding the kth smallest (Case Analysis) Where can it be? i nodesN-i-1nodes If at root, left subtree has k-1 nodes. If (i<k) then search for k-I-1 in right subtree If (i>k) then search for kth in right subtree. Complexity: depth of tree (log (n))
24
Analysis Definitions Problem: what is average time to find or insert an element Definitions follow from problem Internal path length of Binary tree (IPL) –sum of depth of nodes = ipl –average cost of successful search = average depth+1 cost = number of nodes you look at External path length of Binary tree (EPL) –sum of cost of accessing all N+1 null references = epl –average cost of insertion or failed search = epl/(N+1)
25
Example of IPL and EXP 0 1 1 22 IPL = 1+1+2+2 = 6 EPL = 2+2+3+3+3+3 = 16 = IPL+2*5 = IPL+2N Null reference What happens if you remove a leaf?
26
Picture Proof of IPL related to IPL of subtrees N node tree I node subtree N-I-1 node subtree Each node (n-1 of them) had its path length reduce by 1
27
Some Theorems Average internal path length of binary search tree is 1.38NlogN Proof that it is O(n*log n) –Let D(N) = average ipl for tree with N nodes –D(0)=D(1) = 0. –D(i) = average over all splits of tree (draw picture) –D(i) = (left split) 1/N (D(0)+….D(N-1)) + N-1 + (right split) 1/N(…..) = same as quicksort analysis (to be done) – O(NlogN) Why does EPL = IPL+2N (induction)
28
Analysis Goal: f(n) in terms of f(n-1) then expand 2/n( D(0)+…+D(n-1)) + n = D(n) 2*(D(0) + …+ D(n-1))+ n^2 = n*D(n) –Goal compare with previous, subtract and hope 2*(D(0)+…+D(n-2)) + (n-1)^2 = (n-1)*D(n-1) 2*D(n-1) +2n-1 = n*D(n) - (n-1)*D(n-1) n*D(n) =(n+1)*D(n-1) +2n D(n)/(n+1) = D(n-1)/n + 2/(n+1) EUREKA! Expand. Hence: D(n)/(n+1) = 2/(n+1)+ 2/n +….+2/1 = 2*(harmonic series) is O(log n) Conclusion: D(n) is O(n*log(n))
29
1/1+1/2+…1/n is O(log n) General Trick: sum approximates integral and vice versa Area under function 1/x is given by log(x). 1 2 3 4
30
Balanced Trees Depth of tree controls amount of work for many operations, so…. Goal: keep depth small –what does that mean? –What can be achieved? –What needs to be achieved? AVL: 1962 - very balanced Btrees: 1972 (reduce disk accesses) Red-Black: 1978 AA: 1993, a little faster now Splay trees: probabilistically balanced (on finds) All use rotations
31
AVL Tree Recall height of empty tree = -1 In AVL tree, For all nodes, height of left and right subtrees differ by at most 1. AVL trees have logarithmic height Fibonacci numbers: F[1]=1; F[2]= 1; F[3]=2; F[4]=3; Induction Strikes: Thm: S[h] >= F[h+3]-1 Let S[i] = size of smallest AVL tree of height i S[0] = 1; S[1]=2; why? So S[1] >= F[4]-1 S[h]=S[h-1]+S[h-2]+1 >=F[h+2]-1+F[h+1]-1+1 = F[h+3]-1. Hence number of nodes grows exponential with height.
32
On Insertion, what can go wrong? Tree balanced before insertion 0 1 11 21 H-1 H
33
Insertion After insertion, there are 4 ways tree can be unbalanced. Check it out. Outside unbalanced: handled by single rotations Inside unbalanced: handled by double rotations. 2 1 a b c 2 1 r p q
34
Maintaining Balance Rebalancing: single and double rotations Left rotation: after insertion 2 1 1 a b c 2 a bc
35
Another View 2 1 ab c Left 1 2 a bc 1 2 a b c 2 1 a b c Notice what happens to heights Right
36
Another View 2 1 ab c Left 1 2 a bc 1 2 a b c 2 1 a b c Notice what happens to heights, (LEFT) in general: a goes up 1, b stays the same, c goes down 1 Right
37
Single (left) rotation Switches parent and child In diagram: static node leftRotate(node 2) 1 = 2.left 2.left = 1.right 1.right = 2 return 1 Appropriate test question –do it, i.e. given sequence of such as 6, 2, 7,1, -1 etc show the succession on trees after inserts, rotations. Similar for right rotation
38
Double Rotation (left) 3 1 3 1 2 1 2 3 Out of balance: split
39
In Steps 3 1 2 d a b c 3 d2 1 ab c 2 3 1 a b cd
40
Double Rotation Code (left-right) Idea: rotate left child with its right child Then node with new left child static BinaryNode doubleLeft( BinaryNode n) n.left = rotateRight(n.left); return rotateLeft(n) Analogous code for other middle case All rotations are O(1) operations Out-of-balance checked after insertion and after deletions. All O(1). For AVL, d is O(logN) so all operations O(logN).
41
Red-Black Trees Every node red or black Root is black If node red, children black Every path from node to null has same number of black nodes Implementation used in Swing library (JDK1.2) for search trees. Single top-down pass means faster than AVL Depth typically same as for AVL trees. Code has many cases - skipping Red-black trees are what you get via TreeSet() And you can set/change the comparator
42
AA Trees Simpler variant of Red-black trees –simpler = more efficient Add two more properties: 5. Left children may not be red. 6. Remove colors, use levels Leaves are at level 1 If red, level is level of parent If black, level is level of parent-1 Code also has many special cases
43
B-tree of order M Goal: reduce the number of disk accesses Generalization of binary trees Method: keep top of tree in memory and have large branching factor Disk access >1000 times slower than memory access M-ary tree yields O ( log (m/2 N)) accesses Data stored only at leaves Nonleaves store up to M-1 keys Root is leaf or has 2…M children All internal nodes have (M+1)/2…M children All leaves at same depth and have (L+1)/2…L children Often set L = M Practical algorithm, but code longish (many cases)
44
B-Tree Picture: internal node Key... Goal: Store as many key’s a possible Keys are in order M-1 Keys M ptrs Space = M*ptrSize +(M-1)*KeySize Ptrs
45
Representation Leaf nodes are arrays of size M (or linked lists) Internal nodes are: –array of size M-1 of keys –array of size M of pointers to nodes The keys are in orders Choice of M depends on machine architecture and problem. M is argmax of: –keySize*(M-1) + ptrSize*M <= BlockSize
46
Example Analysis (all on disk) Suppose a disk block holds 8,192 bytes. Suppose each key is 32 bytes, each branch is 4 bytes, and each data record is 256 bytes. L = 32 (8192/256) If B-tree has order M, then M-1 keys. An interior node holds 32M-32 + M*4 =36M-32 bytes. Largest solution for M is 228.
47
Splay Trees Like Splay lists, only probabilistically ordered Goal: minimize access time Method: no ordering on insert Ordering on finds only ( as in splay lists) Rotating inserted node up, moves node to root but makes tree unbalanced Instead use double rotations zig-zag and zig-zig This rebalances tree Guarantees O(M log N) costs for M operations, ie. Amortized O(log N).
48
Summary Depth of tree determines overall costs Balancing achieved by rotations AVL trees require 2 passes for insertion/deletions –a pass down to find the point –a pass up to do the corrections Red-Black and AA trees require 1 pass B-Trees are uses for accessing information that won’t fit in memory General: CASE ANALYSIS, separate and conquer
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.