CPSC 221: Data Structures Lecture #5 Branching Out

CPSC 221: Data Structures Lecture #5 Branching Out
Steve Wolfman 2010W2

Learning Goals After this unit, you should be able to...
Determine if a given tree is an instance of a particular type (e.g. binary search tree, heap, etc.) Describe and use pre-, in- and post-order traversal algorithms Describe the properties of binary trees, binary search trees, and more general trees; Implement iterative and recursive algorithms for navigating them in C++ Compare and contrast ordered versus unordered trees in terms of complexity and scope of application Insert and delete elements from a binary tree

Today’s Outline Some Tree Review (here for reference, not discussed)
Binary Trees Dictionary ADT Binary Search Trees Deletion Some troubling questions

Trees Family Trees Organization Charts Classification trees
what kind of flower is this? is this mushroom poisonous? File directory structure folders, subfolders in Windows directories, subdirectories in UNIX Non-recursive procedure call chains Anyone not seen a tree? I mean the kind that grows _down_ from the root. The computer science tree. Here are some applications. What do these have in common? They have a root. Each entry is either a leaf or it has a number of children. Each child is also a tree. Trees are defined recursively! A tree is a leaf or an internal node with one or more children. Each child is a tree.

Tree Terminology A root: leaf: child: parent: sibling: ancestor:
descendent: subtree: B C D E F G H I Let’s review the words: root: A leaf: DEFJKLMNI child:A - C or H - K (leaves have no children) parent: C - A or L - H (the root has no parent) sibling: D - E or F or J - K,L,M, or N grandparent: G to A grandchild: C to H or I ancestor: the node itself or any ancestor’s parent descendent: the node itself or any child’s descendent subtree: a node and all its descendents J K L M N

Tree Terminology Reference
A root: the single node with no parent leaf: a node with no children child: a node pointed to by me parent: the node that points to me sibling: another child of my parent ancestor: my parent or my parent’s ancestor descendent: my child or my child’s descendent subtree: a node and its descendents We sometimes use degenerate versions of these definitions that allow NULL as the empty tree. (This can be very handy for recursive base cases!) B C D E F G H I Let’s review the words: root: A leaf: DEFJKLMNI child:A - C or H - K leaves have no children parent: C - A or L - H the root has no parent sibling: D - E or F or J - K,L,M, or N grandparent: G to A grandchild: C to H or I ancestor: the node itself or any ancestor’s parent descendent: the node itself or any child’s descendent subtree: a node and all its descendents J K L M N

More Tree Terminology A depth: # of edges along path from root to node
depth of H? 1 2 3 4 B C D E F G H I Depth: the number of edges along the path from the root to the node height: the number of edges along the longest path from the node to a leaf degree: the number of children of the node branching factor: the maximum degree of any node (or sometimes the average) preorder traversal: running through all the nodes in the tree, starting with the parent, then all the children postorder traversal: run through all the nodes starting with the children and then the parents J K L M N

More Tree Terminology A
height: # of edges along longest path from node to leaf or, for whole tree, from root to leaf height of tree? 1 2 3 4 B C D E F G H I Depth: the number of edges along the path from the root to the node height: the number of edges along the longest path from the node to a leaf degree: the number of children of the node branching factor: the maximum degree of any node (or sometimes the average) preorder traversal: running through all the nodes in the tree, starting with the parent, then all the children postorder traversal: run through all the nodes starting with the children and then the parents J K L M N

More Tree Terminology A degree: # of children of a node degree of B? 1
1 2 3 4 B C D E F G H I Depth: the number of edges along the path from the root to the node height: the number of edges along the longest path from the node to a leaf degree: the number of children of the node branching factor: the maximum degree of any node (or sometimes the average) preorder traversal: running through all the nodes in the tree, starting with the parent, then all the children postorder traversal: run through all the nodes starting with the children and then the parents J K L M N

More Tree Terminology A
branching factor: maximum degree of any node in the tree 2 for binary trees, our usual concern; 5 for this weird tree B C D E F G H I Depth: the number of edges along the path from the root to the node height: the number of edges along the longest path from the node to a leaf degree: the number of children of the node branching factor: the maximum degree of any node (or sometimes the average) preorder traversal: running through all the nodes in the tree, starting with the parent, then all the children postorder traversal: run through all the nodes starting with the children and then the parents J K L M N

One More Tree Terminology Slide
binary: branching factor of 2 (each child has at most 2 children) n-ary: branching factor of n complete: “packed” binary tree; as many nodes as possible for its height nearly complete: complete plus some nodes on the left at the bottom A B C D E F G Binary: each node has at most two children n-ary: each child has at most n children complete: Each row of the tree is filled in Nearly complete: each row is filled in left to right before the next one is started Complete means kind of “triangle shaped” (but don’t be fooled; this is NOT really a triangle and doesn’t have an “area” of base * height / 2. (That’s because each row has TWICE as many nodes as the last, not ONE (or a constant) more than the last. In fact, the last row has one more node than the rest of the tree!) HOW DEEP CAN A COMPLETE TREE BE? D(N)  D(N/2) + 1 H I J

Trees and (Structural) Recursion
A tree is either: the empty tree a root node and an ordered list of subtrees Trees are a recursively defined structure, so it makes sense to operate on them recursively. OK, let’s get a bit of practice working with trees. Almost anything we do on a tree will be most easily thought of as recursive (even if we implement it iteratively) Let’s try to find the longest undirected path through the tree. That is, the path goes from node to node, never repeating a node, and it can either go up a link or down it. Thinking recursively, what might the longest path in this example tree be? EITHER: the longest path in one of the subtrees OR: the longest path that goes through t To implement that, we could keep returning the height of the tree and the longest path in the tree. The height overall is the max height + 1. The longest path is max(longest subtree path, 2max(heights)+1

Tree Calculations Find the longest undirected path in a tree Might be:
OK, let’s get a bit of practice working with trees. Almost anything we do on a tree will be most easily thought of as recursive (even if we implement it iteratively) Let’s try to find the longest undirected path through the tree. That is, the path goes from node to node, never repeating a node, and it can either go up a link or down it. Thinking recursively, what might the longest path in this example tree be? EITHER: the longest path in one of the subtrees OR: the longest path that goes through t To implement that, we could keep returning the height of the tree and the longest path in the tree. The height overall is the max height + 1. The longest path is max(longest subtree path, 2max(heights)+1

Tree Calculations Example
B C D E F G OK, what’s the longest path in this tree? H I J K L M N

Binary Trees Binary tree is Properties Representation:
an empty tree (NULL, in our case) or, a root node with two subtrees Properties max # of leaves: max # of nodes: Representation: recursive definition! A B C D E F Alright, we’ll focus today on one type of trees called binary trees. Here’s one now. Is this binary tree complete? Why not? (C has just one child, right side is much deeper than left) What’s the maximum # of leaves a binary tree of depth d can have? What’s the max # of nodes a binary tree of depth d can have? Minimum? We won’t go into this, but if you take N nodes and assume all distinct trees of the nodes are equally likely, you get an average depth of SQRT(N). Is that bigger or smaller than log n? Bigger, so it’s not good enough! G H Data right pointer left I J

Representation A B C A D E F B C D E F struct Node { KTYPE key;
right pointer left struct Node { KTYPE key; DTYPE data; Node * left; Node * right; }; B right pointer left C right pointer left A D right pointer left E right pointer left F right pointer left B C D E F

What’s wrong with Lists?
What We Can Do So Far Stack Push Pop Queue Enqueue Dequeue List Insert Remove Find Priority Queue DeleteMin Alright, now we’re armed with the tree expertise we’ll need. So, what can we already do? Well, we can push, pop, enqueue, dequeue, insert, remove, find, and deleteMin. OUTDATED (don’t do decrease key, IIRC): There’s something wrong with our pqueues. Remember decreaseKey? We need to give the location in the heap to perform a decrease. Why? There’s something wrong with our lists as well; what is it? What’s wrong with Lists?

Dictionary ADT Dictionary operations
midterm would be tastier with brownies prog-project so painful… who invented templates? wolf the perfect mix of oomph and Scrabble value Dictionary operations create destroy insert find delete Stores values associated with user-specified keys values may be any (homogenous) type keys may be any (homogenous) comparable type insert brownies - tasty find(wolf) wolf - the perfect mix of oomph and Scrabble value Dictionaries associate some key with a value, just like a real dictionary (where the key is a word and the value is its definition). In this example, I’ve stored 221 data associated with text reviews. This is probably the most valuable and widely used ADT we’ll hit. I’ll give you an example in a minute that should firmly entrench this concept.

Search/Set ADT Dictionary operations Stores keys create destroy insert
Berner Whippet Alsatian Sarplaninac Beardie Sarloos Malamute Poodle Dictionary operations create destroy insert find delete Stores keys keys may be any (homogenous) comparable quickly tests for membership insert Min Pin find(Wolf) NOT FOUND This is a scaled back version of the dictionary ADT in which we essentially drop the values and leave only the keys. We’ll focus on this when looking at binary search trees. BUT, how hard would it be to move to a Dictionary ADT implementation from a Search ADT implementation? BTW: does anyone know what those keys are?

A Modest Few Uses Arrays and “Associative” Arrays Sets Dictionaries
Router tables Page tables Symbol tables C++ Structures Python’s __dict__ that stores fields/methods Our ADT algorithm says to look at some applications at this point. I think the first app pretty much says it all. We move on from there however, to other incredibly widely used applications. I know I’ve said this before, but this is probably the most important and one of the most widely used ADTs we’ll look at. For those keeping track, priority queues are _not_ as widely used as Dictionaries.

Desiderata Fast insertion Fast searching Fast deletion runtime:
OK, how fast do we want these to be? In general, however, we want finds in particular and inserts and deletes if possible to be fast. O(log n) for all of them would be great. We’ll see later how to get an expected O(1) time for all of them. But can anyone think right now of a data structure that gives constant time search, at least, for dictionaries? ARRAYS!

Naïve Implementations
insert find delete Linked list Unsorted array Sorted array LL: O(1), O(n), O(n) Uns: O(1), O(n), O(n) Sorted: O(n), O(log n), O(n) Sorted array is oh-so-close. O(log n) find time and almost O(log n) insert time. What’s wrong? Let’s look at how that search goes: Draw recursive calls (and potential recursive calls) in binary search. Note how it starts looking like a binary tree where the left subtrees have smaller elements and the right subtrees have bigger elements. What if we could store the whole thing in the structure this recursive search is building? so close!

Binary Search Tree Dictionary Data Structure
Binary tree property each node has  2 children result: storage is small operations are simple average depth is small Search tree property all keys in left subtree smaller than root’s key all keys in right subtree larger than root’s key easy to find any given key 8 5 11 2 6 10 12 A binary search tree is a binary tree in which all nodes in the left subtree of a node have lower values than the node. All nodes in the right subtree of a node have higher value than the node. It’s like making that recursion into the data structure! I’m storing integers at each node. Does everybody think that’s what I’m _really_ going to store? What do I need to know about what I store? (comparison, equality testing) 4 7 9 14 13

Example and Counter-Example
Getting to Know BSTs Example and Counter-Example 5 8 4 8 5 11 1 7 11 2 7 6 10 18 Why is the one on the left a BST? It’s not complete! (B/c BSTs don’t need to be complete) Why isn’t the one on the right a BST? Three children of 5 20 has a left child larger than it. What’s wrong with 11? Even though 15 isn’t a direct child, it _still_ needs to be less than 11! 3 4 15 20 BINARY SEARCH TREE 21 NOT A BINARY SEARCH TREE

Getting to Know All About BSTs
In Order Listing struct Node { // constructors omitted KTYPE key; DTYPE data; Node *left, *right; }; 10 5 15 2 9 20 Anyone notice anything interesting about that in-order listing? Everything in the left subtree is listed first. Then the root. Then everything in the right subtree. OK, let’s work out the code to make the in-order listing. Is there an iterative version that doesn’t use its own stack? Not really, no. So, recursion is probably OK here. And.. if a “bushy” tree’s too deep for recursion, you must have a huge amount of data. If (n != null) inorder(n->left) cout << n inorder(n->right) 7 17 30 In order listing: 25791015172030

Finding a Node Getting to Like BSTs 10 5 15 2 9 20 7 17 30 O(1)
Node *& find(Comparable key, Node *& root) { if (root == NULL) return root; else if (key < root->key) return find(key, root->left); else if (key > root->key) root->right); else } 10 5 15 2 9 20 IMPORTANT: Does anyone think it would be a good idea to have this find as the interface to a Search ADT? NO! It exposes really nasty details. But, it’s fine for internal use, and it can easily be called by the real external find. Now, let’s try finding a node. Find 9. This time I’ll supply the code. This should look a _lot_ like binary search! How long does it take? Log n is an easy answer, but what if the tree is very lopsided? So really, this is worst case O(n)! A better answer is theta of the depth of the node sought. If we can bound the depth of that node, we can bound the length of time a search takes. What about the code? All those &s and *s should look pretty scary. Let’s talk through them. 7 17 30 O(1) O(lg n) O(n) O(n lg n) None of these runtime:

Finding a Node Getting to Like BSTs 10 5 15 2 9 20 7 17 30
Node *& find(Comparable key, Node *& root) { if (root == NULL) return root; else if (key < root->key) return find(key, root->left); else if (key > root->key) root->right); else } 10 5 15 2 9 20 Now, let’s try finding a node. Find 9. This time I’ll supply the code. This should look a _lot_ like binary search! How long does it take? Log n is an easy answer, but what if the tree is very lopsided? So really, this is worst case O(n)! A better answer is theta of the depth of the node sought. If we can bound the depth of that node, we can bound the length of time a search takes. What about the code? All those &s and *s should look pretty scary. Let’s talk through them. 7 17 30 WARNING: Much fancy footwork with refs (&) coming. You can do all of this without refs... just watch out for special cases.

Getting to Hope BSTs Like You
Iterative Find Node * find(Comparable key, Node * root) { while (root != NULL && root->key != key) { if (key < root->key) root = root->left; else root = root->right; } return root; 10 5 15 2 9 20 OK, find 9 again. 7 17 30 Look familiar? (It’s trickier to get the ref return to work here. We won’t worry.)

Insert void insert(Comparable key, Node *& root) { Node *& target(find(key, root)); assert(target == NULL); target = new Node(key,data); } 10 5 15 2 9 20 Let’s do some inserts: insert(8) insert (11) insert(31) 7 17 30 runtime: Funky game we can play with the *& version.

Reminder: Value vs. Reference Parameters
Value parameters (Object foo) copies parameter no side effects Reference parameters (Object & foo) shares parameter can affect actual value use when the value needs to be changed Const reference parameters (const Object & foo) cannot affect actual value use when the value is too intricate for pass-by-value A momentary digression. I did some tricky stuff with reference variables there.

BuildTree for BSTs Suppose the data 1, 2, 3, 4, 5, 6, 7, 8, 9 is inserted into an initially empty BST: in order in reverse order median first, then left median, right median, etc. OK, we had a buildHeap, let’s buildTree. How long does this take? Well, IT DEPENDS! Let’s say we want to build a tree from What happens if we insert in order? Reverse order? What about 5, then 3, then 7, then 2, then 1, then 6, then 8, then 9?

Analysis of BuildTree Worst case: O(n2) as we’ve seen
Average case assuming all orderings equally likely turns out to be O(n lg n). Average runtime is equal to the average depth of a node in the tree. We’ll calculate the average depth by finding the sum of all depths in the tree, and dividing by the number of nodes. What’s the sum of all depths? D(n) = D(I) + D(N - I - 1) + N - 1 (left subtree = I, root is 1 node, so right = n - I - 1. D(I) is depth of left, 1 node deeper in overall, same goes for right, total of I + N - I - 1 extra depth). For BSTs, all subtree sizes are equally likely (because we pick the middle element at random and the rest fall on the left or right deterministically). Each subtree then averages 1/N * sum 0 to N-1 of D(j) Or.. Just use the quicksort analysis again!

Bonus: FindMin/FindMax
Find minimum Find maximum 10 5 15 2 9 20 Every now and then everyone succumbs to the temptation to really overuse color. 7 17 30

Double Bonus: Successor
Find the next larger node in this node’s subtree. // Note: If no succ, returns (a useful) NULL. Node *& succ(Node *& root) { if (root->right == NULL) return root->right; else return min(root->right); } Node *& min(Node *& root) { if (root->left == NULL) return root; else return min(root->left); 10 5 15 2 9 20 Here’s a little digression. Maybe it’ll even have an application at some point. Find the next larger node in 10’s subtree. Can we define it in terms of min and max? It’s the min of the right subtree! 7 17 30

More Double Bonus: Predecessor
Find the next smaller node in this node’s subtree. Node *& pred(Node *& root) { if (root->left == NULL) return root->left; else return max(root->left); } Node *& max(Node *& root) { if (root->right == NULL) return root; else return max(root->right); 10 5 15 2 9 20 Predecessor is just the mirror problem. 7 17 30

Deletion 10 5 15 2 9 20 And now for something completely different. Let’s say I want to delete a node. Why might it be harder than insertion? Might happen in the middle of the tree instead of at leaf. Then, I have to fix the BST. 7 17 30 Why might deletion be harder than insertion?

Lazy Deletion (“Tombstones”)
Instead of physically deleting nodes, just mark them as deleted simpler physical deletions done in batches some adds just flip deleted flag extra memory for “tombstone” many lazy deletions slow finds some operations may have to be modified (e.g., min and max) 10 5 15 2 9 20 Now, before we move on to all the pains of true deletion, let’s do it the easy way. We’ll just pretend we delete deleted nodes. This has some real advantages: … 7 17 30

Lazy Deletion Delete(17) Delete(15) Delete(5) Find(9) Find(16)
Insert(5) Find(17) 10 5 15 2 9 20 OK, let’s do some lazy deletions. Everybody yawn, stretch, and say “Mmmm… doughnut” to get in the mood. Those of you who are already asleep have the advantage. 7 17 30

Real Deletion - Leaf Case
Delete(17) 10 5 15 2 9 20 Alright, we did it the easy way, but what about real deletions? Leaves are easy; we just prune them. 7 17 30

Real Deletion - One Child Case
Delete(15) 10 5 15 2 9 20 Single child nodes we remove and… Do what? We can just pull up their children. Is the search tree property intact? Yes. 7 30

Real Deletion - Two Child Case
Delete(5) 10 5 20 2 9 30 Ah, now the hard case. How do we delete a two child node? We remove it and replace it with what? It has all these left and right children that need to be greater and less than the new value (respectively). Is there any value that is guaranteed to be between the two subtrees? Two of them: the successor and predecessor! So, let’s just replace the node’s value with it’s successor and then delete the succ. 7

Finally… 10 7 20 2 9 30 This slide is just for closure.

Delete Code void delete(Comparable key, Node *& root) { Node *& handle(find(key, root)); if (handle != NULL) { Node * toDelete = handle; if (handle->left == NULL) { // Leaf or one child handle = handle->right; } else if (handle->right == NULL) { // One child handle = handle->left; } else { // Two child case Node *& successor(succ(handle)); handle->data = successor->data; toDelete = successor; successor = successor->right; // Succ has <= 1 child } delete toDelete; Here’s the code for deletion using lots of confusing reference pointers BUT no leaders, fake nodes. The iterative version of this can get somewhat messy, but it’s not really any big deal. Refs make this short and “elegant”… but could be done without them with a bit more work.

Thinking about Binary Search Trees
Observations Each operation views two new elements at a time Elements (even siblings) may be scattered in memory Binary search trees are fast if they’re shallow Realities For large data sets, disk accesses dominate runtime Some deep and some shallow BSTs exist for any data OK, let’s think about BSTs in the same way we thought about heaps. Indeed, some of the same ideas come up. One more piece of bad news: what happens to a balanced tree after many insertions/deletions?

Solutions? Reduce disk accesses? Keep BSTs shallow?
How might we solve these problems? Reduce disk accesses: we need to have a bigger branching factor, just like with heaps. BUT what does the search tree property mean when the branching factor is above 2? To keep BSTs shallow, we can insist on one of the better arrangements.

Coming Up Self-balancing Binary Search Trees
Huge Search Tree Data Structure Possibly hashing in between..

CPSC 221: Data Structures Lecture #5 Branching Out

Similar presentations

Presentation on theme: "CPSC 221: Data Structures Lecture #5 Branching Out"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CPSC 221: Data Structures Lecture #5 Branching Out

Similar presentations

Presentation on theme: "CPSC 221: Data Structures Lecture #5 Branching Out"— Presentation transcript:

Similar presentations

About project

Feedback