CSE 326: Data Structures Lecture #7 Binary Search Trees

CSE 326: Data Structures Lecture #7 Binary Search Trees
Henry Kautz Winter Quarter 2002

Binary Trees Properties Representation: max # of leaves = 2depth(tree)
Notation: depth(tree) = MAX {depth(leaf)} = height(root) max # of leaves = 2depth(tree) max # of nodes = 2depth(tree)+1 – 1 max depth = n-1 average depth for n nodes = (over all possible binary trees) Representation: A B C D E F Alright, we’ll focus today on one type of trees called binary trees. Here’s one now. Is this binary tree complete? Why not? (C has just one child, right side is much deeper than left) What’s the maximum # of leaves a binary tree of depth d can have? What’s the max # of nodes a binary tree of depth d can have? Minimum? We won’t go into this, but if you take N nodes and assume all distinct trees of the nodes are equally likely, you get an average depth of SQRT(N). Is that bigger or smaller than log n? Bigger, so it’s not good enough! G H Data right pointer left I J

Dictionary & Search ADTs
Operations create destroy insert find delete Dictionary: Stores values associated with user-specified keys keys may be any (homogenous) comparable type values may be any (homogenous) type implementation: data field is a struct with two parts Search ADT: keys = values kim chi spicy cabbage kreplach tasty stuffed dough kiwi Australian fruit insert kohlrabi - upscale tuber find(kreplach) kreplach - tasty stuffed dough Dictionaries associate some key with a value, just like a real dictionary (where the key is a word and the value is its definition). In this example, I’ve stored user-IDs associated with descriptions of their coolness level. This is probably the most valuable and widely used ADT we’ll hit. I’ll give you an example in a minute that should firmly entrench this concept.

Naïve Implementations
unsorted array sorted linked list insert (w/o duplicates) (if no stretch) find delete (if no shrink) Goal: fast find like sorted array, dynamic inserts/deletes like linked list

Naïve Implementations
unsorted array sorted linked list insert (w/o duplicates) find + O(1) (if no stretch) O(n) find O(log n) delete (if no shrink) Goal: fast find like sorted array, dynamic inserts/deletes like linked list

Binary Search Tree Dictionary Data Structure
Search tree property all keys in left subtree smaller than root’s key all keys in right subtree larger than root’s key result: easy to find any given key inserts/deletes by changing links 8 5 11 2 6 10 12 A binary search tree is a binary tree in which all nodes in the left subtree of a node have lower values than the node. All nodes in the right subtree of a node have higher value than the node. It’s like making that recursion into the data structure! I’m storing integers at each node. Does everybody think that’s what I’m _really_ going to store? What do I need to know about what I store? (comparison, equality testing) 4 7 9 14 13

In Order Listing visit left subtree visit node visit right subtree 10
5 15 2 9 20 Anyone notice anything interesting about that in-order listing? Everything in the left subtree is listed first. Then the root. Then everything in the right subtree. OK, let’s work out the code to make the in-order listing. Is there an iterative version that doesn’t use its own stack? Not really, no. So, recursion is probably OK here. Anyway, if the tree’s too deep for recursion, you must have a huge amount of data. If (n != null) inorder(n->left) cout << n inorder(n->right) 7 17 30 In order listing:

In Order Listing visit left subtree visit node visit right subtree 10
5 15 2 9 20 Anyone notice anything interesting about that in-order listing? Everything in the left subtree is listed first. Then the root. Then everything in the right subtree. OK, let’s work out the code to make the in-order listing. Is there an iterative version that doesn’t use its own stack? Not really, no. So, recursion is probably OK here. Anyway, if the tree’s too deep for recursion, you must have a huge amount of data. If (n != null) inorder(n->left) cout << n inorder(n->right) 7 17 30 In order listing: 25791015172030

Finding a Node 10 5 15 2 9 20 7 17 30 runtime:
Node *& find(Comparable x, Node * root) { if (root == NULL) return root; else if (x < root->key) return find(x,root->left); else if (x > root->key) return find(x, root->right); else } 10 5 15 2 9 20 Now, let’s try finding a node. Find 9. This time I’ll supply the code. This should look a _lot_ like binary search! How long does it take? Log n is an easy answer, but what if the tree is very lopsided? So really, this is worst case O(n)! A better answer is theta of the depth of the node sought. If we can bound the depth of that node, we can bound the length of time a search takes. What about the code? All those &s and *s should look pretty scary. Let’s talk through them. 7 17 30 runtime:

Insert Concept: proceed down tree as in Find; if new key not found, then insert a new node at last spot traversed void insert(Comparable x, Node * root) { assert ( root != NULL ); if (x < root->key){ if (root->left == NULL) root->left = new Node(x); else insert( x, root->left ); } else if (x > root->key){ if (root->right == NULL) root->right = new Node(x); else insert( x, root->right ); } } Let’s do some inserts: insert(8) insert (11) insert(31)

Tricky Insert runtime: C++ trick: use reference parameters
void insert(Comparable x, Node * & root) { if ( root == NULL ) root = new Node(x); else if (x < root->key) insert( x, root->left ); else insert( x, root->right ); } Works even when called with empty tree – node * myTree = NULL; insert( something, myTree ); sets the variable myTree to point to the newly created node runtime:

Digression: Value vs. Reference Parameters
Value parameters (Object foo) copies parameter no side effects Reference parameters (Object & foo) shares parameter can affect actual value use when the value needs to be changed Const reference parameters (const Object & foo) cannot affect actual value use when the value is too intricate for pass-by-value A momentary digression. I did some tricky stuff with reference variables there. Does anyone thing it would be a good idea to have the find I described back there as the interface to a Search ADT? NO! It exposes really nasty details. But, it’s fine for internal use, and it can easily be called by the real external find. Here’s a brief description of value and reference parameters and when you really want to use them.

Really Tricky Insert void insert(Comparable x, Node * & root){
Node * & target = find(x, root); if ( target == NULL ) target = new Node(x); }

BuildTree for BSTs Suppose a1, a2, …, an are inserted into an initially empty BST: a1, a2, …, an are in increasing order a1, a2, …, an are in decreasing order a1 is the median of all, a2 is the median of elements less than a1, a3 is the median of elements greater than a1, etc. data is randomly ordered OK, we had a buildHeap, let’s buildTree. How long does this take? Well, IT DEPENDS! Let’s say we want to build a tree from What happens if we insert in order? Reverse order? What about 5, then 3, then 7, then 2, then 1, then 6, then 8, then 9?

Analysis of BuildTree Worst case is O(n2)
… + n = O(n2) Average case assuming all input sequences are equally likely is O(n log n) equivalently: average depth of a node is log n proof: see Introduction to Algorithms, Cormen, Leiserson, & Rivest Average runtime is equal to the average depth of a node in the tree. We’ll calculate the average depth by finding the sum of all depths in the tree, and dividing by the number of nodes. What’s the sum of all depths? D(n) = D(I) + D(N - I - 1) + N - 1 (left subtree = I, root is 1 node, so right = n - I - 1. D(I) is depth of left, 1 node deeper in overall, same goes for right, total of I + N - I - 1 extra depth). For BSTs, all subtree sizes are equally likely (because we pick the middle element and random and the rest fall on the left or right determinically). Each subtree then averages 1/N * sum 0 to N-1 of D(j)

Proof that Average Depth of a Node in a BST constructed from random data is O(log n)
Calculate sum of all depths, divide by number of nodes D(n) = sum of depths of all nodes in a random BST containing n nodes D(n) = D(left subtree)+D(right subtree) +1*(number of nodes in left and right subtrees) D(n) = D(L)+D(n-L-1)+(n-1) For random data, all subtree sizes equally likely

Deletion 10 5 15 2 9 20 And now for something completely different. Let’s say I want to delete a node. Why might it be harder than insertion? Might happen in the middle of the tree instead of at leaf. Then, I have to fix the BST. 7 17 30 Why might deletion be harder than insertion?

FindMin/FindMax 10 5 15 2 9 20 7 17 30 Node * min(Node * root) {
if (root->left == NULL) return root; else return min(root->left); } 7 17 30 Every now and then everyone succumbs to the temptation to really overuse color. How many children can the min of a node have?

Successor Find the next larger node in this node’s subtree.
not next larger in entire tree Node * succ(Node * root) { if (root->right == NULL) return NULL; else return min(root->right); } 10 5 15 2 9 20 Here’s a little digression. Maybe it’ll even have an application at some point. Find the next larger node in 10’s subtree. Can we define it in terms of min and max? It’s the min of the right subtree! 7 17 30 How many children can the successor of a node have?

Predecessor Find the next smaller node in this node’s subtree. 10 5 15
Node * pred(Node * root) { if (root->left == NULL) return NULL; else return max(root->left); } 10 5 15 2 9 20 Predecessor is just the mirror problem. 7 17 30

Deletion - Leaf Case Delete(17) 10 5 15 2 9 20 7 17 30
Alright, we did it the easy way, but what about real deletions? Leaves are easy; we just prune them. 7 17 30

Deletion - One Child Case
Delete(15) 10 5 15 2 9 20 Single child nodes we remove and… Do what? We can just pull up their children. Is the search tree property intact? Yes. 7 30

Deletion - Two Child Case
Delete(5) 10 5 20 2 9 30 Ah, now the hard case. How do we delete a two child node? We remove it and replace it with what? It has all these left and right children that need to be greater and less than the new value (respectively). Is there any value that is guaranteed to be between the two subtrees? Two of them: the successor and predecessor! So, let’s just replace the node’s value with it’s successor and then delete the succ. 7 replace node with value guaranteed to be between the left and right subtrees: the successor Could we have used the predecessor instead?

Delete(5) 10 5 20 2 9 30 Ah, now the hard case. How do we delete a two child node? We remove it and replace it with what? It has all these left and right children that need to be greater and less than the new value (respectively). Is there any value that is guaranteed to be between the two subtrees? Two of them: the successor and predecessor! So, let’s just replace the node’s value with it’s successor and then delete the succ. 7 always easy to delete the successor – always has either 0 or 1 children!

Delete(5) 10 7 20 2 9 30 Ah, now the hard case. How do we delete a two child node? We remove it and replace it with what? It has all these left and right children that need to be greater and less than the new value (respectively). Is there any value that is guaranteed to be between the two subtrees? Two of them: the successor and predecessor! So, let’s just replace the node’s value with it’s successor and then delete the succ. 7 Finally copy data value from deleted successor into original node

Lazy Deletion Instead of physically deleting nodes, just mark them as deleted simpler physical deletions done in batches some adds just flip deleted flag extra memory for deleted flag many lazy deletions slow finds some operations may have to be modified (e.g., min and max) 10 5 15 Now, before we move on to all the pains of true deletion, let’s do it the easy way. We’ll just pretend we delete deleted nodes. This has some real advantages: … 2 9 20 7 17 30

Dictionary Implementations
unsorted array sorted linked list BST insert find + O(1) O(n) O(Depth) find O(log n) delete BST’s looking good for shallow trees, i.e. the depth D is small (log n), otherwise as bad as a linked list!

CSE 326: Data Structures Lecture #7 Binary Search Trees

Similar presentations

Presentation on theme: "CSE 326: Data Structures Lecture #7 Binary Search Trees"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSE 326: Data Structures Lecture #7 Binary Search Trees

Similar presentations

Presentation on theme: "CSE 326: Data Structures Lecture #7 Binary Search Trees"— Presentation transcript:

Similar presentations

About project

Feedback