CSE 326: Data Structures Lecture #7 Binary Search Trees

Slides:



Advertisements
Similar presentations
CSE 326: Data Structures Lecture #7 Branching Out Steve Wolfman Winter Quarter 2000.
Advertisements

CSE 326 Binary Search Trees David Kaplan Dept of Computer Science & Engineering Autumn 2001.
CSE 326: Data Structures Binary Search Trees Ben Lerner Summer 2007.
1 CSE 326: Data Structures Part Four: Trees Henry Kautz Autumn 2002.
CSE 326: Data Structures Lecture #7 Binary Search Trees Alon Halevy Spring Quarter 2001.
CSE 326: Data Structures Lecture #8 Binary Search Trees Alon Halevy Spring Quarter 2001.
CSE373: Data Structures & Algorithms Lecture 6: Binary Search Trees Nicki Dell Spring 2014 CSE373: Data Structures & Algorithms1.
CSE373: Data Structures & Algorithms Lecture 6: Binary Search Trees Lauren Milne Summer
CPSC 221: Data Structures Lecture #5 Branching Out Steve Wolfman 2014W1 1.
CSE373: Data Structures & Algorithms Lecture 6: Binary Search Trees continued Aaron Bauer Winter 2014 CSE373: Data Structures & Algorithms1.
Tree Data Structures. Heaps for searching Search in a heap? Search in a heap? Would have to look at root Would have to look at root If search item smaller.
1 CSE 326: Data Structures Trees Lecture 6: Friday, Jan 17, 2003.
CSE373: Data Structures & Algorithms Lecture 6: Binary Search Trees Linda Shapiro Winter 2015.
CSE 326: Data Structures Trees
Instructor: Lilian de Greef Quarter: Summer 2017
Lec 13 Oct 17, 2011 AVL tree – height-balanced tree Other options:
CSE 326: Data Structures Lecture #8 Pruning (and Pruning for the Lazy)
Binary Search Trees One of the tree applications in Chapter 10 is binary search trees. In Chapter 10, binary search trees are used to implement bags.
Binary Search Trees A binary search tree is a binary tree
CPSC 221: Data Structures Lecture #5 Branching Out
CPSC 221: Algorithms and Data Structures Lecture #6 Balancing Act
UNIT III TREES.
CS 332: Algorithms Red-Black Trees David Luebke /20/2018.
CSE 332 Data Abstractions B-Trees
Binary Search Trees.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
CPSC 221: Algorithms and Data Structures Lecture #6 Balancing Act
October 30th – Priority QUeues
Binary Search Tree Chapter 10.
Lecture 22 Binary Search Trees Chapter 10 of textbook
Trees.
Binary Search Trees Why this is a useful data structure. Terminology
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
CMSC 341 Lecture 10 Binary Search Trees
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
CSE373: Data Structures & Algorithms Lecture 6: Binary Search Trees
CSE373: Data Structures & Algorithms Lecture 6: Binary Search Trees
CMSC 341: Data Structures Priority Queues – Binary Heaps
Lec 12 March 9, 11 Mid-term # 1 (March 21?)
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Binary Search Trees One of the tree applications in Chapter 10 is binary search trees. In Chapter 10, binary search trees are used to implement bags.
David Kaplan Dept of Computer Science & Engineering Autumn 2001
CPSC 221: Algorithms and Data Structures Lecture #6 Balancing Act
CSE 373: Data Structures and Algorithms
B- Trees D. Frey with apologies to Tom Anastasio
CSE 332: Data Abstractions Binary Search Trees
CSE 332: Data Structures Priority Queues – Binary Heaps Part II
CSE 332: Data Abstractions AVL Trees
CSE 326: Data Structures Lecture #7 Dendrology
B- Trees D. Frey with apologies to Tom Anastasio
CPSC 221: Data Structures Lecture #5 Branching Out
Analysis of Algorithms - Trees
CSE 373: Data Structures and Algorithms
CSC 143 Binary Search Trees.
BST Insert To insert an element, we essentially do a find( ). When we reach a NULL pointer, we create a new node there. void BST::insert(const Comp &
CSE 326: Data Structures Lecture #8 Balanced Dendrology
Amortized Analysis and Heaps Intro
Lecture 10 Oct 1, 2012 Complete BST deletion Height-balanced BST
CSE 373 Data Structures and Algorithms
CSE 373 Data Structures and Algorithms
CSE 326: Data Structures Lecture #9 AVL II
CSE 373: Data Structures and Algorithms
Richard Anderson Spring 2016
Trees.
CSE 373: Data Structures and Algorithms
CSE 326: Data Structures Lecture #10 B-Trees
Trees in Data Structures
Binary Search Trees CS 580U Fall 17.
326 Lecture 9 Henry Kautz Winter Quarter 2002
CSE 373: Data Structures and Algorithms
Presentation transcript:

CSE 326: Data Structures Lecture #7 Binary Search Trees Henry Kautz Winter Quarter 2002

Binary Trees Properties Representation: max # of leaves = 2depth(tree) Notation: depth(tree) = MAX {depth(leaf)} = height(root) max # of leaves = 2depth(tree) max # of nodes = 2depth(tree)+1 – 1 max depth = n-1 average depth for n nodes = (over all possible binary trees) Representation: A B C D E F Alright, we’ll focus today on one type of trees called binary trees. Here’s one now. Is this binary tree complete? Why not? (C has just one child, right side is much deeper than left) What’s the maximum # of leaves a binary tree of depth d can have? What’s the max # of nodes a binary tree of depth d can have? Minimum? We won’t go into this, but if you take N nodes and assume all distinct trees of the nodes are equally likely, you get an average depth of SQRT(N). Is that bigger or smaller than log n? Bigger, so it’s not good enough! G H Data right pointer left I J

Dictionary & Search ADTs Operations create destroy insert find delete Dictionary: Stores values associated with user-specified keys keys may be any (homogenous) comparable type values may be any (homogenous) type implementation: data field is a struct with two parts Search ADT: keys = values kim chi spicy cabbage kreplach tasty stuffed dough kiwi Australian fruit insert kohlrabi - upscale tuber find(kreplach) kreplach - tasty stuffed dough Dictionaries associate some key with a value, just like a real dictionary (where the key is a word and the value is its definition). In this example, I’ve stored user-IDs associated with descriptions of their coolness level. This is probably the most valuable and widely used ADT we’ll hit. I’ll give you an example in a minute that should firmly entrench this concept.

Naïve Implementations unsorted array sorted linked list insert (w/o duplicates) (if no stretch) find delete (if no shrink) Goal: fast find like sorted array, dynamic inserts/deletes like linked list

Naïve Implementations unsorted array sorted linked list insert (w/o duplicates) find + O(1) (if no stretch) O(n) find O(log n) delete (if no shrink) Goal: fast find like sorted array, dynamic inserts/deletes like linked list

Binary Search Tree Dictionary Data Structure Search tree property all keys in left subtree smaller than root’s key all keys in right subtree larger than root’s key result: easy to find any given key inserts/deletes by changing links 8 5 11 2 6 10 12 A binary search tree is a binary tree in which all nodes in the left subtree of a node have lower values than the node. All nodes in the right subtree of a node have higher value than the node. It’s like making that recursion into the data structure! I’m storing integers at each node. Does everybody think that’s what I’m _really_ going to store? What do I need to know about what I store? (comparison, equality testing) 4 7 9 14 13

In Order Listing visit left subtree visit node visit right subtree 10 5 15 2 9 20 Anyone notice anything interesting about that in-order listing? Everything in the left subtree is listed first. Then the root. Then everything in the right subtree. OK, let’s work out the code to make the in-order listing. Is there an iterative version that doesn’t use its own stack? Not really, no. So, recursion is probably OK here. Anyway, if the tree’s too deep for recursion, you must have a huge amount of data. If (n != null) inorder(n->left) cout << n inorder(n->right) 7 17 30 In order listing:

In Order Listing visit left subtree visit node visit right subtree 10 5 15 2 9 20 Anyone notice anything interesting about that in-order listing? Everything in the left subtree is listed first. Then the root. Then everything in the right subtree. OK, let’s work out the code to make the in-order listing. Is there an iterative version that doesn’t use its own stack? Not really, no. So, recursion is probably OK here. Anyway, if the tree’s too deep for recursion, you must have a huge amount of data. If (n != null) inorder(n->left) cout << n inorder(n->right) 7 17 30 In order listing: 25791015172030

Finding a Node 10 5 15 2 9 20 7 17 30 runtime: Node *& find(Comparable x, Node * root) { if (root == NULL) return root; else if (x < root->key) return find(x,root->left); else if (x > root->key) return find(x, root->right); else } 10 5 15 2 9 20 Now, let’s try finding a node. Find 9. This time I’ll supply the code. This should look a _lot_ like binary search! How long does it take? Log n is an easy answer, but what if the tree is very lopsided? So really, this is worst case O(n)! A better answer is theta of the depth of the node sought. If we can bound the depth of that node, we can bound the length of time a search takes. What about the code? All those &s and *s should look pretty scary. Let’s talk through them. 7 17 30 runtime:

Insert Concept: proceed down tree as in Find; if new key not found, then insert a new node at last spot traversed void insert(Comparable x, Node * root) { assert ( root != NULL ); if (x < root->key){ if (root->left == NULL) root->left = new Node(x); else insert( x, root->left ); } else if (x > root->key){ if (root->right == NULL) root->right = new Node(x); else insert( x, root->right ); } } Let’s do some inserts: insert(8) insert (11) insert(31)

Tricky Insert runtime: C++ trick: use reference parameters void insert(Comparable x, Node * & root) { if ( root == NULL ) root = new Node(x); else if (x < root->key) insert( x, root->left ); else insert( x, root->right ); } Works even when called with empty tree – node * myTree = NULL; insert( something, myTree ); sets the variable myTree to point to the newly created node runtime:

Digression: Value vs. Reference Parameters Value parameters (Object foo) copies parameter no side effects Reference parameters (Object & foo) shares parameter can affect actual value use when the value needs to be changed Const reference parameters (const Object & foo) cannot affect actual value use when the value is too intricate for pass-by-value A momentary digression. I did some tricky stuff with reference variables there. Does anyone thing it would be a good idea to have the find I described back there as the interface to a Search ADT? NO! It exposes really nasty details. But, it’s fine for internal use, and it can easily be called by the real external find. Here’s a brief description of value and reference parameters and when you really want to use them.

Really Tricky Insert void insert(Comparable x, Node * & root){ Node * & target = find(x, root); if ( target == NULL ) target = new Node(x); }

BuildTree for BSTs Suppose a1, a2, …, an are inserted into an initially empty BST: a1, a2, …, an are in increasing order a1, a2, …, an are in decreasing order a1 is the median of all, a2 is the median of elements less than a1, a3 is the median of elements greater than a1, etc. data is randomly ordered OK, we had a buildHeap, let’s buildTree. How long does this take? Well, IT DEPENDS! Let’s say we want to build a tree from 123456789 What happens if we insert in order? Reverse order? What about 5, then 3, then 7, then 2, then 1, then 6, then 8, then 9?

Analysis of BuildTree Worst case is O(n2) 1 + 2 + 3 + … + n = O(n2) Average case assuming all input sequences are equally likely is O(n log n) equivalently: average depth of a node is log n proof: see Introduction to Algorithms, Cormen, Leiserson, & Rivest Average runtime is equal to the average depth of a node in the tree. We’ll calculate the average depth by finding the sum of all depths in the tree, and dividing by the number of nodes. What’s the sum of all depths? D(n) = D(I) + D(N - I - 1) + N - 1 (left subtree = I, root is 1 node, so right = n - I - 1. D(I) is depth of left, 1 node deeper in overall, same goes for right, total of I + N - I - 1 extra depth). For BSTs, all subtree sizes are equally likely (because we pick the middle element and random and the rest fall on the left or right determinically). Each subtree then averages 1/N * sum 0 to N-1 of D(j)

Proof that Average Depth of a Node in a BST constructed from random data is O(log n) Calculate sum of all depths, divide by number of nodes D(n) = sum of depths of all nodes in a random BST containing n nodes D(n) = D(left subtree)+D(right subtree) +1*(number of nodes in left and right subtrees) D(n) = D(L)+D(n-L-1)+(n-1) For random data, all subtree sizes equally likely

Deletion 10 5 15 2 9 20 And now for something completely different. Let’s say I want to delete a node. Why might it be harder than insertion? Might happen in the middle of the tree instead of at leaf. Then, I have to fix the BST. 7 17 30 Why might deletion be harder than insertion?

FindMin/FindMax 10 5 15 2 9 20 7 17 30 Node * min(Node * root) { if (root->left == NULL) return root; else return min(root->left); } 7 17 30 Every now and then everyone succumbs to the temptation to really overuse color. How many children can the min of a node have?

Successor Find the next larger node in this node’s subtree. not next larger in entire tree Node * succ(Node * root) { if (root->right == NULL) return NULL; else return min(root->right); } 10 5 15 2 9 20 Here’s a little digression. Maybe it’ll even have an application at some point. Find the next larger node in 10’s subtree. Can we define it in terms of min and max? It’s the min of the right subtree! 7 17 30 How many children can the successor of a node have?

Predecessor Find the next smaller node in this node’s subtree. 10 5 15 Node * pred(Node * root) { if (root->left == NULL) return NULL; else return max(root->left); } 10 5 15 2 9 20 Predecessor is just the mirror problem. 7 17 30

Deletion - Leaf Case Delete(17) 10 5 15 2 9 20 7 17 30 Alright, we did it the easy way, but what about real deletions? Leaves are easy; we just prune them. 7 17 30

Deletion - One Child Case Delete(15) 10 5 15 2 9 20 Single child nodes we remove and… Do what? We can just pull up their children. Is the search tree property intact? Yes. 7 30

Deletion - Two Child Case Delete(5) 10 5 20 2 9 30 Ah, now the hard case. How do we delete a two child node? We remove it and replace it with what? It has all these left and right children that need to be greater and less than the new value (respectively). Is there any value that is guaranteed to be between the two subtrees? Two of them: the successor and predecessor! So, let’s just replace the node’s value with it’s successor and then delete the succ. 7 replace node with value guaranteed to be between the left and right subtrees: the successor Could we have used the predecessor instead?

Deletion - Two Child Case Delete(5) 10 5 20 2 9 30 Ah, now the hard case. How do we delete a two child node? We remove it and replace it with what? It has all these left and right children that need to be greater and less than the new value (respectively). Is there any value that is guaranteed to be between the two subtrees? Two of them: the successor and predecessor! So, let’s just replace the node’s value with it’s successor and then delete the succ. 7 always easy to delete the successor – always has either 0 or 1 children!

Deletion - Two Child Case Delete(5) 10 7 20 2 9 30 Ah, now the hard case. How do we delete a two child node? We remove it and replace it with what? It has all these left and right children that need to be greater and less than the new value (respectively). Is there any value that is guaranteed to be between the two subtrees? Two of them: the successor and predecessor! So, let’s just replace the node’s value with it’s successor and then delete the succ. 7 Finally copy data value from deleted successor into original node

Lazy Deletion Instead of physically deleting nodes, just mark them as deleted simpler physical deletions done in batches some adds just flip deleted flag extra memory for deleted flag many lazy deletions slow finds some operations may have to be modified (e.g., min and max) 10 5 15 Now, before we move on to all the pains of true deletion, let’s do it the easy way. We’ll just pretend we delete deleted nodes. This has some real advantages: … 2 9 20 7 17 30

Dictionary Implementations unsorted array sorted linked list BST insert find + O(1) O(n) O(Depth) find O(log n) delete BST’s looking good for shallow trees, i.e. the depth D is small (log n), otherwise as bad as a linked list!