CSE 326: Data Structures Lecture #7 Dendrology

Slides:



Advertisements
Similar presentations
CSE 326: Data Structures Lecture #7 Branching Out Steve Wolfman Winter Quarter 2000.
Advertisements

CSE 326 Binary Search Trees David Kaplan Dept of Computer Science & Engineering Autumn 2001.
CSE 326: Data Structures Binary Search Trees Ben Lerner Summer 2007.
CSE 326: Data Structures Lecture #7 Binary Search Trees Alon Halevy Spring Quarter 2001.
CSE373: Data Structures & Algorithms Lecture 6: Binary Search Trees Nicki Dell Spring 2014 CSE373: Data Structures & Algorithms1.
CSE373: Data Structures & Algorithms Lecture 6: Binary Search Trees Lauren Milne Summer
CPSC 221: Data Structures Lecture #5 Branching Out Steve Wolfman 2014W1 1.
Outline Binary Trees Binary Search Tree Treaps. Binary Trees The empty set (null) is a binary tree A single node is a binary tree A node has a left child.
WEEK 3 Leftist Heaps CE222 Dr. Senem Kumova Metin CE222_Dr. Senem Kumova Metin.
CSE373: Data Structures & Algorithms Lecture 6: Binary Search Trees Linda Shapiro Winter 2015.
CSE 326: Data Structures Lecture #6 Putting Our Heaps Together Steve Wolfman Winter Quarter 2000.
CSE332: Data Abstractions Lecture 7: AVL Trees
CSE373: Data Structures & Algorithms Priority Queues
CSE373: Data Structures & Algorithms Lecture 5: Dictionary ADTs; Binary Trees Linda Shapiro Winter 2015.
CSE 326: Data Structures Priority Queues (Heaps)
DAST Tirgul 7.
CSE 326: Data Structures Trees
Instructor: Lilian de Greef Quarter: Summer 2017
Week 7 - Friday CS221.
CSE 326: Data Structures Lecture #8 Pruning (and Pruning for the Lazy)
Binary Search Trees One of the tree applications in Chapter 10 is binary search trees. In Chapter 10, binary search trees are used to implement bags.
Binary Search Trees A binary search tree is a binary tree
CPSC 221: Data Structures Lecture #5 Branching Out
CPSC 221: Algorithms and Data Structures Lecture #6 Balancing Act
UNIT III TREES.
CSE 332 Data Abstractions B-Trees
Binary Search Trees.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
CPSC 221: Algorithms and Data Structures Lecture #6 Balancing Act
CSE373: Data Structures & Algorithms
October 30th – Priority QUeues
COMP 103 Binary Search Trees.
CMSC 341 Lecture 13 Leftist Heaps
i206: Lecture 13: Recursion, continued Trees
Binary Search Trees Why this is a useful data structure. Terminology
Binary Trees, Binary Search Trees
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
CSE373: Data Structures & Algorithms Lecture 5: Dictionary ADTs; Binary Trees Catie Baker Spring 2015.
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
CSE373: Data Structures & Algorithms Lecture 6: Binary Search Trees
CSE373: Data Structures & Algorithms Lecture 6: Binary Search Trees
CMSC 341: Data Structures Priority Queues – Binary Heaps
CSE 326: Data Structures Lecture #4 Heaps more Priority Qs
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Binary Search Trees One of the tree applications in Chapter 10 is binary search trees. In Chapter 10, binary search trees are used to implement bags.
David Kaplan Dept of Computer Science & Engineering Autumn 2001
CPSC 221: Algorithms and Data Structures Lecture #6 Balancing Act
CSE 373: Data Structures and Algorithms
CSE 332: Data Abstractions Binary Search Trees
CSE373: Data Structures & Algorithms Lecture 5: AVL Trees
CSE 326: Data Structures Lecture #5 Political Heaps
Trees CMSC 202, Version 5/02.
CMSC 202 Trees.
CSE 326: Data Structures Priority Queues & Binary Heaps
CSE 332: Data Structures Priority Queues – Binary Heaps Part II
CSE 332: Data Abstractions AVL Trees
CPSC 221: Data Structures Lecture #5 Branching Out
Analysis of Algorithms - Trees
CSE 373: Data Structures and Algorithms
CSE 326: Data Structures Lecture #8 Balanced Dendrology
CSE 326: Data Structures Lecture #7 Binary Search Trees
CSE 373 Data Structures and Algorithms
CSE 373 Data Structures and Algorithms
CSE 326: Data Structures Lecture #9 AVL II
CSE 373: Data Structures and Algorithms
CSE 326: Data Structures Lecture #5 Heaps More
CSE 326: Data Structures Lecture #10 Amazingly Vexing Letters
CSE 326: Data Structures Lecture #10 B-Trees
CSE 326: Data Structures Lecture #14
326 Lecture 9 Henry Kautz Winter Quarter 2002
Presentation transcript:

CSE 326: Data Structures Lecture #7 Dendrology What is dendrology?? Bart Niswonger Summer Quarter 2001

Today’s Outline Correction & Clarification Basic Tree Data Structure Dictionary & Search ADTs Binary Search Trees Today we’ll start to cover trees in more detail But lets start with a correction – thank you Ashish! – and a clarification

Clarification height & depth Defined Length of a path equals number of edges on the path height(n) : length of the longest path from n to a leaf depth(n) : length of path from root to n height of a tree equals height(root)

Right Path in a Leftist Tree is Short Correction Right Path in a Leftist Tree is Short If the right path has length at least r, the tree has at least 2r+1-1 nodes Proof by induction Basis: r = 1. Tree has at least three nodes: 21+1 - 1 = 3 Inductive step: assume true for r’ < r. The right subtree has a right path of length at least r - 1, so it has at least 2r – 1 nodes. The left subtree must also have a right path of length at least r - 1 (otherwise not leftist). Again, the left has 2r - 1 nodes. All told then, there are at least: (2r – 1) + (2r – 1) + 1 = 2r+1 - 1 So, a leftist tree with at least n nodes has a right path of length at most log(n + 1) ~ log n 2 1 1 1 Why do we have this leftist property? Because it guarantees that the right path is really short compared to the number of nodes in the tree without being as stodgy as the heap structure property. Here’s a proof which boils down to the fact that if the left path is r long, the null path length must be at least r (otherwise somewhere along there, there’s a node with a larger right npl than left). Since the tree is complete at least to depth r, there are at least 2r - 1 nodes in the tree.

A Generic Tree A I D H J B F K L E C G Here is a generic tree – nothing special about the branching factor, nothing special about the node ordering, incomplete, unbalanced, not even leftist, its just a tree

A Generic Tree Data Structure next_sibling first_child A In general, the branching factor of a tree can be quite high. To efficiently store such a tree, we use a structure like this… why? It allows nodes to have an arbitrary number of children, but is efficient even if they have only one What about child order? Every level is a linked list… maybe that is a useful way of thinking about it B C D E 

A Generic Tree in A Generic Tree Data Structure B C D E H I J Here is our generic tree stored in our generic data structure… F G K L

Combined View of Tree A B C D E F G H I J K L To help visualize…. Traversals: Pre-order: a b f g k c d h I l j e Post-order: f k g b c h l I j d e a Level-order: a b c d e f g h I j k l F G H I J K L

Traversals Many algorithms involve walking through a tree, and performing some computation at each node Walking through a tree is called a traversal Common kinds of traversal Pre-order Post-order Level-order We will take a moment to review traversals with more examples than last time – this should be review, review that we have reviewed before even! Do these remind you of anything? Pre/post-order: DFS; level-order: BFS

Binary Trees Binary tree is Properties Representation: A B C D E F G H a root left subtree (maybe empty) right subtree (maybe empty) Properties max # of leaves: max # of nodes: average height for N nodes: Representation: A B C D E F Alright, we’ll focus today on one type of trees called binary trees. Here’s one now. Is this binary tree complete? Why not? (C has just one child, right side is much deeper than left) What’s the maximum # of leaves a binary tree of depth d/height h can have? 2d What’s the max # of nodes a binary tree of depth d/height h can have? 2d + 1 - 1 Minimum? 2d-1 + 1 ; 2d We won’t go into this, but if you take N nodes and assume all distinct trees of the nodes are equally likely, you get an average depth/height of SQRT(N). Is that bigger or smaller than log n? Bigger, so it’s not good enough! We will see we need to impose structure to get the bounds we want G H Data right pointer left I J

Representation A A B C B C D E F D E F right pointer left right Here is one common way to represent binary trees – notice that this is no more or less efficient than the previous technique (still two pointers) and that by being tailored to the structure at hand it is more intuitive D right pointer left E right pointer left F right pointer left

Dictionary ADT Dictionary operations create destroy insert find delete Stores values associated with user-specified keys values may be any (homogenous) type keys may be any (homogenous) comparable type kim chi spicy cabbage Krispy Kreme tasty doughnut kiwi Australian fruit kale leafy green Kool Aid fruit (?) drink insert kohlrabi - upscale tuber find(kiwi) kiwi - Australian fruit Dictionaries associate some key with a value, just like a real dictionary (where the key is a word and the value is its definition). In this example, I’ve stored foods associated with short descriptions. This is probably the most valuable and widely used ADT we’ll hit. I’ll give you an example in a minute that should firmly entrench this concept.

Search ADT Dictionary operations Stores keys create destroy insert find delete Stores keys keys may be any (homogenous) comparable quickly tests for membership Klee Matisse Rodin Whistler Heartfield Pollock Gross insert Hopper find(Rodin) Rodin This is a scaled back version of the dictionary ADT in which we essentially drop the values and leave only the keys. We’ll focus on this when looking at binary search trees. BUT, how hard would it be to move to a Dictionary ADT implementation from a Search ADT implementation?

A Modest Few Uses Arrays Sets Dictionaries Router tables Page tables Symbol tables C++ Structures Our ADT algorithm says to look at some applications at this point. I think the first app pretty much says it all. We move on from there however, to other incredibly widely used applications. I know I’ve said this before, but this is probably the most important and one of the most widely used ADTs we’ll look at. For those keeping track, priority queues are _not_ as widely used as Dictionaries.

Naïve Implementations insert find delete Linked list Unsorted array Sorted array LL: O(1), O(n), O(n) Uns: O(1), O(n), O(n) Sorted: O(n), O(log n), O(n) Sorted array is oh-so-close. O(log n) find time and almost O(log n) insert time. What’s wrong? Let’s look at how that search goes: Draw recursive calls (and potential recursive calls) in binary search. Note how it starts looking like a binary tree where the left subtrees have smaller elements and the right subtrees have bigger elements. What if we could store the whole thing in the structure this recursive search is building? so close!

Naïve Implementations unsorted array sorted array linked list insert find + O(n) O(n) find + O(1) find O(log n) delete (if no shrink) Goal: fast find like sorted array, dynamic inserts/deletes like linked list

Binary Search Tree Dictionary Data Structure Binary tree property each node has  2 children result: storage is small operations are simple average depth is small Search tree property all keys in left subtree smaller than root’s key all keys in right subtree larger than root’s key easy to find any given key 8 5 11 2 6 10 12 A binary search tree is a binary tree in which all nodes in the left subtree of a node have lower values than the node. All nodes in the right subtree of a node have higher value than the node. It’s like making that recursion into the data structure! I’m storing integers at each node. Does everybody think that’s what I’m _really_ going to store? What do I need to know about what I store? (comparison, equality testing) 4 7 9 14 13

Example and Counter-Example Getting to Know BSTs Example and Counter-Example 15 8 4 8 5 11 1 7 11 2 7 6 10 18 Why is the one on the left a BST? It’s not complete! (B/c BSTs don’t need to be complete) Why isn’t the one on the right a BST? Three children of 5 20 has a left child larger than it. What’s wrong with 11? Even though 15 isn’t a direct child, it _still_ needs to be less than 11! 3 4 15 20 BINARY SEARCH TREE 21 NOT A BINARY SEARCH TREE

Getting to Know All About BSTs In Order Listing 10 5 15 2 9 20 Anyone notice anything interesting about that in-order listing? Everything in the left subtree is listed first. Then the root. Then everything in the right subtree. OK, let’s work out the code to make the in-order listing. Is there an iterative version that doesn’t use its own stack? Not really, no. So, recursion is probably OK here. Anyway, if the tree’s too deep for recursion, you must have a huge amount of data. If (n != null) inorder(n->left) cout << n inorder(n->right) 7 17 30 In order listing: 25791015172030

Finding a Node Getting to Like BSTs 10 5 15 2 9 20 7 17 30 runtime: Node *& find(Comparable key, Node *& root) { if (root == NULL) return root; else if (key < root->key) return find(key, root->left); else if (key > root->key) root->right); else } 10 5 15 2 9 20 Now, let’s try finding a node. Find 9. This time I’ll supply the code. This should look a _lot_ like binary search! How long does it take? Log n is an easy answer, but what if the tree is very lopsided? So really, this is worst case O(n)! A better answer is theta of the depth of the node sought. If we can bound the depth of that node, we can bound the length of time a search takes. What about the code? All those &s and *s should look pretty scary. Let’s talk through them. 7 17 30 runtime:

Getting to Hope BSTs Like You Insert void insert(Comparable key, Node * root) { Node *& target = find(key, root); assert(target == NULL); target = new Node(key); } 10 5 15 2 9 20 Let’s do some inserts: insert(8) insert (11) insert(31) 7 17 30 runtime:

Digression: Value vs. Reference Parameters Value parameters (Object foo) copies parameter no side effects Reference parameters (Object & foo) shares parameter can affect actual value use when the value needs to be changed Const reference parameters (const Object & foo) cannot affect actual value A momentary digression. I did some tricky stuff with reference variables there. Does anyone thing it would be a good idea to have the find I described back there as the interface to a Search ADT? NO! It exposes really nasty details. But, it’s fine for internal use, and it can easily be called by the real external find. Here’s a brief description of value and reference parameters and when you really want to use them.

BuildTree for BSTs Suppose the data 1, 2, 3, 4, 5, 6, 7, 8, 9 is inserted into an initially empty BST: in order in reverse order median first, then left median, right median, etc. OK, we had a buildHeap, let’s buildTree. How long does this take? Well, IT DEPENDS! Let’s say we want to build a tree from 123456789 What happens if we insert in order? Reverse order? What about 5, then 3, then 7, then 2, then 1, then 6, then 8, then 9?

Analysis of BuildTree Worst case: O(n2) as we’ve seen Average case assuming all orderings equally likely: Average runtime is equal to the average depth of a node in the tree. We’ll calculate the average depth by finding the sum of all depths in the tree, and dividing by the number of nodes. What’s the sum of all depths? D(n) = D(I) + D(N - I - 1) + N - 1 (left subtree = I, root is 1 node, so right = n - I - 1. D(I) is depth of left, 1 node deeper in overall, same goes for right, total of I + N - I - 1 extra depth). For BSTs, all subtree sizes are equally likely (because we pick the middle element and random and the rest fall on the left or right deterministically). Each subtree then averages 1/N * sum 0 to N-1 of D(j) -- O(n log n) (proof in chapter 7) D(n) = 2/N( sum j = 0 to n-1 of D(j) ) + N – 1 Expected depth == O(log N) for any node

Bonus: FindMin/FindMax Find minimum Find maximum 10 5 15 2 9 20 Every now and then everyone succumbs to the temptation to really overuse color. 7 17 30