CSE 326: Data Structures Lecture #7 Dendrology

CSE 326: Data Structures Lecture #7 Dendrology
What is dendrology?? Bart Niswonger Summer Quarter 2001

Today’s Outline Correction & Clarification Basic Tree Data Structure
Dictionary & Search ADTs Binary Search Trees Today we’ll start to cover trees in more detail But lets start with a correction – thank you Ashish! – and a clarification

Clarification height & depth Defined Length of a path equals number of edges on the path height(n) : length of the longest path from n to a leaf depth(n) : length of path from root to n height of a tree equals height(root)

Right Path in a Leftist Tree is Short
Correction Right Path in a Leftist Tree is Short If the right path has length at least r, the tree has at least 2r+1-1 nodes Proof by induction Basis: r = 1. Tree has at least three nodes: = 3 Inductive step: assume true for r’ < r. The right subtree has a right path of length at least r - 1, so it has at least 2r – 1 nodes. The left subtree must also have a right path of length at least r - 1 (otherwise not leftist). Again, the left has 2r - 1 nodes. All told then, there are at least: (2r – 1) + (2r – 1) + 1 = 2r+1 - 1 So, a leftist tree with at least n nodes has a right path of length at most log(n + 1) ~ log n 2 1 1 1 Why do we have this leftist property? Because it guarantees that the right path is really short compared to the number of nodes in the tree without being as stodgy as the heap structure property. Here’s a proof which boils down to the fact that if the left path is r long, the null path length must be at least r (otherwise somewhere along there, there’s a node with a larger right npl than left). Since the tree is complete at least to depth r, there are at least 2r - 1 nodes in the tree.

A Generic Tree A I D H J B F K L E C G
Here is a generic tree – nothing special about the branching factor, nothing special about the node ordering, incomplete, unbalanced, not even leftist, its just a tree

A Generic Tree Data Structure
next_sibling first_child A In general, the branching factor of a tree can be quite high. To efficiently store such a tree, we use a structure like this… why? It allows nodes to have an arbitrary number of children, but is efficient even if they have only one What about child order? Every level is a linked list… maybe that is a useful way of thinking about it B C D E 

A Generic Tree in A Generic Tree Data Structure
B C D E H I J Here is our generic tree stored in our generic data structure… F G K L

Combined View of Tree A B C D E F G H I J K L To help visualize….
Traversals: Pre-order: a b f g k c d h I l j e Post-order: f k g b c h l I j d e a Level-order: a b c d e f g h I j k l F G H I J K L

Traversals Many algorithms involve walking through a tree, and performing some computation at each node Walking through a tree is called a traversal Common kinds of traversal Pre-order Post-order Level-order We will take a moment to review traversals with more examples than last time – this should be review, review that we have reviewed before even! Do these remind you of anything? Pre/post-order: DFS; level-order: BFS

Binary Trees Binary tree is Properties Representation: A B C D E F G H
a root left subtree (maybe empty) right subtree (maybe empty) Properties max # of leaves: max # of nodes: average height for N nodes: Representation: A B C D E F Alright, we’ll focus today on one type of trees called binary trees. Here’s one now. Is this binary tree complete? Why not? (C has just one child, right side is much deeper than left) What’s the maximum # of leaves a binary tree of depth d/height h can have? 2d What’s the max # of nodes a binary tree of depth d/height h can have? 2d Minimum? 2d ; 2d We won’t go into this, but if you take N nodes and assume all distinct trees of the nodes are equally likely, you get an average depth/height of SQRT(N). Is that bigger or smaller than log n? Bigger, so it’s not good enough! We will see we need to impose structure to get the bounds we want G H Data right pointer left I J

Representation A A B C B C D E F D E F right pointer left right
Here is one common way to represent binary trees – notice that this is no more or less efficient than the previous technique (still two pointers) and that by being tailored to the structure at hand it is more intuitive D right pointer left E right pointer left F right pointer left

Dictionary ADT Dictionary operations
create destroy insert find delete Stores values associated with user-specified keys values may be any (homogenous) type keys may be any (homogenous) comparable type kim chi spicy cabbage Krispy Kreme tasty doughnut kiwi Australian fruit kale leafy green Kool Aid fruit (?) drink insert kohlrabi - upscale tuber find(kiwi) kiwi - Australian fruit Dictionaries associate some key with a value, just like a real dictionary (where the key is a word and the value is its definition). In this example, I’ve stored foods associated with short descriptions. This is probably the most valuable and widely used ADT we’ll hit. I’ll give you an example in a minute that should firmly entrench this concept.

Search ADT Dictionary operations Stores keys create destroy insert
find delete Stores keys keys may be any (homogenous) comparable quickly tests for membership Klee Matisse Rodin Whistler Heartfield Pollock Gross insert Hopper find(Rodin) Rodin This is a scaled back version of the dictionary ADT in which we essentially drop the values and leave only the keys. We’ll focus on this when looking at binary search trees. BUT, how hard would it be to move to a Dictionary ADT implementation from a Search ADT implementation?

A Modest Few Uses Arrays Sets Dictionaries Router tables Page tables
Symbol tables C++ Structures Our ADT algorithm says to look at some applications at this point. I think the first app pretty much says it all. We move on from there however, to other incredibly widely used applications. I know I’ve said this before, but this is probably the most important and one of the most widely used ADTs we’ll look at. For those keeping track, priority queues are _not_ as widely used as Dictionaries.

Naïve Implementations
insert find delete Linked list Unsorted array Sorted array LL: O(1), O(n), O(n) Uns: O(1), O(n), O(n) Sorted: O(n), O(log n), O(n) Sorted array is oh-so-close. O(log n) find time and almost O(log n) insert time. What’s wrong? Let’s look at how that search goes: Draw recursive calls (and potential recursive calls) in binary search. Note how it starts looking like a binary tree where the left subtrees have smaller elements and the right subtrees have bigger elements. What if we could store the whole thing in the structure this recursive search is building? so close!

Naïve Implementations
unsorted array sorted array linked list insert find + O(n) O(n) find + O(1) find O(log n) delete (if no shrink) Goal: fast find like sorted array, dynamic inserts/deletes like linked list

Binary Search Tree Dictionary Data Structure
Binary tree property each node has  2 children result: storage is small operations are simple average depth is small Search tree property all keys in left subtree smaller than root’s key all keys in right subtree larger than root’s key easy to find any given key 8 5 11 2 6 10 12 A binary search tree is a binary tree in which all nodes in the left subtree of a node have lower values than the node. All nodes in the right subtree of a node have higher value than the node. It’s like making that recursion into the data structure! I’m storing integers at each node. Does everybody think that’s what I’m _really_ going to store? What do I need to know about what I store? (comparison, equality testing) 4 7 9 14 13

Example and Counter-Example
Getting to Know BSTs Example and Counter-Example 15 8 4 8 5 11 1 7 11 2 7 6 10 18 Why is the one on the left a BST? It’s not complete! (B/c BSTs don’t need to be complete) Why isn’t the one on the right a BST? Three children of 5 20 has a left child larger than it. What’s wrong with 11? Even though 15 isn’t a direct child, it _still_ needs to be less than 11! 3 4 15 20 BINARY SEARCH TREE 21 NOT A BINARY SEARCH TREE

Getting to Know All About BSTs
In Order Listing 10 5 15 2 9 20 Anyone notice anything interesting about that in-order listing? Everything in the left subtree is listed first. Then the root. Then everything in the right subtree. OK, let’s work out the code to make the in-order listing. Is there an iterative version that doesn’t use its own stack? Not really, no. So, recursion is probably OK here. Anyway, if the tree’s too deep for recursion, you must have a huge amount of data. If (n != null) inorder(n->left) cout << n inorder(n->right) 7 17 30 In order listing: 25791015172030

Finding a Node Getting to Like BSTs 10 5 15 2 9 20 7 17 30 runtime:
Node *& find(Comparable key, Node *& root) { if (root == NULL) return root; else if (key < root->key) return find(key, root->left); else if (key > root->key) root->right); else } 10 5 15 2 9 20 Now, let’s try finding a node. Find 9. This time I’ll supply the code. This should look a _lot_ like binary search! How long does it take? Log n is an easy answer, but what if the tree is very lopsided? So really, this is worst case O(n)! A better answer is theta of the depth of the node sought. If we can bound the depth of that node, we can bound the length of time a search takes. What about the code? All those &s and *s should look pretty scary. Let’s talk through them. 7 17 30 runtime:

Getting to Hope BSTs Like You
Insert void insert(Comparable key, Node * root) { Node *& target = find(key, root); assert(target == NULL); target = new Node(key); } 10 5 15 2 9 20 Let’s do some inserts: insert(8) insert (11) insert(31) 7 17 30 runtime:

Digression: Value vs. Reference Parameters
Value parameters (Object foo) copies parameter no side effects Reference parameters (Object & foo) shares parameter can affect actual value use when the value needs to be changed Const reference parameters (const Object & foo) cannot affect actual value A momentary digression. I did some tricky stuff with reference variables there. Does anyone thing it would be a good idea to have the find I described back there as the interface to a Search ADT? NO! It exposes really nasty details. But, it’s fine for internal use, and it can easily be called by the real external find. Here’s a brief description of value and reference parameters and when you really want to use them.

BuildTree for BSTs Suppose the data 1, 2, 3, 4, 5, 6, 7, 8, 9 is inserted into an initially empty BST: in order in reverse order median first, then left median, right median, etc. OK, we had a buildHeap, let’s buildTree. How long does this take? Well, IT DEPENDS! Let’s say we want to build a tree from What happens if we insert in order? Reverse order? What about 5, then 3, then 7, then 2, then 1, then 6, then 8, then 9?

Analysis of BuildTree Worst case: O(n2) as we’ve seen
Average case assuming all orderings equally likely: Average runtime is equal to the average depth of a node in the tree. We’ll calculate the average depth by finding the sum of all depths in the tree, and dividing by the number of nodes. What’s the sum of all depths? D(n) = D(I) + D(N - I - 1) + N - 1 (left subtree = I, root is 1 node, so right = n - I - 1. D(I) is depth of left, 1 node deeper in overall, same goes for right, total of I + N - I - 1 extra depth). For BSTs, all subtree sizes are equally likely (because we pick the middle element and random and the rest fall on the left or right deterministically). Each subtree then averages 1/N * sum 0 to N-1 of D(j) -- O(n log n) (proof in chapter 7) D(n) = 2/N( sum j = 0 to n-1 of D(j) ) + N – 1 Expected depth == O(log N) for any node

Bonus: FindMin/FindMax
Find minimum Find maximum 10 5 15 2 9 20 Every now and then everyone succumbs to the temptation to really overuse color. 7 17 30

CSE 326: Data Structures Lecture #7 Dendrology

Similar presentations

Presentation on theme: "CSE 326: Data Structures Lecture #7 Dendrology"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSE 326: Data Structures Lecture #7 Dendrology

Similar presentations

Presentation on theme: "CSE 326: Data Structures Lecture #7 Dendrology"— Presentation transcript:

Similar presentations

About project

Feedback