Algorithms and Data Structures lecture 5

Algorithms and Data Structures lecture 5
Binary Search Trees. Treaps. Skip lists Szymon Grabowski Łódź, 2016

A (free) tree: connected acyclic undirected graph.
Trees – basic notions A (free) tree: connected acyclic undirected graph. If an undirected graph is acyclic but possibly disconnected, we call it a forest. From [Cormen, McGraw-Hill, 2000, p.91]

Trees – basic notions, cont’d
A rooted tree: a free tree with one vertex, the root, being distinguished. Vertices = nodes. Denote the root by r. Node y is an ancestor of node x iff y is on the unique path from r to x (x is then called a descendant of y). If also x  y, we use the terms proper ancestor / proper descendant. Other obvious terms: parent (or father), child (or son), leaf (or external node), non-leaf (or internal node), subtree rooted at node x.

From [Cormen, McGraw-Hill, 2000, p.94]
Trees – basic notions, cont’d Siblings: two (or more) nodes having the same parent. Degree of x: the # of its children. Depth of x: the length of the path from the root r to x. Height of T: max depth in T. Ordered tree: rooted tree in which the children of each node are ordered. From [Cormen, McGraw-Hill, 2000, p.94]

Trees – basic properties
From [Cormen, McGraw-Hill, 2000, p.91] V – set of vertices, E – set of edges

Trees – basic properties, cont’d
1 2 (in a free tree, any two nodes are connected by a unique simple path) From [Cormen, McGraw-Hill, 2000, p.92]

Other representations of trees
From [Knuth, WNT, 2002, vol. 1, p.324] (a) nested sets, (b) nested parentheses, (c) indentation.

Tree representation of a math expression
From [Knuth, WNT, 2002, vol. 1, p.325] First traverse the left subtree of the root, then the root, then the right subtree. I.e., inorder walk.

Binary tree – a tree of degree 2
From [Cormen, McGraw-Hill, 2000, p.96] Full binary tree: every node has 0 or 2 children. Complete binary tree: all leaves at depth k or k-1 (for some k); at deepest level all nodes as far left as possible. A full complete (=perfect) binary tree has 2h-1 internal nodes. All leaves at the same depth.

Binary search tree (BST) A simple yet efficient dynamic dictionary structure
Common interface for search trees: minimum, maximum, predecessor, successor, insert, delete. BST supports those operations in O(log n) average time. And in O(n) worst-case time... Each node x in BST has (at least) 4 fields: key, left, right and p. Left (right) – pointer to the left (right) child of x. (May be NIL.) p – pointer to x’s parent. (Can it be NIL?)

BST has a recursive nature: every internal node is a BST too.
Binary search tree 5 5 From [Cormen, McGraw-Hill, 2000, p.245] In the left subtree of x, everything  key[x], in the right subtree of x, everything  key[x]. Many possible BST’s for the same set of keys. BST has a recursive nature: every internal node is a BST too.

Traversing a tree A tree (not only BST) can be traversed in 3 ways:
inorder walk – prints first all the keys from the left subtree of the root r, then prints r, finally prints all the keys from the right subtree of r, preorder walk – prints the root before the values in any subtree, postorder walk – prints the root after the values in its subtrees. From [Cormen, McGraw-Hill, 2000, p.245] Recursive routine for inorder tree walk. (n) time.

Running time of Postorder-Tree-Walk(root[T]): O(n).
Counting and storing the # of descendants of each node (Algorithms and Theory of Computation Handbook, 1999, Chap. 6.3) Extra field desc[·] stores the number of descendants of a node (incl. the node itself). Running time of Postorder-Tree-Walk(root[T]): O(n).

Find min / max in BST Pretty simple: go left as far as you can for the min element, go right as far as you can for the max element. From [Cormen, McGraw-Hill, 2000, p.248] Both need O(h) time, h – x’s height (of course x can be the root).

O(h) worst-case time, h – tree height.
BST, searching a key From [Cormen, McGraw-Hill, 2000, p.247] Looking for 13: path 15  6  7  13. Looking for 14: path 15  6  7  13  ??? Item with key 13 has no right son, hence “key 14 not found” answer. O(h) worst-case time, h – tree height.

BST, searching a key, pseudo codes
recursive version iterative version From [Cormen, McGraw-Hill, 2000, p.248]

BST, finding successor(x)
Given the pointer to x, we wish to find the successor of x. The structure of BST enables finding it without even comparing the keys. From [Cormen, McGraw-Hill, 2000, p.249] Question: can it return NIL?

BST, finding successor(x), cont’d
Successor(x): either the min in the right subtree, or (if the right subtree doesn’t exist) the lowest ancestor of x whose left child is also an ancestor of x. ...Or, finally, NIL. Question: instead of “the lowest ancestor of x whose left child is also an ancestor of x”, can we just say: “such an item that x is the maximum in its left subtree”? Successor(x) needs O(h) worst-case time. Predecessor(x) – analogously.

Updating BST: insert(T, x)
Simple and straightforward: just find the position in T where x belongs (going down from the root) and update pointers to / from its parent. Do not shift any other item. Insert(x) needs O(h) worst-case time. From [Cormen, McGraw-Hill, 2000, p.251] Question: what if T is empty?

Updating BST: insert(T, x) Example
From [Cormen, McGraw-Hill, 2000, p.252]

Updating BST: delete(T, x)
3 cases possible: x has no children – removing x is trivial, x has one child – x’s parent should become the parent of x’s child, x has two children – see the example, next slide... Delete(T, x) needs, like the other operations, O(h) worst-case time. Easy to notice, h=(log n) at best and (n) at worst.

Updating BST: delete(T, x) (cont’d)
From [Cormen, McGraw-Hill, 2000, p.252] Easiest case Item with key 13 is pointed with z (we don’t look for it).

From [Cormen, McGraw-Hill, 2000, p.252] Still simple... Think about the case when z is the left son, but it has only the right son (or vice versa).

From [Cormen, McGraw-Hill, 2000, p.252] A bit harder... The successor of z (y here) has at most one child; why?

Determine node y to splice out The third case only From [Cormen, McGraw-Hill, 2000, p.253]

Randomly created BST Theorem: The average height of a randomly built binary search tree on n distinct keys is O(log n). But the worst case is O(n), which spoils all the complexities (search, successor etc.). Fortunately, there exist BST variants which guarantee O(log n) tree height. The bad news is they are quite complicated (that is, the tree structure may be even unchanged, but the tree maintenance is intricate).

Range queries on a BST Find 9, then perform succ() multiple times, until finding  24? Yes, it works but telling the complexity is maybe not so easy...

Range queries on a BST in O(h + occ) time

BST, pros and cons Advantages: Disadvantages:
all basic operations (incl. insert & delete) in O(log n) time on avg; O(n) space (extra space only for 3 pointers per item); simplicity; range queries (i.e., find all keys from [a, b]) handled naturally in O(log n + occ) avg time or O(h + occ) worst-case. Disadvantages: O(n) worst-case time complexities; in practice, there exist faster solutions.

AVL trees (Adel’son-Vel’skii & Landis, 1962).
Balanced search trees AVL trees (Adel’son-Vel’skii & Landis, 1962). Red-black trees (Bayer, 1972; Guibas & Sedgewick, 1978): each node has a flag (“color”) and some properties of the tree are controlled, which guarantees the longest path from the root is never more than twice longer than the shortest path from the root. 2-3 trees (Hopcroft, 1970) (and, in general (a,b)-trees). Splay trees (Sleator & Tarjan, 1983): aka “self-adjusting binary search trees”. Nodes accessed more frequently are moved toward the root.

Hint: A theoretical hybrid of a hash table and... ???
Quickie Question: how to achieve O(1) avg times (ins, del, search) and O(log n) worst case time complexities? Hint: A theoretical hybrid of a hash table and... ???

Red-black trees, idea RB-tree: a binary search tree, which also satisfied the following properties: every node is either red or black; every leaf (NIL) is black; if a node is red, then both of its children are black; every path from the root to a leaf contains the same number of black nodes (black-height of the tree). inf/tree/RedBlackTree.html

Red-black tree, another example

Red-black trees, properties
Theorem: a red-black tree with n internal nodes has height at most 2 lg (n+1). RB-tree is a BST. Max, min, search, successor, predecessor operations need O(h) worst case time in any BST, which implies O(log n) complexities for a red-black tree. Insert & delete also need O(log n) worst case time. But are complicated, esp. delete. We therefore say that an RB-tree is balanced.

Treap (randomized BST variant)
There exist balanced binary search trees, ie. BSTs with guaranteed O(log n) operations in the worst case, but are all quite complicated. Fortunately, it is possible to have O(log n) operations with high probability using relatively simple (and practical) structures. One such an idea is a treap (Vuillemin, 1980; Seidel & Aragon, 1989).

(binary search) TRee + hEAP = TREAP
A treap is a binary tree where each node has a search key and a priority. The search keys fulfill the BST property (smaller on the left, larger on the right). The priority values fulfill the heap property (parent less than or equal its children). The priorities are random (i.e. when we insert a new element, we set a random value in its priority field). So even if the input keys are very non-random (e.g., an increasing sequence), the random priorities make it possible to rebalance the tree.

Letters – keys; numbers – priorities
Inserting an item to a treap (1/4) [ ] Letters – keys; numbers – priorities

Inserting an item to a treap (2/4) [ http://www. win. tue
Rotation was needed; now the priorities of R and S are in a correct order.

That was easy, wasn’t it?

Before this step S had 2 children, so this was a bit more complicated...

A pathological sequence for a (plain) BST
Upper numbers – keys. Bottom numbers – (random) priorities. Nothing interesting after the first 2 keys: 68  28 (min-heap order), i.e. no need for a rotation

Both orders preserved? No.  Now, 29 > 14, another rotation needed.
1..6 example, cont’d Both orders preserved? No.  Now, 29 > 14, another rotation needed. Finding the key position... Ops, must be corrected since the heap order is violated.

Luckily, this time no correction needed.
1..6 example, cont’d 1, 2, 3 – done. Luckily, this time no correction needed.

1..6 example, cont’d Again, nice!

Needs a rotation since 10 < 57.
1..6 example, almost final... Needs a rotation since 10 < 57. A bit better but...

1..6 example, final slide Hooray!

Delete from a treap [ http://www. win. tue
Deleting a node is exactly like inserting a node, but in reverse order. Suppose we want to delete node z. As long as z is not a leaf, perform a rotation at the child of z with smaller priority. This moves z down a level and its smaller-priority child up a level. The choice of which child to rotate preserves the heap property everywhere except at z. When z becomes a leaf, chop it off. Clearly, O(h) rotations, each of O(1) cost. I.e. again O(log n) time with high probability.

Splitting a treap [ http://www. win. tue
Sometimes we want to split a treap T into two treaps according to some pivot p: T< should have keys less than p, and T> should have keys greater than p. How to do it? Solution: Insert a dummy node with key value p, and priority –INF. The priority of this node is minimum of course, hence it will be moved to the root. Then everything in the left subtree is the required T< , and everything in the right subtree is T> . (Delete the dummy node – i.e., the root – then.)

Treaps are practical search trees.
Treaps, final notes In the worst case, all the operations (in particular, search, insert and delete) are O(n) with treaps. But this is very very unlikely. A treap is exactly the binary tree resulting from inserting the nodes one by one into an initially empty tree, in order of increasing priority, using the straight BST insertion algorithm. Treaps are practical search trees. The expected length of a search path in a treap is 2 ln n + O(1)  log n + O(1).

Skip list [W. Pugh, 1990] Skip list – another simple, randomized alternative to a balanced tree. Randomization is used for inserts only. Imagine having TWO sorted linked lists: each element appears in one or both lists. “Express” and “local subway lines”: one of those lists is used to traverse the collection fast (jump from an express stop to another express stop), and the other is used to find the exact position of the required key.

so, now we use the ‘slow lane’: 72, 79, 86 – we got it!
Skip list search, rough idea. Static scenario [ ] Let’s look for key 86: we use the ‘express lane’ first: 14 – too small, go next 34 – too small, go next 42 – too small, go next 72 – too small, go next 96 – too large (we went too far), go down from the previous stop so, now we use the ‘slow lane’: 72, 79, 86 – we got it!

…|S1| = n / |S1|. I.e., for |S1| = sqrt(n).
Search complexity? Call the express lane S1, and the slow lane S0. What should be the optimal length of S1? Roughly speaking, the number of visited nodes is |S1| + |S0| / |S1| = |S1| + n / |S1|. This is minimized for… …|S1| = n / |S1|. I.e., for |S1| = sqrt(n). The # of visited nodes is then 2n1/2, which gives O(n1/2) search time complexity. In the worst case and on average.

Multi-level idea [ http://courses. csail. mit. edu/6
With 2 levels, we have 2 n1/2 steps. With 3 levels: 3 n1/3 steps. With k levels: k n1/k steps. What’s the limit? lg n levels: lg(n) n1/lg n = lg(n) * 2 = (log n)

Search for 78. Note –INF / +INF at the boundaries of each list.
Multi-level, searching for a key [ ] Search for 78. Note –INF / +INF at the boundaries of each list. Algorithm for searching key x: we start from the top list (S3 here), scan the list until curr_item = x (we found it; return its position) or curr_item > x (too far; drop down from the prev node, and continue alike). S0 – no drop-down, so if curr_item > x here, then return “no such key”.

Problems 1. The structure is not static: how to handle insert / delete fast? 2. How to provide fast search after a series of updates? Insertion New elements are always added to the bottom-level list. But also they may be added to one or more higher-level lists. The idea is to flip a coin.  A straight, fair coin. Coin tosses are independent. With prob. 1/2 a newly added element to S0 will also be added to S1. Heads: add the element to S1 and flip a coin again (now, with 1/2 prob. the same element may also be added to S2). And so on...

Step 1. Finding the location of key 15 in S0.
Insert example (1/2) [ ] Let the key to insert is 15. Let i be the # of coin flips with heads (may be 0). Let (e.g.) i = 2 in our case. Step 1. Finding the location of key 15 in S0.

Insert example (2/2) [ http://cpp. datastructures
Step 2. Inserting 15 also in S1 and S2. Note that the height (# of levels) of the structure grows in our example.

Insert, preventing a too high structure
We additionally require that each inserted element may increase the height of the skip list by only 1. Note that this prevents e.g. having 10 levels when there are only 2 keys in the structure. …And limits the worst-case height to n + 1 (note the empty skip list has one level, i.e. height 1).

Delete [ http://cpp.datastructures.net/presentations/SkipLists.pdf ]
Algorithm for removing x: find x on all the lists it occurs, and remove it from them; if needed, remove all but one list containing only the two special keys (-INF, +INF). Key 34 removed

Space usage (1/2) [ http://cpp. datastructures
On avg, half of the elements get to S1, 1/4 of the elements reach S2, 1/8 of the elements reach S3, .... This is because the probability of getting i consecutive heads is 1/2i. Simple fact: if each of n items is present in a set with prob. p, and those probabilities are independent, then the expected set size is np. Back to a skip list: we insert an item in list Si with prob. 1/2i. The expected size of Si is thus n/2i, in particular: 1 for i = lg n.

Space usage (2/2) [ http://cpp. datastructures
Conclusion #1: the average height of a skip list is (log n). The expected number of nodes in a skip list is Conclusion #2: the expected number of nodes in a skip list is (n), i.e. its space usage is (n) on avg. (But in the worst case: h = n + 1, space quadratic!!! Luckily, VERY improbable.)

Pathological behavior; how improbable. [ http://cpp. datastructures
Let’s assume that the height h is at least 3 log2n. What’s the probability of such an event? It means that for at least one of the n items we had a series of 3 log2n heads. This prob. is (upper-bounded by) n / 23lg(n) = 1/n2. In other words, the probability that a skip list with n items has height at most 3 log2n is (approx.) 1 – 1/n2.

# drop-downs: O(log n) with high prob.
Search and update times [ ] Search time is proportional to the # of drop-down steps, plus the # of scan-forward steps. # drop-downs: O(log n) with high prob. The avg gap between nearest successive elements in a list Si that also exist in list Si+1 is 2. Why? Because the expected # of coin flips required in order to get heads is 2. Hence in each level we spend O(1) time on avg, and it gives the overall O(log n) avg search time. So with updates.

Skip lists, conclusions
Simple and practical randomized dictionary structure. Expected search / ins / del times: O(log n) (hold with high probability). Expected space usage (with high prob.): O(n). Implementing a skip list may require quad-nodes (4 pointers per nodes: E, W, N, S).

Deterministic skip list
A deterministic skip list variant (O(log n) times in the worst case) also exists (Munro et al., 1992), and is also quite simple. The basic idea is to insist that between any pair of elements above a given height are a small number of elements of precisely that height. The desired behaviour can be achieved by either using some extra space for pointers, or by adding the constraint that the physical sizes of the nodes be exponentially increasing.

Algorithms and Data Structures lecture 5

Similar presentations

Presentation on theme: "Algorithms and Data Structures lecture 5"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Algorithms and Data Structures lecture 5

Similar presentations

Presentation on theme: "Algorithms and Data Structures lecture 5"— Presentation transcript:

Similar presentations

About project

Feedback