The Power of Incorrectness A Brief Introduction to Soft Heaps.

The Power of Incorrectness A Brief Introduction to Soft Heaps

The Problem A heap (priority queue) is a data structure that stores elements with keys chosen from a totally ordered set (e.g. integers) We need to support the following operations  Insert an element  Update (decrease the key of an element)  Extract min (find and delete the element with minimum key)  Merge (optional)

A Note on Notation We evaluate algorithm speed using big O notation. Most of the upper bounds on runtime given here are also lower bounds, but we use just big-O to simplify notation. Some of the runtime specified are amortized, which means the average over a sequence of operation. They’re stated as normal bounds to reduce confusion. All logs are 2 based and denoted as lg. N is the number of elements in our heap at any time. We also use it to denote the number of operations. We’re working in the comparison model.

What about delete? Note we can perform delete the ‘lazy’ style by marking the elements as deleted using a flag. Then we can perform repeated extract- mins when the minimum element is already marked. So delete doesn’t need to be treated any differently than extract-min.

The General Approach We store the elements in a tree with a constant branching factor Heap condition: the key of any node is always at least the key of its parent. *Exercise: show we can perform insert and delete min in time proportional to the height of the tree.

Binary Heaps We use a perfectly balanced tree with a constant branching factor. The height of the tree is O(lgN). So insert/update/extract min all take O(lgN) time. Merge is not supported as a ‘basic’ operation.

Binomial Heap Binomial heaps use a branching factor of O(lgN) and can also support merge in O(lgn) time. The main idea is keep a forest of trees, each with a number of nodes are powers of two and no two have the same size. When we merge two trees of same size, we get another tree whose size is a power of two. We can merge two such forests in O(lgN) time in a manner analogous to binary addition.

Structure of Binomial Heaps We typically describe binomial heaps as one binomial heap attached to another of the same size. Let the rank of a tree in a binomial be the log of the number of nodes it has.

Even More Heap If we get ‘lazy’ with binomial heaps, which is to save all the work until we perform extract min, we can get to O(1) per insert and merge, but O(lgn) for extract min and update. Fibonacci heaps (by Tarjan) can do insert, update, merge in O(1) per access. But it still requires O(lgN) for a delete Can we get rid of the O(lgN) factor?

No! WHY?

A Bound on Sorting We can’t sort N numbers in a comparison model faster than O(NlgN) time. Sketch of Proof:  There are N! possible permutations  Each operation in comparison model can only have 2 possible results, true/false.  So for our algorithm to give distinguish all N! inputs, we need log(N!)=O(NlgN) operations.

Now Apply to Heaps Given an array of N elements, we can insert them into a heap in N inserts. Performing extract-min once gives the 1 st element of the sorted list, 2 nd time gives the 2 nd element. So we can perform extract-min N times to get a sorted list back So one of insert or extract min must take at least O(lgN) time per operation.

Is There a Way Around This Note there is a hidden assumption made in the proof on the previous slide: The result given by every call of extract-min must be correct.

The Idea We sacrifice some correctness to get a better runtime. To be more specific, We allow a fraction of the answers provided by extract-min to be incorrect.

Soft Heaps Supports insertion, update, extract-min and merge in O(1) time. No more than εN (0<ε<0.5) of all elements have their keys raised at any point.

The Motivation: Car Pooling

No, I Meant This :

The Idea in Words We modify the binomial heap described earlier.  trees don’t have to be full anymore.  The idea of rank can be transplanted. We put multiple elements on the same node, resulting in the non-fullness. This allows a reduction in the height of the tree.

The Catch If a node has multiple elements stored on it, how do we track which one is the minimum? Solution: we assign all the elements in the list the same key. Some of the keys would be raised. This is where the error rate comes in.

Example Modified binomial heap with 8 elements. Two of the nodes have 2 elements instead of one. Note 2 and 3’s key values are raised. But two nodes in the deeper parts of the tree are no longer there.

Outline of the Algorithm Insert is done through merging of heaps We merge as we do in binomial heaps, in a manner not so different than adding binary numbers. When inserting, we do not have to change any of the lists stored in the nodes, all we have to do it to maintain heap order when merging trees.

Extract-Min If the root’s list is not empty, we just take what it ‘close’ to the minimum, remove it and reduce the size of the list by 1  Recall that we don’t have to be right sometimes. This is a bit trickier when the list is empty. In both cases we ‘siphon’ elements from below the root to append the root’s list using a separate procedure called ‘sift’.

Sift We pull up some of the elements the current node’s list up the tree. We need to concatenate the item lists when two lists collide. Then we perform sift on one of the children of the current node. Note that at this point we’re doing the same thing as in a binary heap. However, we call sift on another child of the node on some cases, which makes the sift calls truly branching. The question is when to do this.

How Many Elements Do We Sift? This is tricky. If we don’t sift, the height (and thus runtime) would become O(lgN). But if we sift too much, we can get more than εN elements with keys raised. We need to use a combination of the size of the tree and size of the current list to decide when to sift and destroy nodes, hence the branching condition is key.

Sift Loop Condition We call sift twice when the rank of the current tree is large enough (>r for some r) and the rank is odd. The ‘rank being odd’ condition ensures we never call sift more than twice. The constant r is used to globally control how much we sift.

One More Detail We need to keep a rank invariant, which states that a node has at least half as many children as its rank. This prevents excessive merging of lists. We can keep this condition as follows: every time we find a violation on root, we dismantle it list and merge the elements of the list to get its subtrees.

Result of the Analysis The total cost of merging is O(n) by an argument similar to counting the number of carries resulting from incrementing a binary counter N times. Result on Sift from paper (no proof):  Let r=2+2lg(1/ ε ), then sift runs in a O(r) per call, which is O(1) per operation as ε is constant. We can also show that the runtime of O(1/ ε) is optimal if at most εN elements can have keys raised.  Note that if we set ε=1/2N, no errors can occur and we get a normal heap back.

Is This Any Useful? Don’t ever submit this for your CS assignment and expect it to get right answers.

A Problem Given a list of N numbers, we want to find the kth largest in O(N) time. Randomized quick-select does it in O(N), but it’s randomized. The most well-known deterministic algorithm for this involved finding the median of 5 numbers and taking median of medians...basically a mess.

A Simple Deterministic Solution We insert all N elements in to a soft heap with error rate ε=1/3, and perform extract min N/3 times. Then the largest number deleted has rank between 1/3N and 2/3N. So we can remove 1/3n numbers from consideration each time (the ones on different side of k) and do the rest recursively. Runtime: n+(2/3)n+(2/3) 2 n+(2/3) 3 n…=O(n)

Other applications Approximate sorting: sort n numbers so they’re ‘nearly’ ordered. Dynamic maintenance of percentiles Minimum spanning trees  This is the problem soft heaps were designed to solve. It gives the best algorithm to date.  With soft heap (and another 5~6 pages of work), we can get an O(Eα(E)) algorithm for minimum spanning tree. (α(E) is the inverse Ackerman function)

Bibliography Chazelle, Bernard. The Soft Heap: An Approximate Priority Queue with Optimal Error Rate. Chazelle, Bernard. A Minimum Spanning Tree Algorithm with Inverse Ackerman Type Complexity. Wikipedia

The Power of Incorrectness A Brief Introduction to Soft Heaps.

Similar presentations

Presentation on theme: "The Power of Incorrectness A Brief Introduction to Soft Heaps."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Power of Incorrectness A Brief Introduction to Soft Heaps.

Similar presentations

Presentation on theme: "The Power of Incorrectness A Brief Introduction to Soft Heaps."— Presentation transcript:

Similar presentations

About project

Feedback