The Power of Incorrectness A Brief Introduction to Soft Heaps.

Slides:



Advertisements
Similar presentations
Introduction to Algorithms Quicksort
Advertisements

Heaps1 Part-D2 Heaps Heaps2 Recall Priority Queue ADT (§ 7.1.3) A priority queue stores a collection of entries Each entry is a pair (key, value)
Lecture 3: Parallel Algorithm Design
AVL Trees1 Part-F2 AVL Trees v z. AVL Trees2 AVL Tree Definition (§ 9.2) AVL trees are balanced. An AVL Tree is a binary search tree such that.
QuickSort Average Case Analysis An Incompressibility Approach Brendan Lucier August 2, 2005.
Chapter 4: Trees Part II - AVL Tree
AVL Trees COL 106 Amit Kumar Shweta Agrawal Slide Courtesy : Douglas Wilhelm Harder, MMath, UWaterloo
Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
CHAPTER 5 PRIORITY QUEUES (HEAPS) §1 ADT Model Objects: A finite ordered list with zero or more elements. Operations:  PriorityQueue Initialize( int.
© The McGraw-Hill Companies, Inc., Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
COSC 3100 Transform and Conquer
Spring 2015 Lecture 5: QuickSort & Selection
David Luebke 1 5/20/2015 CS 332: Algorithms Quicksort.
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
1 Pertemuan 20 Binomial Heap Matakuliah: T0026/Struktur Data Tahun: 2005 Versi: 1/1.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
CS 206 Introduction to Computer Science II 10 / 14 / 2009 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 12 / 03 / 2008 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 12 / 01 / 2008 Instructor: Michael Eckmann.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
3-Sorting-Intro-Heapsort1 Sorting Dan Barrish-Flood.
Tirgul 4 Order Statistics Heaps minimum/maximum Selection Overview
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
David Luebke 1 7/2/2015 Medians and Order Statistics Structures for Dynamic Sets.
Heapsort CIS 606 Spring Overview Heapsort – O(n lg n) worst case—like merge sort. – Sorts in place—like insertion sort. – Combines the best of both.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2001 Lecture 11 Tuesday, 12/4/01 Advanced Data Structures Chapters.
CS4432: Database Systems II
Heapsort Based off slides by: David Matuszek
ANALYSIS OF SOFT HEAP Varun Mishra April 16,2009.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
Heaps, Heapsort, Priority Queues. Sorting So Far Heap: Data structure and associated algorithms, Not garbage collection context.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
The Binary Heap. Binary Heap Looks similar to a binary search tree BUT all the values stored in the subtree rooted at a node are greater than or equal.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
1 Joe Meehean.  Problem arrange comparable items in list into sorted order  Most sorting algorithms involve comparing item values  We assume items.
Computer Sciences Department1. Sorting algorithm 3 Chapter 6 3Computer Sciences Department Sorting algorithm 1  insertion sort Sorting algorithm 2.
Heapsort. Heapsort is a comparison-based sorting algorithm, and is part of the selection sort family. Although somewhat slower in practice on most machines.
Chapter 21 Priority Queue: Binary Heap Saurav Karmakar.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 9.
CS 206 Introduction to Computer Science II 04 / 22 / 2009 Instructor: Michael Eckmann.
Chapter 18: Searching and Sorting Algorithms. Objectives In this chapter, you will: Learn the various search algorithms Implement sequential and binary.
CS 2133: Data Structures Quicksort. Review: Heaps l A heap is a “complete” binary tree, usually represented as an array:
+ David Kauchak cs312 Review. + Midterm Will be posted online this afternoon You will have 2 hours to take it watch your time! if you get stuck on a problem,
Data Structure II So Pak Yeung Outline Review  Array  Sorted Array  Linked List Binary Search Tree Heap Hash Table.
CS 361 – Chapter 5 Priority Queue ADT Heap data structure –Properties –Internal representation –Insertion –Deletion.
Heapsort. What is a “heap”? Definitions of heap: 1.A large area of memory from which the programmer can allocate blocks as needed, and deallocate them.
Binomial Tree B k-1 B0B0 BkBk B0B0 B1B1 B2B2 B3B3 B4B4 Adapted from: Kevin Wayne B k : a binomial tree B k-1 with the addition of a left child with another.
CMSC 341 Binomial Queues and Fibonacci Heaps. Basic Heap Operations OpBinary Heap Leftist Heap Binomial Queue Fibonacci Heap insertO(lgN) deleteMinO(lgN)
Internal and External Sorting External Searching
AVL Trees and Heaps. AVL Trees So far balancing the tree was done globally Basically every node was involved in the balance operation Tree balancing can.
Lecture 9COMPSCI.220.FS.T Lower Bound for Sorting Complexity Each algorithm that sorts by comparing only pairs of elements must use at least 
HeapSort 25 March HeapSort Heaps or priority queues provide a means of sorting: 1.Construct a heap, 2.Add each item to it (maintaining the heap.
CS6045: Advanced Algorithms Sorting Algorithms. Heap Data Structure A heap (nearly complete binary tree) can be stored as an array A –Root of tree is.
Priority Queues CS 110: Data Structures and Algorithms First Semester,
1 Chapter 6 Heapsort. 2 About this lecture Introduce Heap – Shape Property and Heap Property – Heap Operations Heapsort: Use Heap to Sort Fixing heap.
Sept Heapsort What is a heap? Max-heap? Min-heap? Maintenance of Max-heaps -MaxHeapify -BuildMaxHeap Heapsort -Heapsort -Analysis Priority queues.
Data Structures Binomial Heaps Fibonacci Heaps Haim Kaplan & Uri Zwick
Heap Sort Example Qamar Abbas.
Selection in heaps and row-sorted matrices
A simpler implementation and analysis of Chazelle’s
Lecture 3 / 4 Algorithm Analysis
§3 Worst-Case vs. Amortized
B-Trees.
Presentation transcript:

The Power of Incorrectness A Brief Introduction to Soft Heaps

The Problem A heap (priority queue) is a data structure that stores elements with keys chosen from a totally ordered set (e.g. integers) We need to support the following operations  Insert an element  Update (decrease the key of an element)  Extract min (find and delete the element with minimum key)  Merge (optional)

A Note on Notation We evaluate algorithm speed using big O notation. Most of the upper bounds on runtime given here are also lower bounds, but we use just big-O to simplify notation. Some of the runtime specified are amortized, which means the average over a sequence of operation. They’re stated as normal bounds to reduce confusion. All logs are 2 based and denoted as lg. N is the number of elements in our heap at any time. We also use it to denote the number of operations. We’re working in the comparison model.

What about delete? Note we can perform delete the ‘lazy’ style by marking the elements as deleted using a flag. Then we can perform repeated extract- mins when the minimum element is already marked. So delete doesn’t need to be treated any differently than extract-min.

The General Approach We store the elements in a tree with a constant branching factor Heap condition: the key of any node is always at least the key of its parent. *Exercise: show we can perform insert and delete min in time proportional to the height of the tree.

Binary Heaps We use a perfectly balanced tree with a constant branching factor. The height of the tree is O(lgN). So insert/update/extract min all take O(lgN) time. Merge is not supported as a ‘basic’ operation.

Binomial Heap Binomial heaps use a branching factor of O(lgN) and can also support merge in O(lgn) time. The main idea is keep a forest of trees, each with a number of nodes are powers of two and no two have the same size. When we merge two trees of same size, we get another tree whose size is a power of two. We can merge two such forests in O(lgN) time in a manner analogous to binary addition.

Structure of Binomial Heaps We typically describe binomial heaps as one binomial heap attached to another of the same size. Let the rank of a tree in a binomial be the log of the number of nodes it has.

Even More Heap If we get ‘lazy’ with binomial heaps, which is to save all the work until we perform extract min, we can get to O(1) per insert and merge, but O(lgn) for extract min and update. Fibonacci heaps (by Tarjan) can do insert, update, merge in O(1) per access. But it still requires O(lgN) for a delete Can we get rid of the O(lgN) factor?

No! WHY?

A Bound on Sorting We can’t sort N numbers in a comparison model faster than O(NlgN) time. Sketch of Proof:  There are N! possible permutations  Each operation in comparison model can only have 2 possible results, true/false.  So for our algorithm to give distinguish all N! inputs, we need log(N!)=O(NlgN) operations.

Now Apply to Heaps Given an array of N elements, we can insert them into a heap in N inserts. Performing extract-min once gives the 1 st element of the sorted list, 2 nd time gives the 2 nd element. So we can perform extract-min N times to get a sorted list back So one of insert or extract min must take at least O(lgN) time per operation.

Is There a Way Around This Note there is a hidden assumption made in the proof on the previous slide: The result given by every call of extract-min must be correct.

The Idea We sacrifice some correctness to get a better runtime. To be more specific, We allow a fraction of the answers provided by extract-min to be incorrect.

Soft Heaps Supports insertion, update, extract-min and merge in O(1) time. No more than εN (0<ε<0.5) of all elements have their keys raised at any point.

The Motivation: Car Pooling

No, I Meant This :

The Idea in Words We modify the binomial heap described earlier.  trees don’t have to be full anymore.  The idea of rank can be transplanted. We put multiple elements on the same node, resulting in the non-fullness. This allows a reduction in the height of the tree.

The Catch If a node has multiple elements stored on it, how do we track which one is the minimum? Solution: we assign all the elements in the list the same key. Some of the keys would be raised. This is where the error rate comes in.

Example Modified binomial heap with 8 elements. Two of the nodes have 2 elements instead of one. Note 2 and 3’s key values are raised. But two nodes in the deeper parts of the tree are no longer there.

Outline of the Algorithm Insert is done through merging of heaps We merge as we do in binomial heaps, in a manner not so different than adding binary numbers. When inserting, we do not have to change any of the lists stored in the nodes, all we have to do it to maintain heap order when merging trees.

Extract-Min If the root’s list is not empty, we just take what it ‘close’ to the minimum, remove it and reduce the size of the list by 1  Recall that we don’t have to be right sometimes. This is a bit trickier when the list is empty. In both cases we ‘siphon’ elements from below the root to append the root’s list using a separate procedure called ‘sift’.

Sift We pull up some of the elements the current node’s list up the tree. We need to concatenate the item lists when two lists collide. Then we perform sift on one of the children of the current node. Note that at this point we’re doing the same thing as in a binary heap. However, we call sift on another child of the node on some cases, which makes the sift calls truly branching. The question is when to do this.

How Many Elements Do We Sift? This is tricky. If we don’t sift, the height (and thus runtime) would become O(lgN). But if we sift too much, we can get more than εN elements with keys raised. We need to use a combination of the size of the tree and size of the current list to decide when to sift and destroy nodes, hence the branching condition is key.

Sift Loop Condition We call sift twice when the rank of the current tree is large enough (>r for some r) and the rank is odd. The ‘rank being odd’ condition ensures we never call sift more than twice. The constant r is used to globally control how much we sift.

One More Detail We need to keep a rank invariant, which states that a node has at least half as many children as its rank. This prevents excessive merging of lists. We can keep this condition as follows: every time we find a violation on root, we dismantle it list and merge the elements of the list to get its subtrees.

Result of the Analysis The total cost of merging is O(n) by an argument similar to counting the number of carries resulting from incrementing a binary counter N times. Result on Sift from paper (no proof):  Let r=2+2lg(1/ ε ), then sift runs in a O(r) per call, which is O(1) per operation as ε is constant. We can also show that the runtime of O(1/ ε) is optimal if at most εN elements can have keys raised.  Note that if we set ε=1/2N, no errors can occur and we get a normal heap back.

Is This Any Useful? Don’t ever submit this for your CS assignment and expect it to get right answers.

A Problem Given a list of N numbers, we want to find the kth largest in O(N) time. Randomized quick-select does it in O(N), but it’s randomized. The most well-known deterministic algorithm for this involved finding the median of 5 numbers and taking median of medians...basically a mess.

A Simple Deterministic Solution We insert all N elements in to a soft heap with error rate ε=1/3, and perform extract min N/3 times. Then the largest number deleted has rank between 1/3N and 2/3N. So we can remove 1/3n numbers from consideration each time (the ones on different side of k) and do the rest recursively. Runtime: n+(2/3)n+(2/3) 2 n+(2/3) 3 n…=O(n)

Other applications Approximate sorting: sort n numbers so they’re ‘nearly’ ordered. Dynamic maintenance of percentiles Minimum spanning trees  This is the problem soft heaps were designed to solve. It gives the best algorithm to date.  With soft heap (and another 5~6 pages of work), we can get an O(Eα(E)) algorithm for minimum spanning tree. (α(E) is the inverse Ackerman function)

Bibliography Chazelle, Bernard. The Soft Heap: An Approximate Priority Queue with Optimal Error Rate. Chazelle, Bernard. A Minimum Spanning Tree Algorithm with Inverse Ackerman Type Complexity. Wikipedia