# Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

## Presentation on theme: "Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen."— Presentation transcript:

Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen

Allan Grønlund Jørgensen 2 Progress Report Priority Queues Resilient to Memory Faults, with Moruz, Mølhave (WADS 07) Optimal Resilient Dictionaries, with Brodal, Fagerberg, Finocchi, Grandoni, Italiano, Moruz, Mølhave (ESA07) Comparison Based Dictionaries: Fault Tolerance versus I/O Efficiency, with Brodal and Mølhave (Manuscript-ICALP08) A Linear Time Algorithm for the k Maximal Sums Problem, with Brodal (MFCS 07) Sum Selection, with Brodal. (Manuscript- ICALP08) Fault Tolerance:Sum Selection:

Kvalifikationseksamen Allan Grønlund Jørgensen 3 -87422-52 34 9 -50 41 1 -43 43 -51 -9 -8 742 2 -52

Kvalifikationseksamen Allan Grønlund Jørgensen 4 Outline Introduction The k maximal sums problem Length constrained k maximal sums problem Sum selection problem Summary and plans for the future

Kvalifikationseksamen Allan Grønlund Jørgensen 5 The Maximum Sum Problem Given array of numbers, find the largest sum 7-12-316 5-2 (4,7,9)

Kvalifikationseksamen Allan Grønlund Jørgensen 6 49 11 -4778 Kadanes Algorithm(’77) Scan array from left and in step i update: Largest suffix sum (Largest sum ending at A[i]) Largest sum so far (Largest sum in A[1,…,i]) 7-12116-35-2 189

Kvalifikationseksamen Allan Grønlund Jørgensen 7 Outline Introduction The k maximal sums problem Length constrained k maximal sums problem Sum selection problem Summary and plans for the future

Kvalifikationseksamen Allan Grønlund Jørgensen 8 The k Maximal Sums Problem Given array of numbers, find the k largest sums (they may overlap) Example with k=2 7-12-316 5-2 9 8

Kvalifikationseksamen Allan Grønlund Jørgensen 9 Goal Optimal O(n+k) time algorithm outputting the k maximal sums

Kvalifikationseksamen Allan Grønlund Jørgensen 10 Main Idea (Intuition) Build all sums and insert them into a heap ordered binary tree Find the k largest sums using Frederickson’s heap selection algorithm(’93) in O(k) time

Kvalifikationseksamen Allan Grønlund Jørgensen 11 Example(k=4) 9 86 4 -8 -12 7 3 -11 -3 -55 21 -1216-35 Fredericksons algorithm finds the red nodes in O(k) time (no particular order)

Kvalifikationseksamen Allan Grønlund Jørgensen 12 The Iheap It is a heap ordered binary tree Supports insertions in amortized constant time

Kvalifikationseksamen Allan Grønlund Jørgensen 13 Inserting 7 in an Iheap 9 3 4 5 7 7 3 7 4 5 7 T1 T2T3T4 T3T2 3 4 5 T3 T4 T3 T2

Kvalifikationseksamen Allan Grønlund Jørgensen 14 Main Issue There are n(n+1)/2 =  (n 2 ) sums Constructing and inserting  (n 2 ) sums into a heap ordered binary tree takes  (n 2 ) time

Kvalifikationseksamen Allan Grønlund Jørgensen 15 Grouping Sums The sums are grouped by their endpoint in the array 7-12-316 5-2 (1,4,-7) (2,4,-4) (3,4,-11) (4,4,1) Q4:Q4:

Kvalifikationseksamen Allan Grønlund Jørgensen 16 (4,5,7) (3,5,-5) (2,5,2) (1,5,-1) 7-12-316 5-2 (1,4,-7) (2,4,-4) (3,4,-11) (4,4,1) (5,5,6) Constructing Q 5 from Q 4 Q4:Q4: Q5:Q5:

Kvalifikationseksamen Allan Grønlund Jørgensen 17 Main Idea Continued Represent each Q set as a heap ordered binary tree H Combine all heaps by assembling them into one big heap using dummy infinity keys

Kvalifikationseksamen Allan Grønlund Jørgensen 18 The Assembled Heap H5H5 H4H4 H3H3 H2H2 H1H1

Kvalifikationseksamen Allan Grønlund Jørgensen 19 Representing Q Sets: Each set Q j is represent by a tuple H j is an Iheap containing all j sums from Q j  j is a number must be added to all elements We get the following construction equation =

Kvalifikationseksamen Allan Grønlund Jørgensen 20 Example 0 7-12-3 0 3 0 3 -4 {-3} {4,7} {-8,-5,-12} =

Kvalifikationseksamen Allan Grønlund Jørgensen 21 Analysis of Pair Construction Building each pair takes amortized constant time (One insertion into Iheap) !! But the old version disappears Solution: Partial Persistence (Driscoll.. ‘89) 9 3 4 5 T1 T2 T3 T4 9 5 7 T1 3 4 5 T3 T4 T3 T2 7 insert Version i Version i+1

Kvalifikationseksamen Allan Grønlund Jørgensen 22 Partial Persistence Allows queries to any version Using node copying, the extra cost per update in newest version is amortized the cost of copying O(1) original nodes Query time is the same. Driscoll, Sarnak, Sleator, and Tarjan. Making Data Structures Persistent(’89)

Kvalifikationseksamen Allan Grønlund Jørgensen 23 Before and After 2 3 0 3 -4 0 0 3 0 3 {-3} {4,7} {-8,-5,-12} Partially persistent version Ephemeral version

Kvalifikationseksamen Allan Grønlund Jørgensen 24 Last Problem The resulting data structure is no longer a heap ordered binary tree Fix by incremental construction following Fredericksons algorithm, which works top down

Kvalifikationseksamen Allan Grønlund Jørgensen 25 Resume Build all pairs in O(n) time Join them into a single heap in O(n) time Use Fredericksons algorithm to get the k+n-1 largest and discard the dummies in O(n+k) time O(n+k) time algorithm H5H5 H4H4 H3H3 H2H2 H1H1

Kvalifikationseksamen Allan Grønlund Jørgensen 26 Space Reduction Current algorithm uses O(n+k) time and additional space The input array is considered read only Kadanes algorithm uses O(1) additional space Reduce the additional space usage to O(k)

Kvalifikationseksamen Allan Grønlund Jørgensen 27 Algorithm Build the k heaps, and find the k largest as before Repeat on the next k, using the last built heap. The rest can be discarded. Merge the two sets using selection. ! The size of the last heap is growing Only the k best suffixes are useful. Find these from the last heap and build a new heap with these. Repeat on next k … k elements k best for 1-k k best for k+1-2kk best for 1-2k

Kvalifikationseksamen Allan Grønlund Jørgensen 28 Flash-Back For k=1 this algorithm and Kadanes algorithm from the introduction are the same

Kvalifikationseksamen Allan Grønlund Jørgensen 29 Higher Dimensions Can be reduced to 1D case. …….. For an m x n matrix, we get In general we get

Kvalifikationseksamen Allan Grønlund Jørgensen 30 Outline Introduction The k maximal sums problem Length constrained k maximal sums problem Sum selection problem Summary and plans for the future

Kvalifikationseksamen Allan Grønlund Jørgensen 31 Length Constrained k Maximal Sums Problem Each sum must be an aggregate of at least l numbers and at most u numbers Example with l=3 and u=5 7-6661287-64-2 Best: 19 Best Valid: 13

Kvalifikationseksamen Allan Grønlund Jørgensen 32 Goal Optimal O(n+k) time algorithm outputting the k maximal sums with length between l and u

Kvalifikationseksamen Allan Grønlund Jørgensen 33 First Approach Use the same idea as before but redefine Q to match the length criteria Constructing equation is almost identical but requires a deletion H5H5 H4H4 H3H3 H2H2 H1H1

Kvalifikationseksamen Allan Grønlund Jørgensen 34 (5,7,2) (4,7,-8) (3,7,34) (2,7,51) (1,7,46) Constructing Q Sets Using Deletions (l=3,u=6) (1,6,56) (2,6,61) (3,6,44) (4,6,2) -517-1004212-10666

Kvalifikationseksamen Allan Grønlund Jørgensen 35 Result Same algorithm as before using the new way of constructing the next heap Deleting an element in a heap of size n with constant time insertion takes O(log n) O(nlog(u-l) +k) time alg. H5H5 H4H4 H3H3 H2H2 H1H1

Kvalifikationseksamen Allan Grønlund Jørgensen 36 A Better Way of Constructing the Q sets(u=8,l=4) -517-1004212-101177666 Slab 1Slab 2 0 -10 1 11 13 l -1 j + l -1 Divide into slabs of size u- l+1 For each slab build two sets of heaps: One from left ( L ) and one from right ( R ) For each index j group all sums of length between l and u ending at j+l-1 using the sets from above and two constants Example j=3 in slab 2 32 0+693=693 -10+693=683 1+680=681 11+680=691 13+680=693

Kvalifikationseksamen Allan Grønlund Jørgensen 37 Result Same algorithm using the new way to group sums. Building the L and R sets takes O(u-l) time for each slab. O(n+k) time algorithm H5H5 H4H4 H3H3 H2H2 H1H1

Kvalifikationseksamen Allan Grønlund Jørgensen 38 Outline Introduction The k maximal sums problem Length constrained k maximal sums problem Sum selection problem Summary and plans for the future

Kvalifikationseksamen Allan Grønlund Jørgensen 39 Sum Selection Given array of numbers, find the k ’th largest sum Example with k=5 -56 -52 -50 -43 -14 -13 -6 -4 2 7 9 29 36 38 42 The 15 sums in sorted order: 9 -87422-52 34 9 -50 41 1 -43 43 -51 -9 -8 742 2 -52

Kvalifikationseksamen Allan Grønlund Jørgensen 40 First Solution Use the algorithm finding the k maximal sums to find the k largest and output the smallest of these Algorithm uses O(n+k) time. What if is large?

Kvalifikationseksamen Allan Grønlund Jørgensen 41 Lower Bound Reduction from the Cartesian Sum Problem ( X+Y ) A lower bound of  (|Y| + |Y|log(k/|Y|)) (Frederickson and Johnson ’82) 7 -5 9 13 2 12 1 -3 8 X Y 3

Kvalifikationseksamen Allan Grønlund Jørgensen 42 Reduction 7-5913 2121-38 X Y 12-14-4117+1510-11-411 = -4 113 = 117 - 4

Kvalifikationseksamen Allan Grønlund Jørgensen 43 Result An  (n+nlog(k/n)) lower bound for the sum selection problem

Kvalifikationseksamen Allan Grønlund Jørgensen 44 Goal Optimal O(n+nlog(k/n)) time algorithm for selecting the k ’th largest sum

Kvalifikationseksamen Allan Grønlund Jørgensen 45 Algorithm Reduction to selection in sorted arrays and weight balanced search trees Frederickson and Johnson(’82) already solved selection in n arrays in optimal O(n + nlog(k/n)) time Adapt this algorithm such that it also works on weight balanced trees

Kvalifikationseksamen Allan Grønlund Jørgensen 46 Block Heap 54,49,42 39,31,25 24,12,7 23,22,21 17,13,11 10,5,19,6,3 Heap ordered binary tree Each node stores B sorted elements Inserting a block of B elements takes O(B) time.

Kvalifikationseksamen Allan Grønlund Jørgensen 47 22 54 Reducing Sum Selection to Selection in Arrays and Trees -1004212-101172666 Slab Divide into slabs of size k/n Each index j is associated with two data structures that together cover all sums ending at index j First data structure is all sums starting in current slab and is named WB j The second is the rest and is named BH j Example Extending within a slab Extending to new slab - a block of k/n elements is inserted to BH 2 9 20 666 668 675 686 0 0 WB: BH: 676 720 688 10 BH: WB: 666 668 675 686 676 720 688

Kvalifikationseksamen Allan Grønlund Jørgensen 48 Reducing Problem One insert in tree per step and one insert in Block heap every k/n steps. n trees of size at most k/n and n Block heaps. Join all Block heaps together and use Frederickson to find the 4n blocks with largest minimum n trees and O(n) sorted arrays left H5H5 H4H4 H3H3 H2H2 H1H1

Kvalifikationseksamen Allan Grønlund Jørgensen 49 Result Selection in O(n) trees and sorted arrays storing O(k) elements can be done in O(n+nlog(k/n)) time Result is an O(n+nlog(k/n)) time algorithm.

Kvalifikationseksamen Allan Grønlund Jørgensen 50 Outline Introduction The k maximal sums problem Length constrained k maximal sums problem Sum selection problem Summary and plans for the future

Kvalifikationseksamen Allan Grønlund Jørgensen 51 Summary of Results ProblemTime Complexity k Maximal Sums O(n+k) Length Constrained k Maximal Sums O(n+k) Sum Selection (n+nlog(k/n)) Sum Selection:

Kvalifikationseksamen Allan Grønlund Jørgensen 52 Summary of Results Data StructureQueryUpdate Priority Queue  (log(n)+  )O(log(n)+  ) am Sorted Array O(log(n)+  ) - Rand. Dictionary  (log(n)+  ) exp. O(log(n)+  ) exp. Dictionary  (log(n)+  )O(log(n)+  ) am. I/O Dictionary  I/O  I/O Fault Tolerant Data Structures:

Kvalifikationseksamen Allan Grønlund Jørgensen 53 Progress and Future Time PhD StartQualification Exam Priority Queue Searching I/O Eff. Search k Max Sums Sum Selection Fault Tolerance Sums in Arrays I/O Eff. Sorting Cache Oblivious MIT Selection in arb. Trees (l,u) k Max Sums Dictionary

Kvalifikationseksamen Allan Grønlund Jørgensen 54