Presentation is loading. Please wait.

Presentation is loading. Please wait.

Binary Merge-Sort Merge-Sort(A,i,j) 01 if (i < j) then 02 m = (i+j)/2; 03 Merge-Sort(A,i,m); 04 Merge-Sort(A,m+1,j); 05 Merge(A,i,m,j) Merge-Sort(A,i,j)

Similar presentations


Presentation on theme: "Binary Merge-Sort Merge-Sort(A,i,j) 01 if (i < j) then 02 m = (i+j)/2; 03 Merge-Sort(A,i,m); 04 Merge-Sort(A,m+1,j); 05 Merge(A,i,m,j) Merge-Sort(A,i,j)"— Presentation transcript:

1 Binary Merge-Sort Merge-Sort(A,i,j) 01 if (i < j) then 02 m = (i+j)/2; 03 Merge-Sort(A,i,m); 04 Merge-Sort(A,m+1,j); 05 Merge(A,i,m,j) Merge-Sort(A,i,j) 01 if (i < j) then 02 m = (i+j)/2; 03 Merge-Sort(A,i,m); 04 Merge-Sort(A,m+1,j); 05 Merge(A,i,m,j) Divide Conquer Combine 1 2 8 10 7 9 13 19 127 Merge is linear in the #items to be merged

2 Few key observations Items = (short) strings = atomic... On english wikipedia, about 10 9 tokens to sort  (n log n) memory accesses (I/Os ??) [5ms] * n log 2 n ≈ 3 years In practice it is a “faster”, why?

3 Recursion 102 10 2 51 5 1 1319 13 19 97 9 7 154 15 4 83 8 3 1217 12 17 611 6 11 10 2 5 113 19 9 715 4 8 312 17 6 11 10 2 5 1 13 19 9 715 4 8 3 12 17 6 11 10 2 5 1 13 19 9 7 15 4 8 3 12 17 6 11 log 2 N

4 Implicit Caching… 102 2 10 51 1 5 1319 13 19 97 7 9 154 4 15 83 3 8 1217 12 17 611 6 11 1 2 5 107 9 13 193 4 8 156 11 12 17 1 2 5 7 9 10 13 193 4 6 8 11 12 15 17 1 2 3 4 5 6 7 8 9 10 11 12 13 15 17 19 log 2 N M N/M runs, each sorted in internal memory (no I/Os) 2 passes (one Read/one Write) = 2 * (N/B) I/Os — I/O-cost for binary merge-sort is ≈ 2 (N/B) log 2 (N/M) Log 2 (N/M) 2 passes (R/W)

5 B A key inefficiency 1 2 4 7 9 10 13 193 5 6 8 11 12 15 17 B After few steps, every run is longer than B !!! B We are using only 3 pages But memory contains M/B pages ≈ 2 30 /2 15 = 2 15 B Output Buffer Disk 1, 2, 3 Output Run 4,...

6 Multi-way Merge-Sort Sort N items with main-memory M and disk-pages B: Pass 1: Produce (N/M) sorted runs. Pass i: merge X = M/B-1 runs  log X N/M passes Main memory buffers of B items Pg for run1 Pg for run X Out Pg Disk Pg for run 2...

7 How it works 1 2 5 107 9 13 19 1 2 5 7…. 1 2 3 4 5 6 7 8 9 10 11 12 13 15 17 19 M N/M runs, each sorted in internal memory = 2 (N/B) I/Os 2 passes (one Read/one Write) = 2 * (N/B) I/Os — I/O-cost for X-way merge is ≈ 2 (N/B) I/Os per level Log X (N/M) M X X

8 Cost of Multi-way Merge-Sort Number of passes = log X N/M  log M/B (N/M) Total I/O-cost is  ( (N/B) log M/B N/M ) I/Os Large fan-out (M/B) decreases #passes In practice M/B ≈ 10 5  #passes = 1  few mins Tuning depends on disk features Compression would decrease the cost of a pass!


Download ppt "Binary Merge-Sort Merge-Sort(A,i,j) 01 if (i < j) then 02 m = (i+j)/2; 03 Merge-Sort(A,i,m); 04 Merge-Sort(A,m+1,j); 05 Merge(A,i,m,j) Merge-Sort(A,i,j)"

Similar presentations


Ads by Google