Binary Merge-Sort Merge-Sort(A,i,j) 01 if (i < j) then 02 m = (i+j)/2; 03 Merge-Sort(A,i,m); 04 Merge-Sort(A,m+1,j); 05 Merge(A,i,m,j) Merge-Sort(A,i,j) 01 if (i < j) then 02 m = (i+j)/2; 03 Merge-Sort(A,i,m); 04 Merge-Sort(A,m+1,j); 05 Merge(A,i,m,j) Divide Conquer Combine Merge is linear in the #items to be merged
Few key observations Items = (short) strings = atomic... On english wikipedia, about 10 9 tokens to sort (n log n) memory accesses (I/Os ??) [5ms] * n log 2 n ≈ 3 years In practice it is a “faster”, why?
Recursion log 2 N
Implicit Caching… log 2 N M N/M runs, each sorted in internal memory (no I/Os) 2 passes (one Read/one Write) = 2 * (N/B) I/Os — I/O-cost for binary merge-sort is ≈ 2 (N/B) log 2 (N/M) Log 2 (N/M) 2 passes (R/W)
B A key inefficiency B After few steps, every run is longer than B !!! B We are using only 3 pages But memory contains M/B pages ≈ 2 30 /2 15 = 2 15 B Output Buffer Disk 1, 2, 3 Output Run 4,...
Multi-way Merge-Sort Sort N items with main-memory M and disk-pages B: Pass 1: Produce (N/M) sorted runs. Pass i: merge X = M/B-1 runs log X N/M passes Main memory buffers of B items Pg for run1 Pg for run X Out Pg Disk Pg for run 2...
How it works … M N/M runs, each sorted in internal memory = 2 (N/B) I/Os 2 passes (one Read/one Write) = 2 * (N/B) I/Os — I/O-cost for X-way merge is ≈ 2 (N/B) I/Os per level Log X (N/M) M X X
Cost of Multi-way Merge-Sort Number of passes = log X N/M log M/B (N/M) Total I/O-cost is ( (N/B) log M/B N/M ) I/Os Large fan-out (M/B) decreases #passes In practice M/B ≈ 10 5 #passes = 1 few mins Tuning depends on disk features Compression would decrease the cost of a pass!