Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sorting by the Numbers Sorting Part Four. Question Suppose you are given the task of writing an application to sort a big data file. What do you need.

Similar presentations


Presentation on theme: "Sorting by the Numbers Sorting Part Four. Question Suppose you are given the task of writing an application to sort a big data file. What do you need."— Presentation transcript:

1 Sorting by the Numbers Sorting Part Four

2 Question Suppose you are given the task of writing an application to sort a big data file. What do you need to know to pick a good solution?  File Size = 1 GB  Record Size = 250 Bytes  Available Memory = ¼ GB

3 How many Runs? How big is each Run? Total Records to Process  1 billion bytes in the file  250 bytes for each record  = 4 million records in the file Run Size  1GB file  ¼ GB memory  = 4 Runs of 1 million records each

4 Time to Create the Runs Sorting One Run  Using either Quicksort or Ordered Binary Tree N log 2 N 1million * 20  approximately 20 million comparisons of internal memory locations Sorting Four Runs  80 million internal memory comparisons

5 Refresher on Merging Files So, to merge 2 files of N random records each, requires 2N compares And, to merge 2 files where the runs were built from a sorted file requires N compares File One 1 3 5 7 9 File Two 2 4 6 8 10 File One 1 2 3 4 5 File Two 6 7 8 9 10

6 Merging the Four Files R1R2 T2 R3 T1 R4R1R2 T2 R3T1 R4 2 million compares 4 million compares 3 million compares 2 million compares 4 million compares

7 Total Processing Time Time to Create the 4 Runs  80 million comparisons Time to Merge the 4 Runs  8 million comparisons Assuming a File Read takes just 100 times longer than a Memory Read  Total Time = 880 million time units  note, we have omitted the time to read the runs into memory and to write the runs to temp files

8 Second Example 2 Runs of 2 Million Records each 2 Runs of 2 Million Records each  Internal Sorting N log2 N = 2million * 24 = 48 million compares 96 million to create both runs  File Merging 4 million compares  Total Time 496 million time units 496 million time units

9 Next in this course So how much time does it take to access the disk?


Download ppt "Sorting by the Numbers Sorting Part Four. Question Suppose you are given the task of writing an application to sort a big data file. What do you need."

Similar presentations


Ads by Google