Presentation is loading. Please wait.

Presentation is loading. Please wait.

제 7 장 Cosequential Processing and the Sorting of Large Files

Similar presentations

Presentation on theme: "제 7 장 Cosequential Processing and the Sorting of Large Files"— Presentation transcript:

1 제 7 장 Cosequential Processing and the Sorting of Large Files

2 Cosequential Operations
Coordinated processing of two or more sequential lists to produce a single output list Kinds of Operations merging, union matching intersection combination of above K.O. Lee

3 Matching Operation Output the names common to the two lists
Matching or an intersection Four step 1. Initializing 2. Synchronizing 3. Handling end-of-file conditions 4. Recognizing errors K.O. Lee

4 Matching Operation (2) Algorithm p261 Figure 7.2
three-way conditional statement if NAME_1 < NAME_2 read the next from LIST_1 if NAME_1 > NAME_2 read the next from LIST_2 else output the name read the next from both list K.O. Lee

5 Matching Operation (3) Key of algorithm End-of-file condition
always return to the head of the main loop End-of-file condition test MORE_NAMES_EXIST flag until either of two list reaches end-of-file K.O. Lee

6 Merging Two Lists Based on matching operation p264 Figure 7.5
Difference must read each of the lists completely change MORE_NAMES_EXIST behavior HIGH_VALUE comes after all legal input values in the file’s ordered sequence K.O. Lee

7 Cosequential Processing Model
Assumptions Two or more input files are processed in a parallel fashion Each file is sorted Comments Output may be the same as one of the input files Not necessary that all files have the same record structures K.O. Lee

8 Cosequential Processing Model (2)
Assumptions must exist a high key and a low key value records are in logical sorted order Comments not necessary, but decreases complexity physical ordering can have a large impact on processing K.O. Lee

9 Cosequential Processing Model (3)
Assumptions for each file, only one current record records should be manipulated only in internal memory Comments not prohibits looking ahead or looking back, but such operations should be restricted to subprocedures cannot alter a record K.O. Lee

10 Cosequential Processing Model (4)
Components Initialization read from the first record in the files Synchronization loop as long as relevant records remain Selection in main synchronization loop Use high values as end-of-file condition no special code to deal with end-of-file K.O. Lee

11 Cosequential Processing Model (5)
Components - cont’d I/O and error detection are to be relegated to subprocesses hide details Simple and robust Example: General Ledger Program pp. 268~276 K.O. Lee

12 Multiway Merging K-way merge
merge K input lists to create a single, ordered output list p277 Figure 7.16 less then 8 or so K.O. Lee

13 Multiway Merging (2) Selection Tree K-way merge
set of comparisons becomes expensive time vs space trade-off a kind of tournament tree each higher-level node represents the winner of the two descendent keys the depth of tree is log2 K K.O. Lee

14 Selection Tree K.O. Lee

15 Sorting in RAM Can we improve on the time of RAM sort? Heapsort
perform some of parts in parallel selection tree is good but cannot used to sort entire file Heapsort sorting and reading can occur in parallel keeping all of the keys in heap K.O. Lee

16 Heapsort Heap Processing overlap with I/O
자식 노드는 부모노드보다 크거나 같다. 노드 i의 자식 노드는 2i와 2i+1 Fig 7.20, Fig 7.21 Processing overlap with I/O use more than one buffer p284 Figure 7.22 fill buffer while building heap Procedure for outputting : Fig 7.23 K.O. Lee

17 Sorting Large Files on Disk
Keysort shortcomings cost of seeking cannot sort really large file all key/pointer pairs in RAM Multiway merge algorithm run: sorted subfile K.O. Lee

18 Sorting Large Files on Disk (2)
K.O. Lee

19 Sorting Large Files on Disk (3)
Multiway merging can be extended to files of any size reading during the run creation step no seeking due to sequential reading reading and writing during merging sequential I/O overlap using heapsorting tape can be used K.O. Lee

20 How Much Time Does a Merge Sort Take?
Merge Sort vs Key Sort pp. 287~290 (10분대 5시간) 4 Steps reading records and forming runs writing sorted runs reading sorted runs for merging writing sorted file K.O. Lee

21 Sorting a Very Large File
Kinds of I/O sort phase sequential if using heapsort no improvement merge phase random access(run의 개수에 비례) Ways to improve performance cut down the number of random access in the merge phase K.O. Lee

22 Cost of Increasing the File Size
For a K-way merge of K runs, the buffer size for each of the runs 1/K * size of RAM = 1/K * size of each run merge operation requires K2 seeks Merge sort is O(K2) operation K.O. Lee

23 Cost of Increasing the File Size (2)
Ways to reduce time more hardware merge more than one step reducing the order of each merge increasing the buffer size for each run Increase the length of the initial sorted runs Overlap I/O operations K.O. Lee

24 Hardware-based Improvements
Possible configuration increasing the amount of RAM increasing the number of disk drives increasing the number of I/O channels K.O. Lee

25 Multiple-Step Merging
Break the original set of runs into small groups and merge the runs in these groups separately Fewer seeks, but extra transmission time in second pass Read every record twice to form the intermediate runs and to form the final sorted file K.O. Lee

26 Multiple-Step Merging (2)
Essence of multiple-step merging increase the available buffer space for run extra pass vs random access decreasing More than two steps? reduced seek and rotational times vs transmission times K.O. Lee

27 Increasing Run Lengths
A longer initial run fewer total runs bigger buffers fewer seeks Replacement selection K.O. Lee

28 Replacement Selection
Idea aways select the key from memory that has the lowest value output the key replacing it with a new key from the input list Implementation: p299 p300 Figure 7.27 K.O. Lee

29 Replacement Seletion (2)
What about a key arriving in memory too late to be output into its proper position? use of second heap p301 Figure 7.28 K.O. Lee

30 Replacement Selection (4)
Two questions Given P locations in memory, how long a run can we expect replacement selection to produce, on the average? pp. 301~302 What are the costs of using replacement selection? pp. 303~304 less than 1/3 as many seeks as RAM sorting K.O. Lee

Download ppt "제 7 장 Cosequential Processing and the Sorting of Large Files"

Similar presentations

Ads by Google