Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
CJD CJD
CJD CJD
The Sort Phase The Merge Phase
Phase 1 : The Sort Phase Phase 2 : The Merge Phase
Divide the records of a file into several runs, For each pair of runs, one from file_1 and
another from file_2,
internal sort the records in a run, and
merge the pair resulting in a longer run.
distribute the runs “evenly” to two external
Store the new resulting run in a third external file
files file_1 and file_2 file_3
Redistribute the runs evenly in file_3 to file_1
and file_2
Repeat Phase 2 until all records are in one
long run.
CJD CJD
CJD CJD
CJD CJD
CJD CJD
The Sort Phase The Merge Phase
Phase 2 : The Merge Phase
Phase 1 : The Sort Phase
For each pair of runs, one from file_1 and
Divide the records of a file into several runs, another from file_2,
internal sort the records in a run, and merge the pairs resulting in longer runs.
distribute the runs “evenly” to two external alternately store the resulting runs in external files
file_3 and file_4
files file_1 and file_2
Repeat Phase 2 until all records are in one
long run. Alternate the roles of file_1 and
file_2 with file_3 and file_4 depending on
which files need to be merged and which
would hold the redistributed resultant longer
runs.
CJD CJD
CJD CJD
Let NR be the number of runs initially The sort phase takes 1 pass: sorts each run, but does not
reduce the number of runs.
If NR = 1,
Each execution of merge phase is composed of a merge
Total Passes = 1 (Sort Phase only) step and a distribution step.
Suppose NR > 1. It divides the number of runs by 2
Total Passes = logk NR + 1 Until there is only 1 run.
The number of divisions to go from k j to 1 is j.
So the number of merges is j = logk NR
CJD CJD
Total No.
And the total number of passes is j + 1 = (logk NR) + 1, 14 8 6 5
of Passes
including the one for sort phase
If NR is not a power of 2, the number of passes is logk NR + 1 Question: What conclusion/s can you draw based on
When NR=5 and k=3, requires 3 passes instead of 4 (Balanced the above table.
2-way)
CJD CJD
More Exercises Challenges
What if each sorted run from the sort phase is distributed
Exercise 2: Using Balanced 3-way Sort Merge algorithm, sort
the given master file with the following records. Assume that to a separate file and all such files are merged into one
output file.
the size of the run is 3. Determine the total number of passes.
What are the implications ? What factors make this
File : 28 17 79 38 5 70 24 91 37 3 19 63 15 44 8
approach possible? impossible?
Exercise 3: Using Balanced 3-way Sort Merge algorithm, sort There are main memory and number of file devices
the given master file with the following records. Assume that limitations
the size of the run is 4. Determine the total number of passes. How do you implement a k-way merge efficiently if k > 2 ?
File : 50 110 95 10 100 36 153 40 120 60 70 130 22 140 80 If k is large, use priority queues CS101 (or an
advanced CS101 course)
Exercise 4: Using Balanced 4-way Sort Merge algorithm, sort
the given master file with the following records. Assume that The realistic sort/merge situation is somewhere between
the basic balanced two-way sort merge, and the idealistic
the size of the run is 3. Determine the total number of passes.
balanced k-way sort/merge, which uses k input files for k
File : 50 110 95 10 100 36 153 40 120 60 70 130 22 140 80 runs and merges to one output file.
CJD CJD
CJD CJD
Algorithm Simulation (1) Algorithm Simulation (2)
File : 50 110 95 10 100 36 153 40 120 60 70 130 22 140 80 Merge Phase :
File Size = 15 records Pass 2
Size of Run = 3 initially File 1, File 2: empty
Number of Runs = 5 initially File 3: 10 36 50 95 100 110 – 22 80 140 (trivial merge : a copy)
File 4: 40 60 70 120 130 153
Recall : With a balanced 2-way sort merge :
Pass 3
Sort Phase : File 1: 10 36 40 50 60 70 95 100 110 120 130 153
Pass 1 File 2: 22 80 140 (another trivial merge : a copy)
50 110 95 - 10 100 36 - 153 40 120 - 60 70 130 - 22 140 80 File 3, File 4: empty
File 1: 50 95 110 – 40 120 153 – 22 80 140
File 2: 10 36 100 – 60 70 130 Pass 4
File 3, File 4: empty File 1, File 2: empty
File 3: 10 22 36 40 50 60 70 80 95 100 110 120 130 140 153
File 4: empty
CJD CJD
CJD CJD
CJD CJD