Presentation is loading. Please wait.

Presentation is loading. Please wait.

External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.

Similar presentations


Presentation on theme: "External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary."— Presentation transcript:

1 External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary storage is SLOW, about 250,000 times slower. (i.e. can do 250,000 CPU instructions in one I/O time). Primary storage is volatile. Secondary storage is permanent.

2 Golden Rule of File Processing Minimize the number of disk accesses!! Arrange information so that you get what you want with few disk accesses. Arrange information so you minimize future disk accesses. An organization for data on disk is often called a file structure. Disk based space/time tradeoff: Compress information to save time by reducing disk accesses.

3 Disk Drives Track - a circle around a disk that holds information. Sector - arc portion of a track. Interleaving factor - Physical distance between logically adjacent sectors on a track, to allow for processing of sector data. Locality of Reference - If a record is read from a disk, the next request is likely to come from near the same place in the file. Cluster - smallest unit of allocation; usually several sectors. Extent - group of physically contiguous clusters Internal Fragmentation - wasted space within a sector if the record size is not the sector size.

4 Access Time Seek Time - time for I/O head to reach desired track. Based on distance between I/O head and desired track. –F(n)=t*n+s where t is time to traverse one track and s is the startup time for the I/O head. Rotational delay (latency) - time for data to rotate to I/O head position. Determined by disk RPM. Transfer time - time for data to move under the I/O head. Determined by RPM and size of information to transfer.

5 Disk Access time example 675 Mbyte disk drive –15 platters --> 45 Mbyte/platter –612 tracks/platter –150 sectors/track --> 512 bytes/sector –8 sectors/cluster -> 4K/cluster -> 18 clusters/track –1 track seek -->.08ms –seek startup 3 ms –1 revolution --> 16.7ms –Interleaving factor of 3 -> 3 revolutions to read 1 track (50.1 msec) How long to read a file of 128K divided into 256 records of 512 bytes? –Uses 2 tracks (150 sectors on one 106 on other) –The average distance for seek is disk size/3 (not 2!)

6 Access Time total time = initial seek + second seek +2*(rotational delay+transfer time) total = (612/3*.08+3) + (.08+3) + 2*(.5+3)*16.7= 139.3 Assumes clusters are contiguous and tracks are contiguous If clusters are randomly spread across disk –time = 32*(612/3*.08+3+16.7/2+24/150*16.7) = 969.6 msec –24/150 is the part of the disk to read (8 sectors with an interleaving factor of 3).

7 Buffers Read time for one track –612/3*.08 + 3 + (3+.5)*16.7 = 77.8 Read time for one sector –612/3*.08 +3 + (.5+1/150)*16.7 = 27.8 Read time for one byte –612/3*.08 +3 + (.5)*16.7 = 27.7 Nearly all disk drives read/write one sector every time there is an I/O access. The information in a sector is stored in a buffer in the operating system.

8 Buffer Pools A series of buffers used by an application to cache disk data is called a buffer pool. Double buffering - read data from disk while CPU is processing the previous buffer. Same with writing; store written data into buffer while previous buffer is being physically placed on disk. Many buffers - allows for large differences between I/O time and CPU time. Sometimes I/O may fill a buffer faster than CPU can empty it.

9 Programmer’s View of Files Logical view of files: –An array of bytes. –A file pointer marks the current position. Three fundamental operations: –Read bytes from current position (move file pointer). –Write byes to current position (move file pointer). –Set file pointer to specified position.

10 External Sorting Problem: sort data sets too large to fit in main memory. Assume the data is stored on a disk drive. To sort, portions of the data must be brought into main memory, processed, and returned to disk. An external sort should minimize disk accesses.

11 External Computation Model Secondary storage is divided into equal sized blocks. The basic I/O operation transfers one block of information. Under certain circumstances, reading blocks of a file in sequential order is more efficient. –Minimize seek time. –File physically sequential. –Head does not move between accesses - no timesharing

12 More Model Typically, the time to perform a single block I/O operation is enough to Quicksort the contents of the block. Most systems have single drive, so must sort on a single drive. Need to minimize the number of block I/O operations.

13 Key Sorting Often records are large while keys are small Approach 1 –read in records, sort them, write them out Approach 2 (resembles pointer sort) –Read in only the key values –Store with each key the location on disk of its associated record. –Read in records in key order and write them out in sorted order

14 External sort: Simple Mergesort Quicksort requires random access to the entire set of records A better process for external data is a modified Mergesort algorithm This processes n elements in O(log n) passes. A group of sorted records is called a run.

15 External Mergesort Algorithm Split the file into two files. Read a block from each file. Take first record from each block, output them in sorted order. Take next record from each block, output them to a second file in sorted order. Repeat until finished, alternating between output files. Read new input blocks as needed. Now have runs of size 2 Repeat steps 2-5 except the input files have groups of 2 that need to be merged. Each pass provides runs of twice the size.

16 Problems Is each pass through then input and output files truly sequential? –If the files are all on the same disk, then the heads will be moving from one input file to another input file to an output file to another output file. –Very helpful if each file has its own disk head. How do we reduce the number of passes. –At the beginning, read in as much data as possible and sort it internally and just write it out. Can merge more than 2 runs at a time. - multiway merging.

17 THE MERGESORT The Mergesort process has 2 phases: –Break the file into large initial runs. –Merge the runs together to make a single sorted list

18 Replacement Selection This method tries to maximize the size of initial runs Break available memory into an array for a heap, an input buffer and an output buffer. Fill the array from disk. Make a min-heap Send the smallest value to the output buffer Read next key If new key is greater than last output value –replace the root with this key and “heapify” else –replace the root with the last key and “heapify” –add next record to a new heap (end of the array).


Download ppt "External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary."

Similar presentations


Ads by Google