Chapter 52 Storage The are two general types of storage media that is used with computers. They are : –Primary Storage - This includes all storage media that can be operated on directly by the CPU (RAM, L1 and L2 Cache Memory) –Secondary Storage - This includes Hard Drives, CDs and tape.
Chapter 53 Memory Hierarchies & Storage Devices The Memory Hierarchy is based upon speed of access. However, this speed comes with a price tag attached which varies inversely with the access time of memory. Like cars the faster the memory access is the more it costs.
Chapter 54 Primary Storage Level of Memory The Primary Storage Level of Memory is generally made up of 3 Levels. –L1 Cache which is located on the CPU –L2 Cache which is located near the CPU –Main Memory which is the RAM figure that is often referred to in computer advertisements
Chapter 55 Secondary Storage Level of Memory The Secondary Storage Level of Memory may be made up of 4 Levels. –Flash Memory or EEPROM –Hard Drives –CD ROMs –Tape
Chapter 56 Figure 5.1
Chapter 57 Terms Used in the Hardware Description of Hard Drives Capacity - The number of bytes it can store. Single-sided vs. Double-sided - States if the disk/platter is written on one or both sides. Disk Pack - A collection of disks/platters that are assembled together into a pack. Track - A Circle of a small width on a disk. A disk surface will have many tracks.
Chapter 58 Terms Used in the Hardware Description of Hard Drives Sector - A segment or arc of a track. Block - is the division of a track into equal sized portions by the operating system. Interblock Gaps - These are fixed sized segments that separate the blocks. Read/Write Head - Actual reads/writes the information to the disk.
Chapter 59 Terms Used in the Hardware Description of Hard Drives Cylinder - Tracks with the same diameter that are located on the disk surface of a disk pack.
Chapter 510 Figure 5.2
Chapter 511 Terms Used in Measuring Disk Operations Seek Time (s)- The time it takes to position the read/write head on the desired track. It will be given in all problems that it is needed for. Rotational Delay (rd) - The average amount of time it takes the desired block to rotate into position under the read/write head. Rd=(1/2)*(1/p) min where p is rpm of the disk
Chapter 512 Terms Used in Measuring Disk Operations Transfer Rate (tr) - The rate at which information can be transferred to or from the disk. tr =(track size)/(1/p min) Block Transfer Time (btt) - The time it takes to transfer the data once the read/write head has been positioned. btt = B/tr msec where B is the block size in bytes.
Chapter 513 Terms Used in Measuring Disk Operations Bulk Transfer Rate (btr) - The rate at which multiple blocks can be written/read to contiguous blocks. Where G is the Interblock Gap btr = (B/(B+G)) * tr bytes/msec Rewrite Time (T rw ) - Time it takes after a block is read to write that same block back to the disk or the time for one revolution.
Chapter 514 Computing Times Given : –Seek Time (s) = 10 msec –Rotational speed = 3600 rpm –Track size = 50 KB –Block size (B) = 512 bytes –Interblock Gap = 128 bytes
Chapter 515 Problems for Disk Operations Compute the average time it takes to transfer 1 block on this system. Compute the average time it takes to transfer 20 non-contiguous blocks that are located on the same track. Compute the average time it takes to transfer 20 contiguous blocks.
Chapter 516 Parallelizing Disk Access Using RAID RAID - Stands for Redundant Arrays of Inexpensive Disks or Redundant Arrays of Independent Disks. RAIDs are used to provide increased reliability, increased performance or both.
Chapter 517 RAID Levels Level 0 - has no redundancy and the best write performance but its read performance is not as good as level 1. Level 1 - uses mirrored disks which provide redundancy and improved read performance. Level 2 - provides redundancy using Hamming Codes
Chapter 518 RAID Levels Level 3 - uses a single parity disk. Level 4 and 5 - use block-level data striping with level 5 distributing the data across all the disks. Level 6 - uses the P + Q redundancy scheme making use of the Reed-Soloman codes to protect against the failure of 2 Disks.
Chapter 519 Figure 5.4
Chapter 520 Fig 5.5
Chapter 521 Fig 5.6
Chapter 522 Records Records is the term used to refer to a number of related values or items. Each value or item is stored in a field of a specific data type. Records may be of either fixed or variable lengths.
Chapter 523 Variable Length Records in Files There are several reasons a record with the same record type may be of variable length. –Variable length fields –Repeating fields For efficiency reasons different record types may be clustered in a file.
Chapter 524 Fig 5.7
Chapter 525 Spanned Vs Unspanned Records When the records in a file is stored on a disk they may be placed in blocks of a fixed size. This will rarely match the record size. So a decision must be made when the record size is smaller than the block size and the block size is not a multiple of the record size whether to store the record all in one block and have unused space or in two different blocks.
Chapter 526 Fig 5.8
Chapter 527 File Operations File may either be stored in contiguous blocks or by linking the blocks together. There are advantages and disadvantages to both methods. Operations on files can be group into two type of operations. Retrieval or update. Retrieval only involves a read while and update involves read, write and modification.
Chapter 528 File Structure Heap (Pile) Files Hash (Direct) Files Ordered (Sorted) Files B - Trees
Chapter 529 Once the data has been brought into memory, it can be accessed by an instruction in seconds by a machine running a 25MIPS. The disparity between time for memory access and disk access is enormous:we can perform 625,000 instructions in the time it takes to read /write one disk page. To put this in human terms if you were typing a letter for you boss and found a word you could not make out so you leave him a voice mail message. Since you were told to do nothing else but this you patiently wait for his reply doing Nothing! Unfortunately, he just went on vacation and does not get your message for 3 WEEKS. This is similar to the computer waiting.025 seconds to get the needed data into memory from a disk read.
Chapter 530 Heap (Pile) Files (Unordered) Insertions - Very efficient Search - Very inefficient (Linear Search) Deletion - Very inefficient –Lazy Deletion Problems? When are they Used?
Chapter 531 Ordered (Sorted Files) Records Records are stored based on the value contained in one of their fields called the ordering field. If the ordering field is also a key field than the field is better described as an ordering key.
Chapter 532 Advantages of Ordered Files Reading of the records in order of the ordering field is extremely efficient. Finding the next record is fast. Finding records based on a query of the ordering field is efficient. (binary search). Binary search may be done on the blocks as well.
Chapter 533 Disadvantages of Ordered Files Searches on non-ordering fields are inefficient. Insertion and deletion of records are very expensive. Solutions to these problems?
Chapter 534 Hashing Techniques This is where a records placement is determined by value in the hash field. This value has a hash or randomizing function applied to it which yields the address of the disk block where the record is stored. For most records, we need only a single-block access to retrieve that record.
Chapter 535 Internal Hashing Internal Hashing is implemented as a hash table through the use of an array of records. (In memory) An array index range of 0 to M-1. A function that transforms the hash field value into an integer between 0 to M-1 is used. A common one is h(K) =K mod M.
Chapter 536 Internal Hashing (cont) Collisions occur when a hash field value of a record being inserted hashes to an address that already contains a different record. The process of finding another position for this record is called collision resolution.
Chapter 537 Collision Resolution Open Addressing- Places the record to be inserted in the first available position subsequent to the hash address. Chaining - A pointer field is added to each record location. When an overflow occurs this pointer is set to point to overflow blocks making a linked list.
Chapter 538 Collision Resolution (cont) Multiple hashing - If an overflow occurs a second hash function is used to find a new location. If that location is also filled either another hash function is applied or open addressing is used.
Chapter 539 Fig 5.10 Page 140
Chapter 540 Goals of the Hash Function The goals of a good hash function are to uniformly distribute the records over the address space while minimizing collisions to avoid wasting space. Research has shown – 70% to 90% fill ratio best. – That when uses a Mod function M should be a prime number.
Chapter 541 External Hashing for Disk Files External hashing makes use of buckets, each of which can hold multiple records. A bucket is either a block or a cluster of contiguous blocks. The hash function maps a key into a relative bucket number, rather than an absolute block address for the bucket.
Chapter 542 Types of External Hashing Using a fixed address space is called static hashing. Dynamically changing address space: –Extendible hashing* –Linear hashing** * With a Directory ** Without a Directory
Chapter 543 Static Hashing Under Static Hashing a fixed number of buckets (M) is allocated. Based on the hash value a bucket number is determined in the block directory array which yields the block address. If n records fit into each block. This method allows up to n*M records to be stored.
Chapter 544 Fig 5.11 Page 143
Chapter 545 Fig 5.12 Page 144
Chapter 546 Extendible Hashing In Extendible Hashing, a type of directory is maintained as an array of 2 d bucket addresses. Where d refers to the first d high (left most) order bits and is referred to as the global depth of the directory. However, there does NOT have to be a DISTINCT bucket for each directory entry. A local depth d is stored with each bucket to indicate the number of bits used for that bucket.
Chapter 547 Figure 5.13 Page 146
Chapter 548 Overflow (Bucket Splitting) When an overflow in a bucket occurs that bucket is split. This is done by dynamically allocating a new bucket and redistributing the contents of the old bucket between the old and new buckets based on the increased local depth d+1 of both these buckets.
Chapter 549 Overflow (Bucket Splitting) Now the new buckets address must be added to the directory. If the overflow occurred in a bucket whose current local depth d is less than or equal to the global depth d adjust the directory entries accordingly. (No change in the directory size is made.)
Chapter 550 Overflow (Bucket Splitting) If the overflow occurred in a bucket whose current local depth d is now greater than the global depth d you must increase the global depth accordingly. This results in a doubling of the directory size for each time d is increased by 1 and appropriate adjustment of the entries.
Chapter 551 Slide showing how buckets are split under Extendible Hashing.
Chapter 552 Shrinking Extendible Hashing Files The generally used principal for shrinking extendible hashing files is that when d > d for all buckets after a deletion occurs. Buckets may be combined when the each of the buckets to be combined are less than half full and have the same bit pattern with the exception of the d bit. I.e. d = 3 and the bit patterns of 110 and 111.
Chapter 553 Linear Hashing Linear Hashing allows the hash file to expand and shrink its number of buckets dynamically without needing a directory. It starts with M buckets numbered 0 to M-1 and use the mod hash function h(K)= K mod M as the initial hash function called h i.
Chapter 554 Linear Hashing (Cont) Overflow is handled by chaining individual overflow chains for each bucket. It works by methodically splitting the original buckets; starting with bucket 0, redistributing the contents of bucket 0 between bucket 0 and bucket M (the new bucket) using a secondary hash function: h i+1 (K) = K mod 2M
Chapter 555 Linear Hashing (Cont) This splitting of buckets is done in order (0,1,…,M-1) REGARDLESS of which bucket the collision occurred. To keep track of the next bucket to be split we will use n. So n would be incremented to 1. When a record hashes to a bucket less than n we use the secondary hash function to determine which of the two buckets it belongs in.
Chapter 556 Linear Hashing (Cont) When all of the original M buckets have been split and we have 2M buckets and n = M We reset M to 2M, n to 0 and change our secondary hash function to our primary hash function. Shrinking of the file is done based on the load factor using the reverse of splitting.
Chapter 557 Slide showing how to split using linear hashing.