Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 5 Record Storage & Primary File Organizations

Similar presentations

Presentation on theme: "Chapter 5 Record Storage & Primary File Organizations"— Presentation transcript:

1 Chapter 5 Record Storage & Primary File Organizations

2 Storage The are two general types of storage media that is used with computers. They are : Primary Storage - This includes all storage media that can be operated on directly by the CPU (RAM , L1 and L2 Cache Memory) Secondary Storage - This includes Hard Drives, CD’s and tape. Chapter 5

3 Memory Hierarchies & Storage Devices
The Memory Hierarchy is based upon speed of access. However, this speed comes with a price tag attached which varies inversely with the access time of memory. Like cars the faster the memory access is the more it costs. Chapter 5

4 Primary Storage Level of Memory
The Primary Storage Level of Memory is generally made up of 3 Levels. L1 Cache which is located on the CPU L2 Cache which is located near the CPU Main Memory which is the RAM figure that is often referred to in computer advertisements Chapter 5

5 Secondary Storage Level of Memory
The Secondary Storage Level of Memory may be made up of 4 Levels. Flash Memory or EEPROM Hard Drives CD ROM’s Tape Chapter 5

6 Figure 5.1 Chapter 5

7 Terms Used in the Hardware Description of Hard Drives
Capacity - The number of bytes it can store. Single-sided vs. Double-sided - States if the disk/platter is written on one or both sides. Disk Pack - A collection of disks/platters that are assembled together into a pack. Track - A Circle of a small width on a disk. A disk surface will have many tracks. Chapter 5

8 Terms Used in the Hardware Description of Hard Drives
Sector - A segment or arc of a track. Block - is the division of a track into equal sized portions by the operating system. Interblock Gaps - These are fixed sized segments that separate the blocks. Read/Write Head - Actual reads/writes the information to the disk. Chapter 5

9 Terms Used in the Hardware Description of Hard Drives
Cylinder - Tracks with the same diameter that are located on the disk surface of a disk pack. Chapter 5

10 Figure 5.2 Chapter 5

11 Terms Used in Measuring Disk Operations
Seek Time (s)- The time it takes to position the read/write head on the desired track. It will be given in all problems that it is needed for. Rotational Delay (rd) - The average amount of time it takes the desired block to rotate into position under the read/write head Rd=(1/2)*(1/p) min where p is rpm of the disk Chapter 5

12 Terms Used in Measuring Disk Operations
Transfer Rate (tr) - The rate at which information can be transferred to or from the disk. tr =(track size)/(1/p min) Block Transfer Time (btt) - The time it takes to transfer the data once the read/write head has been positioned. btt = B/tr msec where B is the block size in bytes. Chapter 5

13 Terms Used in Measuring Disk Operations
Bulk Transfer Rate (btr) - The rate at which multiple blocks can be written/read to contiguous blocks. Where G is the Interblock Gap btr = (B/(B+G)) * tr bytes/msec Rewrite Time (Trw) - Time it takes after a block is read to write that same block back to the disk or the time for one revolution. Chapter 5

14 Computing Times Given : Seek Time (s) = 10 msec
Rotational speed = 3600 rpm Track size = 50 KB Block size (B) = 512 bytes Interblock Gap = 128 bytes Chapter 5

15 Problems for Disk Operations
Compute the average time it takes to transfer 1 block on this system. Compute the average time it takes to transfer 20 non-contiguous blocks that are located on the same track. Compute the average time it takes to transfer 20 contiguous blocks. Chapter 5

16 Parallelizing Disk Access Using RAID
RAID - Stands for Redundant Arrays of Inexpensive Disks or Redundant Arrays of Independent Disks. RAIDs are used to provide increased reliability, increased performance or both. Chapter 5

17 RAID Levels Level 0 - has no redundancy and the best write performance but its read performance is not as good as level 1. Level 1 - uses mirrored disks which provide redundancy and improved read performance. Level 2 - provides redundancy using Hamming Codes Chapter 5

18 RAID Levels Level 3 - uses a single parity disk.
Level 4 and 5 - use block-level data striping with level 5 distributing the data across all the disks. Level 6 - uses the P + Q redundancy scheme making use of the Reed-Soloman codes to protect against the failure of 2 Disks. Chapter 5

19 Figure 5.4 Chapter 5

20 Fig 5.5 Chapter 5

21 Fig 5.6 Chapter 5

22 Records Records is the term used to refer to a number of related values or items. Each value or item is stored in a field of a specific data type. Records may be of either fixed or variable lengths. Chapter 5

23 Variable Length Records in Files
There are several reasons a record with the same record type may be of variable length. Variable length fields Repeating fields For efficiency reasons different record types may be clustered in a file. Chapter 5

24 Fig 5.7 Chapter 5

25 Spanned Vs Unspanned Records
When the records in a file is stored on a disk they may be placed in blocks of a fixed size. This will rarely match the record size. So a decision must be made when the record size is smaller than the block size and the block size is not a multiple of the record size whether to store the record all in one block and have unused space or in two different blocks. Chapter 5

26 Fig 5.8 Chapter 5

27 File Operations File may either be stored in contiguous blocks or by linking the blocks together. There are advantages and disadvantages to both methods. Operations on files can be group into two type of operations. Retrieval or update. Retrieval only involves a read while and update involves read, write and modification. Chapter 5

28 File Structure Heap (Pile) Files Hash (Direct) Files
Ordered (Sorted) Files B - Trees Chapter 5

29 Once the data has been brought into memory, it can be accessed by an instruction in seconds by a machine running a 25MIPS. The disparity between time for memory access and disk access is enormous:we can perform 625,000 instructions in the time it takes to read /write one disk page. To put this in human terms if you were typing a letter for you boss and found a word you could not make out so you leave him a voice mail message. Since you were told to do nothing else but this you patiently wait for his reply doing Nothing! Unfortunately, he just went on vacation and does not get your message for 3 WEEKS. This is similar to the computer waiting .025 seconds to get the needed data into memory from a disk read. Chapter 5

30 Heap (Pile) Files (Unordered)
Insertions - Very efficient Search - Very inefficient (Linear Search) Deletion - Very inefficient Lazy Deletion Problems? When are they Used? Chapter 5

31 Ordered (Sorted Files) Records
Records are stored based on the value contained in one of their fields called the ordering field. If the ordering field is also a key field than the field is better described as an ordering key. Chapter 5

32 Advantages of Ordered Files
Reading of the records in order of the ordering field is extremely efficient. Finding the next record is fast. Finding records based on a query of the ordering field is efficient. (binary search). Binary search may be done on the blocks as well. Chapter 5

33 Disadvantages of Ordered Files
Searches on non-ordering fields are inefficient. Insertion and deletion of records are very expensive. Solutions to these problems? Chapter 5

34 Hashing Techniques This is where a records placement is determined by value in the hash field. This value has a hash or randomizing function applied to it which yields the address of the disk block where the record is stored. For most records, we need only a single-block access to retrieve that record. Chapter 5

35 Internal Hashing Internal Hashing is implemented as a hash table through the use of an array of records. (In memory) An array index range of 0 to M-1. A function that transforms the hash field value into an integer between 0 to M-1 is used. A common one is h(K) =K mod M. Chapter 5

36 Internal Hashing (con’t)
Collisions occur when a hash field value of a record being inserted hashes to an address that already contains a different record. The process of finding another position for this record is called collision resolution. Chapter 5

37 Collision Resolution Open Addressing- Places the record to be inserted in the first available position subsequent to the hash address. Chaining - A pointer field is added to each record location. When an overflow occurs this pointer is set to point to overflow blocks making a linked list. Chapter 5

38 Collision Resolution (con’t)
Multiple hashing - If an overflow occurs a second hash function is used to find a new location. If that location is also filled either another hash function is applied or open addressing is used. Chapter 5

39 Fig 5.10 Page 140 Chapter 5

40 Goals of the Hash Function
The goals of a good hash function are to uniformly distribute the records over the address space while minimizing collisions to avoid wasting space. Research has shown 70% to 90% fill ratio best. That when uses a Mod function M should be a prime number. Chapter 5

41 External Hashing for Disk Files
External hashing makes use of buckets, each of which can hold multiple records. A bucket is either a block or a cluster of contiguous blocks. The hash function maps a key into a relative bucket number, rather than an absolute block address for the bucket. Chapter 5

42 Types of External Hashing
Using a fixed address space is called static hashing. Dynamically changing address space: Extendible hashing* Linear hashing** * With a Directory ** Without a Directory Chapter 5

43 Static Hashing Under Static Hashing a fixed number of buckets (M) is allocated. Based on the hash value a bucket number is determined in the block directory array which yields the block address. If n records fit into each block. This method allows up to n*M records to be stored. Problems as this fixed space fills up more and more collisions take place causing chaining. Reorganization takes a significant amount of time and requires a new hash function. Chapter 5

44 Fig 5.11 Page 143 Chapter 5

45 Fig 5.12 Page 144 Chapter 5

46 Extendible Hashing In Extendible Hashing, a type of directory is maintained as an array of 2d bucket addresses. Where d refers to the first d high (left most) order bits and is referred to as the global depth of the directory. However, there does NOT have to be a DISTINCT bucket for each directory entry. A local depth d’ is stored with each bucket to indicate the number of bits used for that bucket. Chapter 5

47 Figure 5.13 Page 146 Chapter 5

48 Overflow (Bucket Splitting)
When an overflow in a bucket occurs that bucket is split. This is done by dynamically allocating a new bucket and redistributing the contents of the old bucket between the old and new buckets based on the increased local depth d’+1 of both these buckets. Chapter 5

49 Overflow (Bucket Splitting)
Now the new bucket’s address must be added to the directory. If the overflow occurred in a bucket whose current local depth d’ is less than or equal to the global depth d adjust the directory entries accordingly. (No change in the directory size is made.) Chapter 5

50 Overflow (Bucket Splitting)
If the overflow occurred in a bucket whose current local depth d’ is now greater than the global depth d you must increase the global depth accordingly. This results in a doubling of the directory size for each time d is increased by 1 and appropriate adjustment of the entries. Chapter 5

51 Slide showing how buckets are split under Extendible Hashing.
Chapter 5

52 Shrinking Extendible Hashing Files
The generally used principal for shrinking extendible hashing files is that when d > d’ for all buckets after a deletion occurs. Buckets may be combined when the each of the buckets to be combined are less than half full and have the same bit pattern with the exception of the d’ bit. I.e. d’ = 3 and the bit patterns of 110 and 111. Chapter 5

53 Linear Hashing Linear Hashing allows the hash file to expand and shrink its number of buckets dynamically without needing a directory. It starts with M buckets numbered 0 to M-1 and use the mod hash function h(K)= K mod M as the initial hash function called hi. Chapter 5

54 Linear Hashing (Con’t)
Overflow is handled by chaining individual overflow chains for each bucket. It works by methodically splitting the original buckets; starting with bucket 0, redistributing the contents of bucket 0 between bucket 0 and bucket M (the new bucket) using a secondary hash function: h i+1(K) = K mod 2M Chapter 5

55 Linear Hashing (Con’t)
This splitting of buckets is done in order (0,1,…,M-1) REGARDLESS of which bucket the collision occurred. To keep track of the next bucket to be split we will use n. So n would be incremented to 1. When a record hashes to a bucket less than n we use the secondary hash function to determine which of the two buckets it belongs in. Chapter 5

56 Linear Hashing (Con’t)
When all of the original M buckets have been split and we have 2M buckets and n = M We reset M to 2M, n to 0 and change our secondary hash function to our primary hash function. Shrinking of the file is done based on the load factor using the reverse of splitting. Chapter 5

57 Slide showing how to split using linear hashing.
Chapter 5

Download ppt "Chapter 5 Record Storage & Primary File Organizations"

Similar presentations

Ads by Google