Download presentation
Presentation is loading. Please wait.
1
File Organizations and Indexes ISYS 464
2
Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector, cylinder (tracks with the same diameter on the various disks) Page, block, or physical record: It is the unit of transfer between disk and primary storage, and vice versa. Blocking factor: the number of records in a block
3
Disk Speed Rpm: rounds per minute –2400, 3600, 7200 rpm Ex. 2400 rpm, then each round takes 1/2400 min/round. –60*1000/2400 = 25 msec/r
4
Time Required to Read One Block Seek time Rotational delay –Half round Block transfer time
5
Example A student file contains 20,000 records, each record has 113 bytes, assume each block is 512 bytes, how many blocks needed? –Blocking factor = floor(Block size/record size) = floor(512/113)=4 –Number of blocks = ceiling(number of records/blocking factor) = 20,000/4=5,000 blocks
6
Linear Search, Binary search, and Direct Access Assume seek = s, rotational delay = r, block transfer time = tr, and file size is 5000 blocks, then the average time to do a linear search is: s + r + tr*(half of blocks) = s + r + 2500*tr If the file is ordered by a key field, then the time to do a binary search is: (s + r + tr) * Log 2 5000 If index is available to enable direct access: s + r + tr
7
Updating a Record Read the block into main memory. Change the record in main memory. Write the block back to disk.
8
File Organization The physical arrangement of data in a file into records and pages on secondary storage. Access method: The steps involved in storing and retrieving records from a file. –Searching and updating
9
Unordered Files (Heap Files) Records are placed in the file in the same order as they are inserted. Searching: must do a linear search if index is not available. Updating: –Insertion: Read the last page, append to the last page, then write the page back. –Modification: Search and read the block to main memory. –Deletion: Mark the record for deletion (deletion flag) and periodically reorganize the file.
10
Ordered Files Enable binary search Insertion: May need a temporary overflow file and periodically the overflow file is merged with the ordered file. Deletion: May need periodical reorganization.
11
Hash Files (Direct Files) The page a record is to be stored is determined by a hash function. Hash function calculates the address of the page based on the key field of the file: –Address = H(Key) Typical hash function: division/remainder: –0 <= Key Mod M <= M-1 –Where M is the number of blocks
12
Disk blocks Block Address 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 H(K) -> Block number Block address: Physical address
13
Hash File Example 8 blocks, each block holds 2 records Hash function: Key Mod 8 Record keys: –Key = 1821, Key Mod 8 = 5 –7115, 3 –2428,4 –4750,6 –1620, 4 –4692,4
14
Collision Resolution Collision: When a record’s home block is full. Open addressing (linear probing): Place the record in the first available block.
15
Searching a Hash File Home block = H(SearchKey) If found in the home block then search successful Else –Search the next block until found or reach a block with empty space
16
Hash File Performance Average Search Length = (Total # of blocks accessed to find all records)/(The number of records in the file) Using the previous example: –(1 + 1 + 1 + 2 + 1 + 1)/6 = 7/6 Time needed to find a record in this file: –(s + r + tr) * 7/6
17
Factors Affecting Hash File Performance Hash file should spread the records evenly over the disk space. Use of a low load factor: –(# of records)/(# of available spaces) Allow each block to hold more records
18
Limitations of Hash File Cannot be accessed by other order: –Direct access only Fixed amount of space allocated to the file: –Static hashing –Waste space, hard to grow Inappropriate for retrievals based on ranges of values: –Find EmpID = 123 –Find EmpID > 123
19
Dynamic Hashing Dynamic hashing allows the file size to change to accommodate growth and shrinkage of the file.
20
A Dynamic Hashing Example Use hash value’s binary number: Key = 1821, Key Mod 8 = 5-> 101 –7115, 3`-> 011 –2428,4 -> 100 –4750,6 -> 110 –1620, 4 -> 100 –4692,4 -> 100 –Assume the file begins with one block and each block holds two records. 1821, 7115 When the third record arrives, use the first binary digit to split the three records into two blocks, and create a block index. 7115 1821, 2428 0 1
21
Index A data structure that allows the DBMS to locate particular records in a file more quickly. Index file: –IndexField + RecordPointer –Ordered according to the indexing field
22
Types of Index Primary index: The data file is ordered by a key field and the index is build on the ordering key field. Secondary index: Index on a non-ordering field of the data file. Clustering index: The data file is ordered by a non-key field, and the index is build on the non-key field. Dense Index: A dense index has an index record for every record in the file. –Record pointer Sparse index: A sparse index has an index record for every distinct value of the indexing field rather than for every record in the file. –Block pointer
23
Ordering fieldNonOrdering field Key field Nonkey field Primary index Secondary index(key) Clustering index Secondary index (nonkey)
24
Primary Clustering Seconday (key) Seconday (nonkey) Number of index entries Dense/Sparse # of blocks in data file # of distinct index field values # of records in data file # of records or distinct index field values Sparse Dense Dense or Sparse
25
Index on Ordering Key Field S10, … S05, … S07, … S20, … S12, … S15, … S30, … S25, … S27, … S05 S12 S25 Block ptr SID
26
Index on NonOrdering Key Field S12, … S25, … S47, … S20, … S22, … S05, … S30, … S33, … S27, … S05 S12 S20 Record ptr S22 SID
27
Index on NonOrdering NonKey Field S12, … S25, … S47, … S20, … S22, … S05, … S30, … S33, … S27, … ACCT CIS Record ptr CIS SIDMajor CIS FIN ACCT CIS FIN MKT CIS FIN Major CIS FIN
28
Physical pointer vs Logical Pointer When index on the key field is available, index on nonkey field can use record keys as logical pointers. S12, … S25, … S47, … S20, … S22, … S05, … S30, … S33, … S27, … ACCT CIS SID CIS SIDMajor CIS FIN ACCT CIS FIN MKT CIS FIN CIS FIN Major S12 S22 S25 S05 S27 S47
29
Searching with Index A file with 30,000 records, each record has 100 bytes, block size is 1024 bytes:. Data file blocking factor = floor(1024/100)=10. Data file blocks = ceiling(30,000/10)=3000 blocks If key field has 9 bytes, and physical pointer has 6 bytes, so each index entry has 15 bytes:. Index file blocking factor = floor(1024/15) = 68. Index file blocks = ceiling(30,000/68) = 442 blocks Time to search for a record with the index is:. Binary search the index = Log 2 442. One data file access. Time = (s + rd + tr) * (1 + Log 2 442 )
30
Tree Nodes: –Regular nodes (internal nodes): nodes with parent and children –Root node: node with no parent –Leaf nodes: nodes with no children Level: length of the path from the root to a node. –Root: level 0 Balanced tree: All leaf nodes are at the same level.
31
B -Trees If a node can store n pointers (n-1 keys), then each node except root and leaf nodes has at least ceiling(n/2) pointers. Each key in the tree represents (key + RecordPointer) All leaf nodes are at the same level. When a node split, it splits into two nodes at the same level, and the middle key is moved up to its parent node.
32
B-Tree Examples A B-Tree with 3 pointers (2 keys) in a node, insert keys: 8, 5, 1,7, 3, 12, 9, 6, 4 A B-Tree with 4 pointers (3 keys) in a node, insert keys: 23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 100, 95
33
B+ Trees Record pointers are stored only at the leaf nodes. –More keys in a node, shorter path Every key must exist at the leaf nodes. Every leaf node contains pointer to the next leaf node. Node Split: –Leaf node split: keep the middle key in the left node and duplicate it in the parent node. –Internal node split: move up the middle key as B-Tree.
34
B+ Tree Examples A B+ Tree with 3 pointers (2 keys) in a node, insert keys: 8, 5, 1, 7, 3, 12, 9, 6 A B+ Tree with 4 pointers (3 keys) in a node, insert keys: 23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 100, 95
35
B+ Tree Advantages Shorter tree: Because internal nodes do not include record pointers, internal nodes can have more keys. All keys in the leaf nodes are already in sorted order. B+ Tree can be used to store data file.
36
Too many indexes will slow down update operations.
37
Redundant Arrays of Inexpensive (Independent) Disks RAID is a method to group more than one drive and make them appear as a single drive.
38
Disk 0Disk 1Disk 2Disk 3 1A2A3A4A 1B2B3B4B 1C2C3C4C RAID 0 No redundancy Best write performance disk can be accessed in parallel Unreliable Creating a stripe set without parity: Spreads the data out over various disks
39
RAID 1 Mirror set –Primary disk and mirror disk –2 writes –Data can be accessed from either disk. –Fault tolerance
40
RAID 5 Creating a stripe set with parity Disk 0Disk 1Disk 2Disk 3 ParityA1A2A3A 1BParity B2B3B 1C 1D 2CParity C3C 2C3DParity D
41
Creating Parity with XOR Disk 0Disk 1Disk 2Disk 3 ParityA1A2A3A 1A=1010, 2A=0100, 3A=1100 ParityA=(1A XOR 2A) XOR 3A = 0010 If Disk 0 fails: Recover by using =(1A XOR 2A) XOR 3A If Disk 1 fails: Recover by using =(ParityA XOR 2A) XOR 3A
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.