1 Query Processing Part 1: Managing Disks. 2 Main Topics on Query Processing Running-time analysis Indexes (e.g., search trees, hashing) Efficient algorithms.

1 Query Processing Part 1: Managing Disks

2 Main Topics on Query Processing Running-time analysis Indexes (e.g., search trees, hashing) Efficient algorithms for the relational operators Optimizing the evaluation of a whole query The characteristics of disks are different from those of main memory

3 Disks (HDDs) HDD – Hard Disk Drive Not to be confused with the newer SSD (Solid-State Drive)

4 P MC Typical Computer Disks (and other secondary storage devices)... The processor (CPU), main memory (RAM) and controllers are connected by a bus

5 Processor Speed: 100  500  1000 MIPS (MIPS = Million Instructions per Second) Memory Access time: 10 -6  10 -9 sec. 1  s  1 ns

6 “Typical Disk” Terms: Platter, Head, Actuator Cylinder, Track Sector (physical), Block (logical), Gap … Head Platter

7 Top (& Bottom) View of a Disk Platter Tracks are concentric circles, divided into sectors All sectors have the same number of bytes (typically 512) Gaps between sectors and between tracks

8 More Details Both surfaces of each platter are used –There is a head for each surface All tracks with the same radius form a cylinder –The heads move together and are always over the same cylinder A block consists of N contiguous sectors –N is determined when the OS formats the disk –The DBMS may choose a different value for N

9 Memory is fast and –time to read (or write) a byte is fixed –can read (or write) just what is needed Disk is slow and –must read (or write) at least one block –time to read (or write) a block varies Memory is volatile whereas a disk keeps the data even without electricity Memory vs. Disk not exactly true …

10 Disk Access Time block x in memory ? user needs block X

11 Seek Time – the time it takes to move the heads to the cylinder where the block is Rotational Delay – the time it takes until the beginning of the block arrives under the head Transfer Time – the time it takes to actually read the block Time = Seek Time + Rotational Delay + Transfer Time + Other …

12 Seek Time 3 or 5x x 1N Cylinders Traveled Time …

13 Average Seek Time Can be measured empirically Alternatively, it can be proved that the average distance from one random cylinder to another is 1/3 of the maximal distance (i.e., from innermost to outermost track) –Hence, the average seek time is about 1/3 of the maximum Typical average seek time is about 10 msec –For the fastest disks it is about 3 msec

14 Rotational Delay (Latency) Head Here Needed Block It is 4.17 msec (7200 rpm) –Only 2 msec for the fastest disks (15,000rpm) The average latency is ½ of the time of one revolution

15 Transfer Time The transfer time can be computed from the sustained transfer rate, which is measured in MB/sec A transfer time of 0.1 msec for a 4KB block amounts to a rate of 40 MB/sec –This is a conservative estimate with respect to recent models of disks

16 Other Delays CPU time to issue I/O Contention for controller Contention for bus, memory We ignore these delays

17 The time to read a block of 4KB is avgSeek + avgLatency + transferTime = = 10 + 4.17 + 0.1 = 14.27 msec If we read 11 sequential blocks (on the same tack), then –Seek & latency are needed just for the first block –So, the time is 10 + 4.17 + 1.1 = 15.27 msec Time to Read 14.27 15.27

18 Summary Random I/O is expensive –Average per 4KB block is ~15 msec Sequential I/O much less –Average per 4KB block is ~1.5 msec (when reading 11 sequential blocks) However, even sequential I/O is slower than memory by at least a factor of 100

19 Writing and Updating Cost of writing is similar to reading Unless we want to verify –If so, add 1 revolution + transfer time To update a block, we must read it into memory, modify it, and then write it back to the disk

20 Typical DB Application The CPU can execute tens-of-thousands (if not millions) of instructions while the controller reads or writes a single block while blocks to read do read next block from disk process the block write some result to disk end

21 Running-Time Analysis: I/O Cost We only count the number of blocks that are read from or written to the disk The CPU time is negligible in comparison Furthermore, the controller can read to and write from the disk while the CPU is processing other blocks –So, the CPU time that can actually influence an exact analysis is even more negligible The goal is to minimize the number of blocks that we read and write

22 We Count Blocks, But What is the cost (in time) of each block? –Cannot tell whether a block was read randomly or sequentially (with other blocks) We should organize data on disks and write programs so that the I/O will be sequential as much as possible –The DBMS helps a lot in this task! –It is also capable of minimizing the number of accessed blocks when processing queries –And it tries to keep the controller busy while the CPU processes blocks that are already in memory

23 Best-Case Analysis Read B 1 blocks from the disk Compute the result and write it back to the disk –Suppose that the size of the result is B 2 What is the best possible I/O cost? What is needed to achieve the best I/O cost?

24 Summary The running time of an algorithm is the I/O cost We measure the I/O cost in terms of the number of blocks that are read or written –A block that is read and then written is counted as 2

25 Arranging Data on Disks

26 The Goal Arrange data on disks so that –Queries and updates can be performed by reading and writing as few blocks as possible, and –Blocks would usually be read sequentially Optimal arrangement depends on the typical queries and updates that are going to be executed Harder to achieve

27 Addresses of Records on Disks

28 Addresses for Records on Disks We need the ability to refer to a particular record In fact, some records have pointers to other records or to blocks –Pointers are inherent to object-relational database systems –Even in purely relational systems, pointers are needed in indexes The DBMS stores indexes – not just relations! Rx

29 How does one refer to records? Several Types of Addresses Rx Many options: Physical Indirect

30 Purely Physical Device ID Cylinder # =Track # Block # Offset in Block Block ID Record Address

31 Fully Indirect (Record IDs) Record ID is a bit string (assigned by the system) that can be translated to a physical address by means of a table map Rec ID for RAddress A Physical addr. Rec ID

32 Tradeoff Flexibility Cost to move records of indirection (for deletions, insertions) Physical addresses limit the ability to move records or use their space when deleting them – why? Logical addresses have the cost of indirection

33 Physical Indirect Many options in between … Half & Half Approach One option: physical address of the block + logical address inside the block

34 R7 R5 R8R6 A Block: Free Space Header: Fixed Part + Array A The address of R6 is the pair (P, 2), where P is the physical address of the block Given (P, 2), we go to the block having the address P and then follow the pointer in A[2] Illustration:

35 More Details on Half & Half One field of the fixed part (of the header) contains the size of the array A The header is at the beginning of the block Any record R can be moved freely inside the block –Only need to change the pointer to R in A All records are packed at the end of the block Available free space is between the header and the records –Why do we want the free space to be contiguous?

36 Insertions Insert a new record R at the end of the free space, and add to the array A a pointer to R The address of R is determined when space is allocated to R R7 R5 R8R6 A Block: Free Space Header: Fixed Part + Array A

37 Deletions To delete a record R, put a null in the entry of A for R – why do we need to do that? Move records toward the end to fill gaps and update their entries in A R7 R5 R8R6 A Block: Free Space Header: Fixed Part + Array A

38 Updates Can be done in-place, except when: –The record grows in size We may have to move the record or parts of it to another block if there is not enough space –We update a field that is used to keep the file in sorted order We may have to move the record to another block, as dictated by the sorted order This case is really like a deletion followed by an insertion

39 Types of Files

40 Arranging a File on Disk Try to allocate a contiguous portion of the disk to the file In a heap, records are packed into blocks in no particular order In a sorted file (also called sequential file), records are inserted in sorted order according to some field(s) … Blocks for the file It is a good idea to chain the file’s blocks in both directions Why the name “sequential file”?

41 Heap Easy to insert – records can be added either at the end or in any block that has available space –I/O cost of insertion is 2 (not 1!) Suppose there are 100 records for “Levy” –What if we want to read all of them? I/O cost is 1,000,000 blocks (must read all blocks) –How much time will it take if we have the IDs of all those records? In the worst case, each record is in another block, so I/O cost is 100 Assume that the file has 1,000,000 blocks

42 Sorted File Must insert a new record in the location dictated by the order How much time does it take (the file has N blocks)? What if each block has some free space – does it help?

43 Sorted File Must insert a new record in the location dictated by the order How much time does insertion take (the file has N blocks)? –We assume that binary search can be done (what is needed to make it possible?) Need to read logN blocks to find the location –On average we have to read and write half of the file’s blocks to make room for the new record (if existing blocks are full) I/O cost is N, where N is the number of blocks of the file –To avoid this high cost, use overflow blocks

44 We need to insert 350, 490 and 600, but block is full 100 200 300 400 500 Overflow Blocks Use an overflow block 100 200 300 350 400 header 490 500 600 What is the problem with overflow?

45 Interesting Problems How much free space to leave in each block, track, cylinder? How often to reorganize file + overflow? Free space

46 Heap vs. Sorted File A file with 100 records for “Levy” (each has a size of 320 bytes) and 1,000,000 blocks (each is 4K bytes long) We have the IDs of all the records for “Levy” and need to read them If the file is organized as heap, then in the worst case the I/O cost is 100 blocks If the file is sorted on Name, then –The records for “Levy” occupy a minimum of 8 blocks and 9 in the worst case, so the I/O cost is 9 –In the best case, the system will read starting with the first “Levy” (in sequential order) and will use read-ahead buffering, so in this case all 9 blocks will be read sequentially

Comment The previous slide says: –The records for “Levy” occupy a minimum of 8 blocks and 9 in the worst case, so the I/O cost is 9 Does this statement assume that records can span blocks? If so, what are the numbers for the minimum and worst cases if records cannot span blocks? 47 Unless explicitly stated otherwise, we assume that records do not span blocks

48 Variable-Length Records Reasons for variable-length records: –Repeating fields Data about children –Variable format A record of a person with data about medical tests –Fields whose size varies, for example Address of a person BLOB (binary, large object), e.g., video clip Also, long fixed-length records cause a problem if they cannot be spanned across blocks

49 Handling Variable-Length Records Several options for arranging variable- length records in blocks –Read the textbook –Read about how it is done in a specific DBMS you may want to use You need to understand these things to achieve optimal performance

50 Simple Example How to store data about students and the courses they take? –Fixed-length records (S#,C#), or –One variable-length record per student (S#,C#*) Does the system allocate space in each record for the max number of courses? Does the system use truly variable- length records, but with overflow blocks? How efficient is it to search on C#? Could save space (disk & memory) All the courses for a given student can be found very efficiently

51 Addresses of Records on Disks are Different from Addresses in Main Memory So, what does happen when a block of records is read into main memory?

52 Pointer Swizzling Memory Disk block 1 block 2 block 1 Rec B Block 1 was read into memory and record B continues to point to record A on the disk Rec A

53 Now We Also Read Block 2 Memory Disk block 1 block 2 block 1 Rec B When reading block 2 into memory, we need to change (swizzle) the pointer to A in record B Rec A

54 This table is just for the DB addresses that are currently in memory –One entry per record or per block? This table is different from the one that translates logical addresses to physical ones (Slide 31)Slide 31 Memory Addr. DB Addr. A Table Translates DB Addresses to Memory Addresses

55 Several Approaches to Swizzling Automatic swizzling –When reading a block into memory, the pointers in that block are swizzled if they are in the table –Is this enough? Swizzling on demand (lazy approach) No swizzling (i.e., use the table all the time) Address A bit indicating whether this is a DB address or a memory address

56 Unswizzling At some point, a block B is removed from memory –To make room for another block If B was changed (while in memory), then first it has to be written to disk –Need to unswizzle the pointers in the block Must also update the table, and unswizzle pointers in memory that are pointing to B –Need a list of all the pointers in memory that point to B

1 Query Processing Part 1: Managing Disks. 2 Main Topics on Query Processing Running-time analysis Indexes (e.g., search trees, hashing) Efficient algorithms.

Similar presentations

Presentation on theme: "1 Query Processing Part 1: Managing Disks. 2 Main Topics on Query Processing Running-time analysis Indexes (e.g., search trees, hashing) Efficient algorithms."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Query Processing Part 1: Managing Disks. 2 Main Topics on Query Processing Running-time analysis Indexes (e.g., search trees, hashing) Efficient algorithms.

Similar presentations

Presentation on theme: "1 Query Processing Part 1: Managing Disks. 2 Main Topics on Query Processing Running-time analysis Indexes (e.g., search trees, hashing) Efficient algorithms."— Presentation transcript:

Similar presentations

About project

Feedback