Presentation is loading. Please wait.

Presentation is loading. Please wait.

Storing Data: Disks and Buffers

Similar presentations


Presentation on theme: "Storing Data: Disks and Buffers"— Presentation transcript:

1 Storing Data: Disks and Buffers
R & G 9.1, 9.3, 9.4 Talk starts with wrap up of previous lecture Acknowledgement: Slides are adopted from the Berkeley course CS186 by Joey Gonzalez and Joe Hellerstein

2 Files and Access Methods
Block diagram of a DBMS SQL Client Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB Concurrency Control and Recovery

3 Database Management System
Last few lectures: SQL SQL Client Database Management System Database First few lectures: out-of-core sort Next few weeks: How is a SQL query Executed?

4 Database Management System
Architecture of a DBMS SQL Client Database Management System Parse, check, and verify the SQL expression SELECT s.sid, s.sname, r.bid FROM Sailors s, Reserves r WHERE s.sid = r.sid AND s.age > 30 Query Parsing & Optimization Heap Scan Reserves Indexed Scan Sailors Indexed Join GroupBy (Age) and translate into an efficient relational query plan Database

5 Database Management System
Architecture of a DBMS SQL Client Database Management System Execute the dataflow by operating on records and files Query Parsing & Optimization Relational Operators Heap Scan Reserves Indexed Scan Sailors Indexed Join GroupBy (Age) Database

6 Database Management System
Architecture of a DBMS SQL Client Database Management System Organizing tables and records as groups of pages in a logical file. Query Parsing & Optimization Relational Operators Bob M 32 94703 Harmon Name Sex Age Zip Addr Alice F 33 Mabel Jose 31 94110 Chavez Jane 30 Files and Index Management Page Header Database

7 Database Management System
Architecture of a DBMS SQL Client Database Management System Query Parsing & Optimization Illusion of operating in memory RAM Relational Operators Frame Frame Frame Page 1 Page 4 Page 2 Files and Index Management Buffer Management Disk Space Management Database

8 Query Parsing & Optimization
Architecture of a DBMS SQL Client Query Parsing & Optimization Translates page requests into physical bytes on one or more device(s) Relational Operators Disk Space Mngmt. Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Files and Index Management Buffer Management Disk Space Management Database

9 Query Parsing & Optimization
Architecture of a DBMS SQL Client Organized in layers Each layer abstracts the layer below Manage complexity Perf. Assumptions Example of good systems design Database Disk Space Management Buffer Management Files and Index Management Relational Operators Query Parsing & Optimization

10 Disk Space & Buffer Management
Frame Page 4 Page 2 Page 1 Read Page Write Page Database operates on in memory pages. Disk Space Mngmt. Page 1 Page 2 Page 3 Page 4 Page 5 Page 6

11 Overview Table Record File Byte Rep. Record Slotted Page Bob M 32
94703 Harmon Name Sex Age Zip Addr Alice F 33 Mabel Jose 31 94110 Chavez Jane 30 Table Record Bob M 32 94703 Harmon Varchar Char Int File Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page 8 Byte Rep. Record Header M 32 94703 Bob Harmon Slotted Page Page Header

12 Overview: Files of Pages of Records
Tables stored as a logical files consisting of pages each containing a collection of records Pages are managed in memory by the buffer manager: higher levels of database only operate in memory on disk by the disk space manager: reads and writes pages to physical disk/files Big Ideas in this Lecture Block/Pages granularity reasoning Exploit access patterns in memory management Efficient binary representation of data

13 Query Parsing & Optimization
Architecture of a DBMS SQL Client Query Parsing & Optimization Translates page requests into physical bytes on on or more device(s) Relational Operators Disk Space Mngmt. Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Files and Index Management Buffer Management Disk Space Management Database

14 Disk Space Management Buffer Management Disk Space Mngmt.
Frame Frame Frame Page 1 Page 4 Disk Space Mngmt. Frame Frame Frame Page 2 Page 1 Page 2 Database operates on in memory pages. Page 3 Page 4 Read Page Write Page Page 5 Page 6 Page 7 Page 8

15 Recall: Disks and Files
DBMS stores information on Disks and SSDs. Disks are a mechanical anachronism (slow!) 10ms Random Read Latency Spinning Platters Arm Movement Arm Assembly

16 Recall: Arranging Pages on Disk
“Next” page concept: pages on same track, followed by pages on same cylinder, followed by pages on adjacent cylinder Arrange file pages sequentially on disk minimize seek and rotational delay. For a sequential scan, pre-fetch several pages at a time! Read large consecutive blocks

17 Recall: SSDs DBMS stores information on Disks and SSDs.
Disks are a mechanical anachronism (slow!) SSDs faster, slow relative to memory, costly writes Issues in current generation (NAND) 4-8K reads, 1-2MB writes So... read is fast and predictable Single read access time: 0.03 ms 4KB random reads: ~500MB/sec Sequential reads: ~525MB/sec But.. write is not! Slower for random Single write access time: 0.03ms 4KB random writes: ~120MB/sec Sequential writes: ~480MB/sec Write amplification: big units, need to reorg for garbage collection & wear

18 Disks and Files DBMS operate at Block Level
DBMS stores information on Disks and SSDs. Disks are a mechanical anachronism (slow!) SSDs faster, slow relative to memory, costly writes DBMS operate at Block Level Read and Write large chunks seq. bytes Leverage cache hierarchy and HW pre-fetch Amortize seek delays on HDDs and Writes on SSD Sequentially: Next disk block is fastest Maximize usage of data per R/W Organize data for fast in memory processing (i.e., mapping)

19 A note on (confusing) terminology
Block = Unit of transfer for disk read/write 64KB - 128KB is a good number today Book says 4KB Page = Fixed size contiguous chunk of memory Assume same size as block Refer to corresponding blocks on disk For simplicity we use Block and Page interchangeably.

20 Disk Space Management Lowest layer of DBMS, manages space on disk
Mapping pages to locations on disk Loading pages from disk to memory Saving pages back to disk & ensuring writes Higher levels call upon this layer to: read/write a pages allocate/de-allocate logical pages Request for a sequence of pages best satisfied by pages stored sequentially on disk Physical details hidden from higher levels of system Higher levels may assume Next Page is fast! Lowest layer could operate directly on the /dev/”files” these are not files but commands directly to device. You wouldn’t bypass a filesystem all together “roll your own” In the old days this might have been a good idea. Then you would be writing your own device driver for each new device that you want to run your database on “eek”. Now people typically run databases on large files sitting on the filesystem. So you would allocate a large file to store the database. Side note: If this was an SSD why would that be a bad idea?

21 Disk Space Management Implementation
Proposal 1: Talk to the device directly Could be very fast if you knew the device well What happens when devices change? Proposal 2: Run over filesystem (FS) Allocate single large “contiguous” file and assume sequential / nearby byte access are fast Most FS optimize for sequential access and temporal locality (buffer cache on hot items) Sometimes disable FS buffering May span multiple files on multiple disks / machines Lowest layer could operate directly on the /dev/”files” these are not files but commands directly to device. You wouldn’t bypass a filesystem all together “roll your own” In the old days this might have been a good idea. Then you would be writing your own device driver for each new device that you want to run your database on “eek”. Now people typically run databases on large files sitting on the filesystem. So you would allocate a large file to store the database. Side note: If this was an SSD why would that be a bad idea?

22 Typically sits on top of local file system
Get Page 4 Get Page 5 Disk Space Management Big File 1 Big File 2 Big File 3 Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page 8 Page 9 File System File System File System

23 Disk Space Management Provide API to read and write pages to device
Pages: block level organization of bytes on disk Ensures next locality and abstracts FS/Device details Disk Space Mngmt. Page 1 Page 2 Page 3 Page 4 Page 5 Page 6

24 Query Parsing & Optimization
Architecture of a DBMS SQL Client Database Disk Space Management Buffer Management Files and Index Management Relational Operators Query Parsing & Optimization Illusion of operating in memory RAM Frame Frame Frame Page 1 Page 4 Page 2 Disk Space Management

25 Mapping Pages into Memory
Disk Space Mngmt. Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Main Memory (Buffer Pool) Frame Frame Frame Page 1 Page 4 Database operates on in memory pages. Load Page Write Page

26 Mapping Pages into Memory
Disk Space Mngmt. Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Main Memory (Buffer Pool) Data must be in RAM for DBMS to operate on it Buffer Mgr. hides the fact that not all data is in RAM Frame Frame Frame Page 1 Page 4 Database operates on in memory pages. Load Page Write Page

27 Mapping Pages into Memory
Disk Space Mngmt. Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Main Memory (Buffer Pool) Frame Frame Frame Page 2 Load Page Page 3 Write Page Page 5

28 Mapping Pages into Memory
Disk Space Mngmt. Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Main Memory (Buffer Pool) Frame Frame Frame Page 5 Page 2 Page 2 Page 3 Load Page Write Page

29 Mapping Pages into Memory
Disk Space Mngmt. Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Main Memory (Buffer Pool) Frame Frame Frame Page 5 ? Page 2 ? Page 3 ? Load Page Request Page 1 Write Page

30 Challenges for Buffer Manager
What happens if a page is modified? Writing “dirty” pages back to disk No free frames  evict which page? Replacement Policy What if page is being used? Pinning Concurrent page operations? Solve this in lock manager …

31 When a Page is Requested ...
Buffer pool information “table” contains: <frame#, pageid, pin_count, dirty> If requested page is not in pool: Choose a frame for replacement. Only “un-pinned” pages are candidates! If frame “dirty”, write current page to disk Read requested page into frame Pin the page and return its address. If requests can be predicted (e.g., sequential scans) pages can be pre-fetched several pages at a time!

32 Mapping Pages into Memory
Files and Index Management Disk Space Mngmt. Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Noticed pages changed from earlier example Buffer Pool Pointer * Pinned Pinned Frame Frame Frame Page 5 Page 1 Page 6 Load Page Request Page 6 Write Page

33 After Requestor Finishes
Requestor of page must: indicate whether page was modified via dirty bit. unpin it (soon preferably!) why? Page in pool may be requested many times, a pin count is used. To pin a page: pin_count++ A page is a candidate for replacement iff pin count == 0 (“unpinned”) CC & recovery may do additional I/Os upon replacement. Write-Ahead Log protocol; more later!

34 Mapping Pages into Memory
Disk Space Mngmt. Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Noticed pages changed from earlier example Buffer Pool 1 2 1 Frame Frame Frame Page 5 Page 2 Page 3 Load Page Finished Pointer * Page 2 Write Page

35 Page Replacement Policy
Page is chosen for replacement by a replacement policy: Least-recently-used (LRU), Clock Most-recently-used (MRU) Policy can have big impact on #I/O’s; Depends on the access pattern. Once a hot area of research but LRU/Clock has become pretty popular

36 LRU Replacement Policy
Least Recently Used (LRU) Pinned Frame: not available to replace track time each frame last unpinned (end of use) replace the frame which least recently unpinned Very common policy: intuitive and simple Works well for repeated accesses to popular pages (temporal locality) Can be costly. Why? Need to maintain heap data-structure Solution?

37 Clock Replacement Policy
Empty Frame Approx. to LRU Arrange frames in logical cycle Introduce “Ref. Bit” Clock arm advances until request satisfied

38 Clock Replacement Policy
Pinned Empty Frame A Want to insert page: Current frame has pin-count > 0  Skip Pinned G F B E C D

39 Clock Replacement Policy
Pinned Empty Frame A Want to insert page: Not Pinned and Ref. bit is set  Clear Ref. Bit Skip Pinned G F B E C D

40 Clock Replacement Policy
Pinned Empty Frame A Want to insert page: Not pinned and Ref. bit unset: Evict Copy Page Set pinned Set ref. bit Advanced clock Pinned G F B E C Pinned D

41 Clock Replacement Policy
Pinned Empty Frame A Request for Page C: Cache Hit!: Pin (inc.) Set ref bit. Pinned F B Pinned E C Pinned G

42 Is LRU/Clock Always Best?
Very common policy: intuitive and simple Works well for repeated accesses to popular pages temporal locality LRU can be costly  Clock policy is cheap When might it perform poorly What about repeated scans of big files?

43 Repeated Scan of Big File (LRU)
Kind of big File Empty Frame Buffer A B C D Scan Cache Hits: Attempts:

44 Repeated Scan of Big File (LRU)
Empty Frame Buffer A B C D Scan A Scan Cache Hits: Attempts: 1

45 Repeated Scan of Big File (LRU)
Empty Frame Buffer A B C D A B Scan Scan Cache Hits: Attempts: 2

46 Repeated Scan of Big File (LRU)
Empty Frame Buffer A B C D A B C Scan Scan Cache Hits: Attempts: 3

47 Repeated Scan of Big File (LRU)
Empty Frame Buffer A B C D A B C LRU Scan Cache Hits: Attempts: 3 4 D Scan

48 Repeated Scan of Big File (LRU)
Empty Frame Buffer A B C D A Scan D B C LRU Scan Cache Hits: Attempts: 5 4

49 Repeated Scan of Big File (LRU)
Empty Frame Buffer A B C D D A C B Scan LRU Scan Cache Hits: Attempts: 5 6

50 Repeated Scan of Big File (LRU)
Empty Frame Buffer A B C D Sequential Flooding D A B Scan LRU Scan No Cache Hits! Cache Hits: Attempts: 6

51 Repeated Scan of Big File (MRU)
Most Recently Used File Empty Frame Buffer A B C D Scan Scan Cache Hits: Attempts:

52 Repeated Scan of Big File (MRU)
Empty Frame Buffer A B C D Scan A Scan Cache Hits: Attempts: 1

53 Repeated Scan of Big File (MRU)
Empty Frame Buffer A B C D A B Scan Scan Cache Hits: Attempts: 2

54 Repeated Scan of Big File (MRU)
Empty Frame Buffer A B C D A B C Scan Scan Cache Hits: Attempts: 3

55 Repeated Scan of Big File (MRU)
Empty Frame Buffer A B C D A B C MRU Scan Cache Hits: Attempts: 4 3 D Scan

56 Repeated Scan of Big File (MRU)
Empty Frame Buffer A B C D Hit! Scan A B D MRU Scan 1 Cache Hits: Attempts: 5

57 Repeated Scan of Big File (MRU)
Empty Frame Buffer A B C D Hit! A B D Scan MRU Scan 2 Cache Hits: Attempts: 6

58 Repeated Scan of Big File (MRU)
Empty Frame Buffer A B C D A B D MRU Scan C 2 Scan Cache Hits: Attempts: 6 7

59 Repeated Scan of Big File (MRU)
Empty Frame Buffer A B C D Hit! A C D MRU Scan 3 Improved Cache Hit Rate! Cache Hits: Attempts: 8 Scan

60 Repeated Scan of Big File (MRU)
Empty Frame Buffer A B C D Hit! What else might we do for sequential scans? A B D MRU Scan 3 Improved Cache Hit Rate! Cache Hits: Attempts: 8 Scan

61 Background Prefetching
File Empty Frame Buffer A B C D Prefetched Prefetched Scan A B C Scan Load (prefetch) “next” pages in a background thread Why does this help? Disk Scheduling & Parallel IO Interleave IO and compute

62 DBMS vs. OS Buffer Cache OS has page/buffer management. Why not let OS manage these tasks? Buffer management in DBMS requires ability to: pin page in buffer pool, force page to disk, order writes important for implementing CC & recovery adjust replacement policy, and pre-fetch pages based on access patterns in typical DB operations.

63 Query Parsing & Optimization
Architecture of a DBMS SQL Client Database Disk Space Management Buffer Management Files and Index Management Relational Operators Query Parsing & Optimization Illusion of operating in memory RAM Frame Frame Frame Page 1 Page 4 Page 2 Disk Space Management


Download ppt "Storing Data: Disks and Buffers"

Similar presentations


Ads by Google