1 CSE232A: Database System Principles Hardware. Data + Indexes Database System Architecture Query ProcessingTransaction Management SQL query Parser Query.

Slides:



Advertisements
Similar presentations
CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.
Advertisements

- Dr. Kalpakis CMSC Dr. Kalpakis 1 Outline In implementing DBMS we need to answer How should the system store and manage very large amounts of data?
Storing Data: Disks and Files: Chapter 9
CS 277 – Spring 2002Notes 21 CS 277: Database System Implementation Notes 02: Hardware Arthur Keller.
CS 245Notes 21 CS 245: Database System Principles Notes 02: Hardware Hector Garcia-Molina.
Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics.
The Memory Hierarchy fastest, perhaps 1Mb
Storage. The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 2Gb Typically magnetic disks, magneto­ optical (erasable),
CS4432: Database Systems II Data Storage - Lecture 2 (Sections 13.1 – 13.3) Elke A. Rundensteiner.
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
IELM 230: File Storage and Indexes Agenda: - Physical storage of data in Relational DB’s - Indexes and other means to speed Data access - Defining indexes.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #6.
1 Storing Data: Disks and Files Yanlei Diao UMass Amherst Feb 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
CS 4432lecture #31 CS4432: Database Systems II Lecture #3 Professor Elke A. Rundensteiner.
1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.
1 CS143: Disks and Files. 2 System Architecture CPU Main Memory Disk Controller... Disk Word (1B – 64B) ~ x GB/sec Block (512B – 50KB) ~ x MB/sec System.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
CS4432: Database Systems II Lecture 2 Timothy Sutherland.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #5.
Disks.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
CS 4432lecture #31 CS4432: Database Systems II Lecture #3 Using the Disk, and Disk Optimizations Professor Elke A. Rundensteiner.
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 5 – Storage Organization.
CS4432: Database Systems II Data Storage (Better Block Organization) 1.
Chapter 2 Data Storage How does a computer system store and manage very large volumes of data ?
Lecture 11: DMBS Internals
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How are data stored? –physical level –logical level.
Chapter 111 Chapter 11: Hardware (Slides by Hector Garcia-Molina,
Chapter 2. Data Storage Chapter 2.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
File Processing : Storage Media 2015, Spring Pusan National University Ki-Joune Li.
1 Data Storage (Chap. 11) Based on Hector Garcia-Molina’s slides.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
11.1Database System Concepts. 11.2Database System Concepts Now Something Different 1st part of the course: Application Oriented 2nd part of the course:
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?
Database Systems Disk Management Concepts. WHY DO DISKS NEED MANAGING? logical information  physical representation bigger databases, larger records,
CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
CS 101 – Sept. 28 Main vs. secondary memory Examples of secondary storage –Disk (direct access) Various types Disk geometry –Flash memory (random access)
Section 13.2 – Secondary storage management (Former Student’s Note)
DBMS 2001Notes 2: Hardware1 Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
CPS216: Advanced Database Systems Notes 03: Data Access from Disks Shivnath Babu.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
CS 245Notes 21 Database System Principles Notes 02: Hardware.
1 Query Processing Exercise Session 1. 2 The system (OS or DBMS) manages the buffer Disk B1B2B3 Bn … … Program’s private memory An application program.
Data Storage and Querying in Various Storage Devices.
File organization Secondary Storage Devices Lec#7 Presenter: Dr Emad Nabil.
TYPES OF MEMORY.
Storage and Disks.
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
CS 554: Advanced Database System Notes 02: Hardware
CPSC-608 Database Systems
File Processing : Storage Media
Lecture 11: DMBS Internals
Lecture 9: Data Storage and IO Models
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
File Processing : Storage Media
CPS216: Advanced Database Systems Notes 04: Data Access from Disks
CS 245: Database System Principles Notes 02: Hardware
Presentation transcript:

1 CSE232A: Database System Principles Hardware

Data + Indexes Database System Architecture Query ProcessingTransaction Management SQL query Parser Query Rewriter and Optimizer Execution Engine relational algebra View definitions Statistics & Catalogs & System Data query execution plan Buffer Manager Transaction Manager Calls from Transactions (read,write) Concurrency Controller Lock Table Recovery Manager Log Hardware aspects of storing and retrieving data

3 Memory Hierarchy Cache memory –On-chip and L2 –Caching outside control of DB system RAM –Addressable space includes virtual memory but DB systems avoid it Disk –Access speed & Transfer rate –Winchester, arrays,… Tertiary storage –Tapes, jukeboxes, DVDs Cost per byte Capacity Access Speed

4 Storage Cost access time (sec) cache electronic main electronic secondary magnetic optical disks online tape nearline tape & optical disks offline tape typical capacity (bytes)

5 Storage Cost access time (sec) cache electronic main electronic secondary magnetic optical disks online tape nearline tape & optical disks offline tape dollars/MB from Gray & Reuter

6 Volatile Vs Non-Volatile Storage Persistence important for transaction atomicity and durability Even if database fits in main memory changes have to be written in non- volatile storage Hard disk RAM disks w/ battery Flash memory

7 Cost of Disk Access: Non-trivial part of estimating performance on secondary storage How many blocks were accessed ? Clustered/consecutive ? Such complexities also apply to flash, even main memory Learn to analyze them when you make the next generation of secondary storage data structures

8 Moore’s Law: Different Rates of Improvement Lead to Reconsiderations Processor speed Main memory bit/$ Disk bit/$ RAM access speed Disk access speed Disk transfer rate Disk Transfer Rate Disk Access Time Clustered/sequential access-based algorithms become relatively better

9 Moore’s Law: Same Phenomenon Applies to RAM RAM Transfer Rate RAM Access Time Algorithms that access memory sequentially have better constant factors than algorithms that access randomly

10 Moore’s Law: Different Rates of Improvement Cache Capacity RAM Capacity Disk Access Time Cost of “miss” increases

11 Focus on: “Typical Disk” Terms: Platter, Head, Actuator Cylinder, Track Sector (physical), Block (logical), Gap … Disk Controller BUS

12 Top View Track Gap Sector Block (typically multiple sectors) Often different numbers of sectors per track

13 “Typical” Numbers Diameter: 1 inch  15 inches Cylinders:100  Surfaces:1 (CDs)  (Tracks/cyl) 2 (floppies)   5 (typical hd)  30 Sector Size:512B  50K Capacity:360 KB (old floppy)  200 GB

14 block x in memory ? I want block X Key performance metric: Time to fetch block Time = Seek Time (locate track) + Rotational Delay (locate sector)+ Transfer Time (fetch block) + Other (disk controller, …)

15 Seek Delay Track Where Head is Track Where Head must go

16 Rotational Delay Head Here Block I Want

17 Seek Time 3 or 5x x 1N Cylinders Traveled Time Few ms

18 Average Random Seek Time   SEEKTIME (i  j) S = N(N-1) N N i=1 j=1 j  i “Typical” S: 10 ms  40 ms

19 Average Rotational Delay R = 1/2 revolution “typical” R = 8.33 ms (7200 RPM) Assume we have to start reading from start of first sector

20 Transfer Rate: t “typical” t: 1  3 MB/second transfer time: block size t

21 Other Delays CPU time to issue I/O Contention for controller Contention for bus, memory “Typical” Value: 0

22 Homework Practice Problem Single surface Rotation speed 7200rpm 16,384 tracks 128 sectors/track 4096 bytes/sector 4 sectors/block (16,384 bytes/block) SEEKTIME (i  j) = [ (j-i)] μs Neglect gaps Calculate minimum, maximum, average time to fetch one block

23 Practice Problem: Minimum Time Head is at the start of the first sector of the block Just compute transfer time 4 sectors cover 4/128 of a track 1 full rotation takes 60/7200=8.33ms Transfer time is 8.33 * 4 /128 = 0.26ms

24 Practice Problem: Maximum Time Assume read must start at the first sector Head is at innermost, required track is the outermost Seek time = … Head just missed the beginning Rotational delay = … Transfer time = …

25 Practice problem: Average time Solve…

26 So far: Random Block Access What about: Reading “Next” block? Time to get = Block Size + Negligible block t - skip gap - switch track - once in a while, next cylinder

27 Rule ofRandom I/O: Expensive Thumb Sequential I/O: Much less Ex:1 KB Block »Random I/O:  20 ms. »Sequential I/O:  1 ms.

28 Practice Problem cont’d: Sustained Bandwidth over Track Assume required blocks are consecutive on single track What is the approximate sustained bandwidth of fetching consecutive blocks? 128 sectors/track * 4KB/sector in 8.33ms/track full rotation = 512KB/8.33ms = 61.46KB/ms

29 Suggested optimization Cluster data in consecutive blocks Give an extra point to algorithms that –exploit data clustering by avoiding “random” accesses –Read/write consecutive blocks

30 An Algorithm with Little Random Access: 2-Phase Merge Sort P KA D L E Z W J C R H Main Memory: 4 blocks SORT A DK P A DE K READ WRITE MERGE Y F X I P KA D L E Z W L DK P PW Z A DK P A DE K L DK P PW Z C F H I J R X Y A DK P A CD F … Improve by bringing max number of blocks in memory in Phase 2

31 Cost for Writing similar to Reading …. unless we want to verify! need to add (full) rotation + Block size t

32 To Modify a Block? To Modify Block: (a) Read Block (b) Modify in Memory (c) Write Block [(d) Verify?]

33 Block Address: Physical Device Cylinder # Surface # Sector Once upon a time DBs had access to such – now it is the OS’s domain

34 Optimizations (in controller or O.S.) Disk Scheduling Algorithms –e.g., elevator algorithm Pre-fetch Arrays

35 Double Buffering Problem: Have a File » Sequence of Blocks B1, B2 Have a Program » Process B1 » Process B2 » Process B3...

36 Single Buffer Solution (1) Read B1  Buffer (2) Process Data in Buffer (3) Read B2  Buffer (4) Process Data in Buffer...

37 SayP = time to process/block R = time to read in 1 block n = # blocks Single buffer time = n(P+R)

38 Double Buffering Memory: Disk: ABCDGEFA B done process A C B done

39 Say P  R What is processing time? P = Processing time/block R = IO time/block n = # blocks Double buffering time = R + nP Single buffering time = n(R+P) Improvement much more dramatic if consequtive blocks: …

40 Block Size Selection? Big Block  Amortize I/O Cost Big Block  Read in more useless stuff! and takes longer to read Unfortunately...

41 Trend memory prices drop and memory capacities increase, transfer rates increase Disk access times do not increase that much  blocks get bigger...

42 Summary Secondary storage, mainly disks I/O times I/Os should be avoided, especially random ones….. Summary