CPSC-608 Database Systems

Slides:



Advertisements
Similar presentations
The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 512Mb Typically magnetic disks, magneto­ optical (erasable), CD­ ROM.
Advertisements

CS4432: Database Systems II Data Storage - Lecture 2 (Sections 13.1 – 13.3) Elke A. Rundensteiner.
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #2.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #6.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #10.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #6.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
CS4432: Database Systems II Lecture 2 Timothy Sutherland.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #5.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #13.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #9.
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #12.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #14.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes 1.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #6.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes 1.
Storage. The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 512Mb Access times in milliseconds, great variability. Unit.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
CS4432: Database Systems II Data Storage (Better Block Organization) 1.
Chapter 2 Data Storage How does a computer system store and manage very large volumes of data ?
Lecture 11: DMBS Internals
Chapter 111 Chapter 11: Hardware (Slides by Hector Garcia-Molina,
Indexing.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
DBMS 2001Notes 2: Hardware1 Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
1 CSE232A: Database System Principles Hardware. Data + Indexes Database System Architecture Query ProcessingTransaction Management SQL query Parser Query.
CPS216: Advanced Database Systems Notes 03: Data Access from Disks Shivnath Babu.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
CS 554: Advanced Database System Notes 02: Hardware
CPSC-608 Database Systems
Disks and Files DBMS stores information on (“hard”) disks.
CPSC-608 Database Systems
Lecture 11: DMBS Internals
Lecture 9: Data Storage and IO Models
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
CPSC-310 Database Systems
CSE 451: Operating Systems Winter 2006 Module 13 Secondary Storage
CSE 451: Operating Systems Autumn 2003 Lecture 12 Secondary Storage
Persistence: hard disk drive
CSE 451: Operating Systems Secondary Storage
CSE 451: Operating Systems Winter 2003 Lecture 12 Secondary Storage
CPSC-608 Database Systems
Scholastic Dishonesty
CPSC-608 Database Systems
CPSC-608 Database Systems
CSE 451: Operating Systems Spring 2005 Module 13 Secondary Storage
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CSE 451: Operating Systems Autumn 2004 Secondary Storage
CPSC-608 Database Systems
CSE 451: Operating Systems Winter 2004 Module 13 Secondary Storage
CPSC-608 Database Systems
CPS216: Advanced Database Systems Notes 04: Data Access from Disks
CPSC-608 Database Systems
CS 245: Database System Principles Notes 02: Hardware
Presentation transcript:

CPSC-608 Database Systems Fall 2017 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 Email: chen@cse.tamu.edu Notes #7

Graduate Database DBMS lock table DDL language DDL complier file administrator DDL complier lock table DDL language file manager logging & recovery concurrency control transaction manager database programmer index/file manager buffer manager query execution engine DML complier main memory buffers DML (query) language secondary storage (disks) DBMS Graduate Database

Graduate Database DBMS lock table DDL language DDL complier file administrator DDL complier lock table DDL language file manager logging & recovery concurrency control transaction manager database programmer index/file manager buffer manager query execution engine DML complier main memory buffers DML (query) language secondary storage (disks) DBMS Graduate Database

Computer Memory Hierarchy

A Typical Computer ... CPU main memory Secondary Storage bus Disk controller Secondary Storage disks https://www.youtube.com/watch?v=4iaxOUYalJU

Main Memory fast small capacity (gigabytes) volatile Disks slow large capacity (100’s gigabytes) non-volatile

Typical Disk … Terms: Platter, Head, Cylinder, Track, Sector, Gap

Top View Track Sector Gap

Original image © IBM Corporation Top view of a 36 GB, 10,000 RPM, IBM SCSI server hard disk, with its top cover removed. Note the height of the drive and the 10 stacked platters. (The IBM Ultrastar 36ZX.) Original image © IBM Corporation Video show at https://www.youtube.com/watch?v=9eMWG3fwiEU

A “typical” disk 5 platters (thus 10 surfaces) A surface has 20,000 tracks A track has 500 sectors (million bytes) A sector has several thousand bytes Disk makes 5000 revolutions per minute (so about 10 millisecond per rotation)

Blocks A (logic) block = one or several sectors (typical size 16KB) Block address Physical device Cylinder # Surface # Sector

? Disk Access Time I want block X block X in memory Time = Seek Time + Rotational Delay + Transfer Time + Other

Seek Time 3 or 5x x 1 N Cylinders Traveled Time

Average Random Seek Time   SEEKTIME (i  j) S = N(N-1) i=1 j=1 ji typical seek time: 10 ms  40 ms

Rotational Delay Average Rotational Delay R = 1/2 revolution Head here Block I want Average Rotational Delay R = 1/2 revolution typical rotational delay = 8 ms

Transfer Rate: typical: t = 80 MB/second = 80 KB/millisecond transfer time: block size / t ~ 10/80 < 1 ms

Other Delays Typical value: ≈ 0 CPU time to issue I/O Contention for controller Contention for bus, memory Typical value: ≈ 0

Thus, reading a block of 16K bytes: Time = Seek Time + Rotational Delay + Transfer Time + Other ~ 30 ms + 8 ms + 16/80 ms + 0 ~ 40 ms

slow (read/write: 1~40 millisecond) large capacity (100’s gigabytes) Disks slow (read/write: 1~40 millisecond) large capacity (100’s gigabytes) non-volatile Main Memory fast (read/write: 10-100 nanosecond) small capacity (gigabytes) volatile Disks are about 105~106 times slower than main memory

I/O Model of Computation Dominance of I/O cost: if a block needs to be moved between disk and main memory, then the time taken to perform the read/write is much larger than the time likely to be used to manipulate that data in main memory. The number of disk block reads/writes is a good approximation to the entire computation.

Disk I/O Optimization: Example I

Disk I/O Optimization: Example I Optimizing Disk Seek Time (by disk controller):

Disk I/O Optimization: Example I Optimizing Disk Seek Time (by disk controller): On a (dynamic) sequence of disk I/O requests, how do we order the requests to minimize the seek time?

Disk I/O Optimization: Example I Optimizing Disk Seek Time (by disk controller): On a (dynamic) sequence of disk I/O requests, how do we order the requests to minimize the seek time? (seeking tracks is the most time consuming component of disk I/O. Moving to a nearer track takes less time.)

Disk I/O Optimization: Example I Optimizing Disk Seek Time (by disk controller): On a (dynamic) sequence of disk I/O requests, how do we order the requests to minimize the seek time? (seeking tracks is the most time consuming component of disk I/O. Moving to a nearer track takes less time.) Elevator Algorithm: Let the head move along its current direction, process each encountered request on the way until no request is ahead. Then reverse the direction.

Elevator Algorithm Keep an UpperQ and a LowerQ, and elevator’s current Direction and Position. Repeat 1. If Direction = Up Then If UpperQ   Then x = Min(UpperQ); Position = x; Delete(UpperQ, x); Else Direction = Down; 2. Else If LowerQ   Then x = Max(LowerQ); Position = x; Delete(LowerQ, x); Else Direction = Up.

Disk I/O Optimization: Example II Reducing the number of disk I/Os. Example. Sorting on disk Each tuple (with a key) takes 160 bytes Each block holds 100 tuples (16KB) A relation R has 10M tuples (1.6 GB, 100K blocks) Main memory has 100MB (6400 blocks) A disk read/write: 40 ms

Main memory sorting algorithms disk read/write: 40 ms a tuple: 160 bytes a block: 16KB (100 tuples) a relation R: 1.6 GB (10M tuples, 100K blocks) main memory: 100MB (6400 blocks) Main memory sorting algorithms heap sort: 10M * log2 (10M) = 230M disk block read/write = 9200M ms = 9200000 seconds > 100 day quick sort and merge sort: 2 * 100K (blocks) * log2 (10M) = 4.6M disk block read/write = 184M ms = 184000 seconds > 2 day

Two-phase Multiway MergeSort disk read/write: 40 ms a tuple: 160 bytes a block: 16KB (100 tuples) a relation R: 1.6 GB (10M tuples, 100K blocks) main memory: 100MB (6400 blocks) Two-phase Multiway MergeSort Phase 1. making sorted sublist repeat fill the main memory with remaining tuples in R and sort them; write the sorted sublist (of 6400 blocks) back to disk Phase 2. Merging bring in a block from each of the sorted sublist; merge them and put in an “output” block; write the “output” block back to disk when it is full

Two-phase Multiway MergeSort Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Sort it Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Sort it Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Sort it Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Sort it Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Main memory Two-phase Multiway MergeSort Disk

Second Phase Main memory Disk

Two-phase Multiway MergeSort Second Phase Main memory One block per sublist Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort Main memory One block per sublist Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort Main memory One block per sublist Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort Main memory One block per sublist Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort Main memory One block per sublist Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort Main memory One block per sublist Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort disk read/write: 40 ms a tuple: 160 bytes a block: 16KB (100 tuples) a relation R: 1.6 GB (10M tuples, 100K blocks) main memory: 100MB (6400 blocks) Two-phase Multiway MergeSort # sublists = 100K/6400 = 16 thus, in phase 2, we can easily hold a block for each sublist in the main memory Disk block read/write: 100K (blocks) * 4 = 400K disk block read/write = 16M ms = 16000 seconds < 4.5 hours

Graduate Database DBMS lock table DDL language DDL complier file administrator DDL complier lock table DDL language file manager logging & recovery concurrency control transaction manager database programmer index/file manager buffer manager query execution engine DML complier main memory buffers DML (query) language secondary storage (disks) DBMS Graduate Database

Read Chapter 13 for more details on memory structures