CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 Notes #5.

Slides:



Advertisements
Similar presentations
CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.
Advertisements

Storing Data: Disks and Files: Chapter 9
CS 277 – Spring 2002Notes 21 CS 277: Database System Implementation Notes 02: Hardware Arthur Keller.
The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 512Mb Typically magnetic disks, magneto­ optical (erasable), CD­ ROM.
Storage. The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 2Gb Typically magnetic disks, magneto­ optical (erasable),
CS4432: Database Systems II Data Storage - Lecture 2 (Sections 13.1 – 13.3) Elke A. Rundensteiner.
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
Disk Access Model. Using Secondary Storage Effectively In most studies of algorithms, one assumes the “RAM model”: –Data is in main memory, –Access to.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #2.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #6.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #10.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #7.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #6.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
CS4432: Database Systems II Lecture 2 Timothy Sutherland.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #5.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #13.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #9.
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #12.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #14.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes 1.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
Storage. The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 8Gb Typically magnetic disks, magneto­ optical (erasable),
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #6.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes 1.
Storage. The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 512Mb Access times in milliseconds, great variability. Unit.
Chapter 8 File Processing and External Sorting. Primary vs. Secondary Storage Primary storage: Main memory (RAM) Secondary Storage: Peripheral devices.
Chapter 2 Data Storage How does a computer system store and manage very large volumes of data ?
Lecture 11: DMBS Internals
1 Secondary Storage Management Submitted by: Sathya Anandan(ID:123)
Chapter 111 Chapter 11: Hardware (Slides by Hector Garcia-Molina,
1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.
Storage.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
Indexing.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
1 Data Storage (Chap. 11) Based on Hector Garcia-Molina’s slides.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Database Systems Disk Management Concepts. WHY DO DISKS NEED MANAGING? logical information  physical representation bigger databases, larger records,
DBMS 2001Notes 2: Hardware1 Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
1 CSE232A: Database System Principles Hardware. Data + Indexes Database System Architecture Query ProcessingTransaction Management SQL query Parser Query.
CPS216: Advanced Database Systems Notes 03: Data Access from Disks Shivnath Babu.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
CS 245Notes 21 Database System Principles Notes 02: Hardware.
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
Lecture 11: DMBS Internals
CPSC-310 Database Systems
CPSC-608 Database Systems
Scholastic Dishonesty
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPS216: Advanced Database Systems Notes 04: Data Access from Disks
CPSC-608 Database Systems
Presentation transcript:

CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5

secondary storage (disks) database administrator DDL language database programmer DML (query) language DBMS file manager buffer manager main memory buffers index/file manager DML complier DDL complier query execution engine transaction manager concurrency control lock table logging & recovery Graduate Database

secondary storage (disks) database administrator DDL language database programmer DML (query) language DBMS file manager buffer manager main memory buffers index/file manager DML complier DDL complier query execution engine transaction manager concurrency control lock table logging & recovery Graduate Database

Computer Memory Hierarchy

CPU main memory Disk controller Secondary Storage... A Typical Computer bus disks

Main Memory fast small capacity (gigabytes) volatile Disks slow large capacity (100’s gigabytes) non-volatile

Typical Disk Terms: Platter, Head, Cylinder, Track, Sector, Gap …

Top View Track Sector Gap

A “typical” disk 5 platters (thus 10 surfaces) A surface has 20,000 tracks A track has 500 sectors (million bytes) A sector has several thousand bytes Disk makes 5000 revolutions per minute (so about 10 millisecond per rotation)

Blocks A (logic) block = one or several sectors (typical size 16KB) Block address Physical device Cylinder # Surface # Sector

Disk Access Time block X in memory ? I want block X Time = Seek Time + Rotational Delay + Transfer Time + Other

Seek Time 3 or 5x x 1N Cylinders Traveled Time

Average Random Seek Time   SEEKTIME (i  j) S = N(N-1) N N i=1 j=1 j  i typical seek time: 10 ms  40 ms

Rotational Delay Head here Block I want Average Rotational Delay R = 1/2 revolution typical rotational delay = 8 ms

Transfer Rate: typical: t = 80 MB/second = 80 KB/millisecond transfer time: block size / t ~ 10/80 < 1 ms

Other Delays CPU time to issue I/O Contention for controller Contention for bus, memory Typical value: ≈ 0

Thus, reading a block of 16K bytes: Time = Seek Time + Rotational Delay + Transfer Time + Other ~ 30 ms + 8 ms + 16/80 ms + 0 ~ 40 ms

Main Memory fast (read/write: nanosecond) small capacity (gigabytes) volatile Disks slow (read/write: 1~40 millisecond) large capacity (100’s gigabytes) non-volatile Disks are about 10 5 ~10 6 times slower than main memory

I/O Model of Computation Dominance of I/O cost: if a block needs to be moved between disk and main memory, then the time taken to perform the read/write is much larger than the time likely to be used to manipulate that data in main memory. The number of disk block reads/writes is a good approximation to the entire computation.

Example. Sorting on disk Each tuple (with a key) takes 160 bytes Each block holds 100 tuples (16KB) A relation R has 10M tuples (1.6 GB, 100K blocks) Main memory has 100MB (6400 blocks) A disk read/write: 40 ms

Main memory sorting algorithms heap sort: 10M * log 2 (10M) = 230M disk block read/write = 9200M ms = seconds > 100 day quick sort and merge sort: 2 * 100K (blocks) * log 2 (10M) = 4.6M disk block read/write = 184M ms = seconds > 2 day

Two-phase Multiway MergeSort Phase 1. making sorted sublist repeat fill the main memory with remaining tuples in R and sort them; write the sorted sublist (of 6400 blocks) back to disk Phase 2. Merging repeat bring in a block from each of the sorted sublist; merge them and put in an “output” block; write the “output” block back to disk when it is full

Main memory Disk Two-phase Multiway MergeSort

Main memory Disk First Phase Two-phase Multiway MergeSort

Main memory Disk Sort it First Phase Two-phase Multiway MergeSort

Main memory Disk First Phase Two-phase Multiway MergeSort

Main memory Disk First Phase Two-phase Multiway MergeSort

Main memory Disk Sort it First Phase Two-phase Multiway MergeSort

Main memory Disk First Phase Two-phase Multiway MergeSort

Main memory Disk First Phase Two-phase Multiway MergeSort

Main memory Disk Sort it First Phase Two-phase Multiway MergeSort

Main memory Disk First Phase Two-phase Multiway MergeSort

Main memory Disk First Phase Two-phase Multiway MergeSort

Main memory Disk Sort it First Phase Two-phase Multiway MergeSort

Main memory Disk First Phase Two-phase Multiway MergeSort

Main memory Disk Second Phase

Main memory Disk Second Phase One block per sublist Two-phase Multiway MergeSort

Main memory Disk merge Two-phase Multiway MergeSort One block per sublist

Main memory Disk merge Two-phase Multiway MergeSort One block per sublist

Main memory Disk merge Two-phase Multiway MergeSort One block per sublist

Main memory Disk merge Two-phase Multiway MergeSort One block per sublist

Main memory Disk merge Two-phase Multiway MergeSort One block per sublist

Main memory Disk merge Two-phase Multiway MergeSort

# sublists = 100K/6400 = 16 thus, in phase 2, we can easily hold a block for each sublist in the main memory Disk block read/write: 100K (blocks) * 4 = 400K disk block read/write = 16M ms = seconds < 4.5 hours

secondary storage (disks) database administrator DDL language database programmer DML (query) language DBMS file manager buffer manager main memory buffers index/file manager DML complier DDL complier query execution engine transaction manager concurrency control lock table logging & recovery Graduate Database

Read Chapter 13 for more details on memory structures