1 Lecture 16: Data Storage Wednesday, November 6, 2006.

Slides:



Advertisements
Similar presentations
Storing Data: Disk Organization and I/O
Advertisements

Storing Data: Disks and Files
Physical DataBase Design
Storing Data: Disks and Files: Chapter 9
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
CS4432: Database Systems II Data Storage - Lecture 2 (Sections 13.1 – 13.3) Elke A. Rundensteiner.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 11 External Sorting.
CS 4432lecture #31 CS4432: Database Systems II Lecture #3 Professor Elke A. Rundensteiner.
1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
CS4432: Database Systems II Lecture 2 Timothy Sutherland.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #5.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
External Sorting 198:541. Why Sort?  A classic problem in computer science!  Data requested in sorted order e.g., find students in increasing gpa order.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
CSE 444: Lecture 24 Query Execution Monday, March 7, 2005.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 9.
Lecture 11: DMBS Internals
Physical Storage Susan B. Davidson University of Pennsylvania CIS330 – Database Management Systems November 20, 2007.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7 “ Yea, from the table of my memory I ’ ll wipe away.
Lecture #19 May 13 th, Agenda Trip Report End of constraints: triggers. Systems aspects of SQL – read chapter 8 in the book. Going under the lid!
1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.
ICS 321 Fall 2011 Overview of Storage & Indexing (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 11/9/20111Lipyeow.
Sorting.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
1 External Sorting. 2 Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing gpa order.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?
1 Lecture 16: Data Storage Friday, November 5, 2004.
Lecture 24 Query Execution Monday, November 28, 2005.
CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
Datalog Another formalism for expressing queries: - cleaner - closer to a “logic” notation - more convenient for analysis - equivalent in power to relational.
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapters 13: 13.1—13.5.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
External Sorting. Why Sort? A classic problem in computer science! Data requested in sorted order –e.g., find students in increasing gpa order Sorting.
COSC 6340: Disks 1 Disks and Files DBMS stores information on (“hard”) disks. This has major implications for DBMS design! » READ: transfer data from disk.
1 Lecture 15: Data Storage, Recovery Monday, February 13, 2006.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
Data Storage and Querying in Various Storage Devices.
The very Essentials of Disk and Buffer Management.
CS522 Advanced database Systems
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
End of XQuery DBMS Internals
CPSC-608 Database Systems
Disks and Files DBMS stores information on (“hard”) disks.
Lecture 11: DMBS Internals
Lecture 9: Data Storage and IO Models
Database Management Systems (CS 564)
Lecture 15: Midterm Review Data Storage
CSE 444: Lecture 25 Query Execution
Files and access methods
Lecture 18: DMBS Overview and Data Storage
Lecture 17: Data Storage and Recovery
Lecture 15: Data Storage Tuesday, February 20, 2001.
Lecture 20: Representing Data Elements
Presentation transcript:

1 Lecture 16: Data Storage Wednesday, November 6, 2006

2 Outline Data Storage The memory hierarchy – 11.2 Disks – 11.3 Merge sort – 11.4

3 The Memory Hierarchy Main Memory Disk Tape Volatile limited address spaces expensive average access time: nanoseconds 5-10 MB/S transmission rates 2-10 GB storage average time to access a block: msecs. Need to consider seek, rotation, transfer times. Keep records “close” to each other. 1.5 MB/S transfer rate 280 GB typical capacity Only sequential access Not for operational data Cache: access time 10 nano’s

4 Main Memory Fastest, most expensive Today: 512MB are common on PCs Many databases could fit in memory –New industry trend: Main Memory Database –E.g TimesTen, DataBlitz Main issue is volatility –Still need to store on disk

5 Secondary Storage Disks Slower, cheaper than main memory Persistent !!! Used with a main memory buffer

6 Buffer Management in a DBMS Data must be in RAM for DBMS to operate on it! Table of pairs is maintained. LRU is not always good. DB MAIN MEMORY DISK disk page free frame Page Requests from Higher Levels BUFFER POOL choice of frame dictated by replacement policy

7 Buffer Manager Manages buffer pool: the pool provides space for a limited number of pages from disk. Needs to decide on page replacement policy LRU Clock algorithm Enables the higher levels of the DBMS to assume that the needed data is in main memory.

8 Buffer Manager Why not use the Operating System for the task?? - DBMS may be able to anticipate access patterns - Hence, may also be able to perform prefetching - DBMS needs the ability to force pages to disk.

9 Tertiary Storage Tapes or optical disks Extremely slow: used for long term archiving only

10 The Mechanics of Disk Mechanical characteristics: Rotation speed (5400RPM) Number of platters (1-30) Number of tracks (<=10000) Number of bytes/track(10 5 ) Platters Spindle Disk head Arm movement Arm assembly Tracks Sector Cylinder

11 Disk Access Characteristics Disk latency = time between when command is issued and when data is in memory Disk latency = seek time + rotational latency –Seek time = time for the head to reach cylinder 10ms – 40ms –Rotational latency = time for the sector to rotate Rotation time = 10ms Average latency = 10ms/2 Transfer time = typically 10MB/s Disks read/write one block at a time (typically 4kB)

12 Average Seek Time Suppose we have N tracks, what is the average seek time ? Getting from cylinder x to y takes time |x-y|

13 The I/O Model of Computation In main memory algorithms we care about CPU time In databases time is dominated by I/O cost Assumption: cost is given only by I/O Consequence: need to redesign certain algorithms Will illustrate here with sorting

14 Sorting Illustrates the difference in algorithm design when your data is not in main memory: –Problem: sort 1Gb of data with 1Mb of RAM. Arises in many places in database systems: –Data requested in sorted order (ORDER BY) –Needed for grouping operations –First step in sort-merge join algorithm –Duplicate removal –Bulk loading of B+-tree indexes.

15 2-Way Merge-sort: Requires 3 Buffers Pass 1: Read a page, sort it, write it. –only one buffer page is used Pass 2, 3, …, etc.: – three buffer pages used. Main memory buffers INPUT 1 INPUT 2 OUTPUT Disk

16 Two-Way External Merge Sort Each pass we read + write each page in file. N pages in the file => the number of passes So total cost is: Improvement: start with larger runs Sort 1GB with 1MB memory in 10 passes Input file 1-page runs 2-page runs 4-page runs 8-page runs PASS 0 PASS 1 PASS 2 PASS 3 9 3,4 6,2 9,48,75,63,1 2 3,4 5,62,64,97,8 1,32 2,3 4,6 4,7 8,9 1,3 5,62 2,3 4,4 6,7 8,9 1,2 3,5 6 1,2 2,3 3,4 4,5 6,6 7,8

17 Can We Do Better ? We have more main memory Should use it to improve performance

18 Cost Model for Our Analysis B: Block size M: Size of main memory N: Number of records in the file R: Size of one record

19 External Merge-Sort Phase one: load M bytes in memory, sort –Result: runs of length M/R records M bytes of main memory Disk... M/R records

20 Phase Two Merge M/B – 1 runs into a new run Result: runs have now M/R (M/B – 1) records M bytes of main memory Disk... Input M/B Input 1 Input 2.. Output

21 Phase Three Merge M/B – 1 runs into a new run Result: runs have now M/R (M/B – 1) 2 records M bytes of main memory Disk... Input M/B Input 1 Input 2.. Output

22 Cost of External Merge Sort Number of passes: Think differently –Given B = 4KB, M = 64MB, R = 0.1KB –Pass 1: runs of length M/R = Have now sorted runs of records –Pass 2: runs increase by a factor of M/B – 1 = Have now sorted runs of 10,240,000,000 = records –Pass 3: runs increase by a factor of M/B – 1 = Have now sorted runs of records Nobody has so much data ! Can sort everything in 2 or 3 passes !

23 External Merge Sort The xsort tool in the XML toolkit sorts using this algorithm Can sort 1GB of XML data in about 8 minutes