ICS 321 Fall 2011 Overview of Storage & Indexing (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 11/9/20111Lipyeow.

Slides:



Advertisements
Similar presentations
Storing Data: Disk Organization and I/O
Advertisements

Disk Storage, Basic File Structures, and Hashing
Databasteknik Databaser och bioinformatik Data structures and Indexing (II) Fang Wei-Kleiner.
Storing Data: Disks and Files: Chapter 9
Tutorial 8 CSI 2132 Database I. Exercise 1 Both disks and main memory support direct access to any desired location (page). On average, main memory accesses.
Storage. The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 2Gb Typically magnetic disks, magneto­ optical (erasable),
CS4432: Database Systems II Data Storage - Lecture 2 (Sections 13.1 – 13.3) Elke A. Rundensteiner.
ICS 624 Spring 2011 Multi-Dimensional Clustering Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 1/24/20111Lipyeow.
Advance Database System
IELM 230: File Storage and Indexes Agenda: - Physical storage of data in Relational DB’s - Indexes and other means to speed Data access - Defining indexes.
1 Overview of Storage and Indexing Chapter 8 (part 1)
1 Storing Data: Disks and Files Yanlei Diao UMass Amherst Feb 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Murali Mani Overview of Storage and Indexing (based on slides from Wisconsin)
1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
CS4432: Database Systems II Lecture 2 Timothy Sutherland.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
1 Overview of Storage and Indexing Chapter 8 1. Basics about file management 2. Introduction to indexing 3. First glimpse at indices and workloads.
©Silberschatz, Korth and Sudarshan11.1Database System Concepts Chapter 11: Storage and File Structure Overview of Physical Storage Media Magnetic Disks.
DISK STORAGE INDEX STRUCTURES FOR FILES Lecture 12.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
1 Lecture 7: Data structures for databases I Jose M. Peña
Lecture 11: DMBS Internals
Storage and File Structure. Architecture of a DBMS.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How are data stored? –physical level –logical level.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
1 Physical Data Organization and Indexing Lecture 14.
1 IT420: Database Management and Organization Storage and Indexing 14 April 2006 Adina Crăiniceanu
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
IDA / ADIT Databasteknik Databaser och bioinformatik Data structures and Indexing (I) Fang Wei-Kleiner.
1 Overview of Storage and Indexing Chapter 8 (part 1)
Storage and Indexing1 Overview of Storage and Indexing.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive.
Chapter 13 Disk Storage, Basic File Structures, and Hashing. Copyright © 2004 Pearson Education, Inc.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
1 Overview of Storage and Indexing Chapter 8. 2 Data on External Storage  Disks: Can retrieve random page at fixed cost  But reading several consecutive.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?
CPSC 461 Final Review I Hessam Zakerzadeh Dina Said.
Database Systems Disk Management Concepts. WHY DO DISKS NEED MANAGING? logical information  physical representation bigger databases, larger records,
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
Chapter 5 Record Storage and Primary File Organizations
COSC 6340: Disks 1 Disks and Files DBMS stores information on (“hard”) disks. This has major implications for DBMS design! » READ: transfer data from disk.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
1 CS122A: Introduction to Data Management Lecture #14: Indexing Instructor: Chen Li.
1 Components of the Virtual Memory System  Arrows indicate what happens on a lw virtual address data physical address TLB page table memory cache disk.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Disks and Files.
File Organization Record Storage and Primary File Organization
CS522 Advanced database Systems
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
End of XQuery DBMS Internals
Oracle SQL*Loader
9/12/2018.
Disks and Files DBMS stores information on (“hard”) disks.
Lecture 11: DMBS Internals
Disk Storage, Basic File Structures, and Buffer Management
Disk storage Index structures for files
Lecture 18: DMBS Overview and Data Storage
Presentation transcript:

ICS 321 Fall 2011 Overview of Storage & Indexing (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 11/9/20111Lipyeow Lim -- University of Hawaii at Manoa

Data Storage Main Memory – Random access – Volatile Flash Memory – Random access – Random writes are expensive Disk – Random access – Sequential access cheaper Tapes – Only sequential access – Archiving 11/9/2011Lipyeow Lim -- University of Hawaii at Manoa2 Cache Main Memory CPU Disk Tapes Optical Disks Tertiary Storage

Relational Tables on Disk Record -- a tuple or row of a relational table RIDs – record identifiers that uniquely identify a record across memory and disk Page – a collection of records that is the unit of transfer between memory and disk Bufferpool – a piece of memory used to cache data and index pages. Buffer Manager – a component of a DBMS that manages the pages in memory Disk Space Manager – a component of a DBMS that manages pages on disk 11/9/2011Lipyeow Lim -- University of Hawaii at Manoa3 record page disk Bufferpool

Magnetic Disks A disk or platter contains multiple concentric rings called tracks. Tracks of a fixed diameter of a spindle of disks form a cylinder. Each track is divided into fixed sized sectors (ie. “arcs”). Data stored in units of disk blocks (in multiples of sectors) An array of disk heads moves as a single unit. Seek time: time to move disk heads over the required track Rotational delay: time for desired sector to rotate under the disk head. Transfer time: time to actually read/write the data 11/9/2011Lipyeow Lim -- University of Hawaii at Manoa4 sector tracks spindle rotates Arms move over tracks

Accessing Data on Disk Seek time: time to move disk heads over the required track Rotational delay: time for desired sector to rotate under the disk head. – Assume uniform distribution, on average time for half a rotation Transfer time: time to actually read/write the data 11/9/2011Lipyeow Lim -- University of Hawaii at Manoa5 sector tracks spindle rotates Arms move over tracks

Example: Barracuda 1TB HDD (ST AS) What is the average time to read 2048 bytes of data ? = Seek time + rotational latency + transfer time = 8.5 msec msec + ( 2048 / 512 ) / 63 * ( msec / 7200 rpm ) = /9/2011Lipyeow Lim -- University of Hawaii at Manoa6 cylinders Bytes/cylinder16065*512 Blocks/cylinder8029 Sectors/track63 Heads255 Spindle Speed7200 rpm Average Latency 4.16 msec Random read seek time < 8.5 msec Random read Write time < 9.5 msec

File Organizations How do we organize records in a file ? Heap files: records not in any particular order – Good for scans Sorted files: records sorted by particular fields – scans in the sorted order or range scans in the sorted order Indexes: Data structures to organize records via trees or hashing. – Like sorted files, they speed up searches for a subset of records, based on values in certain (“search key”) fields – Updates are much faster than in sorted files 11/9/2011Lipyeow Lim -- University of Hawaii at Manoa7

Comparing File Organizations Consider an employee table with search key Scans : fetch all records in the file Point queries: find all employees who are 30 years old (let’s assume there’s only one such employee) Range queries: find all employees aged above 65. Insert a record. Delete a record given its RID. 11/9/2011Lipyeow Lim -- University of Hawaii at Manoa8

Analysis of Algorithms Computation model – CPU comparison operation – General: most expensive operation Worst-case – How bad can it get ? Average-case – Assumption about probabilities Analysis: count the number of some operation w.r.t. some input size Asymptotics: Big “O” – Constants don’t matter – 500n = O(n) 11/9/2011Lipyeow Lim -- University of Hawaii at Manoa9 SELECT * FROM Employees E WHERE E.age=30 For each tuple t in Employees { if (t.age==30) { output t } What is the worse case number of output tuples ? Assume input size : n tuples What is the worse case running time in the number of comparisons ?

Search Algorithms on Sorted Data 11/9/2011Lipyeow Lim -- University of Hawaii at Manoa10 SELECT * FROM Employees E WHERE E.age=30 For each tuple t in Employees { if (t.age==30) { output t } elsif ( t.age > 30 ) { exit } Tuples are sorted by age What is the worse case running time in the number of comparisons ? Shortcircuited Linear Search (lo, hi) = (0,n-1) mid = lo+(hi-lo)/2 While(hi>lo && E[mid].age!=30) { if (E[mid].age < 30) { lo=mid } else { hi=mid } mid = lo+(hi-lo)/2 } Output all satisfying tuples around E[mid] Binary Search

Analysis of Binary Search Number tuples searched per iteration = n, n/2, n/4,... 1 Hence the number of iterations = O( log n ) Therefore number of comparisons = O(log n) 11/9/2011Lipyeow Lim -- University of Hawaii at Manoa11 (lo, hi) = (0,n-1) mid = lo + (hi-lo)/2 While(hi>lo && E[mid].age!=30) { if (E[mid].age < 30) { lo=mid } else { hi=mid } mid = lo + (hi-lo)/2 } Output all satisfying tuples around E[mid] 16, 19, 19, 20, 26, 30, 30, 31, 36 2 cmp 16, 19, 19, 20, 30, 31, 31, 31, 36 What is the worse case?

Analysis of DBMS Algorithms 11/9/2011Lipyeow Lim -- University of Hawaii at Manoa12 SELECT * FROM Employees WHERE age=30 for each page p of Employees table { if (p not in bufferpool) { Fetch p from disk } for each tuple t in page p { if (t.age==40) { output t } What is the most expensive operation ? Worst case running time = + time to fetch all pages of Employees from disk + time to compare age + time to output result Table Scan How would you estimate these times ? What is the worst case number of disk access ?

Analysis Model B : number of data pages R : number of records per page D : average time to read/write a disk page – From previous calculations, if a page is 2K bytes, D is about 13 milliseconds C : average time to process a record – For the 1 Ghz processors we have today, assuming it takes 100 cyles, C is about 100 nanoseconds 11/9/2011Lipyeow Lim -- University of Hawaii at Manoa13

Table Scans on Heap Files 11/9/2011Lipyeow Lim -- University of Hawaii at Manoa14 SELECT * FROM Employees for each page p of Employees table { if (p not in bufferpool) { Fetch p from disk } for each tuple t in page p { output t if (t.age==30) { output t } if (t.age>20 && t.age<30) { output t } SELECT * FROM Employees WHERE age > 20 and age < 30 SELECT * FROM Employees WHERE age=30 O(B) pages get fetched + O(B*R) tuples processed