CENG 3511 File Structures  How are the database tables stored on disk?  We will study data processing in databases: –Storage of data –Organization of.

Slides:



Advertisements
Similar presentations
Storing Data: Disk Organization and I/O
Advertisements

Secondary Storage Devices: Magnetic Disks
Csci 2111: Data and File Structures Week2, Lecture 1 & 2
Peripheral Storage Devices
Lesson 9 Types of Storage Devices.
Storage Devices.
Secondary Storage Rohit Khokher
Types Of Storage Device
Section 5a Types of Storage Devices.
January 25 & 27, Csci 2111: Data and File Structures Week3, Lecture 1 & 2 Secondary Storage and System Software: CD-ROM & Issues in Data Management.
Secondary Storage Devices: Magnetic Disks Optical Disks Floppy Disks Magnetic Tapes CENG 351.
Storage. The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 2Gb Typically magnetic disks, magneto­ optical (erasable),
Advance Database System
CNG 3511 CNG 351 Introduction to Data Management and File Structures Müslim Bozyiğit (Prof. Dr.) Department of Computer Engineering METU.
1 Storing Data: Disks and Files Yanlei Diao UMass Amherst Feb 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Chapter 6 Storage and Multimedia: The Facts and More.
1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
CENG 351 Fall Secondary Storage Devices: Magnetic Disks.
Chapter 3 Secondary Storage
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
1 Secondary Storage Devices. 2 Content ►Secondary storage devices ►Organization of disks ►Organizing tracks by sector ►Organizing tracks by blocks ►Non-data.
Storage Device Computer Component : Storage Device (External Memory, Secondary Memory, Secondary Storage) Storage Types Magnetic Types Optical Types.
Storage device.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
Hard Drive / Hard Disk Functions of hard disk
L/O/G/O External Memory Chapter 3 (C) CS.216 Computer Architecture and Organization.
Lecture 11: DMBS Internals
Chapter 3 Data Storage. Media Storage Main memory (Electronic Memory): Stores data currently being used Is made of semiconductor chips. Secondary Memory.
Physical Storage and File Organization COMSATS INSTITUTE OF INFORMATION TECHNOLOGY, VEHARI.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
January 18 & 20, Files Secondary Storage and System Software: Magnetic Disks &Tapes.
1 Secondary Storage Management Submitted by: Sathya Anandan(ID:123)
Chapter 111 Chapter 11: Hardware (Slides by Hector Garcia-Molina,
GCSE Information Technology Storing data Data storage devices can be divided into 2 main categories: Backing storage is used to store programs and data.
Disks Chapter 5 Thursday, April 5, Today’s Schedule Input/Output – Disks (Chapter 5.4)  Magnetic vs. Optical Disks  RAID levels and functions.
OSes: 11. FS Impl. 1 Operating Systems v Objectives –discuss file storage and access on secondary storage (a hard disk) Certificate Program in Software.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
Secondary Storage Devices: Magnetic Disks Optical Disks Floppy Disks Magnetic Tapes CENG 351.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?
Database Systems Disk Management Concepts. WHY DO DISKS NEED MANAGING? logical information  physical representation bigger databases, larger records,
Lecture 5 Secondary Storage and System Software III.
STORAGE DEVICES Introduction Comparision Storage Hierarchy Slide 1.
Lecture 5: 9/10/2002CS149D Fall CS149D Elements of Computer Science Ayman Abdel-Hamid Department of Computer Science Old Dominion University Lecture.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
Lecture 3 Secondary Storage and System Software I
Engr: Sajida Introduction to computing Optical storage The storage devices which use laser to read data from or write data to the reflective surface store.
File organization Secondary Storage Devices Lec#7 Presenter: Dr Emad Nabil.
File Organization Record Storage and Primary File Organization
File Structures How are the database tables stored on disk?
Secondary Storage Devices
Chapter 2: Computer-System Structures
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
Introduction to File Structures
Backing Store.
Lecture 11: DMBS Internals
Chapter 7.
Device Management Damian Gordon.
Secondary Storage Devices
GCSE OCR 4 Storage Computer Science J276 Unit 1
Lesson 9 Types of Storage Devices.
CENG 351 Introduction to Data Management and File Structures
Presentation transcript:

CENG 3511 File Structures  How are the database tables stored on disk?  We will study data processing in databases: –Storage of data –Organization of data –Access to data –Processing of data

CENG 3512 Hardware Operating System DBMS File system Application Where do File Structures fit in Computer Science?

CENG 3513 Computer Architecture Main Memory (RAM) Secondary Storage data transfer data is manipulated here data is stored here - Semiconductors - Fast, expensive, volatile, small - disks, tape - Slow,cheap, stable, large

CENG 3514 Advantages Main memory is fast Secondary storage is big (because it is cheap) Secondary storage is stable (non-volatile) i.e. data is not lost during power failures Disadvantages Main memory is small. Many databases are too large to fit in main memory (MM). Main memory is volatile, i.e. data is lost during power failures. Secondary storage is slow (10,000 times slower than MM)

CENG 3515 How fast is main memory? Typical time for getting info from: Main memory: ~12 nanosec = 120 x sec Magnetic disks: ~30 milisec = 30 x sec An analogy keeping same time proportion as above: Looking at the index of a book : 20 sec versus Going to the library: 58 days

CENG 3516 Normal Arrangement Secondary storage (SS) provides reliable, long- term storage for large volumes of data At any given time, we are usually interested in only a small portion of the data This data is loaded temporarily into main memory, where it can be rapidly manipulated and processed. As our interests shift, data is transferred automatically between MM and SS, so the data we are focused on is always in MM.

CENG 3517 Goal of the file structures Minimize the number of trips to the disk in order to get desired information Grouping related information so that we are likely to get everything we need with only one trip to the disk.

CENG 3518 File Systems Data is not scattered hither and thither on disk. Instead, it is organized into files. Files are organized into records. Records are organized into fields.

CENG 3519 Example A Sailors file may be a collection of sailor records, one record for each sailor Each sailor record may have several fields, such as –Sailor id –Sailor name –Rating –Age –Gender –Address –… Typically, each record in a file has the same fields.

CENG Secondary Storage Devices

CENG Secondary Storage Devices  Two major types of storage devices: 1.Direct Access Storage Devices (DASDs) –Magnetic Disks Hard disks (high capacity, low cost per bit) Floppy disks (low capacity, slow, cheap) –Solid state disks –Optical Disks CD-ROM = (Compact disc, read-only memory) DVD 2.Serial Devices –Magnetic tapes (very fast sequential access)

CENG Magnetic Disks Bits of data (0’s and 1’s) are stored on circular magnetic platters called disks. A disk rotates rapidly (& never stops). A disk head reads and writes bits of data as they pass under the head. Often, several platters are organized into a disk pack (or disk drive).

13 Top view of a 36 GB, 10,000 RPM, IBM SCSI server hard disk, with its top cover removed. Note the height of the drive and the 10 stacked platters. (The IBM Ultrastar 36ZX.)

14

Components of a Disk Platters Spindle Disk head Arm movement Arm assembly Tracks Sector

CENG Looking at a surface Surface of disk showing tracks and sectors sector tracks

CENG Organization of Disks Disk contains concentric tracks. Tracks are divided into sectors A sector is the smallest addressable unit in a disk. Sectors are addressed by: surface # cylinder (track) # sector #

CENG Accessing Data When a program reads a byte from the disk, the operating system locates the surface, track and sector containing that byte, and reads the entire sector into a special area in main memory called buffer. The bottleneck of a disk access is moving the read/write arm. So it makes sense to store a file in tracks that are below/above each other in different surfaces, rather than in several tracks in the same surface.

CENG Cylinders A cylinder is the set of tracks at a given radius of a disk pack. –i.e. a cylinder is the set of tracks that can be accessed without moving the disk arm. All the information on a cylinder can be accessed without moving the read/write arm.

Cylinders CENG 35120

CENG Estimating Capacities Track capacity = # of sectors/track * bytes/sector Cylinder capacity = # of tracks/cylinder * track capacity Drive capacity = # of cylinders * cylinder capacity Number of cylinders = # of tracks in a surface Knowing these relationships allows us to compute the amount of disk space a file is likely to require

CENG Exercise Store a file of records on a disk with the following characteristics: # of bytes per sector = 512 # of sectors per track = 40 # of tracks per cylinder = 12 # of cylinders = 1331 Q1. How many cylinders does the file require if each data record requires 256 bytes? Q2. What is the total capacity of the disk?

CENG Clusters Another view of sector organization is the one maintained by the O.S.’s file manager. It views the file as a series of clusters of sectors. File manager uses a file allocation table (FAT) to map logical sectors of the file to the physical clusters.

CENG Extents If there is a lot of room on a disk, it may be possible to make a file consist entirely of contiguous clusters. Then we say that the file is one extent. (very good for sequential processing) If there isn’t enough contiguous space available to contain an entire file, the file is divided into two or more noncontiguous parts. Each part is an extent.

CENG Fragmentation  Internal fragmentation: loss of space within a sector or a cluster. 1)Due to records not fitting exactly in a sector: e.g. Sector size is 512 and record size is 300 bytes. Either –store one record per sector, or –allow records span sectors. 2)Due to the use of clusters: If the file size is not a multiple of the cluster size, then the last cluster will be partially used.

CENG Choice of cluster size  Some operating systems allow system administrator to choose cluster size. When to use large cluster size? What about small cluster size?

CENG The Cost of a Disk Access  The time to access a sector in a track on a surface is divided into 3 components: Time ComponentAction Seek TimeTime to move the read/write arm to the correct cylinder Rotational delay (or latency) Time it takes for the disk to rotate so that the desired sector is under the read/write head Transfer timeOnce the read/write head is positioned over the data, this is the time it takes for transferring data

CENG Seek time Seek time is the time required to move the arm to the correct cylinder. Largest in cost. Typically: –5 ms (miliseconds) to move from one track to the next (track-to-track) –50 ms maximum (from inside track to outside track) –30 ms average (from one random track to another random track)

CENG Average Seek Time (s) Since it is usually impossible to know exactly how many tracks will be traversed in every seek, we usually try to determine the average seek time (s) required for a particular file operation. If the starting and ending positions for each access are random, it turns out that the average seek traverses one third of the total number of cylinders. Manufacturer’s specifications for disk drives often list this figure as the average seek time for the drives. Most hard disks today have s of less than 10 ms, and high- performance disks have s as low as 7.5 ms.

CENG Latency (rotational delay) Latency is the time needed for the disk to rotate so the sector we want is under the read/write head. Hard disks usually rotate at about 5000rpm, which is one revolution per 12 msec. Note: –Min latency = 0 –Max latency = Time for one disk revolution –Average latency (r) = (min + max) / 2 = max / 2 = time for ½ disk revolution Typically 6 – 8 ms average

CENG Transfer Time Transfer time is the time for the read/write head to pass over a block. The transfer time is given by the formula: number of bytes transferred Transfer time = x rotation time number of bytes on a track e.g. if there are 63 sectors per track, the time to transfer one sector would be 1/63 of a revolution.

CENG Exercise Given the following disk: –20 surfaces 800 tracks/surface 25 sectors/track 512 bytes/sector –3600 rpm (revolutions per minute) –7 ms track-to-track seek time 28 ms avg. seek time 50 ms max seek time. Find: a)Average latency b)Disk capacity c)Time to read the entire disk, one cylinder at a time

Solution a)Average Latency: 3600 rev/min => 60 rev/sec  1/60 sec/rev = sec = 16.7 ms  Average latency = r = 16.7/2 = 8.3 ms b) Disk capacity 25*512*800*20 = 204.8MB c) Time to read the disk: Track read time = 1 revolution time= 16.7 ms Cylinder read time = 20*16.7= 334ms Total read time = 800*cylinder reads cylinder switches = 800*334 ms * 7ms = 267 sec sec = sec CENG 35133

CENG Exercise Disk characteristics: –Average seek time = 8 msec. –Average rotational delay = 3 msec –Maximum rotational delay = 6 msec. –Spindle speed = 10,000 rpm –Sectors per track = 170 –Sector size = 512 bytes Q) What is the average time to read one sector?

Solution Average time to read one sector: s + r + btt What is btt? btt : block transfer time = revolution time/ #of sectors per track Revolution time = 60/10000 = sec btt = 0.006/170 = ms s + r + btt = = ms CENG 35135

CENG Sequential Reading Given the following disk: –s = 16 ms –r = 8.3 ms –Block transfer time = 0.84 ms a)Calculate the time to read 10 sequential blocks b)Calculate the time to read 100 sequential blocks

Solution a)Reading 10 sequential blocks: = s + r+ 10 * btt = * 0.84 = 32.7 ms b) 100 blocks: = * 0.84 = ms CENG 35137

CENG Random Reading Given the same disk, a)Calculate the time to read 10 blocks randomly b)Calculate the time to read 100 blocks randomly

Solution a)Reading 10 blocks randomly: = 10 * (s + r + btt) = 10 * ( ) = ms b) 100 blocks: = 100 *( ) = 2514 ms CENG 35139

CENG Fast Sequential Reading We assume that blocks are arranged so that there is no rotational delay in transferring from one track to another within the same cylinder. This is possible if consecutive track beginnings are staggered (like running races on circular race tracks) We also assume that the consecutive blocks are arranged so that when the next block is on an adjacent cylinder, there is no rotational delay after the arm is moved to new cylinder Fast sequential reading: no rotational delay after finding the first block.

CENG Consequently … Reading b blocks: i.Sequentially: s + r + b * btt  b * btt ii.Randomly: b * (s + r + btt) insignificant for large files

CENG Exercise Given a file of records, 1600 bytes each, and block size 2400 bytes, how does record placement affect sequential reading time? i)Empty space in blocks. ii)Records overlap block boundaries.

Solution i)Empty space in blocks: b = # of blocks = n = # of records 30000*0.84 = 25.2 sec ii) Records overlap boundaries: Bfr = Blocking factor = 2400/1600 =3/2 b = 30000/1.5 = blocks Time = * 0.84 = 16.8 sec (1/3 faster) CENG 35143

CENG Exercise Specifications of a 300MB disk drive: –Min seek time = 6ms. –Average seek time = 18ms –Rotational delay = 8.3ms – transfer rate = 16.7 ms/track or 1229 bytes/ms –Bytes per sector = 512 –Sectors per track = 40 –Tracks per cylinder = 12 –Tracks per surface = 1331 –Interleave factor = 1 –Cluster size= 8 sectors –Extent size = 5 clusters Q) How long will it take to read a 2048Kb file that is divided into byte records? i)Access the file sequentially ii)Access the file randomly

Solution First find the # of extents: 1 cluster = 8 sectors = 8 *512 = 4096 bytes  16 records per cluster  File contains 8000/16 = 500 clusters Extent size = 5 clusters = 1 track  File contains 100 extents => 100 tracks i)Access the file sequentially: For 1 track = s + r + track transfer time = = 43 ms 100 tracks = 4300 ms = 4.3 sec ii)Access the file randomly: (8000 records) For each record: s+ r + read 1 cluster = /5 * 16.7 = 29.6 ms 8000 records => 8000 * 29.6 = sec CENG 35145

CENG Secondary Storage Devices: Magnetic Tapes

CENG Characteristics No direct access, but very fast sequential access. Resistant to different environmental conditions. Easy to transport, store, cheaper than disk. Before it was widely used to store application data; nowadays, it’s mostly used for backups or archives

CENG Magnetic tapes A sequence of bits are stored on magnetic tape. For storage, the tape is wound on a reel. To access the data, the tape is unwound from one reel to another. As the tape passes the head, bits of data are read from or written onto the tape.

CENG Reel 1 Reel 2 tape Read/write head

CENG Tracks Typically data on tape is stored in 9 separate bit streams, or tracks. Each track is a sequence of bits. Recording density = # of bits per inch (bpi). Typically 800 or 1600 bpi bpi on some recent devices.

CENG In detail ½” … …… …… … parity bit 8 bits = 1 byte

CENG Tape Organization … 2400’ logical record BOT marker Header block (describes data blocks) Data blocksInterblock gap (for acceleration & deceleration of tape) EOT marker

CENG Data Blocks and Records Each data block is a sequence of contiguous records. A record is the unit of data that a user’s program deals with. The tape drive reads an entire block of records at once. Unlike a disk, a tape starts and stops. When stopped, the read/write head is over an interblock gap.

CENG Example: tape capacity Given the following tape: –Recording density = 1600 bpi –Tape length = 2400 ' –Interblockgap = ½ " –512 bytes per record –Blocking factor = 25 How many records can we write on the tape? (ignoring BOT and EOT markers and the header block for simplicity)

Solution 1 block = bytes (= 512 * 25) 12800/1600 = 8 inch per block 8 + ½ " gap = 8.5 inch (total space for 1 block) Tape length = 2400 * 2.58 inch = 6192 inch 6192/8.5 = 728 blocks on tape * 728 = 13M CENG 35155

CENG Secondary Storage Devices: CD-ROM

CENG Physical Organization of CD-ROM Compact Disk – read only memory (write once) Data is encoded and read optically with a laser Can store around 600MB data Digital data is represented as a series of Pits and Lands: –Pit = a little depression, forming a lower level in the track – Land = the flat part between pits, or the upper levels in the track

CENG Organization of data Reading a CD is done by shining a laser at the disc and detecting changing reflections patterns. –1 = change in height (land to pit or pit to land) –0 = a “fixed” amount of time between 1’s LANDPITLANDPITLAND |_____| |_______| Note : we cannot have two 1’s in a row! => uses Eight to Fourteen Modulation (EFM) encoding table.

CENG Properties Note that: Since 0's are represented by the length of time between transitions, we must travel at constant linear velocity (CLV)on the tracks. Sectors are organized along a spiral Sectors have same linear length Advantage: takes advantage of all storage space available. Disadvantage: has to change rotational speed when seeking (slower towards the outside)

CENG Addressing 1 second of play time is divided up into 75 sectors. Each sector holds 2KB 60 min CD: 60min * 60 sec/min * 75 sectors/sec = 270,000 sectors = 540,000 KB ~ 540 MB A sector is addressed by: Minute:Second:Sector e.g. 16:22:34

61 DVD (Digital Video Disc) Characteristics A DVD disc has the same physical size as a CD disc, but it can store from 4.7 to 17 GB of data. Like a CD disc, data is recorded on a DVD disc in a spiral trail of tiny pits separated by lands. The DVD’s larger capacity is achieved by making the pits smaller and the spiral tighter, and by recording the data as many as four layers, two on each side of the disc. To read these tightly packed discs, lasers that produce a shorter wavelength beam of light are required to achieve more accurately aiming and focusing mechanism. In fact, the focusing mechanism is the technology that allows data to be recorded on two layers. To read the second layer, the reader simply focuses the laser a little deeper into the disc, where the second layer of data is recorded.

CENG Buffer Management

CENG A journey of a byte Suppose in our program we wrote: outfile << c; This causes a call to the file manager (a part of O.S. responsible for I/O operations) The O/S (File manager) makes sure that the byte is written to the disk. Pieces of software/hardware involved in I/O: –Application Program –Operating System/ file manager –I/O Processor –Disk Controller

CENG Application program –Requests the I/O operation Operating system / file manager –Keeps tables for all opened files –Brings appropriate sector to buffer. –Writes byte to buffer –Gives instruction to I/O processor to write data from this buffer into correct place in disk. –Note: the buffer is an exact image of a cluster in disk. I/O Processor –a separate chip; runs independently of CPU –Find a time when drive is available to receive data and put data in proper format for the disk –Sends data to disk controller Disk controller –A separate chip; instructs the drive to move R/W head –Sends the byte to the surface when the proper sector comes under R/W head.

CENG Buffer Management Buffering means working with large chunks of data in main memory so the number of accesses to secondary storage is reduced. Today, we’ll discuss the System I/O buffers. These are beyond the control of application programs and are manipulated by the O.S. Note that the application program may implement its own “buffer” – i.e. a place in memory (variable, object) that accumulates large chunks of data to be later written to disk as a chunk.

CENG System I/O Buffer Secondary Storage Buffer Program Data transferred by blocks Temporary storage in MM for one block of data Data transferred by records

CENG Buffer Bottlenecks Consider the following program segment: while (1) { infile >> ch; if (infile.fail()) break; outfile << ch; } What happens if the O.S. used only one I/O buffer?  Buffer bottleneck Most O.S. have an input buffer and an output buffer.

CENG Buffering Strategies Double Buffering: Two buffers can be used to allow processing and I/O to overlap. –Suppose that a program is only writing to a disk. –CPU wants to fill a buffer at the same time that I/O is being performed. –If two buffers are used and I/O-CPU overlapping is permitted, CPU can be filling one buffer while the other buffer is being transmitted to disk. –When both tasks are finished, the roles of the buffers can be exchanged. The actual management is done by the O.S.

CENG Other Buffering Strategies Multiple Buffering: instead of two buffers any number of buffers can be used to allow processing and I/O to overlap. Buffer pooling: –There is a pool of buffers. –When a request for a sector is received, O.S. first looks to see that sector is in some buffer. –If not there, it brings the sector to some free buffer. If no free buffer exists, it must choose an occupied buffer. (usually LRU strategy is used)