Em Spatiotemporal Database Laboratory Pusan National University File Processing : Storage Media 2004, Spring Pusan National University Ki-Joune Li.

Slides:



Advertisements
Similar presentations
Physical DataBase Design
Advertisements

Storage and File Structure By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
1 Storing Data: Disks and Files Yanlei Diao UMass Amherst Feb 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Storage and File Structure
Recap of Feb 25: Physical Storage Media Issues are speed, cost, reliability Media types: –Primary storage (volatile): Cache, Main Memory –Secondary or.
Efficient Storage and Retrieval of Data
Ceng Operating Systems
1 Classification of Physical storage Media Speed with which data can be accessed Cost per unit of data Reliability  data loss on power failure or system.
1 Friday, July 07, 2006 “Vision without action is a daydream, Action without a vision is a nightmare.” - Japanese Proverb.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 11: Storage and.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
©Silberschatz, Korth and Sudarshan11.1Database System Concepts Chapter 11: Storage and File Structure Overview of Physical Storage Media Magnetic Disks.
Dr. Kalpakis CMSC 461, Database Management Systems URL: Storage and File Structure.
Storing Data: Disks & Files
Lecture 11: DMBS Internals
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How are data stored? –physical level –logical level.
Physical Storage and File Organization COMSATS INSTITUTE OF INFORMATION TECHNOLOGY, VEHARI.
Lecture 8 of Advanced Databases Storage and File Structure Instructor: Mr.Ahmed Al Astal.
Chapter 10 Storage and File Structure Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Chapter 10 Storage & File Structure. n Overview of Physical Storage Media n Magnetic Disks n Tertiary Storage n Storage Access n File Organization n Organization.
Source: Database System Concepts, Silberschatz etc Edited: Wei-Pang Yang, IM.NDHU, Introduction to Database CHAPTER 11 Storage and File.
©Silberschatz, Korth and Sudarshan11.1Database System Concepts Magnetic Hard Disk Mechanism NOTE: Diagram is schematic, and simplifies the structure of.
Database System Concepts, 5th Ed. Bin Mu at Tongji University Chapter 11: Storage and File Structure.
1 Storage and File Structure. 2 Classification of Physical Storage Media Speed with which data can be accessed Cost per unit of data Reliability  data.
©Silberschatz, Korth and Sudarshan11.1Database System Concepts Chapter 11: Storage and File Structure  File Organization  Organization of Records in.
Overview of Physical Storage Media
1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Storage and File Structure.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
©Silberschatz, Korth and Sudarshan11.1Database System Concepts Chapter 11: Storage and File Structure File Organization Organization of Records in Files.
File Processing : Storage Media 2015, Spring Pusan National University Ki-Joune Li.
Chapter 11: Storage and File Structure Chapter 11: Storage and File Structure Overview of Physical Storage Media Magnetic Disks RAID File Organization.
Source: Database System Concepts, Silberschatz etc Edited: Wei-Pang Yang, IM.NDHU 11-1 Introduction to Database CHAPTER 11 Storage and File Structure.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
11.1Database System Concepts. 11.2Database System Concepts Now Something Different 1st part of the course: Application Oriented 2nd part of the course:
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
©Silberschatz, Korth and Sudarshan11.1Database System Concepts Chapter 11: Storage and File Structure Overview of Physical Storage Media Magnetic Disks.
1 File Processing : File Organization and File Systems 2015, Spring Pusan National University Ki-Joune Li.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Storage and File Structure Malavika Srinivasan Prof. Franya Franek.
W4118 Operating Systems Instructor: Junfeng Yang.
Storage & File Structure Meghan Nagpal. Storage Media  Cache: Small, fastest form of storage; managed by the hardware; no effects about managing cache.
Data Storage and Querying in Various Storage Devices.
Storing Data: Disks and Files Memory Hierarchy Primary Storage: main memory. fast access, expensive. Secondary storage: hard disk. slower access,
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
Storage Overview of Physical Storage Media Magnetic Disks RAID
Chapter 11: Storage and File Structure
Chapter 11: Storage and File Structure
Module 11: File Structure
Lecture 16: Data Storage Wednesday, November 6, 2006.
FileSystems.
Chapter 11: Storage and File Structure
Database Management Systems (CS 564)
Performance Measures of Disks
Introduction to Database
File Processing : Storage Media
Lecture 11: DMBS Internals
Lecture 10: Buffer Manager and File Organization
Chapter 10: Storage and File Structure
Disk Storage, Basic File Structures, and Buffer Management
Storage and File Structure
File Processing : Storage Media
Module 11: Data Storage Structure
Storage and File Structure
Chapter 13: Data Storage Structures
Introduction to Database
File Processing : File Organization and File Systems
Chapter 13: Data Storage Structures
Chapter 13: Data Storage Structures
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
Presentation transcript:

em Spatiotemporal Database Laboratory Pusan National University File Processing : Storage Media 2004, Spring Pusan National University Ki-Joune Li

em Spatiotemporal Database Laboratory Pusan National University Major Functions of Computer Computation Storage Communication Presentation

em Spatiotemporal Database Laboratory Pusan National University Storage of Data Major Challenges How to store and manage a large amount of data  Example : more than 100 peta bytes for EOS Project How to represent sophisticated data

em Spatiotemporal Database Laboratory Pusan National University Modeling and Representation of Real World Example Building DB about Korean History Very complicated and Depending on viewpoint Database Course : 2004 Fall semester Real World Computer World

em Spatiotemporal Database Laboratory Pusan National University Managing Large Volume of Data Large Volume of Data Cost for Storage Media  Not very important and negligible Processing Time  Comparison between main memory and disk access time RAM : several nanoseconds (10 -9 sec) Disk : several milliseconds (10 -3 sec)  Time is the most valuable resource  Example Retrieving a piece of data from 100 peta bytes DB

em Spatiotemporal Database Laboratory Pusan National University Managing Large Volume of Data Management of Data Secure Management  From hacking  From any kinds of disasters Consistency of Data  Example Failure during a flight reservation transaction Concurrent transaction

em Spatiotemporal Database Laboratory Pusan National University Goals of File Systems To provide with 1. efficient Data Structures for storing large and complex data 2. Access Methods for rapid search 3. Query Processing Methods 4. Robust Management of Transactions

em Spatiotemporal Database Laboratory Pusan National University Memory Hierarchy Large Data Volume Not be stored in main memory But in secondary memory Memory Hierarchy Cache Memory 256 K bytes Main Memory 512 M bytes Secondary Memory 40 G bytes Tertiary Memory 100 Tera bytes Faster Cheaper

em Spatiotemporal Database Laboratory Pusan National University Flash Memory Non-Volatile Data survives power failure, but Data can be written at a location only once, but location can be erased and written to again  Can support only a limited number of write/erase cycles.  Erasing of memory has to be done to an entire bank of memory Speed Reads are roughly as fast as main memory But writes are slow (few microseconds), erase is slower Cost per unit of storage roughly similar to main memory Widely used in embedded devices such as digital cameras

em Spatiotemporal Database Laboratory Pusan National University Optical Storage Non-volatile : data is read optically from a spinning disk using a laser CD-ROM (640 MB), DVD (4.7 to 17 GB), CD-R, DVD-R CD-RW, DVD-RW, and DVD-RAM Speed Reads and writes are slower than with magnetic disk Juke-box systems Large numbers of removable disks, Few drives, and Mechanism for automatic loading/unloading of disks For storing large volumes of data

em Spatiotemporal Database Laboratory Pusan National University Tape Non-volatile Primarily Used for backup Speed Sequential access : much slower than disk Cost Very high capacity (40 to 300 GB tapes available) Tape can be removed from drive Drives are expensive Tape jukeboxes hundreds of terabytes to even a petabyte

em Spatiotemporal Database Laboratory Pusan National University Data Access with Secondary Memory Main Memory Access Request Get Data If in main memory Disk If not in main memory Access to Disk Load on main memory Get Data Hit Ratio r h = n h / n a How to increase hit ratio ?

em Spatiotemporal Database Laboratory Pusan National University Why Hit Ratio is so important ? Example for(int i=0;i<1000;i++) Nbytes=read(fd,buf,100); 1000 disk accesses ? 1000 * sec = 10 sec 1000 * sec = sec when r h = 0 when r h = 1

em Spatiotemporal Database Laboratory Pusan National University Physical Structure of Disk 512 bytes 200~400 sectors 2 * n DF

em Spatiotemporal Database Laboratory Pusan National University Disk Access Time t = t S + t R + t T, where t S : Seek Time  Time to reposition the head over the correct track  Average seek time is 1/2 the worst case seek time  4 to 10 milliseconds on typical disks t R : Rotational Latency  Time to reposition the head over the correct sector  Average rotational latency : ½ r (to find index point) + ½ r = r  In case of rpm : r =1*60sec/15000 = 4 msec t T : Transfer Time  Time to transfer data from disk to main memory via channel  Proportional to the number of sectors to read  Real transfer time is negligible

em Spatiotemporal Database Laboratory Pusan National University Block-Oriented Disk Access Example for(int i=0;i<1000;i++) Nbytes=read(fd,buf,10); 1000 times 10 bytes Buffer in main memory 1024 bytes 10 times 100 times 1 block (e.g bytes) Number of Disk Accesses

em Spatiotemporal Database Laboratory Pusan National University Disk Block Unit of Disk Access Block Size Normally multiple of sectors 1K, 4K, 16K or 64K bytes depending on configuration Why not large block ? Limited by the size of available main memory Too large : unnecessary accesses of sectors  e.g. only 100 bytes, when block size is given as 64K 1 block : 128 sectors (about ½ track, ½ rotation, 2 msec) Too wasteful

em Spatiotemporal Database Laboratory Pusan National University Buffer Temporary memory to transfer a chunk of data 1 buffer : multiple blocks Page A piece of buffer (main memory) corresponding with block Page Replacement when buffer is full

em Spatiotemporal Database Laboratory Pusan National University Buffer Management : Read Main Memory Access Request Get Data If in main memory Disk If not in main memory Access to Disk Load on main memory and Replacement Get Data Read Request

em Spatiotemporal Database Laboratory Pusan National University Buffer Management : Write Main Memory Access Request Write Data If in main memory Disk If not in main memory Access to Disk Load on main memory and Replacement Write Data Write Request Write Data on Disk

em Spatiotemporal Database Laboratory Pusan National University Buffer Manager : Replacement Policy LRU Replace the block least recently used Most operating system and buffer management Idea behind LRU – use past pattern of block references as a predictor of future references Prediction of future reference Queries have well-defined access patterns (such as sequential scans), and a database system can use the information in a user’s query to predict future references

em Spatiotemporal Database Laboratory Pusan National University Buffer Manager : Replacement Policy Pinned block : memory block that is not allowed to be written back to disk. Toss-immediate strategy : frees the space occupied by a block as soon as the final tuple of that block has been processed Most recently used (MRU) strategy : system must pin the block currently being processed. After the final tuple of that block has been processed, the block is unpinned, and it becomes the most recently used block.

em Spatiotemporal Database Laboratory Pusan National University Logical Structure of File File Field Record (Tuple) Record Block Fixed Size Record Variable Size Record

em Spatiotemporal Database Laboratory Pusan National University Fixed Size Record Fixed Size Fixed Number of Fields, and Fixed Size of each Field Easy to implement Disk Address (n-1)*s record Deletion of a record Like Array but no movement  Free Record List or  Pointer to Next Record

em Spatiotemporal Database Laboratory Pusan National University Variable Length Record Variable Length due to Variable Number of Fields, or Variable Size of each Field Complicated to implement Implementation Delimiter (, size, or pointer) Slotted Page Fixed Length  Overflow Area  Reserved Space

em Spatiotemporal Database Laboratory Pusan National University Delimiters Record … … Delimiters Record … … Pointer/Size Difficult to handle deletions and insertions

em Spatiotemporal Database Laboratory Pusan National University Slotted Page n Records can be moved around within a page to keep them contiguous with no empty space between them entry in the header must be updated. n Records can be moved around within a page to keep them contiguous with no empty space between them entry in the header must be updated. n Pointers should not point directly to record But to the entry for the record in header. n Pointers should not point directly to record But to the entry for the record in header. Pointer to Record

em Spatiotemporal Database Laboratory Pusan National University Reserved Space Maximum # of Fields

em Spatiotemporal Database Laboratory Pusan National University Overflow Area First field of record Rest records

em Spatiotemporal Database Laboratory Pusan National University Binary Large Object Block (BLOB) If size (field) > size (block) e.g. Image or Video BLOB : Type of field where its size is greater than block size  cf. CLOB : Text rather than binary Name ID# Photo Block size Name ID# Contiguous Reserved Block for BLOB

em Spatiotemporal Database Laboratory Pusan National University File System Example fd=open(”data.txt”,O_RDONLY,0); Nbytes=read(fd,buf,100);  How to process these functions in OS ?

em Spatiotemporal Database Laboratory Pusan National University i (index)–node : information about file Attributes Pointers to data block i-node Data Block Name Permission Ownership Last updated date/time Created date/time Type : directory, data, special

em Spatiotemporal Database Laboratory Pusan National University i-node : Pointer to data block Attributes Pointers to data block (0-9: up to 40K bytes) Pointers to data block (0-9: up to 40K bytes) Single direct Pointer Data Block... Data Block... Data Block Pointer Block (1024 blocks) Pointer Block (1024 blocks) Double direct Pointer

em Spatiotemporal Database Laboratory Pusan National University Block configuration for i-node Boot Block Super Block 0 1 i-node 1 ~ 40 i-node 41 ~ 80 … … 2 3 Data block … … Reserved Block Given by formatting User space

em Spatiotemporal Database Laboratory Pusan National University Implementation of File Hierarchy Attributes i-node 6 Attributes i-node 1 i-node for root directory i-node for /usr Attributes i-node Root directory block bin dev lib etc usr tmp Directory block for /usr.... lik kimmk parksh i-node for /usr/lik Directory block for /usr/lik.... data.txt Data block for /usr/lik/data.txt Attributes i-node 107 i-node for /usr/lik/data.txt

em Spatiotemporal Database Laboratory Pusan National University FAT (File Allocation Table) DOS or MS-Windows 98 Same purpose of i-node in UNIX

em Spatiotemporal Database Laboratory Pusan National University fd=open(”data.txt”,O_RDONLY,0); Nbytes=read(fd,buf,100); Step 1 : Find i-node for “data.txt” via i-node from root or current directory Step 2 : Check owner and access right Step 3 : Register it to OpenFileTable Initialize entry values : e.g. offset, mode fd : array index of this table Some entries : reserved for stdio, stderr, etc.. Step 4 : Check ownership and right Step 5 : Read 100 bytes to buf Read 100 bytes from the OpenFileTable[fd].offset OpenFileTable[fd].offset += 100; open write

em Spatiotemporal Database Laboratory Pusan National University Data Dictionary : What does it contain ? Data dictionary (also called system catalog) stores metadata Information about relations  names of relations  names and types of attributes of each relation  names and definitions of views User and accounting information, including passwords Statistical and descriptive data  number of tuples in each relation Physical file organization information  How relation is stored (sequential/hash/…)  Physical location of relation operating system file name or disk addresses of blocks containing records of the relation Information about indices

em Spatiotemporal Database Laboratory Pusan National University Data Dictionary : How to Represent it Data structure specialized data structures designed for efficient access a set of relations, with existing system features used to ensure efficient access The latter alternative is usually preferred Relation-metadata (relation-name, number-of-attributes, storage-organization, location) Attribute-metadata (attribute-name, relation-name, domain-type, position, length) User-metadata (user-name, encrypted-password, group) Index-metadata (index-name, relation-name, index-type, index-attributes) View-metadata (view-name, definition)

em Spatiotemporal Database Laboratory Pusan National University Persistent Object Objects in C++ program Volatile Object : Disappears with the termination of program Persistent Object Non-Volatile Object : Keeps its status despite of its termination A Necessary Condition for Object-Oriented Databases Object vs. Record

em Spatiotemporal Database Laboratory Pusan National University OID : Object Identifier ID given by system the only way to identify object one ID per an object Logical OID vs. Physical OID Logical OID  No direct specification from OID to physical location  Need an index that maps an OID to the object’s actual location. Physical OID  encodes physical location of the object  Physical OIDs typically have the following parts: a volume or file identifier a page identifier within the volume or file an offset within the page

em Spatiotemporal Database Laboratory Pusan National University Pointer Swizzling Object Main Memory Object Disk Space Pointer OID Pointer Swizzling