Presentation is loading. Please wait.

Presentation is loading. Please wait.

Em Spatiotemporal Database Laboratory Pusan National University File Processing : Storage Media 2004, Spring Pusan National University Ki-Joune Li.

Similar presentations


Presentation on theme: "Em Spatiotemporal Database Laboratory Pusan National University File Processing : Storage Media 2004, Spring Pusan National University Ki-Joune Li."— Presentation transcript:

1 em Spatiotemporal Database Laboratory Pusan National University File Processing : Storage Media 2004, Spring Pusan National University Ki-Joune Li

2 em Spatiotemporal Database Laboratory Pusan National University Major Functions of Computer Computation Storage Communication Presentation

3 em Spatiotemporal Database Laboratory Pusan National University Storage of Data Major Challenges How to store and manage a large amount of data  Example : more than 100 peta bytes for EOS Project How to represent sophisticated data

4 em Spatiotemporal Database Laboratory Pusan National University Modeling and Representation of Real World Example Building DB about Korean History Very complicated and Depending on viewpoint Database Course : 2004 Fall semester Real World Computer World

5 em Spatiotemporal Database Laboratory Pusan National University Managing Large Volume of Data Large Volume of Data Cost for Storage Media  Not very important and negligible Processing Time  Comparison between main memory and disk access time RAM : several nanoseconds (10 -9 sec) Disk : several milliseconds (10 -3 sec)  Time is the most valuable resource  Example Retrieving a piece of data from 100 peta bytes DB

6 em Spatiotemporal Database Laboratory Pusan National University Managing Large Volume of Data Management of Data Secure Management  From hacking  From any kinds of disasters Consistency of Data  Example Failure during a flight reservation transaction Concurrent transaction

7 em Spatiotemporal Database Laboratory Pusan National University Goals of File Systems To provide with 1. efficient Data Structures for storing large and complex data 2. Access Methods for rapid search 3. Query Processing Methods 4. Robust Management of Transactions

8 em Spatiotemporal Database Laboratory Pusan National University Memory Hierarchy Large Data Volume Not be stored in main memory But in secondary memory Memory Hierarchy Cache Memory 256 K bytes Main Memory 512 M bytes Secondary Memory 40 G bytes Tertiary Memory 100 Tera bytes Faster Cheaper

9 em Spatiotemporal Database Laboratory Pusan National University Flash Memory Non-Volatile Data survives power failure, but Data can be written at a location only once, but location can be erased and written to again  Can support only a limited number of write/erase cycles.  Erasing of memory has to be done to an entire bank of memory Speed Reads are roughly as fast as main memory But writes are slow (few microseconds), erase is slower Cost per unit of storage roughly similar to main memory Widely used in embedded devices such as digital cameras

10 em Spatiotemporal Database Laboratory Pusan National University Optical Storage Non-volatile : data is read optically from a spinning disk using a laser CD-ROM (640 MB), DVD (4.7 to 17 GB), CD-R, DVD-R CD-RW, DVD-RW, and DVD-RAM Speed Reads and writes are slower than with magnetic disk Juke-box systems Large numbers of removable disks, Few drives, and Mechanism for automatic loading/unloading of disks For storing large volumes of data

11 em Spatiotemporal Database Laboratory Pusan National University Tape Non-volatile Primarily Used for backup Speed Sequential access : much slower than disk Cost Very high capacity (40 to 300 GB tapes available) Tape can be removed from drive Drives are expensive Tape jukeboxes hundreds of terabytes to even a petabyte

12 em Spatiotemporal Database Laboratory Pusan National University Data Access with Secondary Memory Main Memory Access Request Get Data If in main memory Disk If not in main memory Access to Disk Load on main memory Get Data Hit Ratio r h = n h / n a How to increase hit ratio ?

13 em Spatiotemporal Database Laboratory Pusan National University Why Hit Ratio is so important ? Example for(int i=0;i<1000;i++) Nbytes=read(fd,buf,100); 1000 disk accesses ? 1000 * 10 -2 sec = 10 sec 1000 * 10 -8 sec = 10 -5 sec when r h = 0 when r h = 1

14 em Spatiotemporal Database Laboratory Pusan National University Physical Structure of Disk 512 bytes 200~400 sectors 2 * n DF

15 em Spatiotemporal Database Laboratory Pusan National University Disk Access Time t = t S + t R + t T, where t S : Seek Time  Time to reposition the head over the correct track  Average seek time is 1/2 the worst case seek time  4 to 10 milliseconds on typical disks t R : Rotational Latency  Time to reposition the head over the correct sector  Average rotational latency : ½ r (to find index point) + ½ r = r  In case of 15000 rpm : r =1*60sec/15000 = 4 msec t T : Transfer Time  Time to transfer data from disk to main memory via channel  Proportional to the number of sectors to read  Real transfer time is negligible

16 em Spatiotemporal Database Laboratory Pusan National University Block-Oriented Disk Access Example for(int i=0;i<1000;i++) Nbytes=read(fd,buf,10); 1000 times 10 bytes Buffer in main memory 1024 bytes 10 times 100 times 1 block (e.g. 1024 bytes) Number of Disk Accesses

17 em Spatiotemporal Database Laboratory Pusan National University Disk Block Unit of Disk Access Block Size Normally multiple of sectors 1K, 4K, 16K or 64K bytes depending on configuration Why not large block ? Limited by the size of available main memory Too large : unnecessary accesses of sectors  e.g. only 100 bytes, when block size is given as 64K 1 block : 128 sectors (about ½ track, ½ rotation, 2 msec) Too wasteful

18 em Spatiotemporal Database Laboratory Pusan National University Buffer Temporary memory to transfer a chunk of data 1 buffer : multiple blocks Page A piece of buffer (main memory) corresponding with block Page Replacement when buffer is full

19 em Spatiotemporal Database Laboratory Pusan National University Buffer Management : Read Main Memory Access Request Get Data If in main memory Disk If not in main memory Access to Disk Load on main memory and Replacement Get Data Read Request

20 em Spatiotemporal Database Laboratory Pusan National University Buffer Management : Write Main Memory Access Request Write Data If in main memory Disk If not in main memory Access to Disk Load on main memory and Replacement Write Data Write Request Write Data on Disk

21 em Spatiotemporal Database Laboratory Pusan National University Buffer Manager : Replacement Policy LRU Replace the block least recently used Most operating system and buffer management Idea behind LRU – use past pattern of block references as a predictor of future references Prediction of future reference Queries have well-defined access patterns (such as sequential scans), and a database system can use the information in a user’s query to predict future references

22 em Spatiotemporal Database Laboratory Pusan National University Buffer Manager : Replacement Policy Pinned block : memory block that is not allowed to be written back to disk. Toss-immediate strategy : frees the space occupied by a block as soon as the final tuple of that block has been processed Most recently used (MRU) strategy : system must pin the block currently being processed. After the final tuple of that block has been processed, the block is unpinned, and it becomes the most recently used block.

23 em Spatiotemporal Database Laboratory Pusan National University Logical Structure of File File Field Record (Tuple) Record Block Fixed Size Record Variable Size Record

24 em Spatiotemporal Database Laboratory Pusan National University Fixed Size Record Fixed Size Fixed Number of Fields, and Fixed Size of each Field Easy to implement Disk Address (n-1)*s record Deletion of a record Like Array but no movement  Free Record List or  Pointer to Next Record

25 em Spatiotemporal Database Laboratory Pusan National University Variable Length Record Variable Length due to Variable Number of Fields, or Variable Size of each Field Complicated to implement Implementation Delimiter (, size, or pointer) Slotted Page Fixed Length  Overflow Area  Reserved Space

26 em Spatiotemporal Database Laboratory Pusan National University Delimiters Record … … Delimiters Record … … Pointer/Size Difficult to handle deletions and insertions

27 em Spatiotemporal Database Laboratory Pusan National University Slotted Page n Records can be moved around within a page to keep them contiguous with no empty space between them entry in the header must be updated. n Records can be moved around within a page to keep them contiguous with no empty space between them entry in the header must be updated. n Pointers should not point directly to record But to the entry for the record in header. n Pointers should not point directly to record But to the entry for the record in header. Pointer to Record

28 em Spatiotemporal Database Laboratory Pusan National University Reserved Space Maximum # of Fields

29 em Spatiotemporal Database Laboratory Pusan National University Overflow Area First field of record Rest records

30 em Spatiotemporal Database Laboratory Pusan National University Binary Large Object Block (BLOB) If size (field) > size (block) e.g. Image or Video BLOB : Type of field where its size is greater than block size  cf. CLOB : Text rather than binary Name ID# Photo Block size Name ID# Contiguous Reserved Block for BLOB

31 em Spatiotemporal Database Laboratory Pusan National University File System Example fd=open(”data.txt”,O_RDONLY,0); Nbytes=read(fd,buf,100);  How to process these functions in OS ?

32 em Spatiotemporal Database Laboratory Pusan National University i (index)–node : information about file Attributes Pointers to data block i-node Data Block Name Permission Ownership Last updated date/time Created date/time Type : directory, data, special

33 em Spatiotemporal Database Laboratory Pusan National University i-node : Pointer to data block Attributes Pointers to data block (0-9: up to 40K bytes) Pointers to data block (0-9: up to 40K bytes) Single direct Pointer Data Block... Data Block... Data Block Pointer Block (1024 blocks) Pointer Block (1024 blocks) Double direct Pointer

34 em Spatiotemporal Database Laboratory Pusan National University Block configuration for i-node Boot Block Super Block 0 1 i-node 1 ~ 40 i-node 41 ~ 80 … … 2 3 Data block … … Reserved Block Given by formatting User space

35 em Spatiotemporal Database Laboratory Pusan National University Implementation of File Hierarchy Attributes i-node 6 Attributes i-node 1 i-node for root directory i-node for /usr Attributes i-node 19 1 1 1 1 4 4 7 7 14 Root directory block 9 9 6 6 8 8.... bin dev lib etc usr tmp 6 6 1 1 19 30 54 Directory block for /usr.... lik kimmk parksh i-node for /usr/lik 19 6 6 107 Directory block for /usr/lik.... data.txt Data block for /usr/lik/data.txt Attributes i-node 107 i-node for /usr/lik/data.txt

36 em Spatiotemporal Database Laboratory Pusan National University FAT (File Allocation Table) DOS or MS-Windows 98 Same purpose of i-node in UNIX

37 em Spatiotemporal Database Laboratory Pusan National University fd=open(”data.txt”,O_RDONLY,0); Nbytes=read(fd,buf,100); Step 1 : Find i-node for “data.txt” via i-node from root or current directory Step 2 : Check owner and access right Step 3 : Register it to OpenFileTable Initialize entry values : e.g. offset, mode fd : array index of this table Some entries : reserved for stdio, stderr, etc.. Step 4 : Check ownership and right Step 5 : Read 100 bytes to buf Read 100 bytes from the OpenFileTable[fd].offset OpenFileTable[fd].offset += 100; open write

38 em Spatiotemporal Database Laboratory Pusan National University Data Dictionary : What does it contain ? Data dictionary (also called system catalog) stores metadata Information about relations  names of relations  names and types of attributes of each relation  names and definitions of views User and accounting information, including passwords Statistical and descriptive data  number of tuples in each relation Physical file organization information  How relation is stored (sequential/hash/…)  Physical location of relation operating system file name or disk addresses of blocks containing records of the relation Information about indices

39 em Spatiotemporal Database Laboratory Pusan National University Data Dictionary : How to Represent it Data structure specialized data structures designed for efficient access a set of relations, with existing system features used to ensure efficient access The latter alternative is usually preferred Relation-metadata (relation-name, number-of-attributes, storage-organization, location) Attribute-metadata (attribute-name, relation-name, domain-type, position, length) User-metadata (user-name, encrypted-password, group) Index-metadata (index-name, relation-name, index-type, index-attributes) View-metadata (view-name, definition)

40 em Spatiotemporal Database Laboratory Pusan National University Persistent Object Objects in C++ program Volatile Object : Disappears with the termination of program Persistent Object Non-Volatile Object : Keeps its status despite of its termination A Necessary Condition for Object-Oriented Databases Object vs. Record

41 em Spatiotemporal Database Laboratory Pusan National University OID : Object Identifier ID given by system the only way to identify object one ID per an object Logical OID vs. Physical OID Logical OID  No direct specification from OID to physical location  Need an index that maps an OID to the object’s actual location. Physical OID  encodes physical location of the object  Physical OIDs typically have the following parts: a volume or file identifier a page identifier within the volume or file an offset within the page

42 em Spatiotemporal Database Laboratory Pusan National University Pointer Swizzling Object Main Memory Object Disk Space Pointer OID Pointer Swizzling


Download ppt "Em Spatiotemporal Database Laboratory Pusan National University File Processing : Storage Media 2004, Spring Pusan National University Ki-Joune Li."

Similar presentations


Ads by Google