Presentation is loading. Please wait.

Presentation is loading. Please wait.

Disks and Files Vivek Pai Princeton University. 2 Why Files Physical reality Block oriented Physical sector #s No protection among users of the system.

Similar presentations


Presentation on theme: "Disks and Files Vivek Pai Princeton University. 2 Why Files Physical reality Block oriented Physical sector #s No protection among users of the system."— Presentation transcript:

1 Disks and Files Vivek Pai Princeton University

2 2 Why Files Physical reality Block oriented Physical sector #s No protection among users of the system Data might be corrupted if machine crashes Filesystem model Byte oriented Named files Users protected from each other Robust to machine failures

3 3 File Structures Byte sequence Read or write a number of bytes Unstructured or linear Record sequence Fixed or variable length Read or write a number of records Tree Records with keys Read, insert, delete a record (typically using B-tree)

4 4 File Structures Today Stream of bytes Simplest to implement in kernel Easy to manipulate in other forms Little performance loss More complicated structures Hardware assist fell out of favor Special-purpose hardware slower, costly

5 5 File Types ASCII – plain text A Unix executable file header: magic number, sizes, entry point, flags Text (code) Data relocation bits symbol table Devices Everything else in the system

6 6 So What Makes Filesystems Hard? Files grow and shrink in pieces Little a priori knowledge 6 orders of magnitude in file sizes Overcoming disk performance behavior Desire for efficiency Coping with failure

7 7 File System Components Disk management Arrange collection of disk blocks into files Naming User gives file name, not track or sector number, to locate data Security Keep information secure Reliability/durability When system crashes, lose stuff in memory, but want files to be durable User File Naming File access Disk management Disk drivers

8 8 Some Definitions File descriptor (fd) – an integer used to represent a file – easier than using names Metadata – data about data - bookkeeping data used to eventually access the “real” data Open file table – system-wide list of descriptors in use

9 9 Kinds of Metadata inode – index node, or a specific set of information kept about each file Two forms – on disk and in memory Directory – names and location information for files and subdirectories Note: stored in files in Unix Superblock – contains information to describe the file system, disk layout Information about free blocks/inodes on disk

10 10 Contents of an Inode Disk inode: File type, size, blocks on disk Owner, group, permissions (r/w/x) Reference count Times: creation, last access, last mod Inode generation number Padding & other stuff 128 bytes on classic Unix

11 11 Directories in Unix Stored like regular files Contents are file names and inode #s Names are nul-terminated strings Logic Separates file from location in tree File can appear in multiple places What are the drawbacks?

12 12 Effects of Corruption inode – file gets “damaged” Maybe some “free” block gets viewed Directory – “lose” files/directories Might get to read deleted files Superblock – can’t figure out anything This is why we replicate the superblock

13 13 Data Structures for A Typical File System Process control block...... Open file pointer array Open file table (systemwide) Memory Inode Disk inode

14 14 Opening A File File name lookup and authentication Copy the file metadata into the in-memory data structure, if it is not in yet Create an entry in the open file table (system wide) if there isn’t one Create an entry in PCB Link up the data structures Return a pointer to user PCB fd = open( FileName, access) Open file table Metadata Allocate & link up data structures File name lookup & authenticate File system on disk

15 15 Reading And Writing What happens when you… read 10 bytes from a file? write 10 bytes into an existing file? write 1024 bytes into a file? Disk works on blocks (sectors) Can have temporary (ephemeral) buffers Longer lasting buffers = disk cache

16 16 Reading A Block PCB Open file table Metadata read( fd, userBuf, size ) Logical  phyiscal read( device, phyBlock, size ) Get physical block to sysBuf copy to userBuf Disk device driver Buffer cache

17 17 A Disk Layout for A File System Superblock defines a file system size of the file system size of the file descriptor area free list pointer, or pointer to bitmap location of the file descriptor of the root directory other meta-data such as permission and various times For reliability, replicate the superblock Super block File metadata (i-node in Unix) File data blocks Boot block

18 18 File Usage Patterns How do users access files? Sequential: bytes read in order Random: read/write element out of middle of arrays Whole file or partial file How are files used? Most files are small Large files use up most of the disk space Large files account for most of the bytes transferred Bad news Need everything to be efficient

19 19 Data Structures for Disk Management A “header” for each file (part of the file meta-data) Disk sectors associated with each file A data structure to represent free space on disk Bit map 1 bit per block (sector) blocks numbered in cylinder-major order, why? Linked list Others? How much space does a bit map need for a 4G disk?

20 20 Linked Files (Alto) File header points to 1st block on disk Each block points to next Pros Can grow files dynamically Free list is similar to a file Cons random access: horrible unreliable: losing a block means losing the rest File header null...

21 21 Contiguous Allocation Request in advance for the size of the file Search bit map or linked list to locate a space File header first sector in file number of sectors Pros Fast sequential access Easy random access Cons External fragmentation Hard to grow files

22 22 Single-Level Indexed Files or Extent-based Filesystems A user declares max size A file header holds an array of pointers to point to disk blocks Pros Can grow up to a limit Random access is fast Cons Clumsy to grow beyond limit Periodic cleanup of new files Up-front declaration a real pain File header Disk blocks

23 23 217 File Allocation Table (FAT) Approach A section of disk for each partition is reserved One entry for each block A file is a linked list of blocks A directory entry points to the 1st block of the file Pros Simple Cons Always go to FAT Wasting space 619 399 foo 217 EOF FAT 0 399 619

24 24 Multi-Level Indexed Files (Unix) 13 Pointers in a header 10 direct pointers 11: 1-level indirect 12: 2-level indirect 13: 3-level indirect Pros & Cons In favor of small files Can grow Limit is 16G and lots of seek What happens to reach block 23, 5, 340? 1 2 data...... 11 12 13 data....................................

25 25 Challenges Unix filesystem has great flexibility Extent-based filesystems have speed Seeks kill performance – locality Bitmaps show contiguous free space Linked lists easy to search How do you perform backup/restore?

26 26 Bigger, Faster, Stronger Making individual disks larger is hard Throw more disks at the problem Capacity increases Effective access speed may increase Probability of failure also increases Use some disks to provide redundancy Generally assume a fail-stop model Fail-stop versus Byzantine failures

27 27 RAID ( Redundant Array of Inexpensive Disks ) Main idea Store the error correcting codes on other disks General error correcting codes are too powerful Use XORs or single parity Upon any failure, one can recover the entire block from the spare disk (or any disk) using XORs Pros Reliability High bandwidth Cons The controller is complex RAID controller XOR

28 28 Synopsis of RAID Levels RAID Level 0: Non redundant (JBOD) RAID Level 1: Mirroring RAID Level 2: Byte-interleaved, ECC RAID Level 3: Byte-interleaved, parity RAID Level 4: Block-interleaved, parity RAID Level 5: Block-interleaved, distributed parity

29 29 Did RAID Work? Performance: yes Reliability: yes Cost: no Controller design complicated Fewer economies of scale High-reliability environments don’t care Now also software implementations


Download ppt "Disks and Files Vivek Pai Princeton University. 2 Why Files Physical reality Block oriented Physical sector #s No protection among users of the system."

Similar presentations


Ads by Google