CS 5204 Operating Systems Disks & File Systems Godmar Back.

CS 5204 Operating Systems Disks & File Systems Godmar Back

Filesystems

Files vs Disks File Abstraction Byte oriented Names Access protection
Consistency guarantees Disk Abstraction Block oriented Block #s No protection No guarantees beyond block write CS 5204 Fall 2015

Filesystem Requirements
Naming Should be flexible, e.g., allow multiple names for same files Support hierarchy for easy of use Persistence Want to be sure data has been written to disk in case crash occurs Sharing/Protection Want to restrict who has access to files Want to share files with other users CS 5204 Fall 2015

FS Requirements (cont’d)
Speed & Efficiency for different access patterns Sequential access Random access Sequential is most common & Random next Other pattern is Keyed access (not usually provided by OS) Minimum Space Overhead Disk space needed to store metadata is lost for user data Twist: all metadata that is required to do translation must be stored on disk Translation scheme should minimize number of additional accesses for a given access pattern Harder than, say page tables where we assumed page tables themselves are not subject to paging! CS 5204 Fall 2015

Software Architecture (including in-memory data structures)
Filesystems Software Architecture (including in-memory data structures)

File Operations: create(), unlink(), open(), read(), write(), close()
Overview File Operations: create(), unlink(), open(), read(), write(), close() Uses names for files Views files as sequence of bytes File System Must implement translation (file name, file offset)  (disk id, disk sector, sector offset) Must manage free space on disk Buffer Cache Uses disk id + sector indices Device Driver CS 5204 Fall 2015

The Big Picture PCB File Data Per-process file descriptor table
Data structures to keep track of open files struct file inode + position + … struct dir inode + position struct inode Buffer Cache … 5 4 3 2 1 Directory Data File Descriptors (inodes) ? PCB Filesystem Information Open file table Cached data and metadata in buffer cache On-Disk Data Structures CS 5204 Fall 2015

Steps in Opening & Reading a File
Lookup (via directory) find on-disk file descriptor’s block number Find entry in open file table (struct inode list in Pintos) Create one if none, else increment ref count Find where file data is located By reading on-disk file descriptor Read data & return to user CS 5204 Fall 2015

Open File Table inode – represents file
at most 1 in-memory instance per unique file #number of openers & other properties file – represents one or more processes using an file With separate offsets for byte-stream dir – represents an open directory file Generally: None of data in OFT is persistent Reflects how processes are currently using files Lifetime of objects determined by open/close Reference counting is used CS 5204 Fall 2015

File Descriptors (“inodes”)
Term “inode” can refer to 3 things: in-memory inode Store information about an open file, such as how many openers, corresponds to on-disk file descriptor on-disk inode Region on disk, entry in file descriptor table, that stores persistent information about a file – who owns it, where to find its data blocks, etc. on-disk inode, when cached in buffer cache A bytewise copy of 2. in memory Q.: Should in-memory inode store a pointer to cached on-disk inode? (Answer: No.) CS 5204 Fall 2015

On-Disk Data Structures and Allocation Strategies
Filesystems On-Disk Data Structures and Allocation Strategies

Filesystem Information
Free Block Map Super Block Contains “superblock” stores information such as size of entire filesystem, etc. Location of file descriptor table & free map Free Block Map Bitmap used to find free blocks Typically cached in memory Superblock & free map often replicated in different positions on disk CS 5204 Fall 2015

File Allocation Strategies
Contiguous allocation Linked files Indexed files Multi-level indexed files CS 5204 Fall 2015

Contiguous Allocation
File A File B Idea: allocate files in contiguous blocks File Descriptor = (first block, length) Good sequential & random access Problems: hard to extend files – may require expensive compaction external fragmentation analogous to segmentation-based VM Pintos’s baseline implementation does this CS 5204 Fall 2015

Linked Files Idea: implement linked list
File A Part 1 File B Part 1 File A Part 2 File B Part 2 Idea: implement linked list either with variable sized blocks or fixed sized blocks (“clusters”) Solves fragmentation problem, but now need lots of seeks for sequential accesses and random accesses unreliable: lose first block, may lose file Solution: keep linked list in memory DOS: FAT File Allocation Table CS 5204 Fall 2015

DOS FAT FAT stored at beginning of disk & replicated for redundancy
FAT cached in memory Size: n-bit entries, m-bit blocks  2^(m+n) limit n=12, 16, 28 m=9 … 15 (0.5KB-32KB) As disk size grows, m & n must grow Growth of n means larger in-memory table 1 6 2 3 5 4 -1 7 11 8 9 10 12 Filename Length First Block “a” 2 1 “b” 4 3 “c” 12 “d” CS 5204 Fall 2015

DOS FAT Scalability Limits
FAT-12 uses 12 bit entries, max of 4096 clusters FAT-16: clusters, FAT-32 uses 28bits, so theoretical max of 2^28 (1 Gi) clusters Floppy disk, say 1.4MB; FAT-12, 1K clusters, need 1,400 entries, 2 bytes each -> 2.8KB Modern disk, say ~500 GB (~2^41 bytes) At 4 KB cluster size, would need 2^29 entries. Each entry at 4 bytes, would need 2^31 bytes, or 2GB, RAM just to hold the FAT. At 32 KB cluster size, would need only 1/8, but still 256MB RAM to hold FAT; simple operations, such as determining how much space is free on disk, require reading entire FAT CS 5204 Fall 2015

Blocksize Trade-Offs Chart above assumes all files are 2KB in size (observed median file size is about 2KB) Larger blocks: faster reads (because seeks are amortized & more bytes per transfer) More wastage (2KB file in 32KB block means 15/16th are unused) Source: Tanenbaum, Modern Operating Systems CS 5204 Fall 2015

Indexed Allocation File A Index File A Part 1 File A Part 2 File A Part 3 Single-index: specify maximum filesize, create index array, then note blocks in index Random access ok – one translation step Sequential access requires more seeks – depending on contiguous allocation Drawback: hard to grow beyond maximum CS 5204 Fall 2015

Multi-Level Indices Used in Unix & (possibly) Pintos (P4) 1 2 3 .. N
FLI SLI TLI Direct Blocks N N+1 N+I index Indirect Block index N+I+1 Double Indirect Block index2 index Triple Indirect Block index3 index2 N+I+I2 index CS 5204 Fall 2015

Logical View (Per File) offset in file
Inode Index Data Index2 1 2 3 4 5 6 7 12 13 14 20 21 27 28 34 35 Physical View (On Disk) (ignoring other files) sector numbers on disk CS 5204 Fall 2015

Logical View (Per File) offset in file
… 18 19 17 16 15 14 … 5 12 4 3 2 1 … 10 11 9 8 7 6 … -1 34 27 20 13 Inode Index Data Index2 1 2 3 4 5 6 7 12 13 14 20 21 27 28 34 35 Physical View (On Disk) (ignoring other files) sector numbers on disk CS 5204 Fall 2015

Multi-Level Indices If filesz < N * BLKSIZE, can store all information in direct block array Biased in favor of small files (ok because most files are small…) Assume index block stores I entries If filesz < (I + N) * BLKSIZE, 1 indirect block suffices Q.: What’s the maximum size before we need triple-indirect block? Q.: What’s the per-file overhead (best case, worst case?) CS 5204 Fall 2015

Extents Index-tree based scheme avoids external fragmentation, and is efficient for small files, but incurs relatively high meta-data overhead for large files Extents can improve that – store (bnum, length) pair to denote that file occupies blocks [bnum, … , bnum+length-1] But complicates offset -> sector translation Used in ext4. CS 5204 Fall 2015

Storing Inodes Unix v7, BSD 4.3 FFS (BSD 4.4)
Cylindergroups have superblock+bitmap+inode list+file space Try to allocate file & inode in same cylinder group to improve access locality Superblock I0 I1 I2 I3 I4 ….. Rest of disk for files & directories CGi SB1 I0 I1 … Files … SB2 I3 I4 ….. Files … SB3 I8 I9 ….. Files … CS 5204 Fall 2015

Positioning Inodes Putting inodes in fixed place makes finding inodes easier Can refer to them simply by inode number After crash, there is no ambiguity as to what are inodes vs. what are regular files Disadvantage: limits the number of files per filesystem at creation time Use “df –ih” on Linux/ext3 to see how many inodes are used/free CS 5204 Fall 2015

Directories and Name Resolution
Filesystems Directories and Name Resolution

Directories Need to find file descriptor (inode), given a name
Approaches: Single directory (old PCs), Two-level approaches with 1 directory per user Now exclusively hierarchical approaches: File system forms a tree (or DAG) How to tell regular file from directory? Set a bit in the inode Data Structures Linear list of (inode, name) pairs B-Trees that map name -> inode Combinations thereof CS 5204 Fall 2015

Using Linear Lists Advantage: (relatively) simple to implement
inode # 23 multi-oom 15 sample.txt offset 0 Advantage: (relatively) simple to implement Disadvantages: Scan makes lookup (& delete!) really slow for large directories Could cause fragmentation (though not a problem in practice) CS 5204 Fall 2015

Using B+-Trees Advantages: Disadvantage:
Scalable to large number of files: in growth, in lookup time Disadvantage: Complex Overhead for small directories (some filesystems switch to B+-Tree only for large directories) Note: some filesystems use B+-Tree not only for directory files, but for block indexes as well. HFS’s ‘catalog’ – single B+-Tree that stores inodes + directories. Also done in NTFS, XFS & Reiserfs, ZFS, and Btrfs Source: Wikipedia) CS 5204 Fall 2015

Absolute Paths How to resolve a path name such as “/usr/bin/ls”?
Split into tokens using “/” separator Find inode corresponding to root directory (how? Use fixed inode # for root) (*) Look up “usr” in root directory, find inode If not last component in path, check that inode is a directory. Go to (*), looking for next comp If last component in path, check inode is of desired type, return CS 5204 Fall 2015

Name Resolution Must have a way to scan an entire directory without other processes interfering -> need a “lock” function But don’t need to hold lock on /usr when scanning /usr/bin Directories can only be removed if they’re empty Requires synchronization also Most OS cache translations in “namei” cache – maps absolute pathnames to inode Must keep namei cache consistent if files are deleted CS 5204 Fall 2015

Current Directory Relative pathnames are resolved relative to current directory Provides default context Every process has one in Unix/Pintos chdir(2) changes current directory cd tmp; ls; pwd vs (cd tmp; ls); pwd lookup algorithm the same, except starts from current dir process should keep current directory open current directory inherited from parent CS 5204 Fall 2015

Hard & Soft Links Provides aliases (different names) for a file
Hard links: (Unix: ln) Two independent directory entries have the same inode number, refer to same file Inode contains a reference count Disadvantage: alias only possible with same filesystem Soft links: (Unix: ln –s) Special type of file (noted in inode); content of file is absolute or relative pathname – stored inside inode instead of direct block list Windows: “junctions” & “shortcuts” CS 5204 Fall 2015

CS 5204 Operating Systems Disks & File Systems Godmar Back.

Similar presentations

Presentation on theme: "CS 5204 Operating Systems Disks & File Systems Godmar Back."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 5204 Operating Systems Disks & File Systems Godmar Back.

Similar presentations

Presentation on theme: "CS 5204 Operating Systems Disks & File Systems Godmar Back."— Presentation transcript:

Similar presentations

About project

Feedback