Download presentation
1
Unit 7 File Systems Reading: - Text : 6.1.2 & 6.1.3
File Systems section from any new book on Operating Systems (like Tanenbaum's in course reference books) Original slides by Patrice Belleville ; Changes by George Tsiknis
2
Unit Outline Disks characteristics Files and Directories
Rotating disks (6.1.2) Solid state disks (6.1.3) Files and Directories File System implementation and layout ISO 9660 (CD-ROMs) MS-DOS Linux Virtual File Systems Robustness and recovery Unit 7
3
Main Memory vs Disk Differences between memory and disk access
memory location are accessible individually. data on disk can only be accessed one chunk at a time block size is typically between 512b and 8Kb. naming variables are accessed using their address. data on disk is normally accessed through a file name the OS translates the name and offset into a logical block number the disk controller maps that number to the location on disk Block size issues: - if a file is small, then part of the block remains unused. - but larger blocks can be read more efficiently. Unit 7
4
A Disk Drive Spindle Arm Platters Actuator Electronics (including a
processor and memory!) SCSI connector Image courtesy of Seagate Technology Unit 7
5
Disk Structure Hard disks: platter view (from the side) Cylinder k
Surface 0 Platter 0 Surface 1 Surface 2 Platter 1 Surface 3 Surface 4 Platter 2 Surface 5 Spindle Unit 7
6
Disk Structure Hard disks: surface layout Tracks Surface Track k Gaps
Spindle Sectors Unit 7
7
Disk Structure A hard disk in action: 7 Spindle Spindle x Unit 7
There are 3 actions to take put a mark on the disk and ask what factors will affect how quickly 1 – How long it takes for the disk to rotate and bring the data under the heads 2 – How long it takes to move the head to the appropriate trrack 3 – The actual rate that the data can be read off the disk Unit 7 7
8
Disks Operation What affects the time needed to retrieve data from a hard disk? Seek Time: Time to position the arm on the right track Tavg seek ~ 9 ms Rotational Latency: Time to position head at the right sector Tavg rotation = ½ * 1/RPM * 60 secs Average Transfer Time : time to transfer a sector Tavg transfer = 1/RPM * 1/<avg # sectors per track> * 60 secs Then Taccess = Tavg seek + Tavg rotation + Tavg transfer Example: Disk with: RPM, 10ms avg seek and 500 sectors/track. Taccess = Unit 7
9
Logical Disk Blocks Modern disks present a simpler abstract view of the complex sector geometry: The set of available sectors is modeled as a sequence of b-sized logical blocks (0, 1, 2, ...) Mapping between logical blocks and actual (physical) sectors Maintained by hardware/firmware device called disk controller. Converts requests for logical blocks into (surface,track,sector) triples. Unit 7
10
Accessing Disk: Direct Memory Access (DMA)
3: Interrupt control transfer Controller -> CPU initiated by Controller 1: PIO data transfer CPU -> Controller initiated by CPU 2: DMA data transfer Controller <-> Memory initiated by Controller Disk controller transfers data to/from main memory independently of CPU Process initiated by CPU using PIO send request to controller with addresses and sizes Data transferred to memory without CPU involvement Controller signals CPU with interrupt when transfer complete Can transfer large amounts of data with one request Unit 13
11
Unit Outline Disks characteristics Files and Directories
Rotating disks (6.1.2) Solid state disks (6.1.3) Files and Directories File System implementation and layout ISO 9660 (CD-ROMs) MS-DOS Linux Virtual File Systems Robustness and recovery Unit 7
12
Solid State Disks (SSDs)
I/O bus Requests to read and write logical disk blocks Solid State Disk (SSD) Flash translation layer Flash memory Block 0 Block B-1 … … … Page 0 Page 1 Page P-1 Page 0 Page 1 Page P-1 Used in USB sticks, digital cameras, iPods, etc. Pages: 512KB to 4KB, Blocks: 32 to 128 pages Data read/written in units of pages. Page can be written only after its block has been erased A block wears out after 100,000 repeated writes. Unit 7
13
SSD Performance Characteristics
Sequential read tput 250 MB/s Sequential write tput 170 MB/s Random read tput 140 MB/s Random write tput 14 MB/s Rand read access 30 us Random write access 300 us Why are random writes so slow? Need to erase a block (takes around 1 ms) Must copy of all useful pages in the block Find a used block (new block) and erase it Write the page into the new block Copy other pages from old block to the new block Unit 7
14
SSD Tradeoffs vs Rotating Disks
Advantages No moving parts faster, less power Disadvantages Have the potential to wear out Mitigated by “wear leveling logic” in flash translation layer E.g. Intel X25 guarantees 1 petabyte (1015 bytes) of random writes before they wear out In 2010, about 100 times more expensive per byte Applications MP3 players, smart phones, laptops Beginning to appear in desktops and servers Unit 7
15
Unit Outline Disks characteristics Files and Directories
Rotating disks (6.1.2) Solid state disks (6.1.3) Files and Directories File System implementation and layout ISO 9660 (CD-ROMs) MS-DOS Linux Virtual File Systems Robustness and recovery Unit 7
16
File System Issues What issues are relevant to the design of a file system? How files are named. Where information about a file is stored. How to find a file's data, given its name. How space for new files is allocated. How to recover from hardware and software failures. Information about a file: not the file contents, but its length, owner, etc. If the file can be stored in multiple places, how do we decide where to store it? If the hard drive dies then you're in trouble. But what if it's just the disk controller? Or a power failure? Unit 7
17
Files In both Windows and Unix, a file is a sequence of bytes.
very flexible These bytes are given meaning by user programs. How do we determine the type of data in a file? Using the file name (e.g. file extension in Windows) By looking at the first few bytes (e.g. Unix) Attributes are associated with each file: These vary depending on the operating system. Unix: devices are also accessed as files by the user /dev/sda /dev/cdrom /dev/pts/0 The user doesn't need to know the details about the device, apart from whether it is accessed one character at a time, or one block at a time. Windows will refuse to execute a file with the wrong extension. Unix doesn't usually care. Unit 7
18
Common File Attributes
File size File owner and group. Location of the file's data Time of creation/last access/last update File permissions (who can read/write/execute it) Assorted flags (hidden/system/archive/lock/etc) Unit 7
19
File Names A file is accessed using its name.
Rules for names depend on the operating system MS-DOS/Windows up to Windows ME (1981) 8 ASCII characters, followed by “.” and 3 characters extension. Case insensitive (that is, MYFILE.DOC is same as myfile.doc) ISO 9660 CD-Rom (1988) Same as for MS-DOS. Design goal was to support the lowest common denominator Extensions allow file names for Windows NT to 7, and Unix/Linux Rock Ridge (name of favorite movie town of a committee member) extension: allows Unix file systems to be stored on CD-Roms. Joliet Extensions: allows Windows file systems to be stored on CD-Roms (despite Microsoft's own OS being the reason for most of the idiotic constraints on filenames) Unit 7
20
File Names (cont') Windows NT to 7 (1993) Unix/Linux
255 Unicode characters, case sensitive (can be switched off). Many Windows tools are case insensitive! Unix/Linux 255 ASCII characters (except NULL and /), case sensitive UTF-8 can be used with recent versions of Linux. Unix/Linux: there is nothing special about . in a file name. Unit 7
21
Directories A directory is just a file whose data contains a list of entries. Each entry contains information about one file or directory. Each file or directory is an entry in some directory, except for the top-level directory. Talk about really old o/s not supporting directories (RT11: create a fixed-sized file and mount it as a separate volume). Unit 7
22
Unit Outline Disks characteristics Files and Directories
Rotating disks (6.1.2) Solid state disks (6.1.3) Files and Directories File System implementation and layout ISO 9660 (CD-ROMs) MS-DOS Linux Virtual File Systems Robustness and recovery Unit 7
23
ISO9660 CD-ROM File System CD-ROMs are read-only. Consists of a sequence blocks of 2048 data bytes. The file system layout is made simpler. Files are stored using contiguous blocks. A CD-ROM contains : 16 blocks with various info set by the manufacturer a primary volume descriptor block containing the root directory Directory entry All binary entries are encoded twice: once in little-endian format, once in big-endian format. Flags indicate: hyde/show entry, is directory?, etc. CD# shoes which CD the entry is ; allows to build a system with multiple CDs. Name: max 8+3 chars Rock Ridge (name of favorite movie town of a committee member) extension: allows Unix file systems to be stored on CD-Roms. Joliet Extensions: allows Windows file systems to be stored on CD-Roms (despite Microsoft's own OS being the reason for most of the idiotic constraints on filenames) bytes – ? ? Directory Entry length Extended attributes record length Flags Name length Location Size Dt/Tm CD# Name; version Unit 7
24
Unit Outline Disks characteristics Files and Directories
Rotating disks (6.1.2) Solid state disks (6.1.3) Files and Directories File System implementation and layout ISO 9660 (CD-ROMs) MS-DOS Linux Virtual File Systems Robustness and recovery Unit 7
25
MS-DOS File System No longer used normally with computers, but in
Most digital cameras MP3 players iPods (unless reformatted differently). Directory entry (32 bytes) Atributes: bit for read only, hidden, system file, etc. Extension Attributes Time Date First cluster # File Name Unused Size Unit 7
26
MS-DOS File System (cont')
Space is managed using a File Allocation Table (FAT) Each block (called cluster) represented by a 12, 16 or 32-bit word. A word contains the number of the next block in file. In other words: each file is a linked list of blocks. Two (usually) copies of the FAT are stored on disk. A copy is always kept in memory. The two copies of the FAT are stored next to each other on disk, so not great in case of a crash. Microsoft uses “cluster” instead of “block”. Each cluster contains 1 or more sectors. Unit 7
27
MS-DOS File System (cont')
Pros: ______________________________ Cons: the FAT table takes a lot of memory space random access to large files is __________ fragmentation can occur frequently blocks of some file a are scattered all over the disk Pros: - simple to implement. Cons: - the table takes a lot of memory space (4KB blocks * 32Gb partition --> 32Mb of RAM) - random access to large files is slow. - fragmentation Unit 7
28
MS-DOS File System (cont')
Fragmentation example 1 1 25 26 27 28 29 30 31 32 33 34 35 * 38 39 40 41 42 43 2 2 3 * 4 5 * 7 8 9 10 11 * 13 13 14 14 15 15 16 16 17 17 18 18 * 19 20 20 21 21 22 22 23 23 44 * 12 24 Pros: - simple to implement. Cons: - the table takes a lot of memory space (4KB blocks * 32Gb partition --> 32Mb of RAM) - random access to large files is slow. - fragmentation 45 46 * 47 * 36 Step 5: deleting the blue file Step 6: appending 4 blocks to the gray file. Step 1: deleting the green file Step 4: creating a new 1-block file Step 3: creating a new 8-block file Initial State: 5 files (sizes are 6, 6, 12, 12, 7 blocks) Step 2: creating a new 7-block file Unit 7
29
Unit Outline Disks characteristics Files and Directories
Rotating disks (6.1.2) Solid state disks (6.1.3) Files and Directories File System implementation and layout ISO 9660 (CD-ROMs) MS-DOS Linux Virtual File Systems Robustness and recovery Unit 7
30
Linux File System ... Overall disk structure
Super blocks contain information about the file system. Each group block contains a copy of its superblock, so if one dies the information can be recovered. Information about free/occupied blocks is kept separate from the information used to locate data. Group Block 0 Group Block 1 ... Group Block n-1 Group Block n Super Block Group Attributes Block Bitmap Inode Bitmap Inode Table Data Blocks Super blocks contain information about the file system. Each block group contains a copy of the superblock, so if one dies the information can be recovered. If the FAT on an MS-DOS disk is damaged, the disk is toast. With Unix, you can recover if either the block bitmap or the Inode bitmap is damaged. Unit 7
31
Linux File System (cont')
Example of a superblock: Filesystem OS type Linux Inode count: Block count: Reserved block count: 805659 Free blocks: Free inodes: First block: Block size: 4096 Blocks per group: 32768 Inodes per group: 16384 Inode blocks per group: 512 First inode: 11 Inode size: 128 Why is access faster than for MS-DOS? Unit 7
32
Linux File Structure A file consists of An Inode
Contains the file's attributes (but not its name). Contains direct and indirect pointers to data blocks. A disk block contains multiple Inodes. Indirect blocks These contain pointers to data blocks, or to other indirect blocks. Data blocks DOS and Windows keep name and attribute together. Unix/Linus keep them apart. So we have hard links: many names can refer to the same file. Unit 7
33
Linux File Structure ... ... ... ... ... ... ... inode:
Type/Permissions Owner info File size Timestamps Data Blocks # (12) Indirect Block # 2-indirect Block # 3-indirect Block # Data Block Data Block Data Block Data Block Data Block Data Block How big can a file get before it needs to use an indirect block (assume 4K block sizes, and 32 bits/block #)? How big can a file get? NTFS: small files can be stored in the NTFS' equivalent of the inode. Ext3: 2TB max file size, 16TB max filesystem size Ext4: 16TB max file size, 1EB max filesystem size using 48 bit block #s. ... ... ... ... Indirect Block Indirect Block Indirect Block ... 2-indirect Block ... 2-indirect Block ... 3-indirect Block ... 2-indirect Block Unit 7
34
Linux Directories A directory contains entries of other directories or files. A directory entry consists of the file name, and the Inode number for the file. The directory contains no other information. The first entry of every directory is . : a reference to the directory itself. The second entry of every directory is .. : a reference to the parent directory. DOS and Windows keep name and attribute together. Unix/Linus keep them apart. So we have hard links: many names can refer to the same file. Large directories are implemented as some type of B-Tree in some file systems (NTFS, ext4). Unit 7
35
Sharing Files in Linux It is possible for several directory entries to refer to the same Inode. This is called a hard link. This is the case for . and .. Hard Links can be used to give a program several names Example: all three entries refer to the same inode Can be used to share files All files must belong to the same file system Why? %ls -ali /bin rwxr-xr-x 3 root root :48 bunzip2* rwxr-xr-x 3 root root :48 bzcat* rwxr-xr-x 3 root root :48 bzip2* Unit 7
36
Sharing Files in Linux (cont')
Unix/Linux also support symbolic (soft) links A file f whose contents is the name of another file. Example: The second file may be on a different file system. %ls -al /lib -rw-r--r-- 1 root root :02 libm so lrwxrwxrwx 1 root root :42 libm.so.6 -> libm so Unit 7
37
Reading Data To read data from a file myfile.txt
Find the directory containing myfile.txt. Read the inode for file myfile.txt. Read the data either by accessing the direct blocks. or by going through up to 3 layers of indirect blocks. Random access to large files is much faster than for the MS-DOS file system. Why is access faster than for MS-DOS? Unit 7
38
Fragmentation Unlike the MS-DOS file system, modern file systems (NTFS, etc.) try to keep files together. For linux: files are kept within a block group if possible. large files are written to large free areas, whereas small files are stored in smaller free areas. Fragmentation still happens, but much more slowly, and normally only becomes a problem if the file system is very full. Unit 7
39
Unit Outline Disks characteristics Files and Directories
Rotating disks (6.1.2) Solid state disks (6.1.3) Files and Directories File System implementation and layout ISO 9660 (CD-ROMs) MS-DOS Linux Virtual File Systems Robustness and recovery Unit 7
40
Virtual File Systems How do we handle multiple disks, devices or partitions of one disk with possibly different file systems (i.e. NTFS, FAT32, CD-ROM, etc.)? MS-DOS, Windows Each disk is assigned a letter name A:\ : floppy disk C:\ : primary hard disk Z:\ : drive on a server somewhere on the network This letter is used to decide which file system to pass the request to. Hence the user must know which file system contains the file he/she wants to access. Recent versions of Windows provide something called a “mounted drive” but they are rarely used and most people don't know about them. Unit 7
41
Virtual File Systems Unix/Linux
There is a root file system / at the top of the hierarchy. Every other file system appears as a subdirectory in that file system. Example: # ls /mnt/cdrom # mount -t iso /dev/cdrom /mnt/cdrom mount: block device /dev/sr0 is write-protected, mounting read-only Autorun.arn Autorun.exe Autorun.inf docs forms ReadMe.txt The user need not even be aware that multiple file systems are involved. >mount –t <type> <device> <dir> Unit 7
42
Virtual File Systems How this is done:
User programs make system calls to access various operations. A layer called the Virtual File System (VFS) performs the parts of the operations that are common to all file systems. The virtual file system calls low-level functions to accomplish specific tasks. Each file system must implement these low-level functions appropriately. Unit 7
43
Virtual File Systems ... ... Pictorially: User program 1
ISO 9660 F. S. Ext4 F. S. VFAT F. S. Unit 7
44
Unit Outline Disks characteristics Files and Directories
Rotating disks (6.1.2) Solid state disks (6.1.3) Files and Directories File System implementation and layout ISO 9660 (CD-ROMs) MS-DOS Linux Virtual File Systems Robustness and recovery Unit 7
45
Robustness and Recovery
File systems contain critical information. Events occur that may cause updates to fail: Operating system crash (caused by a bug). Mechanical/Electrical failures of the disk. Power failures. Consequences: The information about to be written may be lost. The file system may become inconsistent. There is a risk of losing other information Power failures: for older hard disks, the head may destroy the current track. Unit 7
46
File System Consistency Check
When the operating system shuts down: It saves a file-system-is-clean bit to disk. During the boot process The operating system checks this bit. If it's not set, then the file system may be in an inconsistent state So it needs to fix it. Unit 7
47
File System Consistency Check
Example: Linux file system check (e2fsck) Works in 5 stages Stage 1: reads the inodes and determines which inodes are in use the type of file each inode is used for whether blocks are in use or free which blocks contain directories which blocks are used by fewer or more than 1 inode. Stage 2: verifies that directory entries are valid all of the fields must have sensible values. entries for . and .. should be present. . Information taken from the source code (I wasn't able to find this out on the web). Unit 7
48
File System Consistency Check
Stage 3: checks the directory structure It must form a tree So reconnect disconnected pieces, and break any loop. Stage 4: check and correct reference counts Multiple directory entries can point to the same inode The inode keeps track of this number Why? Make sure the reference count in the Inode is correct. Stage 5: check bitmaps. compare block and inode bitmaps against on-disk bitmaps Update these if necessary. Why? To know when the Inode becomes available. Unit 7
49
File System Recovery File System recovery
Takes a long time for large file systems. Does not always restore the file system perfectly. How do databases handle this problem? They log the transactions being performed. If a transaction is interrupted, it can be undone or redone by executing the logged operations. Some file systems do the same thing. Unit 7
50
Journaling File Systems
A journaling file system has a hidden file called a journal (NTFS, Linux ext3). Each operation is broken down into atomic steps. Example: to delete a file Free each data block. Decrement the Inode's reference count (free it if it becomes 0). Remove the directory entry for the file. Before performing the operations Write the sequence of steps to the journal. Add an end-of-operation indicator to the journal. After the operation completes The steps can be deleted from the journal. Free each data block. Decrement the Inode's reference count (free it if it becomes 0). Remove the directory entry for the file. Unit 7
51
Journaling File Systems
Each step must be idempotent That is, executing the step multiple times should have the same effect as executing it only one. Why? Examples (good or bad?): Increment reference count for inode #786453 Set reference count for inode # to 2 Mark block # free When the file system isn't clean on reboot Replay every operation from the journal that has an end-of-operation indicator. This is much faster than a full check. Issues with journaling systems: What to log (metadata vs actual changes) and when. Unit 7
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.