Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.

Similar presentations


Presentation on theme: "Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security."— Presentation transcript:

1 Transactions and Reliability

2 File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security

3 File system reliability issues Heavy caching to achieve higher performance.  Machines crash all the time. How to maintain a consistent file system after the machine crashes? Disk can crash. How to keep the data even when the disk breaks?

4 Caching issue File systems have lots of metadata:  Free blocks, directories, file headers, indirect blocks Metadata is heavily cached for performance  Modification to the metadata may not be on disk right away.

5 Problem System crashes: all cached data are lost!!  OS needs to ensure that the file system does not reach an inconsistent state Example: move a file between directories  Remove a file from the old directory  Add a file to the new directory  What happens when a crash occurs in the middle?

6 UNIX File System (Ad Hoc Failure- Recovery) Metadata handling:  Uses a synchronous write-through caching policy A call to update metadata does not return until the changes are propagated to disk  Updates are ordered  When crashes occur, run fsck to repair in- progress operations

7 Some Examples of Metadata Handling Undo effects not yet visible to users  If a new file is created, but not yet added to the directory-- Delete the file Continue effects that are visible to users  If file blocks are already allocated, but not recorded in the bitmap-- Update the bitmap

8 UFS User Data Handling Uses a write-back policy  Modified blocks are written to disk at 30-second intervals-- Unless a user issues the sync system call  Data updates are not ordered  In many cases, consistent metadata is good enough

9 Current solution: the Transaction Approach (Journaling) A transaction groups operations as a unit, with the following characteristics:  Atomic: all operations in a transaction either happen or they do not (no partial operations)  Serializable: transactions appear to happen one after the other  Durable: once a transaction happens, it is recoverable and can survive crashes

10 More on Transactions A transaction is not done until it is committed Once committed, a transaction is durable If a transaction fails to complete, it must rollback as if it did not happen at all

11 Transaction Implementation (One Thread) Example: money transfer Begin transaction x = x – $100; y = y + $100; Commit

12 Transaction Implementation (One Thread) Common implementations involve the use of a log, a journal that is never erased A file system uses a write-ahead log to track all transactions  information is written to the log before written into the disk.

13 Transaction Implementation (One Thread) Once accounts of x and y are on a log, the log is committed to disk in a single write Actual changes to those accounts are done later

14 Transaction Illustrated x = 1; y = 1; x = 1; y = 1;

15 Transaction Illustrated x = 1; y = 1; x = 0; y = 2;

16 Transaction Illustrated x = 1; y = 1; x = 0; y = 2; begin transaction old x: 1 old y: 1 new x: 0 new y: 2 commit Commit the log to disk before updating the actual values on disk

17 Transaction Steps Mark the beginning of the transaction Log the changes in account x Log the changes in account y Commit Modify account x on disk Modify account y on disk

18 Scenarios of Crashes If a crash occurs after the commit  Replays the log to update accounts If a crash occurs before the commit  Rolls back and discard the transaction A crash cannot occur during the commit  Commit is built as an atomic operation  e.g. writing a single sector on disk

19 Two-Phase Locking (Multiple Threads) Logging alone not enough to prevent multiple transactions from trashing one another (not serializable) Solution: two-phase locking 1. Acquire all locks 2. Perform updates and release all locks Thread A cannot see thread B’s changes until thread B commits and releases locks

20 Transactions in File Systems Many recent file systems built use write- ahead logging  Windows NT, Solaris, ext3 (Linux), etc + Eliminates running fsck after a crash + Write-ahead logging provides reliability - All modifications need to be written twice

21 Log-Structured File System (LFS) If logging is so great, why don’t we treat everything as log entries? Log-structured file system  Everything is a log entry (file headers, directories, data blocks)  Write the log only once Use version stamps to distinguish between old and new entries

22 More on LFS New log entries are always appended to the end of the existing log  All writes are sequential  Seeks only occurs during reads Not so bad due to temporal locality and caching Problem:  Need to create more contiguous space all the time

23 RAID: dealing with disk crash RAID: redundant array of independent disks  Standard way of organizing disks and classifying the reliability of multi-disk systems  General methods: data duplication, parity, and error- correcting codes (ECC)

24 RAID 0 No redundancy Failure causes data loss

25 Non-Redundant Disk Array Diagram (RAID Level 0) open(foo)read(bar)write(zoo) File System

26 Mirrored Disks (RAID Level 1) Each disk has a second disk that mirrors its contents

27 Mirrored Disks (RAID Level 1) Writes go to both disks + Reliability is doubled + Read access faster - Write access slower - Expensive and inefficient

28 Mirrored Disk Diagram (RAID Level 1) open(foo)read(bar)write(zoo) File System

29 Memory-Style ECC (RAID Level 2) Some disks in array are used to hold ECC  Using Hamming codes as the ECC correct one bit error in a 4 bits code word requires 3 redundant bits.

30 Memory-Style ECC (RAID Level 2) + More efficient than mirroring + Can correct, not just detect, errors - Still fairly inefficient  e.g., 4 data disks require 3 ECC disks

31 Memory-Style ECC Diagram (RAID Level 2) open(foo)read(bar)write(zoo) File System

32 Bit-Interleaved Parity (RAID Level 3) One disk in the array stores parity for the other disks  Enough to correct the error when the disk controller tells which disk fails. + More efficient that Levels 1 and 2 - Parity disk doesn’t add bandwidth

33 Parity Method Disk 1: 1001 Disk 2: 0101 Disk 3: 1000 Parity: 0100 (even parity: the number of 1’s is an even number) How to recover disk 2?

34 Bit-Interleaved RAID Diagram (Level 3) open(foo)read(bar)write(zoo) File System

35 Block-Interleaved Parity (RAID Level 4) Like bit-interleaved, but data is interleaved in blocks

36 Block-Interleaved Parity (RAID Level 4) + More efficient data access than level 3 -Parity disk can be a bottleneck -Every write needs to write the parity disk. - Small writes require 4 I/Os  Read the old block  Read the old parity  Write the new block  Write the new parity

37 Block-Interleaved Parity Diagram (RAID Level 4) open(foo)read(bar)write(zoo) File System

38 Block-Interleaved Distributed-Parity (RAID Level 5) Sort of the most general level of RAID Spreads the parity out over all disks +No parity disk bottleneck

39 Block-Interleaved Distributed-Parity (RAID Level 5) +All disks contribute read bandwidth –Requires 4 I/Os for small writes

40 Block-Interleaved Distributed-Parity Diagram (RAID Level 5) open(foo)read(bar)write(zoo) File System


Download ppt "Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security."

Similar presentations


Ads by Google