Presentation is loading. Please wait.

Presentation is loading. Please wait.

Andy Wang COP 5611 Advanced Operating Systems

Similar presentations


Presentation on theme: "Andy Wang COP 5611 Advanced Operating Systems"— Presentation transcript:

1 Andy Wang COP 5611 Advanced Operating Systems
FFS, LFS, and RAID Andy Wang COP 5611 Advanced Operating Systems

2 UNIX Fast File System Designed to improve performance of UNIX file I/O
Two major areas of performance improvement Bigger block sizes Better on-disk layout for files

3 Block Size Improvement
Quadrupling of block size quadrupled amount of data gotten per disk fetch But could lead to fragmentation problems So fragments introduced Small files stored in fragments Fragments addressable (but not independently fetchable)

4 Disk Layout Improvements
Aimed toward avoiding disk seeks Bad if finding related files takes many seeks Very bad if find all the blocks of a single file requires seeks Spatial locality: keep related things close together on disk

5 Cylinder Groups A cylinder group: a set of consecutive disk cylinders in the FFS Files in the same directory stored in the same cylinder group Within a cylinder group, tries to keep things contiguous But must not let a cylinder group fill up

6 Locations for New Directories
Put new directory in relatively empty cylinder group What is “empty”? Many free i_nodes Few directories already there

7 The Importance of Free Space
FFS must not run too close to capacity No room for new files Layout policies ineffective when too few free blocks Typically, FFS needs 10% of the total blocks free to perform well

8 Performance of FFS 4 to 15 times the bandwidth of old UNIX file system
Depending on size of disk blocks Performance on original file system Limited by CPU speed Due to memory-to-memory buffer copies

9 FFS Not the Ultimate Solution
Based on technology of the early 80s And file usage patterns of those times In modern systems, FFS achieves only ~5% of raw disk bandwidth

10 The Log-Structured File System
Good, large buffer caches can catch almost all reads But most writes have to go to disk So file system performance can be limited by writes So, produce a FS that writes quickly Like an append-only log

11 Basic LFS Architecture
Buffer writes, send them sequentially to disk Data blocks Attributes Directories And almost everything else Converts small sync writes to large async writes

12 A Simple Log Disk Structure
File A Block 7 File Z Block 1 File M Block 202 File A Block 3 File F Block 1 File A Block 7 File L Block 26 File L Block 25 Head of Log

13 Key Issues in Log-Based Architecture
1. Retrieving information from the log No matter how well you cache, sooner or later you have to read 2. Managing free space on the disk You need contiguous space to write - in the long run, how do you get more?

14 Finding Data in the Log Give me block 25 of file L Or,
7 File Z Block 1 File M Block 202 File A Block 3 File F Block 1 File A Block 7 File L Block 26 File L Block 25 Give me block 25 of file L Or, Give me block 1 of file F

15 Retrieving Information From the Log
Must avoid sequential scans of disk to read files Solution - store index structures in log Index is essentially the most recent version of the i_node

16 Finding Data in the Log How do you find all blocks of file Foo? Foo
(old) How do you find all blocks of file Foo?

17 Finding Data in the Log with an I_node
Foo Block 1 Foo Block2 Foo Block3 Foo Block1 (old)

18 How Do You Find a File’s I_node?
You could search sequentially LFS optimizes by writing i_node maps to the log The i_node map points to the most recent version of each i_node A file system’s i_nodes cover multiple blocks of i_node map

19 How Do You Find the Inode?
The Inode Map

20 How Do You Find Inode Maps?
Use a fixed region on the disk that always points to the most recent i_node map blocks But cache i_node maps in main memory Small enough that few disk accesses required to find i_node maps

21 Finding I_node Maps New i_node maps An old i_node map

22 Reclaiming Space in the Log
Eventually, the log reaches the end of the disk partition So LFS must reuse disk space like superseded data blocks Space can be reclaimed in background or when needed Goal is to maintain large free extents on disk

23 Example of Need for Reuse
Head of log New data to be logged

24 Major Alternatives for Reusing Log
Threading + Fast - Fragmentation - Slower reads Head of log New data to be logged

25 Major Alternatives for Reusing Log
Copying +Simple +Avoids fragmentation -Expensive New data to be logged

26 LFS Space Reclamation Strategy
Combination of copying and threading Copy to free large fixed-size segments Thread free segments together Try to collect long-lived data permanently into segments

27 A Threaded, Segmented Log
Head of log

28 Cleaning a Segment 1. Read several segments into memory
2. Identify the live blocks 3. Write live data back (hopefully) into a smaller number of segments

29 Identifying Live Blocks
Clearly not feasible to track down live blocks of all files Instead, each segment maintains a segment summary block Identifying what is in each block Crosscheck blocks with owning i_node’s block pointers Written at end of log write, for low overhead

30 Segment Cleaning Policies
What are some important questions? When do you clean segments? How many segments to clean? Which segments to clean? How to group blocks in their new segments?

31 When to Clean Periodically Continuously During off-hours
When disk is nearly full On-demand LFS uses a threshold system

32 How Many Segments to Clean
The more cleaned at once, the better the reorganization of the disk But the higher the cost of cleaning LFS cleans a few tens at a time Till disk drops below threshold value Empirically, LFS not very sensitive to this factor

33 Which Segments to Clean?
Cleaning segments with lots of dead data gives great benefit Some segments are hot, some segments are cold But “cold” free space is more valuable than “hot” free space Since cold blocks tend to stay cold

34 Cost-Benefit Analysis
u = utilization A = age Benefit to cost = u*A/(u + 1) Clean cold segments with some space, hot segments with a lot of space

35 What to Put Where? Given a set of live blocks and some cleaned segments, which goes where? Order blocks by age Write them to segments oldest first Goal is very cold, highly utilized segments

36 Goal of LFS Cleaning 100% full empty number of segments 100% full

37 Performance of LFS On modified Andrew benchmark, 20% faster than FFS
LFS can create and delete 8 times as many files per second as FFS LFS can read 1 1/2 times as many small files LFS slower than FFS at sequential reads of randomly written files

38 Logical Locality vs. Temporal Locality
Logical locality (spatial locality): Normal file systems keep a file’s data blocks close together Temporal locality: LFS keeps data written at the same time close together When temporal locality = logical locality Systems perform the same

39 Major Innovations of LFS
Abstraction: everything is a log Temporal locality Use of caching to shape disk access patterns Cache most reads Optimized writes Separating full and empty segments

40 Where Did LFS Look For Performance Improvements?
Minimized disk access Only write when segments filled up Increased size of data transfers Write whole segments at a time Improving locality Assuming temporal locality, a file’s blocks are all adjacent on disk And temporally related files are nearby

41 Parallel Disk Access and RAID
One disk can only deliver data at its maximum rate So to get more data faster, get it from multiple disks simultaneously Saving on rotational latency and seek time

42 Utilizing Disk Access Parallelism
Some parallelism available just from having several disks But not much Instead of satisfying each access from one disk, use multiple disks for each access Store part of each data block on several disks

43 Disk Parallelism Example
open(foo) read(bar) write(zoo) File System

44 Data Striping Transparently distributing data over multiple disks
Benefits – Increases disk parallelism Faster response for big requests Major parameters are number of disks and size of data interleaf

45 Fine-Grained Vs. Coarse-Grained Data Interleaving
Fine grain data interleaving High data rate for all requests But only one request per disk array Lots of time spent positioning Coarse-grain data interleaving Large requests access many disks Many small requests handled at once Small I/O requests access few disks

46 Reliability of Disk Arrays
Without disk arrays, failure of one disk among N loses 1/Nth of the data With disk arrays (fine grained across all N disks), failure of one disk loses all data N disks 1/Nth as reliable as one disk

47 Adding Reliability to Disk Arrays
Buy more reliable disks Build redundancy into the disk array Multiple levels of disk array redundancy possible Most organizations can prevent any data loss from single disk failure

48 Basic Reliability Mechanisms
Duplicate data Parity for error detection Error Correcting Code for detection and correction

49 Parity Methods Can use parity to detect multiple errors
But typically used to detect single error If hardware errors are self-identifying, parity can also correct errors When data is written, parity must be written, too

50 Error-Correcting Code
Based on Hamming codes, mostly Not only detect error, but identify which bit is wrong

51 RAID Architectures Redundant Arrays of Independent Disks
Basic architectures for organizing disks into arrays Assuming independent control of each disk Standard classification scheme divides architectures into levels

52 Non-Redundant Disk Arrays (RAID Level 0)
No redundancy at all So, what we just talked about Any failure causes data loss

53 Non-Redundant Disk Array Diagram (RAID Level 0)
open(foo) read(bar) write(zoo) File System

54 Mirrored Disks (RAID Level 1)
Each disk has second disk that mirrors its contents Writes go to both disks No data striping + Reliability is doubled + Read access faster - Write access slower - Expensive and inefficient

55 Mirrored Disk Diagram (RAID Level 1)
open(foo) read(bar) write(zoo) File System

56 Memory-Style ECC (RAID Level 2)
Some disks in array are used to hold ECC E.g., 4 data disks require 3 ECC disks + More efficient than mirroring + Can correct, not just detect, errors - Still fairly inefficient

57 Memory-Style ECC Diagram (RAID Level 2)
open(foo) read(bar) write(zoo) File System

58 Bit-Interleaved Parity (RAID Level 3)
Each disk stores one bit of each data block One disk in array stores parity for other disks + More efficient that Levels 1 and 2 - Parity disk doesn’t add bandwidth - Can’t correct errors

59 Bit-Interleaved RAID Diagram (Level 3)
open(foo) read(bar) write(zoo) File System

60 Block-Interleaved Parity (RAID Level 4)
Like bit-interleaved, but data is interleaved in blocks of arbitrary size Size is called striping unit Small read requests use 1 disk + More efficient data access than level 3 + Satisfies many small requests at once - Parity disk can be a bottleneck - Small writes require 4 I/Os

61 Block-Interleaved Parity Diagram (RAID Level 4)
open(foo) read(bar) write(zoo) File System

62 Block-Interleaved Distributed-Parity (RAID Level 5)
Sort of the most general level of RAID Spread the parity out over all disks No parity disk bottleneck All disks contribute read bandwidth Requires 4 I/Os for small writes

63 Block-Interleaved Distributed-Parity Diagram (RAID Level 5)
open(foo) read(bar) write(zoo) File System

64 Where Did RAID Look For Performance Improvements?
Parallel use of disks Improve overall delivered bandwidth by getting data from multiple disks Biggest problem is small write performance But we know how to deal with small writes . . .


Download ppt "Andy Wang COP 5611 Advanced Operating Systems"

Similar presentations


Ads by Google