The Design and Implementation of a Log-Structured File System

The Design and Implementation of a Log-Structured File System
M. Rosenblum and J. K. Ousterhout University of California, Berkeley 1991 Abeer Hakeem The College of William & Mary – Computer Science Department

OUTLINE Introduction Existing File System Problems
Log Structured File Systems File Location and Reading Free Space Management Segment Cleaning Simulation results for Sprite LFS Crash Recovery Experience with Sprite LFS Summary

INTRODUCTION Motivation: Solution:
CPU speeds increased dramatically Disk access time did not Applications becoming disk-bound Reads handled by main memory Disk traffic dominated by writes But placement optimized for reads Solution: Focus on performance of small file writes Write new data sequentially to a log Eliminates almost all seeks Construct a prototype called Sprite LFS

EXISTING FILE SYSTEM PROBLEMS
Spread information around the disk I-nodes separate from file FFS requires 5 seeks for create new file Poor bandwidth utilization Synchronous metadata writes Metadata updates dominates traffic Bad for crash recovery Must scan entire disk

LOG- STRUCTURE FILE SYSTEM
Key Issue: Increase disk write performance by eliminating seeks. Main Idea: Write all modifications to disk sequentially in a log-like structure Convert many small random writes into single large sequential write for workloads that contain many small files. The information written to the disk includes file data blocks, attributes, index blocks, directories, and other information to manage the file system.

Challenge and Contribution: Challenges: How to retrieve information from the log. How to manage free space free space for new writes. Contributions: Log structure file system as main storage A sophisticated cleaning policy A working implementation (Sprite LFS)

The “LOG” Only structure on disk Contains I-nodes and data blocks Includes indexing information so that files can be read back from the log relatively efficiently Most reads will access data that are already in the cache (Main Assumption!)

File Location and Reading ( first challenge) Disk layout of LFS and UNIX

Main Advantages Replaces many small random writes by fewer sequential writes Faster recovery after a crash All blocks that were recently written are at the tail end of log No need to check whole file system for inconsistencies Like UNIX do

FREE SPACE MANAGEMENT SEGMENT ( Second challenge)
Goal : keeping large extents of free space to write new data There are two choices: Threading: leave the live data in place and thread the log through the free extents. Cause free space to become severely fragmented. Log- structured file system will no faster than traditional file system. Copying: copying data out of the log in order to leave large free extents for writing. Its cost particularly for long-lived files

FREE SPACE MANAGEMENT SEGMENT ( Second challenge)
LFS Solution: use combination of threading and copying. Divide disk into fixed-length segments (512kB or 1MB) Segments are always written sequentially All live data must be copied out of a segment before the segment can be rewritten. The log is threaded on a segment-by-segment basis: If the system can collect long-lived data into segments, those segments skipped over and data does not have copied repeatedly.

SEGMENT CLEANING MECHANISM
Segment Cleaning : The process of copying live data out of a segment Read a number of segments into memory Identify live data Only write live data back to smaller number of clean segments The segments that are read are marked as clean.

SEGMENT CLEANING MECHANISM
Basic mechanism Segment summary block identifies information For file block, file (I-node) number and relative block number Liveness can be determined by checking I-node Uid (I-node number, version) helps to avoid some of these checks One consequence: No free-block list or Bitmap Saves memory and disk space Simplifies crash recovery

SEGMENT CLEANING POLICIES
Four Policy Issues When should the segment cleaner execute ? When the number of clean segments drops below a threshold value. (Simple threshold) How many segments to clean? Few tens of segments at a time, more segments cleaned at ones, the more opportunities to rearrange. Which segments to clean? Greedy / Cost-benefit policy How to group live blocks be grouped while cleaning? Sort the blocks by the time they were last modified and group blocks of similar age into new segments.

The Write Cost Metric A way of comparing cleaning policies Intuition: Avg. time disk is busy for writing new data Definition Includes cleaning overhead Depends on utilization Large segments -> seek and rotational latency negligible

Low utilization = low write cost

Log-structured file systems provide a cost performance trade- off: If disk space is underutilized, high performance can be achieved but at high cost per usable byte. If disk capacity utilization is increased, storage cost reduced but so is performance. The key to achieve high performance at low cost: Force the disk into a bimodal segment distribution. Most of the segments are nearly full, a few are empty. Cleaner can almost always work with the empty segments.

SEGMENT CLEANING POLICY (I)
Greedy Policy Always cleans the least-utilized segments, and no reorganization to the data. Age Sort: Sorts the blocks by the time they were last modified Groups blocks of similar age together into new segments

SEGMENT CLEANING POLICY (I)
Consider two file access patterns Uniform: each file has equal likelihood of being selected each step. Hot-and-cold: files divided into two groups. One contains 10% of the files (hot), the other group contains 90% (cold).

SIMULATION RESULT (I) Greedy Policy: LFS Uniform: LFS hot-and-cold:
The variance in segment utilization allows a substantially lower write cost than would be predicted from the overall disk capacity utilization. LFS hot-and-cold: Same cleaning policy except that live blocks were sorted by age before writing them out. This approach lead to the bimodal distribution of segment utilization.

SIMULATION RESULT (I) Greedy Policy results:
Locality and grouping result in worse performance than the system with no locality. The reason, segments dose not get cleaned until it becomes the least utilized of all the segments. Utilization drop slowly in cold segments, so cold segments tend to tie up large numbers of free blocks for long period of time.

SEGMENT CLEANING POLICY (II)
Bimodal Cleaning Policy: Intuition: Free space in “cold” (more stable) segments more valuable Assumption: stability of segment proportional to age of youngest block (i.e. older = colder) Implementation: Cost/benefit analysis Clean cold segments with higher ratio Still group by age before rewriting

EFFECTS OF COST/BENEFIT POLICY
Cold segments cleaned at 75% utilization Hot segments cleaned at 15% utilization Implementation supported by segment usage table Number of live bytes, most recent modification time

CRASH R ECOVERY Sprite LFS uses a two-pronged approach to recovery:
Checkpoints, which define consistent state of the file system. Roll-forward, which is used to recover information written since the last checkpoint.

CRASH RECOVERY (I) Checkpoints
Position in the log at which all file system structures are consistent and complete. Sprite LFS uses two-phase process to create a checkpoint: First: Write out all modified information to the log ( data blocks, indirect blocks, I-nodes, and blocks of the I-node map and segment usage table). Second: Write a checkpoint region ( addresses of all the blocks in the I-node map and segment usage table, the current time and pointer to the last segment written) to a special fixed position on disk.

CRASH RECOVERY (I) Checkpoints In order to handle a crash:
There are two checkpoint regions and checkpoints operations alternate between them. During reboot, the system reads both checkpoint regions and uses the one with the most recent time to initialize its main memory data structure. Sprite LFS performs checkpoints at periodic intervals (thirty seconds) or when the file system is un-mounted or shut down. An alternative to periodic check-pointing is to perform checkpoints after a given amount of new data has been written to the log.

CRASH RECOVERY (II) Roll-forward
Recovering to latest checkpoint would result in loss of too many recently written data blocks Sprite LFS scans through the log segments that were written after the last checkpoint During roll-forward Sprite LFS: Uses information in segment summary blocks to recover recently-written file data. When summary block indicates presence of a new I-node, Sprite LFS updates the I-node map it read from the checkpoint. Incorporate the file’s new data blocks into recovered file system.

CRASH RECOVERY (II) Roll-forward To restore consistency:
Directory entry and I-node consistency Each I-node contains a count of the number of directory entries referring to that I-node. When count = 0, the file is deleted. To restore consistency: LFS outputs a special record in the log for each directory change. The record includes an operation code, location of the directory entry, the contents of the directory entry, and new reference count for the I-node named in the entry. This record called directory operation log. If a log entry appears but the I-node and directory block were not both written, roll-forward updates the directory and/or I- node to complete the operation.

EXPERINCE WITH SPRITE LFS
Developed for Sprite OS in 1991 ~1 year development Based on Unix FFS All of the features have been implemented in Sprite LFS except roll-forward. The production disks uses 30 second checkpoint interval and discard all the information after the last checkpoint when the reboot.

Micro-benchmarks – Small File 1KB Best-case performance – no cleaning Sprite LFS vs. SunOS (based on Unix FFS) Sprite LFS: segment size = 1MB, block size = 4 KB SunOS: block size = 8KB Sprite kept disk 17% busy while saturating CPU; SunOS saturated disk 85% - only 1.2% of potential disk bandwidth used for new data – Sprite WINS!

Micro-benchmarks – Large file 100 MB Sequential rereading requires seeks in Sprite, hence its performance is lower than SunOS Traditional FS – logical locality (assumed access pattern) Log-structured FS – temporal locality (group recent created/modified data)

CEANING OVERHEAD Collected over a 4-month period
Better performance than predicted through simulation – low write cost range Segment utilization of /user6 partition Large number of fully utilized and totally empty segments

SUMMARY Log-structured file system
Writes much larger amounts of new data to disk per disk I/O Uses most of the disk’s bandwidth Free space management done through dividing disk into fixed-size segments Lowest segment cleaning overhead achieved with cost-benefit policy

QUESTIONS

The Design and Implementation of a Log-Structured File System

Similar presentations

Presentation on theme: "The Design and Implementation of a Log-Structured File System"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Design and Implementation of a Log-Structured File System

Similar presentations

Presentation on theme: "The Design and Implementation of a Log-Structured File System"— Presentation transcript:

Similar presentations

About project

Feedback