ZFS: The last word in File Systems - IS IT ? Swaminathan Sundararaman Sriram Subramanian
ZFS: Zettabyte File System The last word in file systems "We've rethought everything and rearchitected it," - Jeff Bonwick, Sun distinguished engineer and chief architect of ZFS. "We've thrown away 20 years of old technology that was based on assumptions no longer true today."
Our Goal To uncover interesting policies of ZFS Focus on How ZFS automatically chooses multiple block sizes, to match workload Policy and performance analysis of ZFS during synchronous workloads
OS Methodology Semantic Block Analysis [ Prabhakaran et. al. ’05] File System Disk Application Pseudo Device Driver Block Inference Workload
Preliminary Results Naïve block allocation policy Does not work well for random workloads Dynamic merges small block writes Suffers from Read-Modify-Write for some workload Poor ZFS Intent Log blocks allocation policy Dynamically changes the block writing mechanism based on workload (under investigation)
Outline Infrastructure Block Classification Strategy Policies Block Allocation Dynamic block resizing ZFS Intent Log (ZIL) Conclusion
Infrastructure Pseudo Device Driver Implemented a Block Driver using Layered Device Interface (LDI) Ioctls to control collection of statistics Issue: Solaris did not allow us to issue ioctls to pseudo block drivers Solution: Indirection Wrote a dummy character driver and redirected the ioctl requests to our block device
Infrastructure (Contd.) Selective classification Log files for Offline block analysis Negligible performance overheads Asynchronously written to the log file
Block Classification Strategy Uber blocks 1024 byte blocks Identified by its Magic Flag Data blocks Identified by a special pattern Pattern repeated after ever 512 byte offset Individual data blocks identified by seq. increasing numbers
Block Classification Strategy ZIL blocks Identified by its Magic Flag Meta-data blocks Rest of the blocks
Sequential Write of 1GB file Block size: 4K ZFS Caches small block writes Large sequential 128k block writes
Random writes inside 4GB file Large 128k block write for every small 4k write Block size: 4K
Random Writes of 4K blocks Offset Block Size
Random Writes of 512bytes Offset Block size
Inference Block Allocation Purely based on file offsets Block size is set to128K for offsets >= 128k Block size is a multiple of 512 bytes for offsets < 128k NOT based on dynamic workload characteristics
Small Sequential Writes of 4K Write 4K blocks Sleep 10 sec Write Next block
Small Seq. Writes of 32KBytes
Unmount after every write
Dynamic Resizing of Blocks Until file sizes < 128k Appending data to small files in inefficient If data is not in memory Small append converted to Read-Modify-Write
COW in ZFS Copy-on-write design makes most disk writes sequential Multiple block sizes, automatically chosen to match workload
ZIL Block Chaining
ZIL Block Allocation
ZIL Block Allocation 33K
Conclusions Block Allocation Purely based on file offsets NOT based on dynamic workload characteristics Dynamic Resizing of Blocks Until file sizes < 128k Appending data to small files in inefficient ZFS Intent Log Internal fragmentation Bad blocks allocation policy Block chaining Mechanism
Conclusion ZFS: The last Word in file systems Might be the latest word definitely not the last word !
Questions ?