Presentation on theme: "Rich Mazzolini Jeff Thompson Vic Lawson Kyle Wisniewski"— Presentation transcript:
1Rich Mazzolini Jeff Thompson Vic Lawson Kyle Wisniewski Ext4 – Linux FilesystemRich MazzoliniJeff ThompsonVic LawsonKyle Wisniewski
2Beginnings and Ext2 EXT: Extended File System First one created in 1992 for LinuxEXT2 created one year after EXTXIAFS also created at the time, but lost to EXT2 due to EXT2’s better longevity and flexibility.EXT2 expanded the maximum filesystem size from 2 GB to 32 TB.
3EXT3In 2001 EXT3 was created to enable journaling within the filesystem.Journaling: writing all filesystem changes to a temporary location, or journal, before writing permanently to the filesystem.Allows for better recovery.EXT2 can be converted to EXT3 without having to backup and restore file.
4EXT4 Not an entirely new filesystem, but rather a fork of EXT3. Main improvements: Journal Checksums and delayed allocation of memoryThis meant the system waits until right before it writes the file permanently to allocate memory. This allows for better decision making.EXT4 is backwards compatible with all other versions of EXT.EXT4 accepts up to 1 exbibyte (260 bytes) volume sizes
5Features of Ext4 Compatibility Existing Ext3 systems can be updated to Ext4 by running only a few commands, however only new data will be stored in the new data structuresLarger filesystems and larger file sizesFilesystems can be a maximum of 1 EiBFiles can be as large as 16TiBMore subdirectoriesExt4 allows for subdirectories
6Features of Ext4(cont.) Extents An extent tells the filesystem how many subsequent physical blocks are used by the fileThis allows to allocate the file into an extent of that size, rather than mapping each individual blockIE a 100MB file can be a single extent or a mapping of blocks at a block size of 4KB
7Features of Ext4(cont.) Multiblock allocation Ext3 would call an allocator for each blockExt4 only calls the allocator once for each fileReturning to the previous example:A 100MB file would need to call the allocator times for each individual block in Ext3In Ext4, the allocator is called only once to allocate the blocks
8Features of Ext4(cont.) Delayed Allocation Fast fsck Allocation is delayed until the file is being written to the diskIn Ext3 the allocation happened as soon as possible, even if the file was to sit in cache for some timeFast fsckIn Ext4, fsck does not check unused inodes, speeding up this command greatly
9Features of Ext4(cont.) Journal Checksumming “No Journaling” Mode Ext4 uses checksumming to make sure that the journal blocks are not failing or corrupting.The journal blocks are some of the most used on the disk which means that they are more prone to hardware failures.“No Journaling” ModeExt4 allows for the disabling of the journal to remove the little of overhead that it takes
10Features of Ext4(cont.) Online Defragmentation Inodes Allows for defragmentation while a filesystem is still in useInodesExt4 has a larger default inode size, allowing for more information about each fileExt4 will automatically reserve several inodes when a directory is created in anticipation of the directory holding filesExt4 uses nanosecond resolution timestamps over Ext3 use of second resolution timestamps
11Features of Ext4(cont.) Persistant Preallocation Barriers Ext4 and later kernel versions of Ext3 allow applications to reserve space for tasks without having to write the data immediatelyAn example would be a file download that may take hours. The space is already allocated as the data comes inBarriersOn by default.A barrier forbids any writing of data passed the barrier until all data before the barrier has been committed.
12Ext4 Design Design for ext4 (on-disk format) Ext4 Extents Ext3 default w/ large file system scalabilityOn-disk format changecorruptionMulti-block allocationdelayed allocationExt4 ExtentsSet of logically contiguous blocks w/i a fileExternal References
13Ext4 Disk Layout (Overview) array of logical blocks: reduces overhead, increases throughputEach files block stored within same groupLayout (redundant superblock and group blocks)Flexible block groups (multiple block groups tied together)Meta block groups (cluster block groups in single disk block)lazy block groups (uninitialize inode bitmap/table)Special i-nodes (defective blocks, root, boot loader,…)Block and inode Allocation (locality, delayed allocation, keep inode, directory, block group in same groupChecksums (detect/isolate errors)Bigalloc (allocate disk blocks in multiple blocks-reduce frag)Group 0 Paddingext4 Super BlockGroup DescriptorsReserved GDT BlocksData Block Bitmapinode Bitmapinode TableData Blocks1024 bytes1 blockmany blocksmany more blocks
14Ext4 Disk Layout (Block) Super blocksblock counts, inode counts, supported features, maintenance features,…)https://ext4.wiki.kernel.org/index.php/Ext4_Disk_LayoutGroup descriptorsContains full copy of block group descriptor tableBlock bitmaptracks usage of data blocks w/i group)i-node bitmaprecords which inode table entries are being used)i-node tablefile metadata (FAT) + file type info)
15Ext4 Disk Layout (directory,extended) i-block contentsfile block indexes and special purposes)Symbolic links (< 60 bytes) or extents (> 60 bytes)Direct/indirect block addressing (ext 2/3 = no efficientExtent tree (allows very large file alloc with single extent)Directory entriesLinear directories (series of data blocks with linear array of directory entires)Hash tree directories –ext3 performance improved with tree keyed off of directory entry name hashExtended attributesFile ACLs, security data, user info.,…
16Ext4 Disk Layout (mount protection,journal) Multiple mount protectionProtect against multiple hosts simultaneous accessJournal (blocks)Protects files system against corruption from system crashLayoutBlock header – 12 byte header with magic number, description of what block contains, transaction idSuper block – size of journal and start of transaction logDescriptor block – journal block tag array of final data block locationsData block – magic number data block replaced with 0’sRevocation block – data blocks that supersede older blockscommit block – indicates transaction completely writtenSuperblock [(descriptor_block data_blocks|revocation_block) [more data or revocations] commmit_block] [more transactions...]
17Ext4 Metadata Checksums OverviewProtect against corruptionStore checksums of metadata objectsPrevents broken metadata shredding file systemTL:DR – too long, didn’t readTL;DR Instructions32-bit file systems run out of space, thus format 64-bitAlgorithm:CRC32 polynomial stronger error detection than CRC16CRC stuffing – user CRC32 function + 16 bit checksumBenchmarking (see chart)https://ext4.wiki.kernel.org/index.php/Ext4_Metadata_Checksums#Stuff_Darrick_Hasn.27t_Thought_Hard_Enough_AboutMetadata check-summingBlock group detector protected by CRC16Journal checksum ensures integrity of journal entries
18Ext4 Metadata Checksums (cont) On-disk structure modificationsSuperblock i-nodes (patch required)i-node/block bitmap 64–bit: each bitmap has crc32c checksumi-node/block bitmap 32 bit: may use crc16c checksumExtent tree: enough space to store crc32c checksum at endDirectory block – most have 12 byte entry at end for checksumH-tree – crc32c checksum stored at end of hash tree blockExtended attributes: separate disk block and inode have 12 bytes for checksumMetadata not upgraded – direct, indirect and triple indirect block maps do not allow for checksumsTool updates: user should be able to:Turn on checksum with simple –o parameterConvert file systems simplyDisable metadata check-summingDisplay checksums in debug
20TODO List for Ext4Improve recovery from bad transaction checksums in the journalTest journal checksums under power failuresAdd inode checksumsImprove the ability to resize while onlineImplement SSD Trim supportImprove merging of extentsImprove defragmentation
21Compatibility with Windows It is possible to use software to allow certain operations in an Ext4 system from Windows, however there are no drivers available yet that allow all features of Ext4 to be usedExt2Fsd is a driver that will allow write operationsExtents must be turned offExt2Read will allow read operations in Windows with extents enabled
22Compatibility with OS X OS X has full compatibility with Ext4 filesystems through the use of Paragon ExtFS.This is a commercial software and must be purchased.Free solutions are extremely limitedext4fuse is a free solution but is limited to read only
23Getting Ext4Once you have upgraded to e2fsprogs 1.41 or later. Simply type:# mke2fs -t ext4 /dev/DEV or# mkfs.ext4 /dev/DEVOnce the filesystem is created, it can be mounted as follows:# mount -t ext4 /dev/DEV /wherever
24To enable the ext4 features on an existing ext3 filesystem, use the command: # tune2fs -O extents,uninit_bg,dir_index /dev/DEVWARNING: Once you run this command, the filesystem will no longer be mountable using the ext3 filesystem!After running this command, you MUST run fsck to fix up some on-disk structures that tune2fs has modified:# e2fsck -fDC0 /dev/DEV