Download presentation
Presentation is loading. Please wait.
1
Ext4: The Next Generation of Ext2/3
Theodore Ts'o IBM
2
Agenda New features in ext3 you might not know about
What's cool about ext3 What's not Why fork ext3->ext4? New ext4 features on deck How to get involved Despite its age, ext3 is actually growing in popularity among enterprise users/vendors. Why? This is because of its robustness, good recoverability, and expansion characteristics. Ext2/3's fundamental design principle is to avoid fragile data structures. This extremely conservative design principle makes ext2/3 filesystem very robust. However, Sometimes, this also leds to the ext2/3 filesystem's reputation of being a little boring, and perhaps not the fastest or the most scalable filesystem that Linux have today. Since ext3 has large number of users and community, it's important to make ext3 faster, better and also remains robust. In the last few years, the ext2/3 development community has been working hard to To bring a fast & scalable yet still robust filesystem to the large under users.. The initial outline of plans to ``modernize'' the ext2/3 filesystem was documented by TedTso and Stephen Tweedie in their Freenix Paper. Three years later, it is time to revisit those plans, see what has been accomplished, what still remains to be done, and what further extensions are now under consideration by the ext 2/3 development community.
3
New features in ext3 (you might not know about)
Directory indexing (hash tree directories) To enable: tune2fs -O dir_index /dev/XX e2fsck -fD /dev/XX Can make some workloads slower Online resizing Need to create filesystem with -O resize_inode Requires e2fsprogs 1.36 or greater, default in e2fsprogs 1.39 Various performance enhacements
4
What's cool about ext3 Very large user community
Very large developer community From a large number of companies: Red Hat, IBM, Bull, Clusterfs, Google, NEC, others Emphasis on robustness above all else Simple filesystem format “PC Class hardware sucks”
5
What's not so cool about ext3?
16TB filesystem size limitation (32-bit block numbers) Second resolutiom timestamps 32,768 limit on subdirectories Performance limitations
6
Why Ext4? No development 2.7 tree
... and changes take longer than the 2-3 months between 2.6 releases Large userspace community People like davem, rusty, akpm, torvalds get cranky if their source trees get trashed Many changes on-deck require format changes Allows more experimentation than if the work is done outside of mainline Make sure users understand that ext4 is risky: mount -t ext4dev Downsides bug fixes must be applied to two code bases smaller testing community
7
Features (under development)
Abillity to use > 16TB filesystems (going beyond 32-bit block numbers) Replacing indirect blocks with extents More efficient block allocation Allow greater than 32k subdirectories Nanosecond timestamps Metadata checksumming Uninitialized groups to speed up mkfs/fsck Persistent file allocation Online defragmentation
8
Extents Indirect block maps are incredibly inefficient
One extra block read (and seek) every 1024 blocks Really obvious when deleting big CD/DVD image files
9
Ext2/3 Indirect Block Map
disk blocks ... 200 201 213 1239 65533 i_data 1 ... 11 12 13 14 200 201 ... 211 212 1237 65530 213 ... 1236 1239 ... 1238 ... Currently, the ext2/ext3 filesystem, like other traditional filesystems, uses a direct, indirect, double indirect, and triple indirect blocks to map file offsets to on-disk blocks. This scheme, sometimes simply called an indirect block mapping scheme, is not efficient for large files, especially large file deletion. In order to address this problem, many modern filesystems (including XFS and JFS on Linux) use some form of extent maps instead of the traditional indirect block mapping scheme. 65531 ... 65532 ... 65533 ... direct block indirect block double indirect block triple indirect block
10
Extents logical length physical 1000 200
Extents is an efficient way to represent large file An extent is a single descriptor for a range of contiguous blocks logical length physical 1000 200 extent maps are a more efficient way to represent the mapping between logical and physical blocks for large files. An {\em extent} is a single descriptor for a range of contiguous blocks, instead of using, say, hundreds of entries to describe each block individually, for example, using such a structure, it becomes possible to efficiently encode the information, ``Logical block 0(and following blocks) can be found starting at physical block 200.''
11
Extent Map disk blocks i_data header 1001 2000 6000 ... 1000 200 200
201 ... 1199 6000 6001 6199 header 1000 200 1001 2000 6000 This graph shows how the logical to physical block mapping looks like with extent. Most files in a typical Linux system will only need a few extents to describe all of their logical to physical block mapping, and so most of the time, these extent maps could be stored in the inode's direct blocks ...
12
Extent Tree leaf node disk blocks i_data index node ... header root
i_data index node ... header root ... ... The extent tree information can be stored in the inode's \ident{i_data}. An inode flag indicates whether the inode's {i_data} array should be interpreted using the traditional indirect block mapping scheme, or as an extent data structure. If the entire extent information can be stored in the i_data field, then it will be treated as a single leaf node of the extent tree; otherwise, it will be treated as the root node of inode's extent tree, and additional filesystem blocks serve as intermediate or leaf nodes in the extent tree. extents extents index ... node header
13
Extent Related Works Multiple block allocation
Allocate contiguous blocks together Reduce fragmentation, reduce extent meta-data Stripe aligned allocations Delayed allocation Defer block allocation to writeback time Improve chances allocating contiguous blocks, reducing fragmentation Trickier to implement in ordered mode With extents work in place, it make sense to have a multiple block allocator, instead of current single block allocator, which is quite inefficient especially with extent. In addition, to increase the chance that individual single block allocation requests being clustered, delayed allocation for extents is also being worked on.
15
64-bit block numbers Most of the hard work done as part of the extents changes Original lustre patches provide 48-bit block numbers. Enough for a 2**60 filesystems Other changes Superblock fields Block group descriptors (required doubling their size) Journal inode (new jbd2 layer)
16
Expanded inode Inode size is normally 128 bytes in ext3
But can be 256, 512, 1024, etc. up to filesystem blocksize Extra space used for fast extended attributes 256 bytes needed for ext4 features Nanosecond timestamps Inode change version # for Lustre, nfsv4
17
Persistent file allocation
Allow applications to preallocate blocks for a file without having to initialize them Contiguous allocation to reduce fragmentation Irrespective of order that blocks are written While avoidng overhead of zeroing blocks Guaranteed space allocation Useful for Streaming audio/video, databases Implemented as unitialized extents MSB of ee_len used to flag “invalid” extents Reads return zero Writes split the extent into valid and invalid extents
18
Metadata checksuming Proof of concept implementation described in the Iron Filesystem paper (from University of Wisconsin) Storage trends: reliability and seek times not keeping up with capacity increases Add checksums to extents, superblock, block group descriptors, inodes, journal
19
Getting involved Mailing list: linux-ext4@vger.kernel.org
Wiki: Still needs work; anyone want to jump in and help, talk to us Weekly conference call; minutes on the wiki Contact us if you'd like dial in IRC channel: irc.oftc.net, /join #linuxfs
20
The Ext4 Development Team
Alex Thomas Andreas Dilger Theodore Tso Stephen Tweedie Mingming Cao Dave Kleikamp Badari Pulavarathy Avantikia Mathur Andrew Morton Laurent Vivier Alexandre Ratchov Eric Sandeen Takashi Sato
21
Conclusion Ext4 work just beginning
Extents merged, other patches on deck Should be a lot of fun! Watch this space....
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.