分类: LINUX
2009-02-23 17:34:24
reference form :
http://www.ibm.com/developerworks/linux/library/l-anatomy-ext4/index.html?S_TACT=105AGX03&S_CMP=EDU
Linux kernel 2.6.28 release brings the first of a stable ext4 file system, the next generation of the extended file system. There are various improvements and innovations in the new ext4 file system. The improvements cover a number of feature angles from new functionality (new features), scalability (scaling beyond current file system constraints), reliability (in the face of failures), and of course, high performance.
history of the extended file system
Minix the first supported file system for Linux, low performance
ext1 introduced into Linux 0.96c in April 1992, using VFS, 2GB fs size support
ext2 January 1993, 2TB fs size support(2.6 extends that to 32TB)
ext3 November 2001, introduced the concept of journaling
ext4 introduced in the 2.6.19, stable in the 2.6.28(December 2008)
new functionality
Forward and backward compatibility
Ext4 is forward compatible in that you can mount an ext3 file system as an ext4 file system. You can also mount an ext4 file system as ext3 (backward compatible), but only if the ext4 file system does not use extents features.
Improving timestamp resolution and range
Ext4 has essentially future-proofed timestamps by extending them into a nanosecond LSB. The time range has also be extended with two additional bits to increase the lifetime by another 500 years.
scalability
Extending file system limits
Ext4 supports file systems of up to 1 exabyte in size (1000 petabytes). Files within ext4 may be up to 16TB in size (assuming 4KB blocks), which is eight times the limit in ext3. The subdirectory limit was extended with ext4, from 32KB directories deep to virtually unlimited. That may appear extreme, but one needs to consider the hierarchy of a file system that consumes an exabyte of storage. Directory indexing was also optimized to a hashed B-tree-like structure, so although the limits are much greater, ext4 supports very fast lookup times.
Extents
Ext4 replaces ext3's mechanism with extents to improve allocation and support a more efficient storage structure. An extent is simply a way to represent a contiguous sequence of blocks. In doing this, metadata shrinks, because instead of maintaining information about where a block is stored, the extent maintains information about where a long list of contiguous blocks is stored (thus reducing the overall metadata storage).
reliability
Checksumming the file system journal
Ext4 implements checksumming of the journal to ensure that valid changes make their way to the underlying file system.
Online defragmentation
An online defragmentation tool exists to defragment both the file system and individual files for improved performance. The online defragmenter is a simple tool that copies files into a new ext4 inode that refers to contiguous extents.
The other aspect of online defragmentation is the reduced time required for a file system check (fsck). Ext4 marks unused groups of blocks within the inode table to allow the fsck process to skip them entirely to speed the check process. When the operating system decides to validate the file system because of internal corruption (which is inevitable as file systems increase in size and distribution), ext4's overall design means improved overall reliability.
performance
Ext4 provides a number of enhancements for improved performance:
File-level preallocation
Certain applications, such as databases or content streaming, rely on files to be stored in contiguous blocks (to exploit sequential block read optimization of drives as well as to maximize Read command-to-block ratios). Although extents can provide segments of contiguous blocks, another brute-force method is to preallocate very large sections of contiguous blocks in the size desired (as was implemented in the past with XFS). Ext4 implements this through a new system call that preallocates and initializes a file of a given size. You can then write the necessary data and provide bounded Read performance over the data.
Delaying block allocation
Another file size-based optimization is delayed allocation. This performance optimization delays the allocation of physical blocks on the disk until they are to be flushed to the disk. The key to this optimization is that by delaying the allocation of physical blocks until they need to be written to the disk, more blocks are present to allocate and write in contiguous blocks. This is similar to persistent preallocation except that the file system performs the task automatically. But if the size of the file is known beforehand, persistent preallocation is the best route.
Multi-block allocation
A final optimization—again, contiguous block related—is the block allocator for ext4. In ext3, the block allocator worked by allocating a single block at a time. When multiple blocks were necessary, it was possible to find contiguous data in non-contiguous blocks. Ext4 fixes this with a block allocator that allocates multiple blocks at a time, likely contiguous on disk. Like the previous optimizations, this optimization collects related data on the disk to optimize for sequential Read optimization.
The other aspect of multi-block allocation is the amount of processing required for allocating the blocks. Recall that ext3 performed allocation one block at a time. In the simplest units, that required a call to block allocation for each block. Allocating multiple blocks at a time requires many fewer calls to the block allocator, resulting in faster allocation and reduced processing.