分类: LINUX
2008-10-19 10:49:50
To support files larger than 2 GiB on 32-bit systems, e.g. x86, PowerPC and MIPS, a number of changes to kernel and C library had to be done. This is called Large File Support (LFS). The support for LFS should be complete now in Linux and this article should give a short overview of the current status.
64 bit systems like Alpha, IA64 and x86-64 don't have problems with large files but do support the new interfaces also. In this case the new interface is mainly an alias to the normal interface.
The LFS support is done by the Linux kernel and the GNU C library (aka glibc).
LFS raises the limit of maximal file size. For 32-bit systems the limit is 231 (2 GiB) but using the LFS interface on filesystems that support LFS applications can handle files as large as 263 bytes.
For 64-bit systems the file size limit is 263 bytes unless a filesystem (like NFSv2) only supports less.
The LFS interface in glibc 2.1.3 is complete - but the implementation not. The implementation in 2.1.3 contains also some bugs, e.g. ftello64 is broken. If you want to use the LFS interface, you need to use a glibc that has been compiled against headers from a kernel with LFS support in it.
Since glibc 2.1.3 was released before LFS support went into Linux 2.3.X/2.4.0-testX, some fixes had to be made to glibc to support the kernel routines. The current stable release of glibc is glibc 2.2.3 (2.2 was released in November 2000) and it does support all the features from Linux 2.4.0. Glibc 2.2.x is now used by most of the major distributions in their latest release (e.g. SuSE 7.2, Red Hat 7.1). glibc 2.2 supports the following features that glibc 2.1.3 doesn't support:
Programs compiled against glibc 2.1.3 will work on a LFS system, there's no need to recompile the programs (with the exception of the 64 bit fcntl locking). Only glibc needs to be updated to support LFS.
Note that glibc 2.0 and libc5 do not support LFS at all.
Locking via fcntl/lockf doesn't work with large files in glibc 2.1.3. The support has been added in Linux 2.4.0-test7 to the kernel and needed incompatible changes to glibc, only glibc 2.2 does handle them. This means:
Since Linux 2.4.0-test7 most of the kernel interface is included into the kernel. The open problems and restrictions are described below.
We can separate two levels of LFS compliance in the file systems:
At least the second level should be generally reachable, but is some work to audit all the weird file systems.
Some bugs in NFSv2 regarding (2) have been fixed already, but some are missing (like the O_LARGEFILE check). Other file systems probably miss it too. A complete audit of all file systems is needed (see also the 2.4 kernel TODO page at ).
The situation about the different filesystems used in Linux 2.4.0 and later can be summarized as follows:
When files > 2 GiB are created in ext2 older kernels will mount file systems only read-only (it sets a read-only compatibility flag).
Chris Mason wrote:
Disks formatted with the current 2.2 code are called our 3.5 disk format. They will not support large files under any kernel (even the 2.4 code).
But, you can mount a 3.5 disk format under the 2.4 kernel code, and use -o conv. This will turn on large file support for the old disks, but only new files will be allowed to grow past 2 GiB.
Once you mount with -o conv, you can't mount under 2.2 any more. We are testing a back port of the LFS disk format to 2.2, it should be ready soon. It has the same -o conv mount option that our 2.4 code has, so all the same rules will apply.
The Linux kernel doesn't support a 64bit rlimit system call yet, glibc supports getrlimit64 and setrlimit64 but wraps too large values to RLIMIT_INFINITY.
For using LFS in user programs, the programs have to use the LFS API. This involves recompilation and changes of programs. The API is documented in the glibc manual (the libc info pages) which can be read with e.g. "info libc".
In a nutshell for using LFS you can choose either of the following:
A complete documentation of the feature test macros like _FILE_OFFSET_BITS and _LARGEFILE_SOURCE is in the glibc manual (run e.g. "info libc 'Feature Test Macros'").
The LFS API is also documented in the LFS standard which is available at .
Be careful when using _FILE_OFFSET_BITS=64 to compile a program that calls a library or a library if any of the interfaces uses off_t. With _FILE_OFFSET_BITS=64 glibc will change the type of off_t to off64_t. You can either change the interface to always use off64_t, use a different function if _FILE_OFFSET_BITS=64 is used (like glibc does). Otherwise take care that both library and program have the same _FILE_OFFSET_BITS setting. Note that glibc is aware of the _FILE_OFFSET_BITS setting, there's no problem with it but there might be problems with other libraries.
Release 7.0 of SuSE Linux supports LFS on all supported platforms. The kernel of SuSE 7.0 is based on Linux 2.2.16.
The LFS support in the SuSE Linux kernel is the same as in the development kernel 2.4.0-test1 for the file systems which are in both kernels, glibc supports all the features of the kernel. The different filesystems are ReiserFS (so far only in SuSE, the 2.2 port doesn't support LFS) and NFSv3 (not available in SuSE 7.0). This means that you need to use ext2 as file system for LFS.
Both Linux 2.4.0-test1 and SuSE 7.0 do not support the getdents64 system call and the 64 bit locking interface. These are only implemented in Linux 2.4.0-test8 and newer.
Release 7.1 of SuSE Linux supports LFS on all supported platforms. SuSE 7.1 comes with kernels based on 2.4.0 and 2.2.18.
The 2.2.18 kernel support LFS with the ext2 file system. The 2.4.0 kernel supports LFS with the ext2 and NFSv3 filesystems and additionally with the ReiserFS filesystem if the new ReiserFS format (incompatible to the 2.2 format) is used instead of the default 2.2 format.
SuSE 7.1 comes with glibc 2.2 that supports the full LFS interface. But the 2.2.18 kernel only does not support the 64-bit filelocking and the getdents64 calls.
The kernel support for LFS is like the one in 7.1.
Since I can't verify each and every distribution, I have to trust others for the following information.
The current stable release (Debian 3.0, codename "woody") has LFS support.
The beta called Fisher was the first to have LFS support (thanks to Russ Marshall). Current Red Hat releases like Red Hat 8 have LFS support.
Tim Small
The 'ulimit' command which is built into bash 1.x (the default for Red Hat 6.2) uses the 32 bit versions of the system calls. The way that glibc currently behaves means that requests to the 32bit setrlimit, or getrlimit will translate 'unlimited' to '231 - 1' in both directions (I would argue that setting a limit to RLIM_INFINITY using the 32bit interface should end up in a call to the 64 bit setrlimit variant with the 64 bit RLIM_INFITIY).
The default PAM configuration for sshd (/etc/pam.d/sshd), includes the line:
session required /lib/security/pam_limits.so
Which fiddles about with various limits (using the 32bit versions of the calls).
If you log-in using ssh, and use bash 1.x to view the limits, you will be told that your file size is unlimited, when it is in fact set to 2097151 (1024 byte) blocks!
Workaround:
I don't have any other information yet. Feel free to send detailed information about distributions if they supports LFS.
Please send me information to fill in the missing bits.
Filesystem | File Size Limit | Filesystem Size Limit |
---|---|---|
ext2/ext3 with 1 KiB blocksize | 16448 MiB (~ 16 GiB) | 2048 GiB (= 2 TiB) |
ext2/3 with 2 KiB blocksize | 256 GiB | 8192 GiB (= 8 TiB) |
ext2/3 with 4 KiB blocksize | 2048 GiB (= 2 TiB) | 8192 GiB (= 8 TiB) |
ext2/3 with 8 KiB blocksize (Systems with 8 KiB pages like Alpha only) | 65568 GiB (~ 64 TiB) | 32768 GiB (= 32 TiB) |
ReiserFS 3.5 | 2 GiB | 16384 GiB (= 16 TiB) |
ReiserFS 3.6 (as in Linux 2.4) | 1 EiB | 16384 GiB (= 16 TiB) |
XFS | 8 EiB | 8 EiB |
JFS with 512 Bytes blocksize | 8 EiB | 512 TiB |
JFS with 4KiB blocksize | 8 EiB | 4 PiB |
NFSv2 (client side) | 2 GiB | 8 EiB |
NFSv3 (client side) | 8 EiB | 8 EiB |
Note Kernel Limitations: The table above describes limitations of the on-disk format. The following kernel limits exist:
Note in the above: 1024 Bytes = 1 KiB; 1024 KiB = 1 MiB; 1024 MiB = 1 GiB; 1024 GiB = 1 TiB; 1024 TiB = 1 PiB; 1024 PiB = 1 EiB (check http://physics.nist.gov/cuu/Units/binary.html)
An IDE disk has 64 minors, one is used for the full disk and therefore 63 partitions are possible. A SCSI disk has 16 minors and therefore only 15 partitions maximal.
Thanks to Andi Kleen, Matti Aarnio, Rogier Wolff, Chris Mason, Andreas Schwab, Lenz Grimmer, Andries Brouwer, Urban Widmark, Bruce Allen and Jana Jaeger for additions to and comments on the contents of this page.