Large File Support in Linux
To support files larger than 2 GiB on 32-bit systems, e.g. x86,
PowerPC and MIPS, a number of changes to kernel and C library had to
be done. This is called Large File Support (LFS). The support for
LFS should be complete now in Linux and this article should give a
short overview of the current status.
64 bit systems like Alpha, IA64 and x86-64 don't have problems
with large files but do support the new interfaces also. In this case
the new interface is mainly an alias to the normal interface.
The LFS support is done by the Linux kernel and the GNU C library
(aka glibc).
Limits
LFS raises the limit of maximal file size. For 32-bit systems the
limit is 231 (2 GiB) but using the LFS interface on filesystems that
support LFS applications can handle files as large as 263 bytes.
For 64-bit systems the file size limit is 263 bytes unless a
filesystem (like NFSv2) only supports less.
LFS in Glibc 2.1.3 and Glibc 2.2
The LFS interface in glibc 2.1.3 is complete - but the implementation
not. The implementation in 2.1.3 contains also some bugs,
e.g. ftello64 is broken. If you want to use the LFS interface, you
need to use a glibc that has been compiled against headers from a
kernel with LFS support in it.
Since glibc 2.1.3 was released before LFS support went into Linux
2.3.X/2.4.0-testX, some fixes had to be made to glibc to support the
kernel routines. The current stable release of glibc is glibc 2.2.3
(2.2 was released in November 2000) and it does support all the features from
Linux 2.4.0. Glibc 2.2.x is now used by most of the major distributions
in their latest release (e.g. SuSE 7.2, Red Hat 7.1).
glibc 2.2
supports the following features that glibc 2.1.3 doesn't support:
- getdents64 system call
- 64 bit file locking interface (see below for details)
Programs compiled against glibc 2.1.3 will work on a LFS system,
there's no need to recompile the programs (with the exception of the
64 bit fcntl locking). Only glibc needs to be updated to support LFS.
Note that glibc 2.0 and libc5 do not support LFS at all.
Locking on Large Files is Not Supported withfcntl/lockfin Glibc 2.1.x
Locking via fcntl/lockf doesn't work with large files in
glibc 2.1.3. The support has been added in Linux 2.4.0-test7 to the
kernel and needed incompatible changes to glibc, only glibc 2.2 does handle
them. This means:
- You can't use the flags F_GETLK64, F_SETLK64
and F_SETLKW64 with fcntl when you use glibc 2.1.x.
If your programs use them now, they fail. They also need to be
recompiled with glibc 2.2 which will support these fcntl
flags.
- lockf64 only works on files < 2 GiB with glibc 2.1.x,
it does work with glibc 2.2 and no recompilation is needed.
LFS in the Linux Kernel
Since Linux 2.4.0-test7 most of the kernel interface is included
into the kernel. The open problems and restrictions are described
below.
File Systems
We can separate two levels of LFS compliance in the file systems:
- Full support for files > 2 GiB and O_LARGEFILE
- Limited LFS support: it gives proper EINVAL/EFBIG/EOVERFLOW error messages when
you try to use O_LARGEFILE or positions > 2 GiB.
At least the second level should be generally reachable, but is
some work to audit all the weird file systems.
Some bugs in NFSv2 regarding (2) have been fixed already, but some
are missing (like the O_LARGEFILE check). Other file systems probably
miss it too. A complete audit of all file systems is needed (see also
the 2.4 kernel TODO page at ).
The situation about the different filesystems used in Linux
2.4.0 and later can be summarized as follows:
- ext2/ext3
- Full support for LFS
- NFSv2
- Cannot handle LFS due to protocol restrictions
(limited to 2 GiB - 1); limited LFS support but expect some bugs
- NFSv3
- The protocol is ok, but I'm not sure about the
Linux implementation status
- ReiserFS 3.5.x (not part of the kernel, separate patch)
- Does not support LFS
- ReiserFS 3.6.x (part of kernel 2.4.1 and newer)
- Full support for LFS if the new on disk format is used. This
format is incompatible to the format used by 3.5.x (see below for some
more details).
- coda
- Does not work with LFS (local cache issues, protocol is
ok)
- UFS
- Full support for LFS (although not complete
vs. O_LARGEFILE flag use)
- minix
- limited to 2 GiB - 1 (file size is limited to 65804 MiB
but note that filesystem size is limited to 64 MiB - but holes are allowed)
- SysV (aka SCO)
- limited to 2 GiB -1
- msdos
- limited to 2 GiB - 1
- umsdos
- based on msdos, limited to 2 GiB - 1
- smbfs
- Older protocols are limited to 4 GiB - 1. SMB extensions allow 64 bit
filesystems. Linux smbfs implementation is currently limited to 2 GiB - 1.
- NCPfs
- protocol is limited to 4 GiB - 1, Linux implementation
to 2 GiB - 1
- JFS
- Should work with LFS (for details about JFS see http://oss.software.ibm.com/developer/opensource/jfs)
- XFS
- Should work with LFS (for details about XFS see )
- other file systems
- I don't have any information yet, feel free
to send me updates.
Note for ext2
When files > 2 GiB are created in ext2 older kernels will mount
file systems only read-only (it sets a read-only compatibility
flag).
Note for ReiserFS
Chris Mason wrote:
Disks formatted with the current 2.2 code are called our 3.5 disk format.
They will not support large files under any kernel (even the 2.4 code).
But, you can mount a 3.5 disk format under the 2.4 kernel code, and
use -o conv. This will turn on large file support for the
old disks, but only new files will be allowed to grow past 2 GiB.
Once you mount with -o conv, you can't mount under 2.2
any more. We are testing a back port of the LFS disk format to 2.2,
it should be ready soon. It has the same -o conv mount
option that our
2.4 code has, so all the same rules will apply.
rlimit64 Is Not Supported
The Linux kernel doesn't support a 64bit rlimit system call
yet, glibc supports getrlimit64 and setrlimit64 but
wraps too large values to RLIMIT_INFINITY.
Using LFS
For using LFS in user programs, the programs have to use the LFS API.
This involves recompilation and changes of programs. The API is
documented in the glibc manual (the libc info pages) which can be read
with e.g. "info libc".
In a nutshell for using LFS you can choose either of the following:
- Compile your programs with "gcc -D_FILE_OFFSET_BITS=64". This forces all file access calls to use
the 64 bit variants. Several types change also, e.g. off_t
becomes off64_t. It's therefore important to always use the
correct types and to not use
e.g. int instead of off_t.
For portability with other platforms you should use
getconf LFS_CFLAGS which will return
-D_FILE_OFFSET_BITS=64 on Linux platforms but might return
something else on e.g. Solaris. For linking, you should use the link
flags that are reported via getconf LFS_LDFLAGS. On Linux
systems, you do not need special link flags.
- Define _LARGEFILE_SOURCE and
_LARGEFILE64_SOURCE. With these defines you can use the LFS
functions like open64 directly.
- Use the O_LARGEFILE flag with open to operate
on large files.
A complete documentation of the feature test macros like
_FILE_OFFSET_BITS and _LARGEFILE_SOURCE is in the
glibc manual (run e.g. "info libc 'Feature Test Macros'").
The LFS API is also documented in the LFS standard which is available
at .
LFS and Libraries other than Glibc
Be careful when using _FILE_OFFSET_BITS=64 to compile a
program that calls a library or a library if any of the interfaces
uses off_t. With _FILE_OFFSET_BITS=64 glibc will
change the type of off_t to off64_t. You can either
change the interface to always use off64_t, use a different
function if _FILE_OFFSET_BITS=64 is used (like glibc does).
Otherwise take care that both library and program have the same
_FILE_OFFSET_BITS setting. Note that glibc is aware of the
_FILE_OFFSET_BITS setting, there's no problem with it but
there might be problems with other libraries.
Distributions with LFS Support
SuSE 7.0
Release 7.0 of SuSE Linux supports LFS on all supported platforms.
The kernel of SuSE 7.0 is based on Linux 2.2.16.
The LFS support in the SuSE Linux kernel is the same as in the
development kernel 2.4.0-test1 for the file systems which are in both
kernels, glibc supports all the features of the kernel. The different
filesystems are ReiserFS (so far only in SuSE, the 2.2 port doesn't
support LFS) and NFSv3 (not available in SuSE 7.0). This means that
you need to use ext2 as file system for LFS.
Both Linux 2.4.0-test1 and SuSE 7.0 do not support the
getdents64 system call and the 64 bit locking interface.
These are only implemented in Linux 2.4.0-test8 and newer.
SuSE 7.1
Release 7.1 of SuSE Linux supports LFS on all supported platforms.
SuSE 7.1 comes with kernels based on 2.4.0 and 2.2.18.
The 2.2.18 kernel support LFS with the ext2 file system. The 2.4.0
kernel supports LFS with the ext2 and NFSv3 filesystems and
additionally with the ReiserFS filesystem if the new ReiserFS format
(incompatible to the 2.2 format) is used instead of the default 2.2
format.
SuSE 7.1 comes with glibc 2.2 that supports the full LFS interface.
But the 2.2.18 kernel only does not support the 64-bit filelocking and
the getdents64 calls.
SuSE 7.2 and newer
The kernel support for LFS is like the one in 7.1.
Other Distributions
Since I can't verify each and every distribution, I have to trust
others for the following information.
Debian
The current stable release (Debian 3.0, codename "woody") has LFS
support.
Red Hat
The beta called Fisher was the first to have LFS support (thanks
to Russ Marshall). Current Red Hat releases like Red Hat 8 have LFS support.
Tim Small send the following special
combo-gotcha for Red Hat 6.2 (and probably other older distros as
well):
The 'ulimit' command which is built into bash 1.x (the default for
Red Hat 6.2) uses the 32 bit versions of the system calls. The way
that glibc currently behaves means that requests to the 32bit
setrlimit, or getrlimit will translate 'unlimited' to '231 - 1' in
both directions (I would argue that setting a limit to RLIM_INFINITY
using the 32bit interface should end up in a call to the 64 bit
setrlimit variant with the 64 bit RLIM_INFITIY).
The default PAM configuration for sshd (/etc/pam.d/sshd), includes the line:
session required /lib/security/pam_limits.so
Which fiddles about with various limits (using the 32bit versions of
the calls).
If you log-in using ssh, and use bash 1.x to view the limits, you will
be told that your file size is unlimited, when it is in fact set to
2097151 (1024 byte) blocks!
Workaround:
- Either:
- Comment out the line in /etc/pam.d/sshd (note that limits set
in /etc/security/limits.conf will no longer be effective for ssh
logins)
- Or: Rebuild the pam package with 64 bit support
- Install the bash2 RPM
- Either:
- rename the old bash, and symlink /bin/bash2 to /bin/bash (you
may want to keep /bin/sh pointing at the old bash, if you are
worried about compatibility)
- Or: use vipw to change users over to /bin/bash2
Other...
I don't have any other information yet. Feel free to send detailed information about
distributions if they supports LFS.
Some Other Often Requested Data about Filesystems
Please send me information to fill in the missing bits.
Maximum On-Disk Sizes of the Filesystems
Filesystem |
File Size Limit |
Filesystem Size Limit |
ext2/ext3 with 1 KiB blocksize |
16448 MiB (~ 16 GiB) |
2048 GiB (= 2 TiB) |
ext2/3 with 2 KiB blocksize |
256 GiB |
8192 GiB (= 8 TiB) |
ext2/3 with 4 KiB blocksize |
2048 GiB (= 2 TiB) |
8192 GiB (= 8 TiB) |
ext2/3 with 8 KiB blocksize (Systems with 8 KiB pages like Alpha only) |
65568 GiB (~ 64 TiB) |
32768 GiB (= 32 TiB) |
ReiserFS 3.5 |
2 GiB |
16384 GiB (= 16 TiB) |
ReiserFS 3.6 (as in Linux 2.4) |
1 EiB |
16384 GiB (= 16 TiB) |
XFS |
8 EiB |
8 EiB |
JFS with 512 Bytes blocksize |
8 EiB |
512 TiB |
JFS with 4KiB blocksize |
8 EiB |
4 PiB |
NFSv2 (client side) |
2 GiB |
8 EiB |
NFSv3 (client side) |
8 EiB |
8 EiB |
Note Kernel Limitations: The table above describes
limitations of the on-disk format. The following kernel limits
exist:
- On 32-bit systems with Kernel 2.4.x: The size of a file and a
block device is limited to 2 TiB. By using LVM several block
devices can be combined enabling the handling of larger file
systems.
- 64-bit systems: The sizes of a filesytem and of a file are
limited by 263 (8 EiB). But there might be hardware
driver limits that do not allow to access such large devices.
- Kernel 2.6: For both 32-bit systems with option CONFIG_LBD set
and for 64-bit systems: The size of a file system is limited to
273 (far too much for today). On 32-bit systems
(without CONFIG_LBD set) the size of a file is limited to 2 TiB.
Note that not all filesystems and hardware drivers might handle
such large filesystems.
Note in the above:
1024 Bytes = 1 KiB;
1024 KiB = 1 MiB;
1024 MiB = 1 GiB; 1024 GiB = 1 TiB; 1024 TiB = 1 PiB; 1024 PiB = 1
EiB (check http://physics.nist.gov/cuu/Units/binary.html)
Maximum Number of Partitions
An IDE disk has 64 minors, one is used for the full disk and therefore
63 partitions are possible. A SCSI disk has 16 minors and therefore
only 15 partitions maximal.
Links
Thanks
Thanks to Andi Kleen, Matti Aarnio, Rogier Wolff, Chris Mason,
Andreas Schwab, Lenz Grimmer, Andries Brouwer, Urban Widmark, Bruce
Allen and Jana Jaeger for additions to and comments on the contents of
this page.
Last modified: Tue Feb 15 12:59:13 CET 2005
|