Linux journaling filesystem-NyLZSIO-ChinaUnix博客

ubuntu,linux天堂2541

首页　| 　博文目录　| 　关于我

NyLZSIO

博客访问： 835785
博文数量： 770
博客积分： 5000
博客等级：大校
技术积分： 4950
用户组：普通用户
注册时间： 2008-10-09 17:49

文章分类

全部博文（770）

未分配的博文（770）

文章存档

2011年（1）

2008年（769）

我的朋友

最近访客

推荐博文

Linux journaling filesystem

分类：

2008-10-10 11:07:40

[QUOTE]Linux: Journaling Filesystems

The lack of a journaling filesystem was often cited as one of the major factors holding back the widespread adoption of Linux at the enterprise level. However, this objection is no longer valid, as there are now four such filesystems from which to choose.

Journaling filesystems offer several important advantages over static filesystems, such as ext2. In particular, if the system is halted without a proper shutdown, they guarantee consistency of the data and eliminate the need for a long and complex filesystem check during rebooting. The term journaling derives its name from the fact that a special file called a journal is used to keep track of the data that has been written to the hard disk.

In the case of conventional filesystems, disk checks during rebooting after a power failure or other system crash can take many minutes, or even hours for large hard disk drives with capacities of hundreds of gigabytes. Moreover, if an inconsistency in the data is found, it is sometimes necessary for human intervention in order to answer complicated questions about how to fix certain filesystem problems. Such downtime can be very costly with big systems used by large organizations.

In the case of a journaling filesystem, if power supply to the computer is suddenly interrupted, a given set of updates will have either been fully committed to the filesystem (i.e., written to the hard disk), in which case there is not a problem, and the filesystem can be used immediately, or the updates will have been marked as not yet fully committed, in which case the file system driver can read the journal and fix any inconsistencies that occurred. This is far quicker than a scan of the entire hard disk, and it guarantees that the structure of the filesystem is always self-consistent. With a journaling filesystem, a computer can usually be rebooted in just a few seconds after a system crash, and although some data might be lost, at least it will not take many minutes or hours to discover this fact.

Ext3 has been integrated into the Linux kernel since version 2.4.16 and has become the default filesystem on Red Hat and some other distributions. It is basically an extension of ext2 to which a journaling capability has been added, and it provides the same high degree of reliability because of the exhaustively field-proven nature of its underlying ext2. Also featured is the ability for ext2 partitions to be converted to ext3 and vice-versa without any need for backing up the data and repartitioning. If necessary, an ext3 partition can even be mounted by an older kernel that has no ext3 support; this is because it would be seen as just another normal ext2 partition and the journal would be ignored.

ReiserFS, developed by Hans Reiser and others, was actually the first journaling filesystem added to the Linux kernel. As was the case with ext2, it was designed from the ground up for use in Linux. However, unlike ext3, it was also designed from the ground up as a journaling filesystem rather than as an add-on to an existing filesystem, and thus it is widely considered to be the most advanced of the native Linux journaling filesystems. Features include high speed, excellent stability and the ability to pack small files into less disk space than is possible with many other filesystems.

A new version of ReiserFS, designated Reiser4, is a complete rewrite from version 3 and is said to result in major improvements in performance, including higher speeds, the ability to accommodate more CPUs, built-in encryption and ease of customization.

JFS was originally developed by IBM in the mid-1990s for its AIX Unix operating system, and it was later ported to the company's OS/2 operating system. IBM subsequently changed the licensing of the OS/2 implementation to open source, which led to its support on Linux. JFS is currently used primarily on IBM enterprise servers, and it is also a good choice for systems that multiboot Linux and OS/2.

XFS was developed in the mid-1990s by Silicon Graphics (SGI) for its 64 bit IRIX Unix servers. These servers were designed with advanced graphics processing in mind, and they feature the ability to accommodate huge files sizes. The company likewise converted XFS to open source, after which it was also adopted by Linux. Because it is a 64-bit filesystem, XFS features size limitations in the millions of terabytes (in contrast to the still generous 4TB limit of ext2).

Most Linux distributions that ship with 2.4.x and later kernels support ext2, ext3 and ReiserFS. Support for JFS has been added to the 2.4.20 and 2.5.6 kernels, and XFS was added to the 2.5.36 kernel. JFS and XFS support can be added to earlier kernels by downloading the appropriate patches from the respective websites and compiling as a module or into the kernel. Partitions can then be converted by backing up the data, creating the new filesystem and then restoring the data.[/QUOTE]

　　　　　　
--------------------next---------------------
[QUOTE]Linux File System Benchmarks

Because no single file system is the best in all situations, determining which file system is the best for your application is not always easy.  However, as you will see for yourself, picking the right file system can offer performance gains in excess of 95%.

As a starting point, here are a few questions to ask yourself:
1. Do you want a journaling file system or not? (ie: Can you afford long FSCK times?)
2. Is your application CPU limited, or I/O limited?

If your application is CPU limited, obviously you want to choose a file system that uses the least amount of CPU as possible. On the other hand, if your application is I/O limited, you want to choose a file system that runs all the tests in the shortest amount of time, regardless of the CPU usage. If your not sure, or want the "best bang for your buck", your decision is not so clear cut. Also keep in mind that some file systems have obvious strengths or weaknesses, offering much better performance in very specific applications. (ie: When working with many small files, or many large files.)[/QUOTE]

　　　　　　
--------------------next---------------------
[QUOTE]
The structure of the Linux file system

The Linux file system treats everything as a file. This includes images, text files, programs, directories, partitions and hardware device drivers.

Each filesystem contains a control block, which holds information about that filesystem. The other blocks in the filesystem are inodes, which contain information about individual files, and data blocks, which contain the information stored in the individual files.

There is a substantial difference between the way the user sees the Linux filesystem (first sense) and the way the kernel (the core of a Linux system) actually stores the files. To the user, the filesystem appears as a hierarchical arrangement of directories that contain files and other directories (i.e., subdirectories). Directories and files are identified by their names. This hierarchy starts from a single directory called root, which is represented by a "/" (forward slash).

(The meaning of root and "/" are often confusing to new users of Linux. This because each has two distinct usages. The other meaning of root is a user who has administrative privileges on the computer, in contrast to ordinary users, who have only limited privileges in order to protect system security. The other use of "/" is as a separator between directories or between a directory and a file, similar to the backward slash used in MS-DOS.)

The Filesystem Hierarchy Standard (FHS) defines the main directories and their contents in Linux and other Unix-like operating systems. All files and directories appear under the root directory, even if they are stored on different physical devices (e.g., on different disks or on different computers). A few of the directories defined by the FHS are /bin (command binaries for all users), /boot (boot loader files such as the kernel), /home (users home directories), /mnt (for mounting a CDROM or floppy disk), /root (home directory for the root user), /sbin (executables used only by the root user) and /usr (where most application programs get installed).

To the Linux kernel, however, the filesystem is flat. That is, it does not:

* have a hierarchical structure
* differentiate between directories, files or programs
* identify files by names. Instead, the kernel uses inodes to represent each file.

An inode is actually an entry in a list of inodes referred to as the inode list. Each inode contains information about a file including

* its inode number (a unique identification number)
* the owner and group associated with the file
* the file type (for example, whether it is a regular file or a directory)
* the file's permission list
* the file creation, access and modification times
* the size of the file
* the disk address (i.e., the location on the disk where the file is physically stored).

The inode numbers for the contents of a directory can be seen by using the -i option with the familiar ls (i.e., list) command in a terminal window:

ls -i

The df command is used to show information about each of the filesystems which are currently mounted on (i.e., connected to) a system, including their allocated maximum size, the amount of disk space they are using, the percentage of their disk space they are using and where they are mounted (i.e., the mountpoint). (Here filesystems is used as a variant of the first meaning, referring to the parts of the entire hierarchy of directories.)

df can be used by itself, but it is often more convenient to add the -m option to show sizes in megabytes rather than in the default kilobytes:

df -m

A column showing the type of each of these filesystems can be added to the filesystem table produced by the above command by using the --print-type option, i.e.:

df -m --print-type

[/QUOTE]

　　　　　　
--------------------next---------------------
[QUOTE]
Linux File Systems : Native

Every native Linux filesystem implements a basic set of common concepts that were derived from those originally developed for Unix. (Native means that the filesystems were either developed originally for Linux or were first developed for other operating systems and then rewritten so that they would have functions and performance on Linux comparable or superior to those of filesystems originally developed for Linux.)

Several Linux native filesystems are currently in widespread use, including ext2, ext3, ReiserFS, JFS and XFS. Additional native filesystems are in various stages of development.

These filesystems differ from the DOS/Windows filesystems in a number of ways including

*
   allowing important system folders to span multiple partitions and multiple hard drives
*
   adding additional information about files, including ownership and permissions
*
   establishing a number of standard folders for holding important components of the operating system.

Linux's first filesystem was minix, which was borrowed from the Minix OS. This filesystem because it was an efficient and relatively bug-free piece of existing software that postponed the need to design a new filesystem from scratch.

However, minix was not well suited for use on Linux hard disks for several reasons, including its maximum partition size of only 64MB, its short filenames and its single timestamp. But minix can be useful for floppy disks and RAM disks because its low overhead can sometimes allow more files to be stored than is possible with other Linux filesystems.

The Extended File System, ext, was introduced in April, 1992. With a maximum partition size of 2GB and a maximum file name size of 255 characters, it removed the two biggest minix limitations. However, there still was no support for the separate access, inode modification and data modification timestamps. Also, its use of linked lists to keep track of free blocks and inodes caused the lists to become unsorted and the filesystem to become fragmented.

The Second Extended File System (ext2) was released in January, 1993. It was a rewrite of ext which features

*
   improved algorithms that greatly improved its speed
*
   additional date stamps (such as date of last access, date of last inode modification and date of last data modification)
*
   the ability to track the state of the filesystem. Ext2 maintains a special field in the superblock that indicates the status of the filesystem as either clean or dirty. A dirty filesystem will trigger a utility to scan the filesystem for errors. Ext2 also features support for a maximum file size of 4TB (1 terabyte is 1024 gigabytes). Consequently, it has completely superseded ext, support for which has been removed from the Linux kernel.

Ext2 is the most portable of the native Linux filesystems because drivers and other tools exist that allow accessing ext2 data from a number of other operating systems. However, as useful as these tools are, most of them have limitations, such as being access utilities rather than true drivers, not working with the most recent versions of ext2, not being able to write to ext2 or posing a risk of causing filesystem corruption when writing to ext2.
[/QUOTE]

　　　　　　
--------------------next---------------------
[QUOTE]Linux: Supported non-Linux Filesystems

Unlike most other operating systems, Linux supports a large number of foreign filesystems in addition to its native filesystems. This is possible because of the virtual file system layer, which was incorporated into Linux from its infancy and makes it easy to mount other filesystems. In addition to reading, foreign filesystem support also often includes writing, copying, erasing and other operations.

Among the most commonly used PC filesystems is FAT (File Allocation Table). This is the primary filesystem for MS-DOS and Microsoft Windows 95, 98 and ME, and it is also supported by Windows NT, 2000 and XP and most other operating systems. The first variant, FAT16, was Microsoft's standard filesystem until Windows 95, and the subsequent FAT32 is the standard for Windows 98 and Windows ME. Linux supports both reading from and writing to FAT16 and FAT32, and their main use on Linux is to share files with Microsoft Windows on dual-boot systems and through floppies.

FAT filesystems can not accommodate information about files such as ownership and permissions. Also, FAT16 partitions are limited to a maximum of 2GB. Although the theoretical maximum size for FAT32 partitions is 8TB, Windows 98's scandisk (disk checking utility) only supports 128GB, and Windows 2000 does not permit the creation of FAT32 disks larger than 32GB.

NTFS is Microsoft's replacement for FAT. A descendant of HPFS (the native filesystem for IBM's OS/2 operating system), NTFS's purpose was to remove the limitations of the FAT filesystem (such as poor stability) while adding new features not found in HPFS. Of the Windows operating systems, it can only be accessed by NT, 2000 and XP. Under Linux, NTFS is currently supported only in read-only mode and only on some distributions.

HFS (Hierarchical File System) is the native filesystem used on most Macintosh computers, and it is sometimes said to be "the Macintosh equivalent of FAT." However, Linux's support for HFS is not as complete as that for many other filesystems. As most Macintoshes include FAT support, it thus might be preferable in some situations to use this filesystem instead of HFS when exchanging data with Macintosh computers.

ISO 9660, released in 1988 by an industry committee called High Sierra, is the standard filesystem for CDROMs. Almost all computers with CDROM drives can read files written in ISO 9660 regardless of their operating system. [/QUOTE]

　　　　　　
--------------------next---------------------
[QUOTE]
journaled file system
Last modified: Monday, January 29, 2001

A file system in which the hard disk maintains data integrity in the event of a system crash or if the system is otherwise halted abnormally. The journaled file system (JFS) maintains a log, or journal, of what activity has taken place in the main data areas of the disk; if a crash occurs, any lost data can be recreated because updates to the metadata in directories and bit maps have been written to a serial log. The JFS not only returns the data to the pre-crash configuration but also recovers unsaved data and stores it in the location it would have been stored in if the system had not been unexpectedly interrupted.[/QUOTE]

[QUOTE]Journaling file system
From Wikipedia, the free encyclopedia.
(Redirected from Journaling filesystem)

A journaling file system is a file system that logs changes to a journal (usually a circular log in a specially-allocated area) before actually writing them to the main file system.
[edit]

Rationale

File systems tend to be very large data structures; updating them to reflect changes to files and directories usually requires many separate write operations. This introduces a race condition, in which an interruption (like a power failure or system crash) can leave data structures in an invalid intermediate state.

For example, deleting a file on a Unix file system involves two steps:

1. removing its directory entry
2. marking the file's inode as free space in the free space map

If step 1 occurs just before a crash, there will be an orphaned inode and hence a storage leak. On the other hand, if step 2 is performed first before the crash, the not-yet-deleted inode will be marked free and possibly be overwritten by something else.

One way to recover is to do a complete walk of the file system's structures when it is next mounted to detect and correct any inconsistencies. This can be very slow for large file systems, and is likely to become slower yet, given that the ratio of storage capacity to I/O bandwidth on modern computer systems is rising.

Another way to recover is to for the file system can keep a journal of the changes it intends to make, ahead of time. Recovery then simply involves a forward replay of changes until the file system is consistent again. In this sense, the changes are said to be atomic (or indivisible) in that they will have either:

* have succeeded originally
* be replayed completely during recovery
* not be replayed at all

Log-structured file systems are those which the journal is itself the entire filesystem. As of 2005, none of the most popular general-purpose filesystems are log-structured, although log-structured file system concepts influenced the development of WAFL and Reiser4.

Databases use more rigorous versions of the same journaling techniques to ensure data integrity.
[edit]

Metadata-only journaling

Journaling can have a severe impact on performance because it requires that all data be written twice. Metadata-only journaling is a compromise between reliability and performance that stores only changes to file metadata (which is usually relatively small and hence less of a drain on performance) in the journal. This still ensures that the file system can recover quickly on next mount, but cannot guarantee complete consistency because unjournaled file data already committed to disk may out of sync with the metadata.

For example, when appending to a file on a Unix file system typically requires three steps:

1. increasing the size of the file in its inode
2. allocating space for the extension in the free space map
3. actually writing the appended data to the newly-allocated space

In a metadata-only journal, it is not clear whether step 3 has happened, because it was not logged.

To avoid this out-of-order write hazard, file data must committed to storage before writing its associated metadata to the journal. This has generally tricky to implement because it requires coordination within the operating system kernel between the file system and memory cache, which traditionally uses elevator sorting (or some similar discipline) to maximize write throughput.

Soft updates take a variation of this approach by dispensing with a journal but imposing an order on all writes to ensure that the file system never becomes inconsistent to begin with, or that the only inconsistency that ever happens is a storage leak. Recovery then simply becomes a matter of running a background walk of the file system to garbage collect orphaned metadata.
[edit]

See also

* Be File System
* WAFL file system
* Comparison of file systems
[/QUOTE]

　　　　　　
--------------------next---------------------

阅读(661) | 评论(0) | 转发(0) |

上一篇：转战-->Mambo 出鞘，谁与争锋？

下一篇：VoIP工作原理及术语解释

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6