基于flash的文件系统及技术-wangxingchao2010-ChinaUnix博客

wangxingchao2010

首页　| 　博文目录　| 　关于我

wangxingchao201

博客访问： 343651
博文数量： 102
博客积分： 2510
博客等级：少校
技术积分： 1146
用户组：普通用户
注册时间： 2010-01-21 22:33

文章分类

全部博文（102）

Borqs同事交流（0）
虚拟化（3）
云计算Cloud（2）
驱动模块（1）
生活（4）
职场（10）
业界（8）
OMS（1）
Android（16）
文件系统（16）
ULK笔记（5）
内核时间（6）
内核调度（28）
未分配的博文（2）

文章存档

2011年（8）

2010年（94）

我的朋友

embedded

Raw flash vs. FTL devices

FTL stands for "Flash Translation Layer" and it is software which emulates a block device on top of flash hardware. At early days FTL ran on the host computer. For example, old PCMCIA flash devices were essentially raw flash devices, and the PCMCIA standard defined the media storage format for them. So the host computer had to run the FTL software driver which implemented PCMCIA FTL. However, nowadays FTL is usually firmware, and it is run by the controller which is built into the storage device. For example, if you look inside an USB flash drive, you'll find there a NAND chip (or several of them), and a micro-controller, which runs FTL firmware. Some USB flash drives are known to have quite powerful ARM processors inside. Similarly, MMC, eMMC, SD, SSD, and other FTL devices have a built-in controller which runs FTL firmware.

All FTL devices have an interface which provides block I/O access. Well, the interfaces are different and they are defined by different specifications, e.g., MMC, eMMC, SD, USB mass storage, ATA, and so on. But all of them provide block-based access to the device. By block-based access we mean that whole device is represented as an linear array of (usually 512-byte) blocks. Each block may be read or written.

Linux has an abstraction of a block device. For example hard drives are block devices. Linux has many file systems and the block I/O subsystem, which includes elevators and so on which have been created to work with block devices (historically - hard drives). So the idea is that the same software may be used with FTL devices. For example, you may use FAT file system on your MMC card, or ext3 file system or your SSD.

Although most flashes on the commodity hardware have FTL, there are systems which have bare flashes and do not use FTL. Those are mostly various handheld devices and embedded systems. Raw flash devices are very different to block devices. They have different work model, they have tighter constraints and more issues than block devices. In case of FTL devices these constraints and issues are hidden, but in case of raw flash the software has to deal with them. Please, refer table for the some more details about the difference between block devices and raw flashes.

UBIFS file system has been designed for raw flash. It doesn't work with block devices and it assumes the raw flash device model. In other words, it assumes the device has eraseblocks, which may be written to, read from, or erased. UBIFS takes care of writing all data out-of-place, doing garbage-collection and so on. UBIFS utilizes UBI, which is doing stuff like wear-leveling and bad eraseblock handling. All these things are not normally needed for block devices.

Very often people ask questions like "why would one need to use raw flash and why not just use eMMC, or something like this?". Well, there is no simple answer, and the following is what UBIFS developers think. Please, take into account the date of this writing (3 May 2009). The answer is given in form of a list of non-structured items, and the reader should structure it in a way which is appropriate for his system. And because mass storage systems mostly use NAND flash (modern FTL devices also have NAND flash arrays inside), we talk specifically about NAND flashes. Also, we'd like to emphasize that we do not give general recommendations and everything depends on system requirements.

Bare NAND chips are cheaper and simpler, which is very important for small system. However, it seems like the industry pushes FTL devices forward and the situation is not that simple and obvious anymore. Indeed, an FTL devise is more complex than a raw NAND of similar size, because FTL device has additional controller inside, and so on. But since the industry tends to produce a lot of FTL devices, and actually sell a lot of them, the price is going down.
If you need an flash storage where you are going to use FAT file system, then in most cases you should stick with an FTL device (eMMC, MMC, SD or whatever). Just make sure the FTL device is doing proper wear-leveling.
The other situation is when you are going to use your FTL device for system storage (e.g. for rootfs) and use a more robust file system like ext3. In this situation you should take into account various system requirements like tolerance to sudden power cuts. The following items are mostly related to system storage situations.
FTL devices are "black boxes". FTL algorithms are normally vendor secrets. But we know that NAND flash has issues like wear-leveling, bad blocks handling, read-disturb and so on. And it is important to get them right, especially in case of MLC NAND flash, which may have very short eraseblock life-time (e.g., only 1000 erase-cycles). But because FTL algorithms are closed, it is difficult to be sure whether a specific FTL device gets everything right or not.
If you start thinking about how FTL could be implemented, you realize that it must do things like garbage collection (sometimes referred to as "reclaim process"). And flash hardware pretty much requires most writes to be out-of-place. But how does FTL behave in case of sudden power-cuts? What if a power-cut happens while it is in the middle of doing garbage collection? Does the FTL device guarantee that the data which was on the flash media before the power cut happens will not disappear or become corrupted?
The power-cut tolerance may be tested, while it is quite difficult to test stuff like wear-leveling or read-disturb handling, because it may require too much time.
We have heard reports that some USB flash drives wear out very quickly, i.e., they start reporting I/O errors after few weeks of intensive use. This means that FTL does not do proper wear-leveling. But this does not mean that all USB flash drives are bad, but you just should be careful.
We have heard reports that MMC, eMMC, and SD cards corrupt and lose data if power is cut during writing. Even the data which were there long time before may corrupt or disappear. This means that they have bad FTL which does not do things properly. But again, this does not have to be true for all MMCs/eMMCs and SDs - there are many different vendors. Be careful, though.
In general, if you glance back into the history, many FTL devices were mostly used with FAT file system for storing stuff like photo and video. FAT file system is not reliable by definition, which suggests that FTL devices may also be not very reliable, just because historically this was not really required. Indeed, it is not a big deal to lose a couple of photos. However, it is crucial to make sure that system libraries do not corrupt because of power-cuts.
Good FTL, especially if it deals with MLC NAND (which is used in modern mass storage devices) must be a rather complex piece of software. Implementing it in firmware might be a difficult task. And running it might require a powerful controller. Obviously, we may suspect that vendors go for various kind of tricks or compromises to keep their devices "good enough" and cheap. For example, it is known that some vendors optimize their FTL devices for FAT, and if you start using ext3 on top of it, you might face some unexpected problems or the device may become not as good as you would imagine. And with closed FTL it is often difficult to verify this.
SSD drives are probably very different to eMMC, MMC/SD etc. We have not worked with SSD drives. They are expensive and they probably have powerful CPUs inside, which run complex firmware which is probably getting things righ.
FTL devices are becoming more popular and better, although it is not easy to distinguish between good and bad FTL devices (of course vendors would assure you their device is perfect). Generally, there is nothing wrong in using an FTL device as long as you trust it, or have tested it, or it simply fit your system requirements.
In case of raw flash we know exactly what we are doing. UBI/UBIFS handles all aspects of NAND flash like bad erase-blocks and wear-leveling. It guarantees power-cut tolerance. It is open and available, so you may always validate, test, and fix it. There is not lie about what it can and what it cannot. In opposite, with FTL devices you do not have much visibility to what is going on inside, vendors may lie about how good their FTL device is. If you find a bug in the firmware, vendors do not usually provide you a fast and easy way to update it, and so on.
Theoretically, UBIFS may do better job, because it knows much more information about the files than FTL. For example, UBIFS knows about deleted files, while FTL does not, so FTL may do unneeded work trying to preserve the sectors belonging to deleted files. However, some FTL devices support "discard" requests and may benefit from the file system hints about unused sectors. Nevertheless, in general, UBIFS should do better job on a bare NAND, than a traditional FS on an FTL device with a similar NAND chip. On the other hand, FTL devices may include multiple NAND chips, highly parallelise things and provide fast I/O. Probably SSD is a good example.
Obviously, the advantage of FTL devices is that you use old and trusted software on top of them. But be careful, sometimes this may be not 100% true. UBIFS authors once tested a good brand eMMC with respect to the power cut tolerance. Some severe problems were found. But also, it was found that ext3 was not really usable with that eMMC either. What happened was that power cuts sometimes left some eMMC sectors not readable - the read operation returned ECC errors. But for ext3 read errors are fatal - it is not designed to handle them. The ckfs.ext3 tool also refused to repair a file system which had unreadable sectors.

So it is indeed difficult to give an answer. Just think about cons and pros, take into account your system requirements and decide. Nonetheless, raw flashes are used, mostly in the embedded world, and this is why UBIFS has been developed.

Ubifs已经加进kernel了它的官网真不错，，文档写得很仔细。。。希望不是棒子写的，

*** The Linux MTD, JFFS HOWTO ***

三种技术方向:

1. raw-flash + mtd:上面跑yaffs/jffs ,其biduan主要是mtd.

2. (raw-flash-ftl): mmc/emmc/ssd等技术，vendor有异，水平不齐，黑盒子，基于flash的技术用得不如软件好。

3. raw-flash + mtd + ftl 层 + ext3(?)，软件实现。..wr

Unfortunately it is a rather difficult task to create a good FTL layer and nobody still managed to implement one for Linux.

4. mtd + ubi + ubifs: ubifs的实现方式。号称很完美...

主要是，flash与disk的许多不同，优点与缺点的平衡，要在文件系统和中间层里体现出来。。。wear-leving/erase-block...

What are the differences between flash devices and block drives?

The following table describes the differences between block devices and raw flashes. Note, SSD, MMC, eMMC, RS-MMC, SD, mini-SD, micro-SD, USB flash drive, CompactFlash, MemoryStick, MemoryStick Micro, and other FTL devices are block devices, not raw flash devices. Of course, hard drives are also block devices.

Block drives	MTD device
Consists of sectors	Consists of eraseblocks
Sectors are small (512, 1024 bytes)	Eraseblocks are larger (typically 128KiB)
Maintains 2 main operations: read sector andwrite sector	Maintains 3 main operations: read from eraseblock, write to eraseblock, anderase eraseblock
Bad sectors are re-mapped and hidden by hardware (at least in modern LBA hard drives); in case of FTL devices it is the resposibility of FTL to provide this	Bad eraseblocks are not hidden and should be dealt with in software
Sectors are devoid of the wear-out property (in FTL devices it is the resposibility of FTL to provide this)	Eraseblocks wear-out and become bad and unusable after about 10³ (for MLC NAND) - 10⁵ (NOR, SLC NAND) erase cycles

So as one sees flashes (MTD devices) are somewhat more difficult to work with.

Can I mount ext2 over an MTD device?

Ext2, ext3, XFS, JFS, FAT and other "conventional" file systems work with block devices. They are designed this way. Flashes are not block devices, they are very different beasts. Please, read, and FAQ entries.

Please, do not be confused by USB stick, MMC, SD, CompactFlash and other popular removable devices. Although they are also called "flash", they are not MTD devices. They are out of MTD subsystem's scope. Please, read FAQ entry.

In order to use one of conventional file systems over an MTD device, you need a software layer which emulates a block device over the MTD device. These layers are often called Flash Translation Layers (FTLs).

There is an extremely simple FTL layer in Linux MTD subsystem - mtdblock. It emulates block devices over MTD devices. There is also an mtdblock_ro module which emulates read-only block devices. When you load this module, it creates a block device for each MTD device in the system. The block devices are then accessible via /dev/mtdblockX device nodes.

But in many cases using mtdblock is a very bad idea because what it basically does if you change any sector of you mtdblockX device, it reads the whole corresponding eraseblock into the memory, erases the eraseblock, changes the sector in RAM, and writes the whole eraseblock back. This is very straightforward. If you have a power failure when the eraseblock is being erased, you lose all the block device sectors in it. The flash will likely decay soon because you will wear few eraseblocks out - most probably those ones which contain FAT/bitmap/inode table/etc.

Unfortunately it is a rather difficult task to create a good FTL layer and nobody still managed to implement one for Linux. But now when we have UBI (see ) it is much easier to do it on top of UBI.

It makes sense to use mtdblock_ro for read-only file systems or read-only mounts. For example, one may use SquashFS as it compresses data quite well. But think twice before using mtdblockin read-write more. And don't try to use it on NAND flash as it is does not handle bad eraseblocks.

ext4加入商用了

I have just noticed that Red Hat added Ext4 support to RHEL-5 in kernel 2.6.18-110.el5. They also added a new package named e4fsprogs (a break from the e2fsprogs name that has been used for so long). Hopefully they will use a single package for utilities for Ext2/3/4 filesystems in RHEL-6 and not continue this package split. Using commands such as e4fsck andtune4fs is a minor inconvenience.

Converting a RHEL 5 or CentOS 5 system to Ext4 merely requires running the command “tune4fs -O flex_bg,uninit_bg /dev/WHATEVER” to enable Ext4 on the devices, editing /etc/fstab to change the filesystem type to ext4, running a command such as “mkinitrd -f /boot/initrd-2.6.18-164.9.1.el5xen.img 2.6.18-164.9.1.el5xen” to generate a new initrd with Ext4 support (which must be done after editing /etc/fstab), and then rebooting.

When the system is booted it will run fsck on the filesystems automatically – but not display progress reports which is rather disconcerting. The system will display “/ contains a file system with errors, check forced.” and apparently hang for a large amount of time. This is however slightly better than the situation on Debian/Unstable where – which would be unpleasant if you don’t have convenient console access. Hopefully this will be fixed before Squeeze is released.

I now have a couple of my CentOS 5 DomUs running with Ext4, it seems to work well.

jffs2是文件系统，mtd/ftl是中间层，底层的硬件如raw-flash/ftl-flash/block-disk,这几种，最优组合是一定的，但可以交替。。。

比如说 jffs2运行在hard-disk上，中间要加一层block2mtd的层

There are four layers of software

JFFS2: filesystem driver
MTD: Memory Technology Devices driver
NAND: generic NAND driver
Hardware specific driver

the MTD driver just provides a mount point for JFFS2. The generic NAND driver provides all functions, which are neccecary to identify, read, write and erase NAND Flash. The hardware dependend functions are provided by the hardware driver. They provide mainly the hardware access informations and functions for the generic NAND driver. For YAFFS applies the same.

后面可以着重看下ubi/ubifs/logfs的实现

http://www.ibm.com/developerworks/linux/library/l-flash-filesystems/

In addition to and as a result of the constraints explored in the previous section, managing flash devices presents several challenges. The three most important are garbage collection, managing bad blocks, and wear leveling.

Garbage collection is the process of reclaiming invalid blocks (those that contain some amount of invalid data). Reclamation involves moving the valid data to a new block, and then erasing the invalid block to make it available. This process is commonly done in the background or as needed, if the file system is low on available space.

Over time, flash devices can develop bad blocks through use and can even ship from the manufacturer with blocks that are bad and cannot be used. You can detect the presence of back blocks from a failed flash operation (such as an Erase) or an invalid Write operation (discovered through an invalid Error Correction Code, or ECC).

After bad blocks have been identified, they are marked within the flash itself in a bad block table. How this is done is device-dependent but can be implemented with a separate set of reserved blocks managed separately from normal data blocks. The process of handling bad blocks—whether they ship with the device or appear over time—is called bad block management. In some cases, this functionality is implemented in hardware by an internal microcontroller and is therefore transparent to the upper-level file system.

Recall that flash devices are consumable parts: You can perform a finite number of Erase cycles on each block before the block becomes bad (and must therefore be tagged by bad block management). To maximize the life of the flash, wear-leveling algorithms are provided. Wear leveling comes in two varieties: dynamic wear leveling and static wear leveling.

Dynamic wear leveling addresses the problem of a limited number of Erase cycles for a given block. Rather than randomly using blocks as they are available, dynamic wear-leveling algorithms attempt to evenly distribute the use of blocks so that each gets uniform use. Static wear-leveling algorithms address an even more interesting problem. In addition to a maximum number of Erase cycles, certain flash devices suffer from a maximum number of Read cycles between Erase cycles. This means that if data sits for too long in a block and is read too many times, the data can dissipate and result in data loss. Static wear-leveling algorithms address this by periodically moving stale data to new blocks.

Next is the Flash Translation Layer (FTL), which provides for overall management of the flash device, including allocation of blocks from the underlying flash device as well as address translation, dynamic wear leveling, and garbage collection. In some flash devices, a portion of the FTL can be implemented in hardware.

Like most of open source, software continues to evolve, and new flash file systems are under development. An interesting alternative still in development is LogFS, which includes some very novel ideas. For example, LogFS maintains a tree structure on the flash device itself so that the mount times are similar to traditional file systems, such as ext2. It also uses a wandering tree for garbage collection (a form of B+tree). What makes LogFS particularly interesting, however, is that it is very scalable and can support large flash parts.

With the growing popularity of flash file systems, you'll see a considerable amount of research being applied toward them. LogFS is one example, but other options, such as UbiFS, are also growing. Flash file systems are interesting architecturally and will continue to be a source of innovation in the future.

这么个递增关系

jffs1: 循环的，先写前面的，gabage回收时，reclaim已经用过的，重新放到队尾，一种朴素的数学理论，来达到wear-levving 技术。

jffs2:以block为单位，划分三个list:free/clean/dirty,单独的申请/回收模块。回收是从99% dirty 中抽取放入free-list中。

yaffs2:在ram中保存树的结构，更快的mount时间。

ubifs & logfs:正在开发中的最新技术加入.

atanomy 是个很好的词: http://www.ibm.com/developerworks/views/linux/libraryview.jsp?end_no=100&lcl_sort_order=desc&type_by=Articles&sort_order=desc&show_all=false&sort_by=Relevance&search_by=anatomy+of&topic_by=All+topics+and+related+products&search_flag=true&show_abstract=true&S_TACT=105AGX01&S_CMP=LP

GUESS:

work/life/?

?:newer

outofplace/inplaceupdate:

对于flash技术的两种更新方式：

out of place: 1. 读出所在的erase-block to Ram A 2.update some areas in Ram A 3.write Ram A to NEW erase-block.

In place update:1.read to A. 2. update A. 3. erase A block on Flash. 4. write A to flash.

NOR and NAND flash differ in two important ways:
the connections of the individual memory cells are different
the interface provided for reading and writing the memory is different (NOR allows random-access for reading, NAND allows only page access)

flash文件系统的理念:

The basic concept behind flash file systems is: When the flash store is to be updated, the file system will write a new copy of the changed data over to a fresh block, remap the file pointers, then erase the old block later when it has time.

In practice, flash file systems are only used for "Memory Technology Devices" ("MTD"), which are embedded flash memories that do not have a controller. Removable flash memory cards and USB flash drives have built-in controllers to perform wear-levelling and error correction so use of a specific flash file system does not add any benefit. These removable flash memory devices use the FAT file system to allow universal compatibility with computers, cameras, PDAs and other portable devices with memory card slots or ports.

重要的是优化overwritten-bit1110--->1100是没必要整个更新的

阅读(2008) | 评论(0) | 转发(0) |

上一篇：使用完全公平调度程序（CFS）进行多任务处理

下一篇：Journal Ext3 Notes

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6