Chinaunix首页 | 论坛 | 博客
  • 博客访问: 3146474
  • 博文数量: 117
  • 博客积分: 10003
  • 博客等级: 上将
  • 技术积分: 5405
  • 用 户 组: 普通用户
  • 注册时间: 2007-01-23 09:34
文章分类

全部博文(117)

文章存档

2011年(1)

2010年(10)

2009年(69)

2008年(37)

分类: LINUX

2010-02-23 11:42:50

Summary

Solid State FLASH devices have block alignments typically aligned
on 4KByte boundaries, not the 512Byte boundaries of conventional disks.
In order to maximize performance, partitions need to be aligned on
4KByte boundaries. In SPARC Solaris systems there are two types of
label available, SMI (SunMIcrosystems) and EFI (Extensible Firmware
Interface). The Format tool selected with defaults ensures that any
partition set up using the cylinder values will be correctly aligned.

The SPARC Solaris format tool using the SMI default, in common with
other format and partition tools uses a notation of virtual sectors,
tracks and cylinders to describe a disks geometry. This same notation is
used for Solid State FLASH devices in common with conventional disk drives.

If EFI labels are used, the partition set up uses sector values. This
makes setting and verifying the beginning of a partition easy; the sector
size multiplied by the beginning sector needs to be a multiple of
4096 bytes. The EFI DEFAULT is NOT 4K aligned.

Contents

1. Problem
2. Format
2.1 SMI and EFI
3. Example
__________
1. Problem

Solid State Disk Drives (SSD) use NAND FLASH memory for storage. The storage array
is aligned on block boundaries that are different from conventional disks that
use 512Byte sectors. Contemporary SSD's commonly use 4KByte alignment. Depending
on the drive firmware and cache storage on the drive, performance may be adversly
affected due to excessive read/modify/write operations when transfers are not
aligned.

Partitioning tools in use still use the concept of cylinders, tracks and sectors.
This carries over to SSD's as well except that the cylinder, tracks and sectors
are now virtual. Some tools maximise the number sectors and tracks, for example
in Linux a virtual cylinder is described as 63 Sectors and 255 tracks. A 24GB SSD
will be presented as 2987 cylinders, 255 tracks and 63 sectors and a 32GB SSD
as 3890 cylinders, 255 tracks and 63 sectors. Setting up a partition based on
an arbitary number of cylinders has a high probability of not being 4K aligned
and a subsequent drop in performance.

Bytes per cylinder = 512 bytes/sector * 63 sectors * 255 tracks = 8225280

which is 1962.13740458 4K block. This is clearly NOT 4K aligned.

Unlike Linux and other systems using fdisk for partitioning, Solaris uses
Format which assigns different values to sectors per track and tracks
per cylinder. Format assigns 128 sectors and 16 heads

Bytes per cylinder = 512 bytes/sector * 128 sectors * 16 tracks = 1048576

which is 256 4K blocks. This means that any partition created on any
arbitary cylinder boundary will be 4K block aligned.

_________
2. Format

The SPARC Solaris tool for creating disk labels and paritioning disk
drives is format. If a drive is not labeled then the format command
prompts the user with a "Label it now?" message.

The format utility includes a verify command which displays the assignment
of sectors and heads. This allows the bytes per cylinder to be checked and
also shows the start in cylinders permitting verification of alignment.

2.1 SMI vs EFI

The default label created by the format tool is an SMI (Sun MIcrosystems)
label. This uses the a (virtual) cylinder, sector, track notation. An
alternative label is the EFI (Extensible Firmware Interface). This label
uses simpler sector value to define the beginning of a partition. The
value of the sector size is shown by the tool. To set or verify that
a partition is on a 4K boundary, the sector size value is multiplied
by the start of partition value; the result must be an integer multiple
of 4096 to be 4K aligned.

Please note there are restrictions on EFI labels, please search the
Sun Web for more information on EFI.

__________
3. Example

3.1 SMI Label

This example shows the output of the Format / verify command demonstrating
the values of virtual sectors and tracks.

 Format> verify

 Primary label contents:

 Volume name = <        >
 ascii name  = 
 pcyl        = 23437
 ncyl        = 23435
 acyl        =    2
 nhead       =   16
 nsect       =  128
 Part      Tag    Flag     Cylinders         Size            Blocks
   0       root    wm       0                0         (0/0/0)            0
   1       swap    wu       0                0         (0/0/0)            0
   2     backup    wu       0 - 23434       22.89GB    (23435/0/0) 47994880
   3 unassigned    wm       0                0         (0/0/0)            0
   4 unassigned    wm       0                0         (0/0/0)            0
   5 unassigned    wm       0                0         (0/0/0)            0
   6        usr    wm       0 - 23434       22.89GB    (23435/0/0) 47994880
   7 unassigned    wm       0                0         (0/0/0)            0

3.2 EFI Label

This example shows the creation of an EFI label. Note the use of the
-e switch on format to enable creation of an EFI label. Embedded comments
enclosed in <... ...>

# format -e
Searching for disks...done


AVAILABLE DISK SELECTIONS:
      0. c0t0d0 
         /pci@400/pci@0/pci@1/scsi@0/sd@0,0
      1. c0t1d0 
         /pci@400/pci@0/pci@1/scsi@0/sd@1,0
      2. c0t2d0 
         /pci@400/pci@0/pci@1/scsi@0/sd@2,0
      3. c0t3d0 
         /pci@400/pci@0/pci@1/scsi@0/sd@3,0
Specify disk (enter its number): 1
selecting c0t1d0
[disk formatted]

<... command list removed for brevity ...>
<... this is a normal SMI label ...>

format> verify

Primary label contents:

Volume name = <        >
ascii name  = 
pcyl        = 14089
ncyl        = 14087
acyl        =    2
nhead       =   24
nsect       =  424
Part      Tag    Flag     Cylinders         Size            Blocks
 0 unassigned    wm       0                0         (0/0/0)             0
 1 unassigned    wm       0                0         (0/0/0)             0
 2     backup    wm       0 - 14086       68.35GB    (14087/0/0) 143349312
 3 unassigned    wm       0                0         (0/0/0)             0
 4 unassigned    wm       0                0         (0/0/0)             0
 5 unassigned    wm       0                0         (0/0/0)             0
 6 unassigned    wm       0                0         (0/0/0)             0
 7       home    wm       0 - 14086       68.35GB    (14087/0/0) 143349312

<... Now I create an EFI label ...>

format> label
[0] SMI Label
[1] EFI Label
Specify Label type[0]: 1
Warning: This disk has an SMI label. Changing to EFI label will erase all
current partitions.
Continue? y
format> verify

Volume name = <        >
ascii name  = 
bytes/sector    =  512
sectors = 143374737
accessible sectors = 143374704
Part      Tag    Flag     First Sector         Size         Last Sector
 0        usr    wm                34       68.36GB          143358320
 1 unassigned    wm                 0           0               0
 2 unassigned    wm                 0           0               0
 3 unassigned    wm                 0           0               0
 4 unassigned    wm                 0           0               0
 5 unassigned    wm                 0           0               0
 6 unassigned    wm                 0           0               0
 7 unassigned    wm                 0           0               0
 8   reserved    wm         143358321        8.00MB          143374704

 Note: In this example the partitions are NOT 4K aligned
       34 * 512 = 17408 / 4096 = 4.25
       The first sector would need to start at 40 to be 4K aligned

Linux Procedure

Summary

There are a number of tools available in Linux to partition
drives. Solid State Disk Drives have block alignments that are
typically aligned on 4KByte boundaries, not the 512Byte boundaries
of conventional disks. Partition alignment on a Solid State Disk
Drive needs to be correctly aligned for maximum performance.

The man and info pages should be consulted for additional information
on fdisk, parted, md and mdadm.

Contents

1. Problem
2. Paritioning Tools
3. Checking Partition layout
4. File Systems
5. Examples
5.1 fdisk
5.2 parted
__________
1. Problem

Solid State Disk Drives (SSD) use NAND FLASH memory for storage. The storage array
is aligned on block boundaries that are different from conventional disks that
use 512Byte sectors. Contemporary SSD's commonly use 4KByte alignment. Depending
on the drive firmware and cache storage on the drive, performance may be adversly
affected due to excessive read/modify/write operations when transfers are not
aligned.

Partitioning tools in use still use the concept of cylinders, tracks and sectors.
This carries over to SSD's as well except that the cylinder, tracks and sectors
are now virtual. Typical representation for SSD's is to use:

  • 63 Sectors
  • 255 Tracks
  • Variable cylinders

For example a 24GB SSD will be presented as 2987 cylinders, 255 tracks and 63
sectors, a 32GB SSD as 3890 cylinders, 255 tracks and 63 sectors.

Unless special settings are used, the partitioning tools typically use cylinders,
or in the case of SSD's, virtual cylinders, to mark the start of partitions.
Using the cylinder defaults will result in mis-alignment and the consequences are
inferior performance on an SSD drive.

_____________________
2. Partitioning Tools

The Linux Operating System includes tools for partitioning, the most common are:

  • fdisk and derivatives
  • parted

2.1 fdisk

fdisk understands DOS type partitions and BSD or SUN type of disk labels. It is
an older tool and does not understand GUID Partition Tables.

fdisk will determine the disk geometry automatically, this is not necessarily the
physical disk geometry especially in the case of SSD's. fdisk includes consistency
checks to verify that the partition starts and ends on a cylinder boundary, (the
exception being the first partition).

fdisk defaults to aligning partitions on cylinder boundaries. fdisk does display
the allocation units making it possible to calculate the alignment based on
the cylinder count.

fdisk can be used to partition drives but it has limitations. One method for
Linux is to create a Sun lable and then manipulate the resulting partition
boundaries to meet alignment needs.

2.2 parted

parted is a disk partitioning and resizing tool. It has support for multiple
file systems and may be used for reorganising disk useage.

Unlike fdisk, parted defaults to specifying the start and size of a partition in
megabytes. This makes parted much more useful for fine control of partition
boundaries.

For SSD's, parted is the recommended tool.

____________________________
3. Checking Partition Layout

Information on the partitions of a drive are specified at:

    /sys/block/sd?/sd?n/start
    /sys/block/sd?/sd?n/size

where ? is the drive letter, for example sda and n is the partition
number, for example sda1

The values displayed are the number of 512byte blocks.

3.1 fdisk

Using fdisk is not helpful because the tool displays in cylinders and the
first cylinder is not checked for alignment with fdisk's cylinder boundary.
The number of blocks shown by fdisk is in 1K blocks, this value tends to
agree ( x2 ) with the value given by size.

3.2 parted

parted will show the start, end and size in Bytes and, as such, it is more
useful but be careful of rounding. You may change the displayed units by
the unit command (B, kB, mB, gB for example).
____________________________
4. File Systems and Managers

4.1 Base File Systems

The common file systems under Linux are ext2, ext3, and reiserfs. The
file system creation tools allow specification of a block size. For example
to set up an ext3 file system that uses 4K block size use:

    mke2fs -j -b 4096 -i 4096 /dev/sdb

This will create an ext3 filesystem (the -j switch) with a block size
of 4K and an inode size of 4K.

Note: I have achieved consistent high IOP and IO Rates using vdbench
by creating an ext3 4K blocked file system on the raw disk.

4.2 Software Raid - md and adadm

Software raid can be constructed using the Linux Multiple Device driver.
The md device supports RAID 1, 4, 5, 6 and 10. Using md will add a small
overhead but in my testing using vdbench on a RAID 1 volume created from
two identical SSD's the differences on write performance was not measureable
and there was an increase in read performance.

___________
5. Examples

5.1 fdisk

This example shows the use of fdisk and the data from /sys/block/sda/sda1
to validate the start address. A non aligned partition 1, sda1, is created
and the vdbench output is provided to show the effects.

Note: The output of fdisk has been cut to only the relevant lines.
Comments are preceeded by # and separated from the command text by indent.

  # Display partition layout with parted

   [root@wgs94-192]# parted /dev/sda
   GNU Parted 1.8.8
   Using /dev/sda
   Welcome to GNU Parted! Type 'help' to view a list of commands.
   (parted) p
   Model: ATA MARVELL SD88SA02 (scsi)
   Disk /dev/sda: 24.6GB
   Sector size (logical/physical): 512B/512B
   Partition Table: msdos

   Number  Start   End     Size    Type     File system  Flags
   1      1049kB  24.6GB  24.6GB  primary  ntfs
   (parted) q

  # Display the start and size information

    [root@wgs94-192]# cat /sys/block/sda/sda1/start
    2048
    [root@wgs94-192]# cat /sys/block/sda/sda1/size
    47994880

  # Start fdisk

    [root@wgs94-192]# fdisk /dev/sda

    The number of cylinders for this disk is set to 2987.
    There is nothing wrong with that, but this is larger than 1024,
    and could in certain setups cause problems with:
    1) software that runs at boot time (e.g., old versions of LILO)
    2) booting and partitioning software from other OSs
      (e.g., DOS FDISK, OS/2 FDISK)

    Command (m for help): p

    Disk /dev/sda: 24.5 GB, 24575868928 bytes
    255 heads, 63 sectors/track, 2987 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes
    Disk identifier: 0xb01f8a12

       Device Boot      Start         End      Blocks   Id  System
    /dev/sda1               1        2988    23997440    7  HPFS/NTFS

    Command (m for help): q

  # Note the discrepancy here. fdisk shows the start as cylinder 1
  # but it does not start at 8225280 Bytes as proven above.
  # Lets modify the partition table and see the results.

    [root@wgs94-192]# fdisk /dev/sda



    Command (m for help): s
    Building a new sun disklabel. Changes will remain in memory only,
    until you decide to write them. After that, of course, the previous
    content won't be recoverable.

    Command (m for help): p

    Disk /dev/sda (Sun disk label): 255 heads, 63 sectors, 2987 cylinders
    Units = cylinders of 16065 * 512 bytes

       Device Flag    Start       End    Blocks   Id  System
    /dev/sda1             0      2981  23944882+  83  Linux native
    /dev/sda2  u       2981      2987     48195   82  Linux swap
    /dev/sda3             0      2987  23993077+   5  Whole disk

    Command (m for help): w
    The partition table has been altered!

    Calling ioctl() to re-read partition table.
    Syncing disks.

  # Now I recheck the start and size of the partitions

    [root@wgs94-192]# cat /sys/block/sda/sda1/start
    0
    [root@wgs94-192]# cat /sys/block/sda/sda1/size
    47889765
    [root@wgs94-192]# cat /sys/block/sda/sda2/start
    47889765
    [root@wgs94-192]# cat /sys/block/sda/sda2/size
    96390
    [root@wgs94-192]# cat /sys/block/sda/sda3/start
    0
    [root@wgs94-192]# cat /sys/block/sda/sda3/size
    47986155

  # fdisk displays size in 1K blocks not 512 bytes
  # 23993077 * 2 = 47986154 but notice the + sign.

  # Now modify the partition layout to start on cylinder 8

    [root@wgs94-192]# fdisk /dev/sda

    Command (m for help): p

    Disk /dev/sda (Sun disk label): 255 heads, 63 sectors, 2987 cylinders
    Units = cylinders of 16065 * 512 bytes

       Device Flag    Start       End    Blocks   Id  System
    /dev/sda1             0      2981  23944882+  83  Linux native
    /dev/sda2  u       2981      2987     48195   82  Linux swap
    /dev/sda3             0      2987  23993077+   5  Whole disk

    Command (m for help): d 2
    Partition number (1-8): 2

    Command (m for help): d 1
    Partition number (1-8): 1

    Command (m for help): n
    Partition number (1-8): 1
    First cylinder (0-2987): 8
    Last cylinder or +size or +sizeM or +sizeK (8-2987, default 2987):
    Using default value 2987

    Command (m for help): p

    Disk /dev/sda (Sun disk label): 255 heads, 63 sectors, 2987 cylinders
    Units = cylinders of 16065 * 512 bytes

       Device Flag    Start       End    Blocks   Id  System
    /dev/sda1             8      2987  23928817+  83  Linux native
    /dev/sda3             0      2987  23993077+   5  Whole disk

    Command (m for help): w
    The partition table has been altered!

    Calling ioctl() to re-read partition table.
    Syncing disks.

  # and I check the alignment

    [root@wgs94-192]# cat /sys/block/sda/sda1/start
    128520
    [root@wgs94-192]# cat /sys/block/sda/sda1/size
    47857635
    [root@wgs94-192]# cat /sys/block/sda/sda3/start
    0
    [root@wgs94-192]# cat /sys/block/sda/sda3/size
    47986155
    [root@wgs94-192]#

  # This has accurately placed the start of sda1 at 64260KB - which is modulo 4K

  # Now I will modify the partition to start at cylinder 1

    [root@wgs94-192]# fdisk /dev/sda

    Command (m for help): p

    Disk /dev/sda (Sun disk label): 255 heads, 63 sectors, 2987 cylinders
    Units = cylinders of 16065 * 512 bytes

       Device Flag    Start       End    Blocks   Id  System
    /dev/sda1             8      2987  23928817+  83  Linux native
    /dev/sda3             0      2987  23993077+   5  Whole disk

    Command (m for help): d
    Partition number (1-8): 1

    Command (m for help): n
    Partition number (1-8): 1
    First cylinder (0-2987): 1
    Last cylinder or +size or +sizeM or +sizeK (1-2987, default 2987):
    Using default value 2987

    Command (m for help): w
    The partition table has been altered!

    Calling ioctl() to re-read partition table.
    Syncing disks.

  # and I check the start location

    [root@wgs94-192]# cat /sys/block/sda/sda1/start
    16065
    [root@wgs94-192]# cat /sys/block/sda/sda1/size
    47970090

  # Note this is no longer 4K aligned. It starts at byte 8,225,280 which
  # is 4K block 2008.125.

5.2 parted

This example shows the use of parted and the data from /sys/block/sda/sda1
to validate the start address. This example creates an msdos label and shows
fine control over the starting address.

Note: The output of parted has been cut to only the relevant lines.
Comments are preceeded by # and separated from the command text by indent.

  # I start parted and over-write the existing lable with a new lable

    [root@wgs94-192]# parted /dev/sda
    GNU Parted 1.8.8
    Using /dev/sda
    Welcome to GNU Parted! Type 'help' to view a list of commands.
    (parted) mklabel
    Warning: The existing disk label on /dev/sda will be destroyed and all data
             on this disk will be lost. Do you want to continue?
    Yes/No? yes
    New disk label type?  [sun]? msdos
    (parted) p
    Model: ATA MARVELL SD88SA02 (scsi)
    Disk /dev/sda: 24.6GB
    Sector size (logical/physical): 512B/512B
    Partition Table: msdos

    Number  Start  End  Size  Type  File system  Flags

    (parted) mkpart
    Partition type?  primary/extended? primary
    File system type?  [ext2]? ext3
    Start? 4096B
    End? 24.6GB
    (parted) p
    Model: ATA MARVELL SD88SA02 (scsi)
    Disk /dev/sda: 24.6GB
    Sector size (logical/physical): 512B/512B
    Partition Table: msdos

    Number  Start  End     Size    Type     File system  Flags
     1      4096B  24.6GB  24.6GB  primary

    (parted) q
    Information: You may need to update /etc/fstab.

  # ... and valdidate that the partition is 4K aligned

    [root@wgs94-192]# cat /sys/block/sda/sda1/start 
    8

阅读(15941) | 评论(2) | 转发(0) |
给主人留下些什么吧!~~

chinaunix网友2010-10-24 16:59:32

博主辛苦啦! 文章写的非常精彩! 受益匪浅! 技术需要实践!如需迅速掌握嵌入式核心开发技术 请关注“百度哥2010”的ARM11嵌入式“real6410”开发板 http://shop62249124.taobao.com/

chinaunix网友2010-03-31 17:58:07

大哥,您就会抄,自己不动点脑子阿,都一把年纪了