Chinaunix首页 | 论坛 | 博客

  • 博客访问: 896964
  • 博文数量: 226
  • 博客积分: 10077
  • 博客等级: 上将
  • 技术积分: 2708
  • 用 户 组: 普通用户
  • 注册时间: 2008-05-25 14:36
文章分类

全部博文(226)

文章存档

2010年(15)

2009年(82)

2008年(129)

我的朋友

分类: LINUX

2009-11-19 11:33:44

MFS Deployment

The preferred MFS deployment method is installation from the source.

Source package supports standard ./configure && make && make install procedure. Significant configure options are:

  • --disable-mfsmaster - don't build managing server (useful for plain node installation)
  • --disable-mfschunkserver - don't build chunkserver
  • --disable-mfsmount - don't build mfsmount and mfstools (they are built by default if fuse development package is detected)
  • --enable-mfsmount - make sure to build mfsmount and mfstools (error is reported if fuse development package cannot be found)
  • --prefix=DIRECTORY - install to given prefix (default is /usr/local)
  • --sysconfdir=DIRECTORY - select configuration files directory (default is ${prefix}/etc)
  • --localstatedir=DIRECTORY - select top variable data directory (default is ${prefix}/var; MFS metadata are stored in mfs subdirectory, i.e. ${prefix}/var/mfs by default)
  • --with-default-user=USER - user to run daemons as if not set in configuration files (default is nobody)
  • --with-default-group=GROUP - group to run daemons as if not set in configuration files (default is nogroup)

For example, to install MFS using system FHS-compliant paths on Linux, use: ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var/lib

make install respects standard DESTDIR= variable, allowing to install package in temporary location (e.g. in order to create binary package). Already existing configuration or metadata files won't be overwritten.

Managing server (master)

As the managing server (master) is a crucial element of MFS, it should be installed on a machine which guarantees high reliability and access requirements which are adequate for the whole system. It is advisable to use a server with a redundant power supply, ECC memory, and disk array RAID1/RAID5/RAID10. The managing server OS has to be POSIX compliant (systems verified so far: Linux, FreeBSD, Mac OS X and OpenSolaris).

To install the managing process one needs to:

  • install mfs-master package (make install after running configure without --disable-mfsmaster option in case of installation from source files)
  • create a user with whose permissions the service is about to work (if such user doesn't already exists)
  • make sure that the directory for the meta datafiles exists and is writable by this user (make install run as the root does the thing for the user and paths set up by configure, if the user existed before)
  • configure the service (file mfsmaster.cfg), paying special attention to TCP ports in use
  • add or create (depending both on the operating system and the distribution) a batch script starting the process mfsmaster

After the installation the managing server is started by running mfsmaster. If the mfsmaster program is executed by the root, it switches to a configured user; otherwise it runs as a user who executed it. In case of a server outage or an improper shutdown the mfsmetarestore utility will restore the file system information.

Data servers (chunkservers)

When the managing server is installed, data servers (chunkservers) may be set up. These machines should have appropriate free space on disks and POSIX compliant OS (verified so far: Linux, FreeBSD, Mac OS X and OpenSolaris). Chunkserver stores data chunks/fragments as files on a common file system (eg. ext3, xfs, ufs). It is important to dedicate file systems used by MFS exclusively to it - this is necessary to manage the free space properly. MFS does not take into account that a free space accessible to it could be taken by other data. If it's not possible to create a separate disk partition, a filesystems in files can be used (have a look at the following instructions).

Linux:

  • creating:
    dd if=/dev/zero of=file bs=100m seek=400 count=0
    mkfs -t ext3 file
  • mounting:
    mount -o loop file mount-point

FreeBSD:

  • creating and mounting:
    dd if=/dev/zero of=file bs=100m count=400
    mdconfig -a -t vnode -f file -u X
    newfs -m0 -O2 /dev/mdX
    mount /dev/mdX mount-point
  • mounting a previously created file system:
    mdconfig -a -t vnode -f file -u X
    mount /dev/mdX mount-point

Mac OS X:

Start "Disk Utility" from "/Applications/Utilities"
Select from menu "Images->New->Blank Image ..."

Note: on each chunkserver disk some space is reserved for growing chunks and thus inaccessible for creation of the new ones. Only disks with more than 256 MB and chunkservers reporting more than 1 GB of total free space are accessible for new data. Minimal configurations should start from several gigabytes of storage.

To install the data server (chunkserver):

  • isolate space intended for MFS as separate file systems, mounted at a defined point (e.g. /mnt/hd1, /mnt/hd2 itd.)
  • install mfs-chunkserver package (make install after running configure without --disable-mfschunkserver option in case of instalation from source files)
  • create a user, with whose permissions the service is about to work (if such user doesn't already exist)
  • give this user permissions to write to all filesystems dedicated to MFS
  • configure the service (mfschunkserver.cfg file), paying special attention to used TCP ports (MASTER_PORT has to be the same as MATOCS_LISTEN_PORT in mfsmaster.cfg on managing server)
  • enter list of mount points of file systems dedicated to MFS file mfshdd.conf
  • add or create (depending both on the operating system and distribution) batch script starting process mfschunkserver

Note: It's important which local IP address mfschunkserver uses to connect to mfsmaster. This address is passed by mfsmaster to MFS clients (mfsmount) and other chunkservers to communicate with the chunkserver, so it must be remotely accessible. Thus master address (MASTER_HOST) must be set to such for which chunkserver will use proper local address to connect - usually belonging to the same network as all MFS clients and other chunkservers. Generally loopback addres (localhost, 127.0.0.1) can't be used as MASTER_HOST, as it would make the chunkserver inaccessible for any other host (such configuration would work only on single machine running all of mfsmaster, mfschunkserver and mfsmount).

After installation data server is started simply by running mfschunkserver. If mfschunkserver program is executed by root, it switches to configured user; otherwise it runs as user who executed it.

Clients (mfsmount)

mfsmount requires to work. FUSE is available on several operating systems: Linux, FreeBSD, OpenSolaris and MacOS X

  • In case of Linux a kernel module with API version at least 7.8 is required (it can be checked with dmesg command - after loading kernel module there should be line fuse init (API version 7.8)). It is available in fuse package version 2.6.0 or later or Linux kernel 2.6.20 or later. Due to some minor bugs, the newer module is recommended ((fuse 2.7.2 or Linux 2.6.24 (although fuse 2.7.x standalone doesn't contain getattr/write race condition fix)
  • In case of FreeBSD the module version 0.3.9 or later should be used
  • On MacOS X MacFUSE 10.5 was tested

Installing MFS client:

  • install mfs-client package (make install after running configure without --disable-mfsmount option in case of installation from source files)
  • create a directory where MFS will be mounted (e.g. /mnt/mfs)

MFS is mounted with the following command:

mfsmount [-h master] [-p port] [-l path] [-w mount-point]

where master is the host name of the managing server, port is the same as given in MATOCU_LISTEN_PORT in file mfsmaster.cfg, path is mounted MFS subdirectory (default is /, which means mounting the whole file system), mount-point is the previously created directory for MFS.

Using MFS:

Mounting the File System

After launching the managing server and data servers (chunkservers) (one is required, but at least two are recommended) one can mount file system by starting the mfsmount process. The default mount point of MFS is /mnt/mfs directory, but another path can be given by using -w path option, for example:

mfsmount -w /mnt/mfs-test

Furthermore, by starting the mfsmount process with the -m option one can mount the auxiliary file system MFSMETA (which may be useful to restore a file accidentally deleted from the MFS volume or to free some space by removing a file before elapsing the quarantine time.). MFSMETA is mounted by default in the /mnt/mfsmeta directory, but another path can also be given using -w path option, for example:

mfsmount -m -w /mnt/mfsmeta-test

Basic operations

After mounting the file system one can perform all standard file operations (like creating files, copying, deleting, changing names, etc.). MFS is a networking file system, so operations progress may be slower than in a local system.

Free space on the MFS volume can be checked the same way as for local file systems, e.g. with the df> command:

$ df -h | grep MFS
MFS 85T 80T 4.9T 95% /mnt/mfs
MFS 394G 244G 151G 62% /mnt/mfs-test
MFSMETA 51G 51G 0 100% /mnt/mfs-test-meta

What is important is that each file can be stored in more than one copy. In such cases it takes adequately more space than its proper size. Additionally, files deleted during the quarantine time are kept in a "trash can" so they also take up space their size also depend on the number of copies). Just like in other Unix file systems, in case of deleting a file opened by some other process, data is stored at least until the file is closed.

Occupied space in the MSFMETA file system as presented by df is space taken by the data in the "trash can", whereas free space is space taken by data of the files that should be deleted without further storing in the "trash can", but which have been opened by some other processes.

Operations specific for MFS

The "goal" (i.e. the number of copies for a given file) can be verified by the mfsgetgoal command and changed with the mfssetgoal command:

$ mfsgetgoal /mnt/mfs-test/test1
/mnt/mfs-test/test1: 2
$ mfssetgoal 3 /mnt/mfs-test/test1
/mnt/mfs-test/test1: 3
$ mfsgetgoal /mnt/mfs-test/test1
/mnt/mfs-test/test1: 3

Similar operations can be done on the whole directory trees with the mfsrgetgoal and mfsrsetgoal commands:

$ mfsrgetgoal /mnt/mfs-test/test2
/mnt/mfs-test/test2:
files with goal 2 : 36 (36)
directories with goal 2 : 1 (1)
$ mfsrsetgoal 3 /mnt/mfs-test/test2
/mnt/mfs-test/test2:
inodes with goal changed: 37 (37)
inodes with goal not changed: 0 (0)
inodes with permission denied: 0 (0)
$ mfsrgetgoal /mnt/mfs-test/test2
/mnt/mfs-test/test2:
files with goal 3 : 36 (36)
directories with goal 3 : 1 (1)

The actual number of copies of a file can be verified with the mfscheckfile and mfsfileinfo commands:

$ mfscheckfile /mnt/mfs-test/test1
/mnt/mfs-test/test1:
3 copies: 1 chunks
$ mfsfileinfo /mnt/mfs-test/test1
/mnt/mfs-test/test1:
chunk 0: 00000000000520DF_00000001 / (id:336095 ver:1)
copy 1: 192.168.0.12:9622
copy 2: 192.168.0.52:9622
copy 3: 192.168.0.54:9622

Note: a zero length file contains no data, so despite the non-zero "goal" setting for such files, these commands will return an empty result.

In case of a change in the number of copies of an already existing file, the data will be multiplied (or an extra copy will be deleted) with a delay. It can be verified using the commands described above.

Setting the "goal" for a directory is inherited for the new files and directories created within it (it does not change the number of copies of already existing files).

The summary of the contents of the whole tree (an enhanced equivalent of du -s, with information specific for MFS) can be called up with the command mfsdirinfo:

$ mfsdirinfo /mnt/mfs-test/test/:
inodes: 15 (15)
directories: 4 (4)
files: 8 (8)
good files: 7 (7)
under goal files: 0 (0)
missing files: 1 (1)
chunks: 6 (6)
good chunks: 5 (5)
under goal chunks: 0 (0)
missing chunks: 1 (1)
length: 264K (270604)
size: 606K (620544)
hdd usage: 1.1M (1170432)

The above summary displays the number of the directories, files, data fragments (chunks) used by the files with its condition (good, under goal - with the number of copies lower than the set "goal", missing - with data lost due to failure of all machines storing a given chunk), as well as the size of the disk's space taken by files in the directory (length - the sum of file sizes, size - with block size takien into account, hdd usage - total disk space utilization considering all copies of chunks.

The time of storing a deleted file can be verified by the mfsgettrashtime command and changed with mfssettrashtime:

$ mfsgettrashtime /mnt/mfs-test/test1
/mnt/mfs-test/test1: 604800
$ mfssettrashtime 0 /mnt/mfs-test/test1
/mnt/mfs-test/test1: 0
$ mfsgettrashtime /mnt/mfs-test/test1
/mnt/mfs-test/test1: 0

These tools also have their recursive equivalents mfsrgettrashtime and mfsrsettrashtime operating on whole directory trees:

$ mfsrgettrashtime /mnt/mfs-test/test2
/mnt/mfs-test/test2:
files with trashtime 604800 : 36 (36)
directories with trashtime 604800 : 1 (1)
$ mfsrsettrashtime 1209600 /mnt/mfs-test/test2
/mnt/mfs-test/test2:
inodes with trashtime changed: 37 (37)
inodes with trashtime not changed: 0 (0)
inodes with permission denied: 0 (0)
$ mfsrgettrashtime /mnt/mfs-test/test2
/mnt/mfs-test/test2:
files with trashtime 1209600 : 36 (36)
directories with trashtime 1209600 : 1 (1)

Time is given in seconds (useful values: 1 hour is 3600 seconds, 24h - 86400 seconds, 1 week - 604800 seconds). Just as in the case of the number of copies, the storing time set for a directory is inherited for newly created files and directories. The number 0 means that a file after the removal will be deleted immediately and its recovery will not be possible.

Removed files may be accessed through a separately mounted MFSMETA file system. In particular it contains directories /trash (containing information about deleted files that are still being stored) and /trash/undel (designed for retrieving files). Only the administrator has access to MFSMETA (user with uid 0, usually root).

$ mfssettrashtime 3600 /mnt/mfs-test/test1
/mnt/mfs-test/test1: 3600
$ rm /mnt/mfs-test/test1
$ ls /mnt/mfs-test/test1
ls: /mnt/mfs-test/test1: No such file or directory
# ls -l /mnt/mfs-test-meta/trash/*test1
-rw-r--r-- 1 user users 1 2007-08-09 15:23 /mnt/mfs-test-meta/trash/00013BC7|test1

The name of the file that is still visible in the "trash" directory consists of an 8-digit hexadecimal i-node number and a path to the file relative to the mounting point with characters / replaced with the | character. If such a name exceeds the limits of the operating system (usually 255 characters), the initial part of the path is deleted.

The full path of the file in relation to the mounting point can be read or saved by reading or saving this special file:

# cat '/mnt/mfs-test-meta/trash/00013BC7|test1'
test1
# echo 'test/test2' > '/mnt/mfs-test-meta/trash/00013BC7|test1'
# cat '/mnt/mfs-test-meta/trash/00013BC7|test1'
test/test2

Moving this file to the trash/undel subdirectory causes a restoring of the original file in proper MFS file system - at path set in a way described above or the original path (if it was not changed).

Note: if a new file with the same path already exists, restoring of the file will not succeed.

Deleting the file from the "trash can" results in releasing space previously taken up by it (with a delay - the data is deleted asynchronously). In such cases it is impossible to restore the file.

It is also possible to change the number of copies or the time of storing files in the "trash can" with mfssetgoal and mfssettrashtime tools (like for the files on the proper MFS).

Beside the trash and trash/undel directories MFSMETA holds a third directory reserved with files intended for final removal, but still open. These files will be erased and their data will be deleted immediately after the last user closes them. Files in the reserved directory are named the same way as those in trash, but no further operations are possible for these files.

Another characteristic feature of the MFS system is the possibility of taking a snapshot of the file with the the mfssnapshot command:

$ mfssnapshot copy source-file

(In case of normal file duplication, data of the file can be changed by another process writing to the source file. mfssnapshot prepares a copy of the whole file in one operation. Furthermore, until modification of any of the files takes place, the copy does not take up any additional space.)

After such an operation subsequent writes to the source file do not modify the copy (nor vice versa).

MFS maintenance

Starting MFS cluster

The safest way to start MFS (avoiding any read or write errors, inaccessible data or similar problems) is:

  • start mfsmaster process
  • start all mfschunkserver processes
  • when all chunkservers get connected to the MFS master, the filesystem can be mounted on any number of clients using mfsmount.

Stopping MFS cluster

To safely stop MFS:

  • unmount MFS on all clients (using the umount command or an equivalent)
  • stop chunkserver processes with the mfschunkserver -s
  • stop master process with the mfsmaster -s command .

Maintenance of MFS chunkservers

Provided that there are no files with a goal lower than 2 and no under-goal files (what can be checked by mfsrgetgoal and mfsdirinfo commands), it is possible to stop or restart a single chunkserver at any time. When you need to stop or restart another chunkserver afterwards, be sure that the previous one is connected and there are no under-goal chunks.

MFS metadata backups

There are two general parts of metadata:

  • main metadata file (metadata.mfs, named metadata.mfs.back when the mfsmaster is running), synchronized each hour
  • metadata changelogs (changelog.*.mfs), stored for last N hours (configured by BACK_LOGS setting)

The main metadata file needs regular backups with the frequency depending on how many hourly changelogs are stored.

Metadata changelogs are automatically replicated to all chunkservers in real time (with changelog_csback.*.mfs names).

MFS master recovery

In case of mfsmaster crash (due to e.g. host or power failure) last metadata changelog needs to be merged into the main metadata file. It can be done with the mfsmetarestore utility; the simplest way to use it is:

mfsmetarestore -a

If master data are stored in location other than the specified during MFS compilation, the actual path needs to be specified using the -d option, e.g.:

mfsmetarestore -a -d /storage/mfsmaster

MFS master recovery from a backup

In order to restore the master host from a backup:

  • install mfsmaster in normal way
  • configure it using the same settings (e.g. by retrieving mfsmaster.cfg file from the backup)
  • retrieve metadata.mfs.back file from the backup, place it in mfsmaster data directory
  • copy last metadata changelogs from any chunkserver running just before master failure into mfsmaster data directory
  • merge metadata changelogs using mfsmetarestore command as specified before - either by renaming changelog_csback.*.mfs files to changelog.*.mfs and using mfsmetarestore -a, or by specifying actual file names using non-automatic mfsmetarestore syntax, e.g.
    mfsmetarestore -m metadata.mfs.back -o metadata.mfs changelog_csback.*.mfs.
  • start MFS.
阅读(1166) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~