MooseFS-kevinadmin-ChinaUnix博客

潇湘雨夜kevin.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

kevinadmin

博客访问： 389516
博文数量： 87
博客积分： 2810
博客等级：少校
技术积分： 825
用户组：普通用户
注册时间： 2008-10-28 22:34

文章分类

全部博文（87）

System Monitor（1）
Virtual Host（1）
Storage（0）
OpenVPN（0）
Monitor（23）
Mysql（1）
LAMP（1）
Syslog-NG（6）
Script（12）
SystemBase（6）
Mail（2）
Windows（0）
Linux（19）
FreeBSD（10）
未分配的博文（5）

文章存档

2010年（25）

2009年（43）

2008年（19）

我的朋友

最近访客

推荐博文

MooseFS

分类：

2010-03-03 13:39:04

Reference Guide

REQUIREMENTS FOR THE MASTER SERVER, CHUNK SERVERS AND CLIENTS Master

As the managing server (master) is a crucial element of MooseFS, it should be installed on a machine which guarantees high stability and access requirements which are adequate for the whole system. It is advisable to use a server with a redundant power supply, ECC memory, and disk array RAID1/RAID5/RAID10. The managing server OS has to be POSIX compliant (systems verified so far: Linux, FreeBSD, Mac OS X and OpenSolaris).

The most important factor in the master machine is RAM, as the full file system structure is cached in RAM for speed. The master server should have approximately 300 MiB of RAM allocated to handle 1 million files on chunkservers.

The necessary size of HDD depends both on the number of files and chunks used (main metadata file) and on the number of operations made on the files (metadata changelog); for example the space of 20GiB is enough for storing information for 25 million files and for changelogs to be kept for up to 50 hours.

Metalogger

MooseFS metalogger just gathers metadata backups from the MooseFS master server - so the hardware requirements are not higher than for the master server itself; it needs about the same disk space. Similarly to the master server - the OS has to be POSIX compliant (Linux, FreeBSD, Mac OS X, OpenSolaris, etc.).

If you would like to use the metalogger as a master server in case of its failure, the metalogger machine should have at least the same amount of RAM and HDD as the main master server.

Chunkservers

Chunkserver machines should have appropriate disk space (dedicated exclusively for MooseFS) and POSIX compliant OS (verified so far: Linux, FreeBSD, Mac OS X and OpenSolaris).

Minimal configuration should start from several gigabytes of storage space (only disks with more than 256 MB and chunkservers reporting more than 1 GB of total free space are accessible for new data).

Clients (mfsmount)

mfsmount requires FUSE to work; FUSE is available on several operating systems: Linux, FreeBSD, OpenSolaris and MacOS X, with the following notes:

In case of Linux a kernel module with API 7.8 or later is required (it can be checked with dmesg command - after loading kernel module there should be a line fuse init (API version 7.8)). It is available in fuse package 2.6.0 (or later) or in Linux kernel 2.6.20 (or later). Due to some minor bugs, the newer module is recommended (fuse 2.7.2 or Linux 2.6.24, although fuse 2.7.x standalone doesn't contain getattr/write race condition fix).
In case of FreeBSD the module version 0.3.9 or later should be used. You also have to use this patch in order to fix support of files bigger than 4GB.
MacFUSE has been tested on MacOSX 10.5 and 10.6.

MAKING AND INSTALLING

The preferred MooseFS deployment method is installation from the source.

Source package supports standard ./configure && make && make install procedure. Significant configure options are:

--disable-mfsmaster - don't build managing server (useful for plain node installation)
--disable-mfschunkserver - don't build chunkserver
--disable-mfsmount - don't build mfsmount and mfstools (they are built by default if fuse development package is detected)
--enable-mfsmount - make sure to build mfsmount and mfstools (error is reported if fuse development package cannot be found)
--prefix=DIRECTORY - install to given prefix (default is /usr/local)
--sysconfdir=DIRECTORY - select configuration files directory (default is ${prefix}/etc)
--localstatedir=DIRECTORY - select top variable data directory (default is ${prefix}/var; MFS metadata are stored in mfs subdirectory, i.e. ${prefix}/var/mfs by default)
--with-default-user=USER - user to run daemons as if not set in configuration files (default is nobody)
--with-default-group=GROUP - group to run daemons as if not set in configuration files (default is nogroup)

For example, to install MooseFS using system FHS-compliant paths on Linux, use: ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var/lib

make install respects standard DESTDIR= variable, allowing to install package in temporary location (e.g. in order to create binary package). Already existing configuration or metadata files won't be overwritten.

Managing server (master)

As the managing server (master) is a crucial element of MooseFS, it should be installed on a machine which guarantees high reliability and access requirements which are adequate for the whole system. It is advisable to use a server with a redundant power supply, ECC memory, and disk array RAID1/RAID5/RAID10. The managing server OS has to be POSIX compliant (systems verified so far: Linux, FreeBSD, Mac OS X and OpenSolaris).

To install the metalogger process one needs to:

install mfs-master package (make install after running configure without --disable-mfsmaster option in case of installation from source files)
create a user with whose permissions the service is about to work (if such user doesn't already exists)
make sure that the directory for the meta datafiles exists and is writable by this user (make install run as the root does the thing for the user and paths set up by configure, if the user existed before)
configure the service (file mfsmaster.cfg), paying special attention to TCP ports in use
add or create (depending both on the operating system and the distribution) a batch script starting the process mfsmaster

After the installation the managing server is started by running mfsmaster. If the mfsmaster program is executed by the root, it switches to a configured user; otherwise it runs as a user who executed it. In case of a server outage or an improper shutdown the mfsmetarestore utility will restore the file system information.

Metalogger

Metalogger daemon is installed together with mfs-master. The minimal requirements are not bigger than master itself; it can be run on any machine (e.g. any chunkserver), but the best place is the backup machine for MooseFS master (so in case of primary master machine failure it's possible to run master process in place of metalogger - see appropriate for details).

To install the managing process one needs to:

install mfs-master package make install after running configure without --disable-mfsmaster option in case of installation from source files)
create a user with whose permissions the service is about to work (if such user doesn't already exists)
make sure that the directory for the metadata files exists and is writable by this user (make install run as the root does the thing for the user and paths set up by configure, if the user existed before)
configure the service (file mfsmetalogger.cfg, paying special attention to TCP ports used (MASTER_PORT has to be the same as MATOML_LISTEN_PORT in mfsmaster.cfg on managing server)
add or create (depending both on the operating system and the distribution) a batch script starting the process mfsmetalogger

After the installation the managing server is started by running mfsmetalogger. If the mfsmetalogger program is executed by the root, it switches to a configured user; otherwise it runs as a user who executed it.

Chunkservers

When the managing server is installed, data servers (chunkservers) may be set up. These machines should have appropriate free space on disks and POSIX compliant OS (verified so far: Linux, FreeBSD, Mac OS X and OpenSolaris). Chunkserver stores data chunks/fragments as files on a common file system (eg. ext3, xfs, ufs). It is important to dedicate file systems used by MooseFS exclusively to it - this is necessary to manage the free space properly. MooseFS does not take into account that a free space accessible to it could be taken by other data. If it's not possible to create a separate disk partition, filesystems in files can be used (have a look at the following instructions).

Linux:

creating:

dd if=/dev/zero of=file bs=100m seek=400 count=0 mkfs -t ext3 file

mounting:

mount -o loop file mount-point

FreeBSD:

creating and mounting:

dd if=/dev/zero of=file bs=100m count=400

mdconfig -a -t vnode -f file -u X

newfs -m0 -O2 /dev/mdX

mount /dev/mdX mount-point
mounting a previously created file system:

mdconfig -a -t vnode -f file -u X

mount /dev/mdX mount-point

Mac OS X:

Start "Disk Utility" from "/Applications/Utilities"
Select from menu "Images->New->Blank Image ..."

Note: on each chunkserver disk some space is reserved for growing chunks and thus inaccessible for creation of the new ones. Only disks with more than 256 MB and chunkservers reporting more than 1 GB of total free space are accessible for new data. Minimal configurations should start from several gigabytes of storage.

To install the data server (chunkserver):

isolate space intended for MooseFS as separate file systems, mounted at a defined point (e.g. /mnt/hd1, /mnt/hd2 itd.)
install mfs-chunkserver package (make install after running configure without --disable-mfschunkserver option in case of instalation from source files)
create a user, with whose permissions the service is about to work (if such user doesn't already exist)
give this user permissions to write to all filesystems dedicated to MooseFS
configure the service (mfschunkserver.cfg file), paying special attention to used TCP ports (MASTER_PORT has to be the same as MATOCS_LISTEN_PORT in mfsmaster.cfg on managing server)
enter list of mount points of file systems dedicated to MooseFS file mfshdd.conf
add or create (depending both on the operating system and distribution) batch script starting process mfschunkserver

Note: It's important which local IP address mfschunkserver uses to connect to mfsmaster. This address is passed by mfsmaster to MFS clients (mfsmount) and other chunkservers to communicate with the chunkserver, so it must be remotely accessible. Thus master address (MASTER_HOST) must be set to such for which chunkserver will use proper local address to connect - usually belonging to the same network as all MFS clients and other chunkservers. Generally loopback addres (localhost, 127.0.0.1) can't be used as MASTER_HOST, as it would make the chunkserver inaccessible for any other host (such configuration would work only on single machine running all of mfsmaster, mfschunkserver and mfsmount).

After installation data server is started simply by running mfschunkserver. If mfschunkserver program is executed by root, it switches to configured user; otherwise it runs as user who executed it.

Clients (mfsmount)

mfsmount requires to work. FUSE is available on several operating systems: Linux, FreeBSD, OpenSolaris and MacOS X

In case of Linux a kernel module with API version at least 7.8 is required (it can be checked with dmesg command - after loading kernel module there should be line fuse init (API version 7.8)). It is available in fuse package version 2.6.0 or later or Linux kernel 2.6.20 or later. Due to some minor bugs, the newer module is recommended ((fuse 2.7.2 or Linux 2.6.24 (although fuse 2.7.x standalone doesn't contain getattr/write race condition fix)
In case of FreeBSD the module version 0.3.9 or later should be used
On MacOSX 10.5 and 10.6 MacFUSE was tested

Installing MooseFS client:

install mfs-client package (make install after running configure without --disable-mfsmount option in case of installation from source files)
create a directory where MooseFS will be mounted (e.g. /mnt/mfs)

MooseFS is mounted with the following command:

mfsmount [-h master] [-p port] [-l path] [-w mount-point]

where master is the host name of the managing server, port is the same as given in MATOCU_LISTEN_PORT in file mfsmaster.cfg, path is mounted MooseFS subdirectory (default is /, which means mounting the whole file system), mount-point is the previously created directory for MooseFS.

USING MOOSEFS Mounting the File System

After launching the managing server and data servers (chunkservers) (one is required but at least two are recommended) one can mount file system by starting the } process. MooseFS is mounted with the following command:

mfsmount mountpoint [-d] [-f] [-s] [-m] [-n] [-p] [-H MASTER] [-P PORT] [-S PATH] [-o OPT[,OPT...]]

where MASTER is the host name of the managing server, PORT is the same as given in MATOCU_LISTEN_PORT in file mfsmaster.cfg, PATH is mounted MooseFS subdirectory (default is /, which means mounting the whole file system), mountpoint is the previously created directory for MooseFS.

By starting the mfsmount process with the -m (or -o mfsmeta) option one can mount the auxiliary file system MFSMETA (which may be useful to restore a file accidentally deleted from the MooseFS volume or to free some space by removing a file before elapsing the quarantine time), for example:

mfsmount -m /mnt/mfsmeta

Basic operations

After mounting the file system one can perform all standard file operations (like creating files, copying, deleting, changing names, etc.). MooseFS is a networking file system, so operations progress may be slower than in a local system.

Free space on the MooseFS volume can be checked the same way as for local file systems, e.g. with the df command:

$ df -h | grep mfs

mfsmaster:9421 85T 80T 4.9T 95% /mnt/mfs

mfsmaster:9321 394G 244G 151G 62% /mnt/mfs-test

What is important is that each file can be stored in more than one copy. In such cases it takes adequately more space than its proper size. Additionally, files deleted during the quarantine time are kept in a "trash can" so they also take up space, their size also depends on the number of copies). Just like in other Unix file systems, in case of deleting a file opened by some other process, data is stored at least until the file is closed.

You may alse like to have a look at this FAQ entry: When doing df -h on a filesystem the results are different from what I would expect.

Operations specific for MooseFS Setting the goal

The "goal" (i.e. the number of copies for a given file) can be verified by the mfsgetgoal command and changed with the mfssetgoal command:

$ mfsgetgoal /mnt/mfs-test/test1

/mnt/mfs-test/test1: 2

$ mfssetgoal 3 /mnt/mfs-test/test1

/mnt/mfs-test/test1: 3

$ mfsgetgoal /mnt/mfs-test/test1

/mnt/mfs-test/test1: 3

Similar operations can be done on the whole directory trees with the mfsgetgoal -r and mfssetgoal -r commands:

$ mfsgetgoal -r /mnt/mfs-test/test2

/mnt/mfs-test/test2:

files with goal 2 : 36

directories with goal 2 : 1

$ mfssetgoal -r 3 /mnt/mfs-test/test2

/mnt/mfs-test/test2:

inodes with goal changed: 37

inodes with goal not changed: 0

inodes with permission denied: 0

$ mfsgetgoal -r /mnt/mfs-test/test2

/mnt/mfs-test/test2:

files with goal 3 : 36

directories with goal 3 : 1

The actual number of copies of a file can be verified with the mfscheckfile and mfsfileinfo commands:

$ mfscheckfile /mnt/mfs-test/test1

/mnt/mfs-test/test1:

3 copies: 1 chunks

$ mfsfileinfo /mnt/mfs-test/test1

/mnt/mfs-test/test1:

chunk 0: 00000000000520DF_00000001 / (id:336095 ver:1)

copy 1: 192.168.0.12:9622

copy 2: 192.168.0.52:9622

copy 3: 192.168.0.54:9622

Note: a zero length file contains no data, so despite the non-zero "goal" setting for such file, these commands will return an empty result.

In case of a change in the number of copies of an already existing file, the data will be multiplied or adequately deleted with a delay. It can be verified using the commands described above.

Setting the "goal" for a directory is inherited for the new files and directories created within it (it does not change the number of copies of already existing files).

The summary of the contents of the whole tree (an enhanced equivalent of du -s, with information specific for MooseFS) can be called up with the command mfsdirinfo:

$ mfsdirinfo /mnt/mfs-test/test/:

inodes: 15

directories: 4

files: 8

chunks: 6

length: 270604

size: 620544

realsize: 1170432

The above summary displays the number of the directories, files, data fragments (chunks) used by the files, as well as the size of the disk's space taken by files in the directory (length - the sum of file sizes, size - with block size taken into account, realsize - total disk space utilization considering all copies of chunks).

Setting quarantine time for trash bin

A quarantine time of storing a deleted file in a "trash can" can be verified by the mfsgettrashtime command and changed with mfssettrashtime:

$ mfsgettrashtime /mnt/mfs-test/test1

/mnt/mfs-test/test1: 604800

$ mfssettrashtime 0 /mnt/mfs-test/test1

/mnt/mfs-test/test1: 0

$ mfsgettrashtime /mnt/mfs-test/test1

/mnt/mfs-test/test1: 0

These tools also have recursive option operating on whole directory trees:

$ mfsgettrashtime -r /mnt/mfs-test/test2

/mnt/mfs-test/test2:

files with trashtime 0 : 36

directories with trashtime 604800 : 1

$ mfssettrashtime -r 1209600 /mnt/mfs-test/test2

/mnt/mfs-test/test2:

inodes with trashtime changed: 37

inodes with trashtime not changed: 0

inodes with permission denied: 0

$ mfsgettrashtime -r /mnt/mfs-test/test2

/mnt/mfs-test/test2:

files with trashtime 1209600 : 36

directories with trashtime 1209600 : 1

Time is given in seconds (useful values: 1 hour is 3600 seconds, 24h - 86400 seconds, 1 week - 604800 seconds). Just as in the case of the number of copies, the storing time set for a directory is inherited for newly created files and directories. The number 0 means that a file after the removal will be deleted immediately and its recovery will not be possible.

Removed files may be accessed through a separately mounted MFSMETA file system. In particular it contains directories /trash (containing information about deleted files that are still being stored) and /trash/undel (designed for retrieving files). Only the administrator has access to MFSMETA (user with uid 0, usually root).

$ mfssettrashtime 3600 /mnt/mfs-test/test1

/mnt/mfs-test/test1: 3600

$ rm /mnt/mfs-test/test1

$ ls /mnt/mfs-test/test1

ls: /mnt/mfs-test/test1: No such file or directory

# ls -l /mnt/mfs-test-meta/trash/*test1

-rw-r--r-- 1 user users 1 2007-08-09 15:23 /mnt/mfs-test-meta/trash/00013BC7|test1

The name of the file that is still visible in the "trash" directory consists of an 8-digit hexadecimal i-node number and a path to the file relative to the mounting point with characters / replaced with the | character. If such a name exceeds the limits of the operating system (usually 255 characters), the initial part of the path is deleted.

The full path of the file in relation to the mounting point can be read or saved by reading or saving this special file:

# cat '/mnt/mfs-test-meta/trash/00013BC7|test1'

test1

# echo 'test/test2' > '/mnt/mfs-test-meta/trash/00013BC7|test1'

# cat '/mnt/mfs-test-meta/trash/00013BC7|test1'

test/test2

Moving this file to the trash/undel subdirectory causes a restoring of the original file in a proper MooseFS file system - at path set in a way described above or the original path (if it was not changed).

Note: if a new file with the same path already exists, restoring of the file will not succeed.

Deleting the file from the "trash can" results in releasing space previously taken up by it (with a delay - the data is deleted asynchronously). In such cases it is impossible to restore the file.

It is also possible to change the number of copies or the time of storing files in the "trash can" with mfssetgoal and mfssettrashtime tools (like for the files on the proper MooseFS).

Beside the trash and trash/undel directories MFSMETA holds a third directory reserved with files intended for final removal, but still open. These files will be erased and their data will be deleted immediately after the last user closes them. Files in the reserved directory are named the same way as those in trash, but no further operations are possible for these files.

Taking snapshots

Another characteristic feature of the MooseFS system is the possibility of taking a snapshot of the file or directory tree with the mfsmakesnapshot command:

$ mfsmakesnapshot source ... destination

(In case of normal file duplication, data of the file can be changed by another process writing to the source file. mfsmakesnapshot prepares a copy of the whole file (or files) in one operation. Furthermore, until modification of any of the files takes place, the copy does not take up any additional space.)

After such operation, subsequent writes to the source file do not modify the copy (nor vice versa).

Alternatively, file snapshots can be created using mfsappendchunks utility, which works like mfssnapshot known from MooseFS 1.5:

$ mfsappendchunks destination-file source-file ...

When multiple source files are given, their snapshots are added to the same destination file, padding each to chunk boundary (64MB).

Additional attributes

Additional attributes of file or directory (noowner, noattrcache, noentrycache) can be checked, set or deleted using mfsgeteattr, mfsseteattr and mfsdeleattr utilities, which behave similarly to mfsgetgoal/mfssetgoal or mfsgettrashtime/mfssettrashtime. See mfstools manual page for details.

MOOSEFS MAINTENANCE Starting MooseFS cluster

The safest way to start MooseFS (avoiding any read or write errors, inaccessible data or similar problems) is to run the following commands in this sequence:

start mfsmaster process
start all mfschunkserver processes
start mfsmetalogger processes (if configured)
when all chunkservers get connected to the MooseFS master, the filesystem can be mounted on any number of clients using mfsmount (you can check if all chunkservers are connected by checking master logs or CGI monitor).

Stopping MooseFS cluster

To safely stop MooseFS:

unmount MooseFS on all clients (using the umount command or an equivalent)
stop chunkserver processes with the mfschunkserver stop command
stop metalogger processes with the mfsmetalogger stop command
stop master process with the mfsmaster stop command.

Maintenance of MooseFS chunkservers

Provided that there are no files with a goal lower than 2 and no under-goal files (what can be checked by mfsgetgoal -r and mfsdirinfo commands), it is possible to stop or restart a single chunkserver at any time. When you need to stop or restart another chunkserver afterwards, be sure that the previous one is connected and there are no under-goal chunks.

MooseFS metadata backups

There are two general parts of metadata:

main metadata file (metadata.mfs, named metadata.mfs.back when the mfsmaster is running), synchronized each hour
metadata changelogs (changelog.*.mfs), stored for last N hours (configured by BACK_LOGS setting)

The main metadata file needs regular backups with the frequency depending on how many hourly changelogs are stored. Metadata changelogs should be automatically replicated in real time. Since MooseFS 1.6.5, both tasks are done by mfsmetalogger daemon.

MooseFS master recovery

In case of mfsmaster crash (due to e.g. host or power failure) last metadata changelog needs to be merged into the main metadata file. It can be done with the mfsmetarestore utility; the simplest way to use it is:

$ mfsmetarestore -a

If master data are stored in location other than the specified during MooseFS compilation, the actual path needs to be specified using the -d option, e.g.:

$ mfsmetarestore -a -d /storage/mfsmaster

MooseFS master recovery from a backup

In order to restore the master host from a backup:

install mfsmaster in normal way
configure it using the same settings (e.g. by retrieving mfsmaster.cfg file from the backup)
retrieve metadata.mfs.back file from the backup or metalogger host, place it in mfsmaster data directory
copy last metadata changelogs from any metalogger running just before master failure into mfsmaster data directory
merge metadata changelogs using mfsmetarestore command as specified before - either using mfsmetarestore -a, or by specifying actual file names using non-automatic mfsmetarestore syntax, e.g.

$ mfsmetarestore -m metadata.mfs.back -o metadata.mfs changelog.*.mfs

Please also read a mini howto about preparing . In that document we present a solution using CARP and in which metalogger takes over functionality of the broken master server.

阅读(2112) | 评论(0) | 转发(0) |

上一篇：微博

下一篇：圭十月份零用钱

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6