分类:
2010-03-03 13:39:04
As the managing server (master) is a crucial element of MooseFS, it should be installed on a machine which guarantees high stability and access requirements which are adequate for the whole system. It is advisable to use a server with a redundant power supply, ECC memory, and disk array RAID1/RAID5/RAID10. The managing server OS has to be POSIX compliant (systems verified so far: Linux, FreeBSD, Mac OS X and OpenSolaris).
The most important factor in the master machine is RAM, as the full file system structure is cached in RAM for speed. The master server should have approximately 300 MiB of RAM allocated to handle 1 million files on chunkservers.
The necessary size of HDD depends both on the number of files and chunks used (main metadata file) and on the number of operations made on the files (metadata changelog); for example the space of 20GiB is enough for storing information for 25 million files and for changelogs to be kept for up to 50 hours.
MetaloggerMooseFS metalogger just gathers metadata backups from the MooseFS master server - so the hardware requirements are not higher than for the master server itself; it needs about the same disk space. Similarly to the master server - the OS has to be POSIX compliant (Linux, FreeBSD, Mac OS X, OpenSolaris, etc.).
If you would like to use the metalogger as a master server in case of its failure, the metalogger machine should have at least the same amount of RAM and HDD as the main master server.
ChunkserversChunkserver machines should have appropriate disk space (dedicated exclusively for MooseFS) and POSIX compliant OS (verified so far: Linux, FreeBSD, Mac OS X and OpenSolaris).
Minimal configuration should start from several gigabytes of storage space (only disks with more than 256 MB and chunkservers reporting more than 1 GB of total free space are accessible for new data).
Clients (mfsmount)mfsmount requires FUSE to work; FUSE is available on several operating systems: Linux, FreeBSD, OpenSolaris and MacOS X, with the following notes:
The preferred MooseFS deployment method is installation from the source.
Source package supports standard ./configure && make && make install procedure. Significant configure options are:
--disable-mfsmaster - don't build managing server (useful for plain node installation)
--disable-mfschunkserver - don't build chunkserver
--disable-mfsmount - don't build mfsmount and mfstools (they are built by default if fuse development package is detected)
--enable-mfsmount - make sure to build mfsmount and mfstools (error is reported if fuse development package cannot be found)
--prefix=DIRECTORY - install to given prefix (default is /usr/local)
--sysconfdir=DIRECTORY - select configuration files directory (default is ${prefix}/etc)
--localstatedir=DIRECTORY - select top variable data directory (default is ${prefix}/var; MFS metadata are stored in mfs subdirectory, i.e. ${prefix}/var/mfs by default)
--with-default-user=USER - user to run daemons as if not set in configuration files (default is nobody)
--with-default-group=GROUP - group to run daemons as if not set in configuration files (default is nogroup)
For example, to install MooseFS using system FHS-compliant paths on Linux, use: ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var/lib
make install respects standard DESTDIR= variable, allowing to install package in temporary location (e.g. in order to create binary package). Already existing configuration or metadata files won't be overwritten.
As the managing server (master) is a crucial element of MooseFS, it should be installed on a machine which guarantees high reliability and access requirements which are adequate for the whole system. It is advisable to use a server with a redundant power supply, ECC memory, and disk array RAID1/RAID5/RAID10. The managing server OS has to be POSIX compliant (systems verified so far: Linux, FreeBSD, Mac OS X and OpenSolaris).
To install the metalogger process one needs to:
install mfs-master package (make install after running configure without --disable-mfsmaster option in case of installation from source files)
create a user with whose permissions the service is about to work (if such user doesn't already exists)
make sure that the directory for the meta datafiles exists and is writable by this user (make install run as the root does the thing for the user and paths set up by configure, if the user existed before)
configure the service (file mfsmaster.cfg), paying special attention to TCP ports in use
add or create (depending both on the operating system and the distribution) a batch script starting the process mfsmaster
After the installation the managing server is started by running mfsmaster. If the mfsmaster program is executed by the root, it switches to a configured user; otherwise it runs as a user who executed it. In case of a server outage or an improper shutdown the mfsmetarestore utility will restore the file system information.
MetaloggerMetalogger daemon is installed together with mfs-master. The minimal requirements are not bigger than master itself; it can be run on any machine (e.g. any chunkserver), but the best place is the backup machine for MooseFS master (so in case of primary master machine failure it's possible to run master process in place of metalogger - see appropriate for details).
To install the managing process one needs to:
After the installation the managing server is started by running mfsmetalogger. If the mfsmetalogger program is executed by the root, it switches to a configured user; otherwise it runs as a user who executed it.
ChunkserversWhen the managing server is installed, data servers (chunkservers) may be set up. These machines should have appropriate free space on disks and POSIX compliant OS (verified so far: Linux, FreeBSD, Mac OS X and OpenSolaris). Chunkserver stores data chunks/fragments as files on a common file system (eg. ext3, xfs, ufs). It is important to dedicate file systems used by MooseFS exclusively to it - this is necessary to manage the free space properly. MooseFS does not take into account that a free space accessible to it could be taken by other data. If it's not possible to create a separate disk partition, filesystems in files can be used (have a look at the following instructions).
Linux:
creating:
dd if=/dev/zero of=file bs=100m seek=400 count=0 mkfs -t ext3 file
mounting:
mount -o loop file mount-point
FreeBSD:
creating and mounting:
dd if=/dev/zero of=file bs=100m count=400
mdconfig -a -t vnode -f file -u X
newfs -m0 -O2 /dev/mdX
mount /dev/mdX mount-point
mounting a previously created file system:
mdconfig -a -t vnode -f file -u X
mount /dev/mdX mount-point
Mac OS X:
Start "Disk Utility" from "/Applications/Utilities"
Select from menu "Images->New->Blank Image ..."
Note: on each chunkserver disk some space is reserved for growing chunks and thus inaccessible for creation of the new ones. Only disks with more than 256 MB and chunkservers reporting more than 1 GB of total free space are accessible for new data. Minimal configurations should start from several gigabytes of storage.
To install the data server (chunkserver):
isolate space intended for MooseFS as separate file systems, mounted at a defined point (e.g. /mnt/hd1, /mnt/hd2 itd.)
install mfs-chunkserver package (make install after running configure without --disable-mfschunkserver option in case of instalation from source files)
create a user, with whose permissions the service is about to work (if such user doesn't already exist)
give this user permissions to write to all filesystems dedicated to MooseFS
configure the service (mfschunkserver.cfg file), paying special attention to used TCP ports (MASTER_PORT has to be the same as MATOCS_LISTEN_PORT in mfsmaster.cfg on managing server)
enter list of mount points of file systems dedicated to MooseFS file mfshdd.conf
add or create (depending both on the operating system and distribution) batch script starting process mfschunkserver
Note: It's important which local IP address mfschunkserver uses to connect to mfsmaster. This address is passed by mfsmaster to MFS clients (mfsmount) and other chunkservers to communicate with the chunkserver, so it must be remotely accessible. Thus master address (MASTER_HOST) must be set to such for which chunkserver will use proper local address to connect - usually belonging to the same network as all MFS clients and other chunkservers. Generally loopback addres (localhost, 127.0.0.1) can't be used as MASTER_HOST, as it would make the chunkserver inaccessible for any other host (such configuration would work only on single machine running all of mfsmaster, mfschunkserver and mfsmount).
After installation data server is started simply by running mfschunkserver. If mfschunkserver program is executed by root, it switches to configured user; otherwise it runs as user who executed it.
Clients (mfsmount)mfsmount requires to work. FUSE is available on several operating systems: Linux, FreeBSD, OpenSolaris and MacOS X
In case of Linux a kernel module with API version at least 7.8 is required (it can be checked with dmesg command - after loading kernel module there should be line fuse init (API version 7.8)). It is available in fuse package version 2.6.0 or later or Linux kernel 2.6.20 or later. Due to some minor bugs, the newer module is recommended ((fuse 2.7.2 or Linux 2.6.24 (although fuse 2.7.x standalone doesn't contain getattr/write race condition fix)
In case of FreeBSD the module version 0.3.9 or later should be used
On MacOSX 10.5 and 10.6 MacFUSE was tested
Installing MooseFS client:
install mfs-client package (make install after running configure without --disable-mfsmount option in case of installation from source files)
create a directory where MooseFS will be mounted (e.g. /mnt/mfs)
MooseFS is mounted with the following command:
mfsmount [-h master] [-p port] [-l path] [-w mount-point]
where master is the host name of the managing server, port is the same as given in MATOCU_LISTEN_PORT in file mfsmaster.cfg, path is mounted MooseFS subdirectory (default is /, which means mounting the whole file system), mount-point is the previously created directory for MooseFS.
USING MOOSEFS Mounting the File SystemAfter launching the managing server and data servers (chunkservers) (one is required but at least two are recommended) one can mount file system by starting the } process. MooseFS is mounted with the following command:
mfsmount mountpoint [-d] [-f] [-s] [-m] [-n] [-p] [-H MASTER] [-P PORT] [-S PATH] [-o OPT[,OPT...]]
where MASTER is the host name of the managing server, PORT is the same as given in MATOCU_LISTEN_PORT in file mfsmaster.cfg, PATH is mounted MooseFS subdirectory (default is /, which means mounting the whole file system), mountpoint is the previously created directory for MooseFS.
By starting the mfsmount process with the -m (or -o mfsmeta) option one can mount the auxiliary file system MFSMETA (which may be useful to restore a file accidentally deleted from the MooseFS volume or to free some space by removing a file before elapsing the quarantine time), for example:
mfsmount -m /mnt/mfsmeta
Basic operationsAfter mounting the file system one can perform all standard file operations (like creating files, copying, deleting, changing names, etc.). MooseFS is a networking file system, so operations progress may be slower than in a local system.
Free space on the MooseFS volume can be checked the same way as for local file systems, e.g. with the df command:
$ df -h | grep mfs
mfsmaster:9421 85T 80T 4.9T 95% /mnt/mfs
mfsmaster:9321 394G 244G 151G 62% /mnt/mfs-test
What is important is that each file can be stored in more than one copy. In such cases it takes adequately more space than its proper size. Additionally, files deleted during the quarantine time are kept in a "trash can" so they also take up space, their size also depends on the number of copies). Just like in other Unix file systems, in case of deleting a file opened by some other process, data is stored at least until the file is closed.
You may alse like to have a look at this FAQ entry: When doing df -h on a filesystem the results are different from what I would expect.
Operations specific for MooseFS Setting the goalThe "goal" (i.e. the number of copies for a given file) can be verified by the mfsgetgoal command and changed with the mfssetgoal command:
$ mfsgetgoal /mnt/mfs-test/test1
/mnt/mfs-test/test1: 2
$ mfssetgoal 3 /mnt/mfs-test/test1
/mnt/mfs-test/test1: 3
$ mfsgetgoal /mnt/mfs-test/test1
/mnt/mfs-test/test1: 3
Similar operations can be done on the whole directory trees with the mfsgetgoal -r and mfssetgoal -r commands:
$ mfsgetgoal -r /mnt/mfs-test/test2
/mnt/mfs-test/test2:
files with goal 2 : 36
directories with goal 2 : 1
$ mfssetgoal -r 3 /mnt/mfs-test/test2
/mnt/mfs-test/test2:
inodes with goal changed: 37
inodes with goal not changed: 0
inodes with permission denied: 0
$ mfsgetgoal -r /mnt/mfs-test/test2
/mnt/mfs-test/test2:
files with goal 3 : 36
directories with goal 3 : 1
The actual number of copies of a file can be verified with the mfscheckfile and mfsfileinfo commands:
$ mfscheckfile /mnt/mfs-test/test1
/mnt/mfs-test/test1:
3 copies: 1 chunks
$ mfsfileinfo /mnt/mfs-test/test1
/mnt/mfs-test/test1:
chunk 0: 00000000000520DF_00000001 / (id:336095 ver:1)
copy 1: 192.168.0.12:9622
copy 2: 192.168.0.52:9622
copy 3: 192.168.0.54:9622
Note: a zero length file contains no data, so despite the non-zero "goal" setting for such file, these commands will return an empty result.
In case of a change in the number of copies of an already existing file, the data will be multiplied or adequately deleted with a delay. It can be verified using the commands described above.
Setting the "goal" for a directory is inherited for the new files and directories created within it (it does not change the number of copies of already existing files).
The summary of the contents of the whole tree (an enhanced equivalent of du -s, with information specific for MooseFS) can be called up with the command mfsdirinfo:
$ mfsdirinfo /mnt/mfs-test/test/:
inodes: 15
directories: 4
files: 8
chunks: 6
length: 270604
size: 620544
realsize: 1170432
The above summary displays the number of the directories, files, data fragments (chunks) used by the files, as well as the size of the disk's space taken by files in the directory (length - the sum of file sizes, size - with block size taken into account, realsize - total disk space utilization considering all copies of chunks).
Setting quarantine time for trash binA quarantine time of storing a deleted file in a "trash can" can be verified by the mfsgettrashtime command and changed with mfssettrashtime:
$ mfsgettrashtime /mnt/mfs-test/test1
/mnt/mfs-test/test1: 604800
$ mfssettrashtime 0 /mnt/mfs-test/test1
/mnt/mfs-test/test1: 0
$ mfsgettrashtime /mnt/mfs-test/test1
/mnt/mfs-test/test1: 0
These tools also have recursive option operating on whole directory trees:
$ mfsgettrashtime -r /mnt/mfs-test/test2
/mnt/mfs-test/test2:
files with trashtime 0 : 36
directories with trashtime 604800 : 1
$ mfssettrashtime -r 1209600 /mnt/mfs-test/test2
/mnt/mfs-test/test2:
inodes with trashtime changed: 37
inodes with trashtime not changed: 0
inodes with permission denied: 0
$ mfsgettrashtime -r /mnt/mfs-test/test2
/mnt/mfs-test/test2:
files with trashtime 1209600 : 36
directories with trashtime 1209600 : 1
Time is given in seconds (useful values: 1 hour is 3600 seconds, 24h - 86400 seconds, 1 week - 604800 seconds). Just as in the case of the number of copies, the storing time set for a directory is inherited for newly created files and directories. The number 0 means that a file after the removal will be deleted immediately and its recovery will not be possible.
Removed files may be accessed through a separately mounted MFSMETA file system. In particular it contains directories /trash (containing information about deleted files that are still being stored) and /trash/undel (designed for retrieving files). Only the administrator has access to MFSMETA (user with uid 0, usually root).
$ mfssettrashtime 3600 /mnt/mfs-test/test1
/mnt/mfs-test/test1: 3600
$ rm /mnt/mfs-test/test1
$ ls /mnt/mfs-test/test1
ls: /mnt/mfs-test/test1: No such file or directory
# ls -l /mnt/mfs-test-meta/trash/*test1
-rw-r--r-- 1 user users 1 2007-08-09 15:23 /mnt/mfs-test-meta/trash/00013BC7|test1
The name of the file that is still visible in the "trash" directory consists of an 8-digit hexadecimal i-node number and a path to the file relative to the mounting point with characters / replaced with the | character. If such a name exceeds the limits of the operating system (usually 255 characters), the initial part of the path is deleted.
The full path of the file in relation to the mounting point can be read or saved by reading or saving this special file:
# cat '/mnt/mfs-test-meta/trash/00013BC7|test1'
test1
# echo 'test/test2' > '/mnt/mfs-test-meta/trash/00013BC7|test1'
# cat '/mnt/mfs-test-meta/trash/00013BC7|test1'
test/test2
Moving this file to the trash/undel subdirectory causes a restoring of the original file in a proper MooseFS file system - at path set in a way described above or the original path (if it was not changed).
Note: if a new file with the same path already exists, restoring of the file will not succeed.
Deleting the file from the "trash can" results in releasing space previously taken up by it (with a delay - the data is deleted asynchronously). In such cases it is impossible to restore the file.
It is also possible to change the number of copies or the time of storing files in the "trash can" with mfssetgoal and mfssettrashtime tools (like for the files on the proper MooseFS).
Beside the trash and trash/undel directories MFSMETA holds a third directory reserved with files intended for final removal, but still open. These files will be erased and their data will be deleted immediately after the last user closes them. Files in the reserved directory are named the same way as those in trash, but no further operations are possible for these files.
Taking snapshotsAnother characteristic feature of the MooseFS system is the possibility of taking a snapshot of the file or directory tree with the mfsmakesnapshot command:
$ mfsmakesnapshot source ... destination
(In case of normal file duplication, data of the file can be changed by another process writing to the source file. mfsmakesnapshot prepares a copy of the whole file (or files) in one operation. Furthermore, until modification of any of the files takes place, the copy does not take up any additional space.)
After such operation, subsequent writes to the source file do not modify the copy (nor vice versa).
Alternatively, file snapshots can be created using mfsappendchunks utility, which works like mfssnapshot known from MooseFS 1.5:
$ mfsappendchunks destination-file source-file ...
When multiple source files are given, their snapshots are added to the same destination file, padding each to chunk boundary (64MB).
Additional attributesAdditional attributes of file or directory (noowner, noattrcache, noentrycache) can be checked, set or deleted using mfsgeteattr, mfsseteattr and mfsdeleattr utilities, which behave similarly to mfsgetgoal/mfssetgoal or mfsgettrashtime/mfssettrashtime. See mfstools manual page for details.
MOOSEFS MAINTENANCE Starting MooseFS clusterThe safest way to start MooseFS (avoiding any read or write errors, inaccessible data or similar problems) is to run the following commands in this sequence:
To safely stop MooseFS:
Provided that there are no files with a goal lower than 2 and no under-goal files (what can be checked by mfsgetgoal -r and mfsdirinfo commands), it is possible to stop or restart a single chunkserver at any time. When you need to stop or restart another chunkserver afterwards, be sure that the previous one is connected and there are no under-goal chunks.
MooseFS metadata backupsThere are two general parts of metadata:
The main metadata file needs regular backups with the frequency depending on how many hourly changelogs are stored. Metadata changelogs should be automatically replicated in real time. Since MooseFS 1.6.5, both tasks are done by mfsmetalogger daemon.
MooseFS master recoveryIn case of mfsmaster crash (due to e.g. host or power failure) last metadata changelog needs to be merged into the main metadata file. It can be done with the mfsmetarestore utility; the simplest way to use it is:
$ mfsmetarestore -a
If master data are stored in location other than the specified during MooseFS compilation, the actual path needs to be specified using the -d option, e.g.:
$ mfsmetarestore -a -d /storage/mfsmaster
MooseFS master recovery from a backupIn order to restore the master host from a backup:
$ mfsmetarestore -m metadata.mfs.back -o metadata.mfs changelog.*.mfs
Please also read a mini howto about preparing . In that document we present a solution using CARP and in which metalogger takes over functionality of the broken master server.