分类: LINUX
2009-11-19 11:33:44
MFS Deployment
The preferred MFS deployment method is installation from the source.
Source package supports standard ./configure && make && make install procedure. Significant configure options are:
For example, to install MFS using system FHS-compliant paths on Linux, use: ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var/lib
make install respects standard DESTDIR= variable, allowing to install package in temporary location (e.g. in order to create binary package). Already existing configuration or metadata files won't be overwritten.
Managing server (master)
As the managing server (master) is a crucial element of MFS, it should be installed on a machine which guarantees high reliability and access requirements which are adequate for the whole system. It is advisable to use a server with a redundant power supply, ECC memory, and disk array RAID1/RAID5/RAID10. The managing server OS has to be POSIX compliant (systems verified so far: Linux, FreeBSD, Mac OS X and OpenSolaris).
To install the managing process one needs to:
After the installation the managing server is started by running mfsmaster. If the mfsmaster program is executed by the root, it switches to a configured user; otherwise it runs as a user who executed it. In case of a server outage or an improper shutdown the mfsmetarestore utility will restore the file system information.
Data servers (chunkservers)
When the managing server is installed, data servers (chunkservers) may be set up. These machines should have appropriate free space on disks and POSIX compliant OS (verified so far: Linux, FreeBSD, Mac OS X and OpenSolaris). Chunkserver stores data chunks/fragments as files on a common file system (eg. ext3, xfs, ufs). It is important to dedicate file systems used by MFS exclusively to it - this is necessary to manage the free space properly. MFS does not take into account that a free space accessible to it could be taken by other data. If it's not possible to create a separate disk partition, a filesystems in files can be used (have a look at the following instructions).
Linux:
dd if=/dev/zero of=file bs=100m seek=400 count=0
mkfs -t ext3 file
mount -o loop file mount-point
FreeBSD:
dd if=/dev/zero of=file bs=100m count=400
mdconfig -a -t vnode -f file -u X
newfs -m0 -O2 /dev/mdX
mount /dev/mdX mount-point
mdconfig -a -t vnode -f file -u X
mount /dev/mdX mount-point
Mac OS X:
Start "Disk Utility" from "/Applications/Utilities"
Select from menu "Images->New->Blank Image ..."
Note: on each chunkserver disk some space is reserved for growing chunks and thus inaccessible for creation of the new ones. Only disks with more than 256 MB and chunkservers reporting more than 1 GB of total free space are accessible for new data. Minimal configurations should start from several gigabytes of storage.
To install the data server (chunkserver):
Note: It's important which local IP address mfschunkserver uses to connect to mfsmaster. This address is passed by mfsmaster to MFS clients (mfsmount) and other chunkservers to communicate with the chunkserver, so it must be remotely accessible. Thus master address (MASTER_HOST) must be set to such for which chunkserver will use proper local address to connect - usually belonging to the same network as all MFS clients and other chunkservers. Generally loopback addres (localhost, 127.0.0.1) can't be used as MASTER_HOST, as it would make the chunkserver inaccessible for any other host (such configuration would work only on single machine running all of mfsmaster, mfschunkserver and mfsmount).
After installation data server is started simply by running mfschunkserver. If mfschunkserver program is executed by root, it switches to configured user; otherwise it runs as user who executed it.
Clients (mfsmount)
mfsmount requires to work. FUSE is available on several operating systems: Linux, FreeBSD, OpenSolaris and MacOS X
Installing MFS client:
MFS is mounted with the following command:
mfsmount [-h master] [-p port] [-l path] [-w mount-point]
where master is the host name of the managing server, port is the same as given in MATOCU_LISTEN_PORT in file mfsmaster.cfg, path is mounted MFS subdirectory (default is /, which means mounting the whole file system), mount-point is the previously created directory for MFS.
Using MFS:
Mounting the File System
After launching the managing server and data servers (chunkservers) (one is required, but at least two are recommended) one can mount file system by starting the mfsmount process. The default mount point of MFS is /mnt/mfs directory, but another path can be given by using -w path option, for example:
mfsmount -w /mnt/mfs-test
Furthermore, by starting the mfsmount process with the -m option one can mount the auxiliary file system MFSMETA (which may be useful to restore a file accidentally deleted from the MFS volume or to free some space by removing a file before elapsing the quarantine time.). MFSMETA is mounted by default in the /mnt/mfsmeta directory, but another path can also be given using -w path option, for example:
mfsmount -m -w /mnt/mfsmeta-test
Basic operations
After mounting the file system one can perform all standard file operations (like creating files, copying, deleting, changing names, etc.). MFS is a networking file system, so operations progress may be slower than in a local system.
Free space on the MFS volume can be checked the same way as for local file systems, e.g. with the df> command:
$ df -h | grep MFS
MFS 85T 80T 4.9T 95% /mnt/mfs
MFS 394G 244G 151G 62% /mnt/mfs-test
MFSMETA 51G 51G 0 100% /mnt/mfs-test-meta
What is important is that each file can be stored in more than one copy. In such cases it takes adequately more space than its proper size. Additionally, files deleted during the quarantine time are kept in a "trash can" so they also take up space their size also depend on the number of copies). Just like in other Unix file systems, in case of deleting a file opened by some other process, data is stored at least until the file is closed.
Occupied space in the MSFMETA file system as presented by df is space taken by the data in the "trash can", whereas free space is space taken by data of the files that should be deleted without further storing in the "trash can", but which have been opened by some other processes.
Operations specific for MFS
The "goal" (i.e. the number of copies for a given file) can be verified by the mfsgetgoal command and changed with the mfssetgoal command:
$ mfsgetgoal /mnt/mfs-test/test1
/mnt/mfs-test/test1: 2
$ mfssetgoal 3 /mnt/mfs-test/test1
/mnt/mfs-test/test1: 3
$ mfsgetgoal /mnt/mfs-test/test1
/mnt/mfs-test/test1: 3
Similar operations can be done on the whole directory trees with the mfsrgetgoal and mfsrsetgoal commands:
$ mfsrgetgoal /mnt/mfs-test/test2
/mnt/mfs-test/test2:
files with goal 2 : 36 (36)
directories with goal 2 : 1 (1)
$ mfsrsetgoal 3 /mnt/mfs-test/test2
/mnt/mfs-test/test2:
inodes with goal changed: 37 (37)
inodes with goal not changed: 0 (0)
inodes with permission denied: 0 (0)
$ mfsrgetgoal /mnt/mfs-test/test2
/mnt/mfs-test/test2:
files with goal 3 : 36 (36)
directories with goal 3 : 1 (1)
The actual number of copies of a file can be verified with the mfscheckfile and mfsfileinfo commands:
$ mfscheckfile /mnt/mfs-test/test1
/mnt/mfs-test/test1:
3 copies: 1 chunks
$ mfsfileinfo /mnt/mfs-test/test1
/mnt/mfs-test/test1:
chunk 0: 00000000000520DF_00000001 / (id:336095 ver:1)
copy 1: 192.168.0.12:9622
copy 2: 192.168.0.52:9622
copy 3: 192.168.0.54:9622
Note: a zero length file contains no data, so despite the non-zero "goal" setting for such files, these commands will return an empty result.
In case of a change in the number of copies of an already existing file, the data will be multiplied (or an extra copy will be deleted) with a delay. It can be verified using the commands described above.
Setting the "goal" for a directory is inherited for the new files and directories created within it (it does not change the number of copies of already existing files).
The summary of the contents of the whole tree (an enhanced equivalent of du -s, with information specific for MFS) can be called up with the command mfsdirinfo:
$ mfsdirinfo /mnt/mfs-test/test/:
inodes: 15 (15)
directories: 4 (4)
files: 8 (8)
good files: 7 (7)
under goal files: 0 (0)
missing files: 1 (1)
chunks: 6 (6)
good chunks: 5 (5)
under goal chunks: 0 (0)
missing chunks: 1 (1)
length: 264K (270604)
size: 606K (620544)
hdd usage: 1.1M (1170432)
The above summary displays the number of the directories, files, data fragments (chunks) used by the files with its condition (good, under goal - with the number of copies lower than the set "goal", missing - with data lost due to failure of all machines storing a given chunk), as well as the size of the disk's space taken by files in the directory (length - the sum of file sizes, size - with block size takien into account, hdd usage - total disk space utilization considering all copies of chunks.
The time of storing a deleted file can be verified by the mfsgettrashtime command and changed with mfssettrashtime:
$ mfsgettrashtime /mnt/mfs-test/test1
/mnt/mfs-test/test1: 604800
$ mfssettrashtime 0 /mnt/mfs-test/test1
/mnt/mfs-test/test1: 0
$ mfsgettrashtime /mnt/mfs-test/test1
/mnt/mfs-test/test1: 0
These tools also have their recursive equivalents mfsrgettrashtime and mfsrsettrashtime operating on whole directory trees:
$ mfsrgettrashtime /mnt/mfs-test/test2
/mnt/mfs-test/test2:
files with trashtime 604800 : 36 (36)
directories with trashtime 604800 : 1 (1)
$ mfsrsettrashtime 1209600 /mnt/mfs-test/test2
/mnt/mfs-test/test2:
inodes with trashtime changed: 37 (37)
inodes with trashtime not changed: 0 (0)
inodes with permission denied: 0 (0)
$ mfsrgettrashtime /mnt/mfs-test/test2
/mnt/mfs-test/test2:
files with trashtime 1209600 : 36 (36)
directories with trashtime 1209600 : 1 (1)
Time is given in seconds (useful values: 1 hour is 3600 seconds, 24h - 86400 seconds, 1 week - 604800 seconds). Just as in the case of the number of copies, the storing time set for a directory is inherited for newly created files and directories. The number 0 means that a file after the removal will be deleted immediately and its recovery will not be possible.
Removed files may be accessed through a separately mounted MFSMETA file system. In particular it contains directories /trash (containing information about deleted files that are still being stored) and /trash/undel (designed for retrieving files). Only the administrator has access to MFSMETA (user with uid 0, usually root).
$ mfssettrashtime 3600 /mnt/mfs-test/test1
/mnt/mfs-test/test1: 3600
$ rm /mnt/mfs-test/test1
$ ls /mnt/mfs-test/test1
ls: /mnt/mfs-test/test1: No such file or directory
# ls -l /mnt/mfs-test-meta/trash/*test1
-rw-r--r-- 1 user users 1 2007-08-09 15:23 /mnt/mfs-test-meta/trash/00013BC7|test1
The name of the file that is still visible in the "trash" directory consists of an 8-digit hexadecimal i-node number and a path to the file relative to the mounting point with characters / replaced with the | character. If such a name exceeds the limits of the operating system (usually 255 characters), the initial part of the path is deleted.
The full path of the file in relation to the mounting point can be read or saved by reading or saving this special file:
# cat '/mnt/mfs-test-meta/trash/00013BC7|test1'
test1
# echo 'test/test2' > '/mnt/mfs-test-meta/trash/00013BC7|test1'
# cat '/mnt/mfs-test-meta/trash/00013BC7|test1'
test/test2
Moving this file to the trash/undel subdirectory causes a restoring of the original file in proper MFS file system - at path set in a way described above or the original path (if it was not changed).
Note: if a new file with the same path already exists, restoring of the file will not succeed.
Deleting the file from the "trash can" results in releasing space previously taken up by it (with a delay - the data is deleted asynchronously). In such cases it is impossible to restore the file.
It is also possible to change the number of copies or the time of storing files in the "trash can" with mfssetgoal and mfssettrashtime tools (like for the files on the proper MFS).
Beside the trash and trash/undel directories MFSMETA holds a third directory reserved with files intended for final removal, but still open. These files will be erased and their data will be deleted immediately after the last user closes them. Files in the reserved directory are named the same way as those in trash, but no further operations are possible for these files.
Another characteristic feature of the MFS system is the possibility of taking a snapshot of the file with the the mfssnapshot command:
$ mfssnapshot copy source-file
(In case of normal file duplication, data of the file can be changed by another process writing to the source file. mfssnapshot prepares a copy of the whole file in one operation. Furthermore, until modification of any of the files takes place, the copy does not take up any additional space.)
After such an operation subsequent writes to the source file do not modify the copy (nor vice versa).
MFS maintenance
Starting MFS cluster
The safest way to start MFS (avoiding any read or write errors, inaccessible data or similar problems) is:
Stopping MFS cluster
To safely stop MFS:
Maintenance of MFS chunkservers
Provided that there are no files with a goal lower than 2 and no under-goal files (what can be checked by mfsrgetgoal and mfsdirinfo commands), it is possible to stop or restart a single chunkserver at any time. When you need to stop or restart another chunkserver afterwards, be sure that the previous one is connected and there are no under-goal chunks.
MFS metadata backups
There are two general parts of metadata:
The main metadata file needs regular backups with the frequency depending on how many hourly changelogs are stored.
Metadata changelogs are automatically replicated to all chunkservers in real time (with changelog_csback.*.mfs names).
MFS master recovery
In case of mfsmaster crash (due to e.g. host or power failure) last metadata changelog needs to be merged into the main metadata file. It can be done with the mfsmetarestore utility; the simplest way to use it is:
mfsmetarestore -a
If master data are stored in location other than the specified during MFS compilation, the actual path needs to be specified using the -d option, e.g.:
mfsmetarestore -a -d /storage/mfsmaster
MFS master recovery from a backup
In order to restore the master host from a backup: