一个轻量级的分布式文件存储FastDFS-fatsandwich-ChinaUnix博客

sandwich vs Linux

首页　| 　博文目录　| 　关于我

fatsandwich

博客访问： 454533
博文数量： 161
博客积分： 5005
博客等级：上校
技术积分： 1090
用户组：普通用户
注册时间： 2008-10-20 16:38

文章分类

全部博文（161）

linux kernel（3）
other（1）
mac（6）
php（7）
linux基础知识（10）
数据处理（2）
网络技术应用（6）
IT业界（3）
script（4）

shell（4）
eclipse（3）
杂（5）
architecture（4）
everyday sandwic（9）
内核（4）
Android（4）
常识（6）
C/C++（3）

基础知识（3）
perl（1）
linux下编程（4）

小白问题（1）

函数使用备忘（1）

Qt（1）
python（11）

Django（2）
linux管理（14）

pure-ftpd（1）
Linux应用（15）

netshare（1）
MySQL（7）
JAVA（20）

Junit（5）

语法（1）

Spring（7）
GCC内嵌汇编（1）
VI（3）
未分配的博文（5）

文章存档

2011年（21）

2010年（33）

2009年（89）

2008年（18）

我的朋友

相关博文

一个轻量级的分布式文件存储FastDFS

分类：数据库开发技术

2011-01-11 20:11:02

国人做的用C语言写的轻量级的分布式文件存储，只有 tracker和storage 节点。没有使用数据库。

文件下载地址:

作者做的和MogileFS的对比

FastDFS设计时借鉴了MogileFS的一些思路。FastDFS是一个完善的分布式文件存储系统，通过客户端API对文件进行读写。可以说，MogileFS的所有功能特性FastDFS都具备，

MogileFS网址：。

另外，相对于MogileFS，FastDFS具有如下特点和优势：
1. FastDFS完善程度较高，不需要二次开发即可直接使用；
2. 和MogileFS相比，FastDFS裁减了跟踪用的数据库，只有两个角色：tracker和storage。FastDFS的架构既简化了系统，同时也消除了性能瓶颈；
3. 在系统中增加任何角色的服务器都很容易：增加tracker服务器时，只需要修改storage和client的配置文件（增加一行tracker配置）；增加storage服务器时，通常不需要修改

任何配置文件，系统会自动将该卷中已有文件复制到该服务器；
4. FastDFS比MogileFS更高效。表现在如下几个方面：
1）参见上面的第2点，FastDFS和MogileFS相比，没有文件索引数据库，FastDFS整体性能更高；
2）从采用的开发语言上看，FastDFS比MogileFS更底层、更高效。FastDFS用C语言编写，代码量不到2万行，没有依赖其他开源软件或程序包，安装和部署特别简洁；而MogileFS用

perl编写；
3）FastDFS直接使用socket通信方式，相对于MogileFS的HTTP方式，效率更高。并且FastDFS使用sendfile传输文件，采用了内存零拷贝，系统开销更小，文件传输效率更高。
5. FastDFS有着详细的设计和使用文档，而MogileFS的文档相对比较缺乏。
6. FastDFS的日志记录非常详细，系统运行时发生的任何错误信息都会记录到日志文件中，当出现问题时方便管理员定位错误所在。
7. FastDFS还对文件附加属性（即meta data，如文件大小、图片宽度、高度等）进行存取，应用不需要使用数据库来存储这些信息。
----------------------------------------------------------------------------------------------------------------------
1. 下载

wget

2．解压文件并安装编译环境

需要先安装编译环境，使用以下命令
wget
tar -zxvf FastDFS_v1.26.tar.gz

2. 编译安装

cd FastDFS

./make.sh && ./make.sh install

3. tracker配置
1)配置文件
vi /usr/local/FastDFS/conf/mytracker.conf
disabled=false
bind_addr=
port=22122
network_timeout=60
base_path=/home/fastdfs/
max_connections=256
store_lookup=0
store_group=group1
store_server=0
store_path=0
download_server=0
reserved_storage_space = 4GB
log_level=info
run_by_group=
run_by_user=
allow_hosts=*
sync_log_buff_interval=10

2)启动脚本
vi /usr/local/FastDFS/start.sh
#!/bin/sh
/usr/local/bin/fdfs_trackerd /usr/local/config/tracker.conf
3)重启
/usr/local/FastDFS/restart.sh /usr/local/bin/fdfs_trackerd /usr/local/config/tracker.conf

4. storage配置
1)配置文件
vi /usr/local/config/storage.conf
disabled=false
group_name=group1
bind_addr=192.168.0.14
port=23000
network_timeout=60
heart_beat_interval=10
stat_report_interval=3
base_path=/home/fastdfs/
max_connections=256
sync_wait_msec=1
sync_interval=0
sync_start_time=00:00
sync_end_time=23:59
store_path_count=1
store_path0=/iscsi/dfs1/
subdir_count_per_path=256
tracker_server=192.168.0.55:22122
log_level=info
run_by_group=
run_by_user=
allow_hosts=192.168.0.[1-255]
file_distribute_path_mode=0
file_distribute_rotate_count=100
fsync_after_written_bytes=0
sync_log_buff_interval=10
sync_binlog_buff_interval=60
check_file_duplicate=0
key_namespace=FastDFS
keep_alive=0

2)启动脚本
vi /usr/local/bin/start.sh
#!/bin/sh
/usr/local/bin/fdfs_storaged /usr/local/config/storage.conf

3)重启命令
/usr/local/bin/restart.sh /usr/local/bin/fdfs_storaged /usr/local/config/storage.conf

5.运行测试程序
/usr/local/bin/fdfs_test
例如上传测试文件：
/usr/local/bin/fdfs_test conf/storage.conf upload /etc/fstab

6.显示系统状态
/usr/local/bin/fdfs_monitor

Setup:

完成！

------------------------------配置说明------------------------------------------
首先是 tracker.conf

# is this config file disabled
# false for enabled
# true for disabled
disabled=false
# 这个配置文件是否不生效,呵呵(改成是否生效是不是会让人感觉好点呢?) false 为生效(否则不生效) true反之

# bind an address of this host
# empty for bind all addresses of this host
bind_addr=
# 是否绑定IP,
# bind_addr= 后面为绑定的IP地址 (常用于服务器有多个IP但只希望一个IP提供服务)。如果不填则表示所有的(一般不填就OK),相信较熟练的SA都常用到类似功能,很多系统和应

用都有

# the tracker server port
port=22122
# 提供服务的端口,不作过多解释了

# network timeout in seconds
network_timeout=60
# tracker server的网络超时，单位为秒。发送或接收数据时，如果在超时时间后还不能发送或接收数据，则本次网络通信失败。

# the base path to store data and log files
base_path=/home/yuqing/fastdfs
# base_path 目录地址(根目录必须存在,子目录会自动创建)
# 附目录说明:
tracker server目录及文件结构：
${base_path}
    |__data
    |     |__storage_groups.dat：存储分组信息
    |     |__storage_servers.dat：存储服务器列表
    |__logs
          |__trackerd.log：tracker server日志文件

数据文件storage_groups.dat和storage_servers.dat中的记录之间以换行符（\n）分隔，字段之间以西文逗号（,）分隔。
storage_groups.dat中的字段依次为：
1. group_name：组名
2. storage_port：storage server端口号

storage_servers.dat中记录storage server相关信息，字段依次为：
1. group_name：所属组名
2. ip_addr：ip地址
3. status：状态
4. sync_src_ip_addr：向该storage server同步已有数据文件的源服务器
5. sync_until_timestamp：同步已有数据文件的截至时间（UNIX时间戳）
6. stat.total_upload_count：上传文件次数
7. stat.success_upload_count：成功上传文件次数
8. stat.total_set_meta_count：更改meta data次数
9. stat.success_set_meta_count：成功更改meta data次数
10. stat.total_delete_count：删除文件次数
11. stat.success_delete_count：成功删除文件次数
12. stat.total_download_count：下载文件次数
13. stat.success_download_count：成功下载文件次数
14. stat.total_get_meta_count：获取meta data次数
15. stat.success_get_meta_count：成功获取meta data次数
16. stat.last_source_update：最近一次源头更新时间（更新操作来自客户端）
17. stat.last_sync_update：最近一次同步更新时间（更新操作来自其他storage server的同步）

# max concurrent connections this server supported
# max_connections worker threads start when this service startup
max_connections=256
# 系统提供服务时的最大连接数。因一个连接由一个线程服务，也就是工作线程数。

# the method of selecting group to upload files
# 0: round robin
# 1: specify group
# 2: load balance, select the max free space group to upload file
store_lookup=2
# 上传组(卷) 的方式 0:轮询方式 1: 指定组 2: 平衡负载(选择最大剩余空间的组(卷)上传)
# 这里如果在应用层指定了上传到一个固定组,那么这个参数被绕过

# which group to upload file
# when store_lookup set to 1, must set store_group to the group name
store_group=group2
# 当上一个参数设定为1 时 (store_lookup=1，即指定组名时)，必须设置本参数为系统中存在的一个组名。如果选择其他的上传方式，这个参数就没有效了。

# which storage server to upload file
# 0: round robin (default)
# 1: the first server order by ip address
# 2: the first server order by priority (the minimal)
store_server=0
# 选择哪个storage server 进行上传操作(一个文件被上传后，这个storage server就相当于这个文件的storage server源，会对同组的storage server推送这个文件达到同步效

果)
# 0: 轮询方式
# 1: 根据ip 地址进行排序选择第一个服务器（IP地址最小者）
# 2: 根据优先级进行排序（上传优先级由storage server来设置，参数名为upload_priority）

# which path(means disk or mount point) of the storage server to upload file
# 0: round robin
# 2: load balance, select the max free space path to upload file
store_path=0
# 选择storage server 中的哪个目录进行上传。storage server可以有多个存放文件的base path（可以理解为多个磁盘）。
# 0: 轮流方式，多个目录依次存放文件
# 2: 选择剩余空间最大的目录存放文件（注意：剩余磁盘空间是动态的，因此存储到的目录或磁盘可能也是变化的）

# which storage server to download file
# 0: round robin (default)
# 1: the source storage server which the current file uploaded to
download_server=0
# 选择哪个 storage server 作为下载服务器
# 0: 轮询方式，可以下载当前文件的任一storage server
# 1: 哪个为源storage server 就用哪一个 (前面说过了这个storage server源是怎样产生的) 就是之前上传到哪个storage server服务器就是哪个了

# reserved storage space for system or other applications.
# if the free(available) space of any stoarge server in
# a group <= reserved_storage_space,
# no file can be uploaded to this group.
# bytes unit can be one of follows:
### G or g for gigabyte(GB)
### M or m for megabyte(MB)
### K or k for kilobyte(KB)
### no unit for byte(B)
reserved_storage_space = 4GB
# storage server 上保留的空间,保证系统或其他应用需求空间(指出如果同组的服务器的硬盘大小一样,以最小的为准,也就是只要同组中有一台服务器达到这个标准了,这个标准

就生效,原因就是因为他们进行备份)

#standard log level as syslog, case insensitive, value list:
### emerg for emergency
### alert
### crit for critical
### error
### warn for warning
### notice
### info
### debug
log_level=info
# 选择日志级别(日志写在哪?看前面的说明了,有目录介绍哦呵呵)

#unix group name to run this program,
#not set (empty) means run by the group of current user
run_by_group=
# 操作系统运行FastDFS的用户组 (不填就是当前用户组,哪个启动进程就是哪个)

#unix username to run this program,
#not set (empty) means run by current user
run_by_user=
# 操作系统运行FastDFS的用户 (不填就是当前用户,哪个启动进程就是哪个)

# allow_hosts can ocur more than once, host can be hostname or ip address,
# "*" means match all ip addresses, can use range like this: 10.0.1.[1-15,20] or
# host[01-08,20-25].domain.com, for example:
# allow_hosts=10.0.1.[1-15,20]
# allow_hosts=host[01-08,20-25].domain.com
allow_hosts=*
# 可以连接到此 tracker server 的ip范围（对所有类型的连接都有影响，包括客户端，storage server）

# sync log buff to disk every interval seconds
# default value is 10 seconds
sync_log_buff_interval = 10
# 同步或刷新日志信息到硬盘的时间间隔，单位为秒
# 注意：tracker server 的日志不是时时写硬盘的，而是先写内存。

# check storage server alive interval
check_active_interval = 120
# 检测 storage server 存活的时间隔，单位为秒。
# storage server定期向tracker server 发心跳，如果tracker server在一个check_active_interval内还没有收到storage server的一次心跳，那边将认为该storage server已

经下线。所以本参数值必须大于storage server配置的心跳时间间隔。通常配置为storage server心跳时间间隔的2倍或3倍。

# thread stack size, should > 512KB
# default value is 1MB
thread_stack_size=1MB
# 线程栈的大小。FastDFS server端采用了线程方式。更正一下，tracker server线程栈不应小于64KB，不是512KB。
# 线程栈越大，一个线程占用的系统资源就越多。如果要启动更多的线程（max_connections），可以适当降低本参数值。

# auto adjust when the ip address of the storage server changed
# default value is true
storage_ip_changed_auto_adjust=true
# 这个参数控制当storage server IP地址改变时，集群是否自动调整。注：只有在storage server进程重启时才完成自动调整。

# 以下是关于http的设置了默认编译是不生效的要求更改 #WITH_HTTPD=1 将注释#去掉再编译
# 关于http的应用说实话不是很了解没有见到相关说明 ,望版主可以完善一下以下是字面解释了
#HTTP settings
http.disabled=false # HTTP服务是否不生效
http.server_port=8080 # HTTP服务端口

#use "#include" directive to include http other settiongs
##include http.conf # 如果加载http.conf的配置文件去掉第一个#

哈哈完成了一个下面是 storage.conf

# is this config file disabled
# false for enabled
# true for disabled
disabled=false
#同上文了就不多说了

# the name of the group this storage server belongs to
group_name=group1
# 指定此 storage server 所在组(卷)

# bind an address of this host
# empty for bind all addresses of this host
bind_addr=
# 同上文

# if bind an address of this host when connect to other servers
# (this storage server as a client)
# true for binding the address configed by above parameter: "bind_addr"
# false for binding any address of this host
client_bind=true
# bind_addr通常是针对server的。当指定bind_addr时，本参数才有效。
# 本storage server作为client连接其他服务器（如tracker server、其他storage server），是否绑定bind_addr。

# the storage server port
port=23000
# storage server服务端口

# network timeout in seconds
network_timeout=60
# storage server 网络超时时间，单位为秒。发送或接收数据时，如果在超时时间后还不能发送或接收数据，则本次网络通信失败。

# heart beat interval in seconds
heart_beat_interval=30
# 心跳间隔时间，单位为秒 (这里是指主动向tracker server 发送心跳)

# disk usage report interval in seconds
stat_report_interval=60
# storage server向tracker server报告磁盘剩余空间的时间间隔，单位为秒。

# the base path to store data and log files
base_path=/home/yuqing/fastdfs
# base_path 目录地址,根目录必须存在子目录会自动生成 (注 :这里不是上传的文件存放的地址,之前是的,在某个版本后更改了)
# 目录结构因为版主没有更新到论谈上这里就不发了大家可以看一下置顶贴:

# max concurrent connections server supported
# max_connections worker threads start when this service startup
max_connections=256
# 同上文

# when no entry to sync, try read binlog again after X milliseconds
# 0 for try again immediately (not need to wait)
sync_wait_msec=200
# 同步文件时，如果从binlog中没有读到要同步的文件，休眠N毫秒后重新读取。0表示不休眠，立即再次尝试读取。

# after sync a file, usleep milliseconds
# 0 for sync successively (never call usleep)
sync_interval=0
# 同步上一个文件后，再同步下一个文件的时间间隔，单位为毫秒，0表示不休眠，直接同步下一个文件。

# sync start time of a day, time format: Hour:Minute
# Hour from 0 to 23, Minute from 0 to 59
sync_start_time=00:00

# sync end time of a day, time format: Hour:Minute
# Hour from 0 to 23, Minute from 0 to 59
sync_end_time=23:59
# 上面二个一起解释。允许系统同步的时间段 (默认是全天) 。一般用于避免高峰同步产生一些问题而设定，相信sa都会明白

# path(disk or mount point) count, default value is 1
store_path_count=1
# 存放文件时storage server支持多个路径（例如磁盘）。这里配置存放文件的基路径数目，通常只配一个目录。

# store_path#, based 0, if store_path0 not exists, it's value is base_path
# the paths must be exist
store_path0=/home/yuqing/fastdfs
#store_path1=/home/yuqing/fastdfs2
# 逐一配置store_path个路径，索引号基于0。注意配置方法后面有0,1,2 ......，需要配置0到store_path - 1。
# 如果不配置base_path0，那边它就和base_path对应的路径一样。

# subdir_count * subdir_count directories will be auto created under each
# store_path (disk), value can be 1 to 256, default value is 256
subdir_count_per_path=256
# FastDFS存储文件时，采用了两级目录。这里配置存放文件的目录个数 (系统的存储机制,大家看看文件存储的目录就知道了)
# 如果本参数只为N（如：256），那么storage server在初次运行时，会自动创建 N * N 个存放文件的子目录。

# tracker_server can ocur more than once, and tracker_server format is
# "host:port", host can be hostname or ip address
tracker_server=10.62.164.84:22122
tracker_server=10.62.245.170:22122
# tracker_server 的列表要写端口的哦 (再次提醒是主动连接tracker_server )
# 有多个tracker server时，每个tracker server写一行

#unix group name to run this program,
#not set (empty) means run by the group of current user
run_by_group=
# 同上文了

#unix username to run this program,
#not set (empty) means run by current user
run_by_user=
# 同上文了 (提醒注意权限如果和 webserver不搭可以会产生错误哦)

# allow_hosts can ocur more than once, host can be hostname or ip address,
# "*" means match all ip addresses, can use range like this: 10.0.1.[1-15,20] or
# host[01-08,20-25].domain.com, for example:
# allow_hosts=10.0.1.[1-15,20]
# allow_hosts=host[01-08,20-25].domain.com
allow_hosts=*
# 允许连接本storage server的IP地址列表（不包括自带HTTP服务的所有连接）
# 可以配置多行，每行都会起作用

# the mode of the files distributed to the data path
# 0: round robin(default)
# 1: random, distributted by hash code
file_distribute_path_mode=0
# 文件在data目录下分散存储策略。
# 0: 轮流存放，在一个目录下存储设置的文件数后（参数file_distribute_rotate_count中设置文件数），使用下一个目录进行存储。
# 1: 随机存储，根据文件名对应的hash code来分散存储。

# valid when file_distribute_to_path is set to 0 (round robin),
# when the written file count reaches this number, then rotate to next path
# default value is 100
file_distribute_rotate_count=100
# 当上面的参数file_distribute_path_mode配置为0（轮流存放方式）时，本参数有效。
# 当一个目录下的文件存放的文件数达到本参数值时，后续上传的文件存储到下一个目录中。

# call fsync to disk when write big file
# 0: never call fsync
# other: call fsync when written bytes >= this bytes
# default value is 0 (never call fsync)
fsync_after_written_bytes=0
# 当写入大文件时，每写入N个字节，调用一次系统函数fsync将内容强行同步到硬盘。0表示从不调用fsync

# sync log buff to disk every interval seconds
# default value is 10 seconds
sync_log_buff_interval=10
# 同步或刷新日志信息到硬盘的时间间隔，单位为秒
# 注意：storage server 的日志信息不是时时写硬盘的，而是先写内存。

# sync binlog buff / cache to disk every interval seconds
# this parameter is valid when write_to_binlog set to 1
# default value is 60 seconds
sync_binlog_buff_interval=60
# 同步binglog（更新操作日志）到硬盘的时间间隔，单位为秒

# thread stack size, should > 512KB
# default value is 1MB
thread_stack_size=1MB
# 线程栈的大小。FastDFS server端采用了线程方式。storage server线程栈不应1MB。
# 线程栈越大，一个线程占用的系统资源就越多。如果要启动更多的线程（max_connections），可以适当降低本参数值。

# the priority as a source server for uploading file.
# the lower this value, the higher its uploading priority.
# default value is 10
upload_priority=10
# 本storage server作为源服务器，上传文件的优先级，可以为负数。值越小，优先级越高。这里就和 tracker.conf 中store_server= 2时的配置相对应了

# if check file duplicate, when set to true, use FastDHT to store file indexes
# 1 or yes: need check
# 0 or no: do not check
# default value is 0
check_file_duplicate=0
# 是否检测上传文件已经存在。如果已经存在，则不存在文件内容，建立一个符号链接以节省磁盘空间。
# 这个应用要配合FastDHT 使用，所以打开前要先安装FastDHT
# 1或yes 是检测，0或no 是不检测

# namespace for storing file indexes (key-value pairs)
# this item must be set when check_file_duplicate is true / on
key_namespace=FastDFS
# 当上个参数设定为1 或 yes时 (true/on也是可以的) ，在FastDHT中的命名空间。

# set keep_alive to 1 to enable persistent connection with FastDHT servers
# default value is 0 (short connection)
keep_alive=0
# 与FastDHT servers 的连接方式 (是否为持久连接) ，默认是0（短连接方式）。可以考虑使用长连接，这要看FastDHT server的连接数是否够用。

# 下面是关于FastDHT servers 的设定需要对FastDHT servers 有所了解,这里只说字面意思了
# you can use "#include filename" (not include double quotes) directive to
# load FastDHT server list, when the filename is a relative path such as
# pure filename, the base path is the base path of current/this config file.
# must set FastDHT server list when check_file_duplicate is true / on
# please see INSTALL of FastDHT for detail
##include /home/yuqing/fastdht/conf/fdht_servers.conf
# 可以通过 #include filename 方式来加载 FastDHT servers 的配置，装上FastDHT就知道该如何配置啦。
# 同样要求 check_file_duplicate=1 时才有用，不然系统会忽略
# fdht_servers.conf 记载的是 FastDHT servers 列表

下面是http的配置了。如果系统较大，这个服务有可能支持不了，可以自行换一个webserver，我喜欢lighttpd，当然ng也很好了。具体不说明了。相应这一块的说明大家都懂，不

明白见上文。
#HTTP settings
http.disabled=false
http.server_port=8888
http.trunk_size=256KB
# http.trunk_size表示读取文件内容的buffer大小（一次读取的文件内容大小），也就是回复给HTTP client的块大小。

#use "#include" directive to include HTTP other settiongs
##include http.conf

阅读(1639) | 评论(0) | 转发(0) |

上一篇：pdflush 进程详解

下一篇：查看进程所占fd数和修改系统配置的方法（转）

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6