inotify是什么?用它能干些什么?这个问题我们还是首先从内核的文档开始吧--
Documentation/filesystems/inotify.txt(说点题外话,内核文档虽然是没有任何格式的txt文档,给人的感觉却非常
好,而且作者总是以最精炼的语言清楚地描述了相关的内容): a powerful yet simple file change
notification system,通俗点说它是一个内核用于通知用户空间客户程序文件系统变化的系统,并且它是powerful yet
simple的。
inotify的用户接口原型主要有以下3个:
初始化:int inotify_init(void);
添加监视对象:int inotify_add_watch(int fd, const char *path, uint32_t mask);
删除监视对象:int inotify_rm_watch(int fd, uint32_t wd);
内核文档里对于inotify_rm_watch的用户接口原型描述似乎有点问题,第2个参数写成了mask,实际上它应该是inotify_add_watch返回的watch descriptor。
根据文档描述可以看出,inotify使用大概分为以下几个步骤:
1、int fd = inotify_init(); 初始化inotify实例。
2、
int wd = inotify_add_watch(fd, path, mask);
添加监视对象,这里的mask是一个或多个事件的位标记集合,具体的事件定义可参考(linux/inotify.h)或者
(sys/inotify.h),前者为linux内核头文件,后者为glibc提供的头文件。
3、size_t len = read(fd, buf, BUF_LEN); 读取事件数据,buf应是一个指向inotify_event结构数组的指针。不过要注意的是inotify_event的name成员长度是可变的,这个问题后面再解释。
4、已经存在的监视对象可通过int ret = inotify_rm_watch(fd, wd);来删除。
下面我们来看一个示例:
#include
#include
#include
#include
#include
static void
_inotify_event_handler(struct inotify_event *event)
{
printf("event->mask: 0x%08x\n", event->mask);
printf("event->name: %s\n", event->name);
}
int
main(int argc, char **argv)
{
if (argc != 2) {
printf("Usage: %s \n", argv[0]);
return -1;
}
unsigned char buf[1024] = {0};
struct inotify_event *event = {0};
int fd = inotify_init();
int wd = inotify_add_watch(fd, argv[1], IN_ALL_EVENTS);
for (;;) {
fd_set fds;
FD_ZERO(&fds);
FD_SET(fd, &fds);
if (select(fd + 1, &fds, NULL, NULL, NULL) > 0) {
int len, index = 0;
while (((len = read(fd, &buf, sizeof(buf))) < 0) && (errno == EINTR));
while (index < len) {
event = (struct inotify_event *)(buf + index);
_inotify_event_handler(event);
index += sizeof(struct inotify_event) + event->len;
}
}
}
inotify_rm_watch(fd, wd);
return 0;
}
由
以上代码可以看出inotify_init返回的file
descriptor是可以用select或者poll进行I/O复用的。由于inotify_event长度是可变的,因此在读取
inotify_event数组内容的时候需要动态计算下一个事件数据的偏移量(index += sizeof(struct
inotify_event) + event->len),len成员即name成员的长度。
在实际测试过程中,通过运行以上的
测试程序监视一个文件,还遇到过两个奇怪的现象:用vim编辑那个被监视的文件,修改并保存,触发的是IN_DELETE_SELF和
IN_MOVE_SELF事件而不是我们所期望的IN_MODIFY事件;再次修改并保存的时候不再有任何事件发生。希望能给看官一个教训,其实这是由于
vim的工作机制引起的,vim会先将源文件复制为另一个文件,然后在另一文件基础上编辑(一般后缀名为swp),保存的时候再将这个文件覆盖源文件,因
此会出现上述的第一个现象,第二个现象是因为原来的文件已经被后来的新文件代替,因此监视对象所监视的文件已经不存在了,所以自然不会产生任何事件。
另外,内核文档第四部分介绍了inotify的背景以及设计思路,不可不看:
Q: What is the design decision behind not tying the watch to the open fd of
the watched object?
A: Watches are associated with an open inotify device, not an open file.
This solves the primary problem with dnotify: keeping the file open pins
the file and thus, worse, pins the mount. Dnotify is therefore infeasible
for use on a desktop system with removable media as the media cannot be
unmounted. Watching a file should not require that it be open.
Q: What is the design decision behind using an-fd-per-instance as opposed to
an fd-per-watch?
A: An fd-per-watch quickly consumes more file descriptors than are allowed,
more fd's than are feasible to manage, and more fd's than are optimally
select()-able. Yes, root can bump the per-process fd limit and yes, users
can use epoll, but requiring both is a silly and extraneous requirement.
A watch consumes less memory than an open file, separating the number
spaces is thus sensible. The current design is what user-space developers
want: Users initialize inotify, once, and add n watches, requiring but one
fd and no twiddling with fd limits. Initializing an inotify instance two
thousand times is silly. If we can implement user-space's preferences
cleanly--and we can, the idr layer makes stuff like this trivial--then we
should.
There are other good arguments. With a single fd, there is a single
item to block on, which is mapped to a single queue of events. The single
fd returns all watch events and also any potential out-of-band data. If
every fd was a separate watch,
- There would be no way to get event ordering. Events on file foo and
file bar would pop poll() on both fd's, but there would be no way to tell
which happened first. A single queue trivially gives you ordering. Such
ordering is crucial to existing applications such as Beagle. Imagine
"mv a b ; mv b a" events without ordering.
- We'd have to maintain n fd's and n internal queues with state,
versus just one. It is a lot messier in the kernel. A single, linear
queue is the data structure that makes sense.
- User-space developers prefer the current API. The Beagle guys, for
example, love it. Trust me, I asked. It is not a surprise: Who'd want
to manage and block on 1000 fd's via select?
- No way to get out of band data.
- 1024 is still too low. ;-)
When you talk about designing a file change notification system that
scales to 1000s of directories, juggling 1000s of fd's just does not seem
the right interface. It is too heavy.
Additionally, it _is_ possible to more than one instance and
juggle more than one queue and thus more than one associated fd. There
need not be a one-fd-per-process mapping; it is one-fd-per-queue and a
process can easily want more than one queue.
Q: Why the system call approach?
A: The poor user-space interface is the second biggest problem with dnotify.
Signals are a terrible, terrible interface for file notification. Or for
anything, for that matter. The ideal solution, from all perspectives, is a
file descriptor-based one that allows basic file I/O and poll/select.
Obtaining the fd and managing the watches could have been done either via a
device file or a family of new system calls. We decided to implement a
family of system calls because that is the preferred approach for new kernel
interfaces. The only real difference was whether we wanted to use open(2)
and ioctl(2) or a couple of new system calls. System calls beat ioctls.