首页　| 　博文目录　| 　关于我

博客访问： 554174
博文数量： 140
博客积分： 10
博客等级：民兵
技术积分： 650
用户组：普通用户
注册时间： 2012-12-11 19:00

文章分类

全部博文（140）

libevent（1）
cmake（2）
Linux内核之TCP/I（2）
杂项（6）
网络编程（0）
Linux内核学习笔（6）
Python学习笔记（31）
Linux内核之基础（1）
Linux内核源代码（28）

iptables（4）

sk_buff结构（2）

IP层数据包处理过（5）

nf_queue运行原理（7）
C/C++函数之网络（27）
Linux基础之文件（7）
Linux内核之网络（13）
Linux基础之命令（16）
未分配的博文（0）

文章存档

2015年（5）

2014年（135）

我的朋友

用户态发送数据包流程

上一节我们分析了数据包是如何被用户态接受并处理的，本节介绍用户态是如何向内核发送数据的。首先让我们看一下，内核和用户态可以发送的消息类型。

enum nfqnl_msg_types {

NFQNL_MSG_PACKET, /* packet from kernel to userspace */

NFQNL_MSG_VERDICT, /* verdict from userspace to kernel */

NFQNL_MSG_CONFIG, /* connect to a particular queue */

NFQNL_MSG_VERDICT_BATCH, /* batchv（批量） from userspace to kernel */

NFQNL_MSG_MAX

};

NFQNL_MSG_PACKET：内核向用户发送数据包时，填充该值。注意是数据包

NFQNL_MSG_VERDICT ：用户向内核发送verdict信息时，填充该值。一般是对数据包的修改意见。

NFQNL_MSG_CONFIG ：一般是用户向内核下达的一些命令或者说是配置信息

注意：消息类型信息被填充到nlmsghdr结构体的nl_type的低八位，高八位用于表示该数据包是属于netfilter netlink的哪个子系统。

填充头部

NetFilter Netlink 下有两个头部（跟TCP/IP很像），最外边是netlink的头部，接下来是netfilter 的头部；我们统称为头部。

struct nlmsghdr {

__u32 nlmsg_len; /* Length of message including header */

__u16 nlmsg_type; /* Message content，由上层的协议自己定义。nlh->nlmsg_type包含了两部分信息，高八位表示Netlink Netfilter子系统的值, 低八位表示消息的类型。*/

__u16 nlmsg_flags; /* 表示该message是单播/多播，请求/应答等*/

/* Additional flags, 它的值与libipq的一致。由

#define NLM_F_REQUEST 1 /* It is request message. */

#define NLM_F_MULTI 2 /* Multipart message, terminated by NLMSG_DONE */

#define NLM_F_ACK 4 /* Reply with ack, with zero or error code */

#define NLM_F_ECHO 8 /* Echo this request */

#define NLM_F_DUMP_INTR 16 /* Dump was inconsistent due to sequence change */

__u32 nlmsg_seq; /* Sequence number ，先不考虑*/

__u32 nlmsg_pid; /* Sending process port ID，目的地址 */

};

struct nfgenmsg {

__u8 nfgen_family; /* AF_xxx,到以后再详细分析它的作用. */

__u8 version; /* nfnetlink version */

__be16 res_id; /* resource id, queue_num的值，该结构中唯一重要的信息 */

};

填充Netlink Netfilter头部函数是nfnl_fill_hdr，以下是nfnl_fill_hdr的具体实现过程。

/**

* nfnl_fill_hdr - fill in netlink and nfnetlink header

* @nfnlh: nfnetlink handle

* @nlh: netlink message to be filled in

* @len: length of _payload_ bytes (not including nfgenmsg)

* @family: AF_INET / ...

* @res_id: resource id

* @msg_type: nfnetlink message type (without subsystem)

* @msg_flags: netlink message flags

* This function sets up appropiately the nfnetlink header. See that the

* pointer to the netlink message passed must point to a memory region of

* at least the size of struct nlmsghdr + struct nfgenmsg.

void nfnl_fill_hdr( struct nfnl_subsys_handle *ssh, struct nlmsghdr *nlh, unsigned int len,

u_int8_t family, u_int16_t res_id, u_int16_t msg_type,u_int16_t msg_flags)

{

struct nfgenmsg *nfg = (void *)nlh + sizeof(*nlh);

填充nlmsghdr.

nlh->nlmsg_len = NLMSG_LENGTH(len+sizeof(*nfg));

nlh->nlmsg_type包含了两部分信息，高八位表示Netlink Netfilter子系统的值

低八位表示消息的类型，是枚举nfqnl_msg_types的成员。

nlh->nlmsg_type = (ssh->subsys_id<<8)|msg_type;

/*在绑定协议时，传递了NLM_F_REQUEST 和 NLM_F_ACK, 难道还期待内核的回应信息不成，是的*/

nlh->nlmsg_flags = msg_flags;

nlh->nlmsg_pid = 0; //内核就填充0

if (ssh->nfnlh->flags & NFNL_F_SEQTRACK_ENABLED) {

nlh->nlmsg_seq = ++ssh->nfnlh->seq;

/* kernel uses sequence number zero for events */

if (!ssh->nfnlh->seq)

nlh->nlmsg_seq = ssh->nfnlh->seq = time(NULL);

} else {

/* unset sequence number, ignore it */

nlh->nlmsg_seq = 0; //所以nlmsg_seq可先不考虑

}

nfg->nfgen_family = family; /*地址绑定中给了一个AF_UNSPEC*/

nfg->version = NFNETLINK_V0;

nfg->res_id = htons(res_id); /*重要的就是这个了. 但是在地址绑定中给了个0，也合理。不能说是合理，只是在地址绑定时，是不分queue_num的。所以你赋的值被内核无视*/

}

填充数据

用户和内核之间发送的message的载荷部分主要由：（nfattr + 结构体）* 组成。要填充数据，先来认识一下struct nfattr 和可利用的结构体。

struct nfattr {

__u16 nfa_len; /*struct nfattr + 结构体的大小*/

/*当发送配置消息时，也就是说消息类型为NFQNL_MSG_PACKET，它能取的值是:

enum nfqnl_attr_config {

NFQA_CFG_UNSPEC,

NFQA_CFG_CMD, /* 后面跟结构体nfqnl_msg_config_cmd */

NFQA_CFG_PARAMS, /* 后面跟nfqnl_msg_config_params */

NFQA_CFG_QUEUE_MAXLEN, /* __u32 ，设置内核态queue的长度*/

__NFQA_CFG_MAX

};

当为verdict时，我们再分析消息类型就变为了NFQNL_MSG_VERDICT。它的取值是：

enum nfqnl_attr_type {

NFQA_UNSPEC,

NFQA_PACKET_HDR,

//数据包的ID值

NFQA_VERDICT_HDR, /* nfqnl_msg_verdict_hrd */

NFQA_MARK, /* __u32 nfmark */

NFQA_TIMESTAMP, /* nfqnl_msg_packet_timestamp */

NFQA_IFINDEX_INDEV, /* __u32 ifindex */

NFQA_IFINDEX_OUTDEV, /* __u32 ifindex */

NFQA_IFINDEX_PHYSINDEV, /* __u32 ifindex */

NFQA_IFINDEX_PHYSOUTDEV, /* __u32 ifindex */

NFQA_HWADDR, /* nfqnl_msg_packet_hw */

NFQA_PAYLOAD, /* opaque data payload */

__NFQA_MAX

};

注意：这些值不是都能取的，之所以不能取的原因是，即便是取值了。内核也没有处理这部分的代码与之对应，造成浪费内存。不过我们倒是可以试着扩展它的功能。

它们每个元素所对应的结构体已经都定义好了，这里就不一一介绍了。后面代码中用到了的话，再进行分析。

__u16 nfa_type;

};

struct nfqnl_msg_config_cmd {

__u8 command; /* 可传递枚举类型nfqnl_msg_config_cmds, 用来绑定协议和回调函数用 */

__u8 _pad;

__be16 pf; /* AF_xxx for PF_[UN]BIND，针对的协议类型，一般为AF_INET */

};

struct nfqnl_msg_config_params {

__be32 copy_range;

/*指定希望内核发送的单个netlink message 的大小*/

__u8 copy_mode; /* enum nfqnl_config_mode */

} __attribute__ ((packed));

enum nfqnl_config_mode {

NFQNL_COPY_NONE,

//什么都不拷贝

NFQNL_COPY_META, //元数据，只拷贝netlinkmsg

NFQNL_COPY_PACKET,

//整个数据包

};

接下来以struct nfattr的nfa_type两中类型为对象进行一下比较。

用户态向内核发送NFQNL_MSG_CONFIG类型的消息

用户调用：nfq_unbind_pf和nfq_bind_pf函数，它们的作用是将nfq_handle与具体的协议族绑定到一起，即只有是该协议族数据包，才会

将该数据包递交给上层应用程序，一般是AF_INET。那么接下来就分析以下，如何将该配置信息下发下去的。

首先分析函数__build_send_cfg_msg(struct nfq_handle *h, u_int8_t command, u_int16_t queuenum, u_int16_t pf)

第一个参数和最后一个参数不必说，command 的值是取以下枚举中的一个，queuenum = 0（先记住，这个设置针对的是所有queue_num）.

enum nfqnl_msg_config_cmds {

NFQNL_CFG_CMD_NONE,

NFQNL_CFG_CMD_BIND,

NFQNL_CFG_CMD_UNBIND,

NFQNL_CFG_CMD_PF_BIND,

NFQNL_CFG_CMD_PF_UNBIND,

};

static int __build_send_cfg_msg(struct nfq_handle *h, u_int8_t command,

u_int16_t queuenum, u_int16_t pf)

{

/*在Netlink Netfilter 中，Header由两部分组成，(struct nlmsghdr + struct nfgenmsg)，两部分的长度等于NFNL_HEADER_LEN；数据部分则是(struct nfattr + struct nfqnl_msg_config_cmd) 。内核想用户态发送的是一个struct nfattr * [] 类型的指针，即可以发送多个nfattr+特定结构体。*/

union {

char buf[NFNL_HEADER_LEN

+NFA_LENGTH(sizeof(struct nfqnl_msg_config_cmd))];

struct nlmsghdr nmh;

} u;

/*填充Netlink Netfilter Header */

nfnl_fill_hdr(h->nfnlssh, &u.nmh, 0, AF_UNSPEC, queuenum,

NFQNL_MSG_CONFIG, NLM_F_REQUEST|NLM_F_ACK);

/*以下是填充数据部分*/

struct nfqnl_msg_config_cmd cmd;

cmd.command = command;

cmd.pf = htons(pf);

nfnl_addattr_l(&u.nmh, sizeof(u), NFQA_CFG_CMD, &cmd, sizeof(cmd)); //向后面增加的是属性

return nfnl_query(h->nfnlh, &u.nmh);

}

/**

* nfnl_addattr_l - Add variable length attribute to nlmsghdr

* @n: netlink message header to which attribute is to be added. nlmsghdr

* @maxlen: maximum length of netlink message header 整个缓冲区的长度

* @type: type of new attribute //NFQA_CFG_CMD,

* @data: content of new attribute, //__build_send_cfg_msg 传递了struct nfqnl_msg_config_cmd*

* @len: attribute length, // __build_send_cfg_msg 传递了sizeof(cmd).

int nfnl_addattr_l(struct nlmsghdr *n, int maxlen, int type, const void *data,

int alen)

{

int len = NFA_LENGTH(alen);

struct nfattr *nfa;

if ((NLMSG_ALIGN(n->nlmsg_len) + len) > maxlen) {

errno = ENOSPC;

return -1;

}

nfa = NLMSG_TAIL(n); //跳到nlmsg的末尾(((void *) (nlh)) + NLMSG_ALIGN((nlh)->nlmsg_len))

nfa->nfa_type = type; //enum nfqnl_attr_config 枚举类型中的一个值

nfa->nfa_len = len; //struct nfattr的大小 + 后面跟的结构体的大小

memcpy(NFA_DATA(nfa), data, alen); //将结构体放到缓冲区中

n->nlmsg_len = (NLMSG_ALIGN(n->nlmsg_len) + NFA_ALIGN(len)); //更新n->nlmsg_len的值

return 0;

}

接下来是发送消息部分

* nfnl_query - request/response communication challenge

* @h: nfnetlink handler

* @nlh: nfnetlink message to be sent

* This function sends a nfnetlink message to a certain subsystem and

* receives the response messages associated, such messages are passed to

* the callback registered via register_callback(). （确实要等待接受内核的确认信息，有回调函数吗？）

* On success, 0 is returned. On error, a negative is returned. If your

* does not want to listen to events anymore, then your callback must

* return NFNL_CB_STOP.

int nfnl_query(struct nfnl_handle *h, struct nlmsghdr *nlh)

{

assert(h);

assert(nlh);

if (nfnl_send(h, nlh) == -1) //调用sendto先发送

return -1;

return nfnl_catch(h); //看来nfnl_catch函数就是对recvfrom的封装。

}

/**

* nfnl_catch - get responses from the nfnetlink system and process them

* @h: nfnetlink handler

* This function handles the data received from the nfnetlink system.

* For example, events generated by one of the subsystems.

（确实是接受内核态数据的接口）

* The message is passed to the callback registered via callback_register(). *

* On success, 0 is returned. On error, a -1 is returned. If you do not

* want to listen to events anymore, then your callback must return

* NFNL_CB_STOP.

int nfnl_catch(struct nfnl_handle *h)

{

int ret;

assert(h);

while (1) {

unsigned char buf[h->rcv_buffer_size] //注意这里buff的长度

__attribute__ ((aligned));

//recvfrom在接受消息的时候事，只认netlink消息。

ret=nfnl_recv(h, buf, sizeof(buf));//调用recvfrom，并对收上来的数据进行一些合法性判断

if (ret == -1) {

/* interrupted syscall must retry */

if (errno == EINTR)

continue;

break;

}

ret = nfnl_process(h, buf, ret); //处理数据包, 数据包中包含了多个nlmsg

if (ret <= NFNL_CB_STOP)

break;

}

return ret;

}

/**

* nfnl_process - process data coming from a nfnetlink system

* @h : nfnetlink handler

* @buf: buffer that contains the netlink message

* @len: size of the data contained in the buffer (not the buffer size)

* This function processes all the nfnetlink messages contained inside a

* buffer. It performs the appropiate sanity checks and passes the message

* to a certain handler that is registered via register_callback().

* On success, NFNL_CB_STOP is returned if the data processing has finished.

* If a value NFNL_CB_CONTINUE is returned, then there is more data to

* process. On error, NFNL_CB_CONTINUE is returned and errno is set to the

* appropiate value.

* In case that the callback returns NFNL_CB_FAILURE, errno may be set by

* the library client. If your callback decides not to process data anymore

* for any reason, then it must return NFNL_CB_STOP. Otherwise, if the

* callback continues the processing NFNL_CB_CONTINUE is returned.

int nfnl_process(struct nfnl_handle *h, const unsigned char *buf, size_t len)

{

int ret = 0;

struct nlmsghdr *nlh = (struct nlmsghdr *)buf;

assert(h);

assert(buf);

assert(len > 0);

/* check for out of sequence message, 先不考虑 */

if (nlh->nlmsg_seq && nlh->nlmsg_seq != h->seq) {

errno = EILSEQ;

return -1;

}

while (len >= NLMSG_SPACE(0) && NLMSG_OK(nlh, len)) {

ret = nfnl_step(h, nlh);

if (ret <= NFNL_CB_STOP)

break;

nlh = NLMSG_NEXT(nlh, len); //应为存在多个nlmsg

}

return ret;

}

static int nfnl_step(struct nfnl_handle *h, struct nlmsghdr *nlh)

{

struct nfnl_subsys_handle *ssh;

u_int8_t type = NFNL_MSG_TYPE(nlh->nlmsg_type); //低8位

u_int8_t subsys_id = NFNL_SUBSYS_ID(nlh->nlmsg_type); //高8位

/* Is this an error message */

if (nfnl_is_error(h, nlh)) {

/* This is an ACK */

if (errno == 0)

return 0;

/* This an error message */

return -1;

}

/* nfnetlink sanity checks: check for nfgenmsg size */

if (nlh->nlmsg_len < NLMSG_SPACE(sizeof(struct nfgenmsg))) {

errno = ENOSPC;

return -1;

}

if (subsys_id > NFNL_MAX_SUBSYS) {

errno = ENOENT;

return -1;

}

//目前为止只有NF_QUEUE子系统

ssh = &h->subsys[subsys_id]; //高8为用来寻找子系统

if (!ssh) {

errno = ENOENT;

return -1;

}

if (type >= ssh->cb_count) {//低8位用来寻找回调函数，或者说处理该数据包的回调函数.

errno = ENOENT;

return -1;

}

//注意NF_QUEUE只注册了一个PACKET类型的回调函数。

if (ssh->cb[type].attr_count) {

int err;

struct nfattr *tb[ssh->cb[type].attr_count];

struct nfattr *attr = NFM_NFA(NLMSG_DATA(nlh));

int min_len = NLMSG_SPACE(sizeof(struct nfgenmsg));

int len = nlh->nlmsg_len - NLMSG_ALIGN(min_len);

err = nfnl_parse_attr(tb, ssh->cb[type].attr_count, attr, len);

if (err == -1)

return -1;

if (ssh->cb[type].call) {

* On error, the callback returns NFNL_CB_FAILURE and

* errno must be explicitely set. On success,

* NFNL_CB_STOP is returned and we're done, otherwise

* NFNL_CB_CONTINUE means that we want to continue

* data processing.

return ssh->cb[type].call(nlh, tb, ssh->cb[type].data);

}

/* no callback set, continue data processing */

return 1;

}

调用该函数nfnl_callback_register，注册一个统筹调度的函数，然后即可分析内核的返回值了.

用户态向内核发送NFQNL_MSG_VERDICT类型的消息

注意发送verdict消息，一般是在原先的消息的基础上进行修改的。函数调用链nfq_set_verdict → __set_verdict → nfnl_build_nfa_iovec → nfnl_sendiov → nfnl_sendmsg。主要看下面这个函数。

**************************************************************************************

参数解析

struct nfq_q_handle *qh ：表示处理该message的回调函数

u_int32_t id : 数据包的ID值，在nfattr数组中下标为NFQA_PACKET_HDR – 1中。用户一般调用nfq_get_msg_packet_hdr函数，即可获得数据包ID。

u_int32_t verdict ：数据包的处理意见，与HOOK点回调函数的返回值一样。

u_int32_t mark, int set_mark ：暂时不清楚，但是nfq_set_verdict 在调用该函数时，均传递了0值.

data_len , const unsigned char *data : 表示新的数据包（从网卡上抓到的包）的起始地址和长度.

enum nfqnl_msg_types type : 表示消息类型，这里为NFQNL_MSG_VERDICT

static int __set_verdict(struct nfq_q_handle *qh, u_int32_t id,

u_int32_t verdict, u_int32_t mark, int set_mark,

u_int32_t data_len, const unsigned char *data,

enum nfqnl_msg_types type)

{

***********************************************************************************

Netlink Message 消息格式如下

struct nfqnl_msg_verdict_hdr vh;

union {

char buf[NFNL_HEADER_LEN

+NFA_LENGTH(sizeof(mark))

+NFA_LENGTH(sizeof(vh))];

struct nlmsghdr nmh;

} u;

struct iovec iov[3];

int nvecs;

/* This must be declared here (and not inside the data

* handling block) because the iovec points to this. */

struct nfattr data_attr;

memset(iov, 0, sizeof(iov));

*************************************************************************************

填充Netlink message header 和属性

vh.verdict = htonl(verdict);

vh.id = htonl(id);

nfnl_fill_hdr(qh->h->nfnlssh, &u.nmh, 0, AF_UNSPEC, qh->id,

//表示queue_num

type, NLM_F_REQUEST);

// flag = NLM_F_REQUEST

/* add verdict header 主要目的是传递消息的ID*/

nfnl_addattr_l(&u.nmh, sizeof(u), NFQA_VERDICT_HDR, &vh, sizeof(vh));

if (set_mark)

nfnl_addattr32(&u.nmh, sizeof(u), NFQA_MARK, mark);

*************************************************************************************

组织更改的数据包

iov[0].iov_base = &u.nmh;

iov[0].iov_len = NLMSG_TAIL(&u.nmh) - (void *)&u.nmh;

//这是netlink message内容

nvecs = 1;

if (data_len) {

* iov[0]

* iov[1] 保存struct nfattr的base和length

* iov[2] 保存数据包的buffer和length。注意应该是从IP层开始

nfnl_build_nfa_iovec(&iov[1], &data_attr, NFQA_PAYLOAD,

data_len, (unsigned char *) data);

nvecs += 2;

/* Add the length of the appended data to the message

* header. The size of the attribute is given in the

* nfa_len field and is set in the nfnl_build_nfa_iovec()

* function. */

u.nmh.nlmsg_len += data_attr.nfa_len;

//更新

}

**************************************************************************************

发送出去数据包

它实际上调用的发送函数是sendmsg，函数原型是ssize_t sendmsg(int sockfd, const struct msghdr *msg, int flags)。所以 nfnl_sendiov主要的任务也是构造struct msghdr *msg,但是最主要的组成部分前面已经准备好了，所以nfnl_sendiov就是填充一下目的地址什么。

return nfnl_sendiov(qh->h->nfnlh, iov, nvecs, 0);

}

可见消息格式是：netlink message header + NFQA_VERDICT_HDR(数据包ID) + NFQA_PAYLOAD( 数据包内容)。

总结

nfq_unbind_pf、nfq_bind_pf和nfq_create_queue三个函数内部原理都一样，都调用了__build_send_cfg_msg函数。发送的数据类型也是enum nfqnl_msg_config_cmd类型. nfq_create_queue在qh_list后面又增加了一个节点，节点用于处理queue_num的内容。

阅读(11546) | 评论(1) | 转发(0) |

上一篇：NF_QUEUE原理分析3——从用户态程序开始分析

下一篇：NF_QUEUE原理分析5——从netlink_kernel_create开始

给主人留下些什么吧！~~

chaowq2019-08-20 15:25:57

楼主，你好！关于NFQNL_MSG_VERDICT中的mark这个变量，是iptables对数据包的标记变量。 nfq_set_verdict 在调用该函数时，均传递了0值。是因为 nfq_set_verdict是不传标记量的判决函数。nfq_set_verdict2函数才会传标记。我想请教的是，现在我想弄清楚关于iptables规则的 -j nfqueue --queue-bypass 这个bypass旁路选项如果选了之后会不会在数据包的mark里体现。之所以觉得会是在mark里体现，是因为Suricata在启用bypass选项时只做了，将bypass mark放到mark里后，用nfq_set_verdict2传回判决了。但是令人疑惑的是，这个bypass选项解释是--queue-bypass：默认情况下，如果没有用户空间程序正在监听NFQUEUE，那么将要排队的所有数据包都被删除。使用此选项时，NFQUEUE规则将被忽略代替。数据包将移动到下一个规则。这个解释看上去-bypass应该是针对链Queue而不是针对数据包的？为什么会放在数据包的标记mark里呢

回复 | 举报

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6