路漫漫其修远兮，吾将上下而求索！追求原创！dreamice.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

dreamice

博客访问： 332903
博文数量： 34
博客积分： 860
博客等级：准尉
技术积分： 524
用户组：普通用户
注册时间： 2008-07-23 12:04

文章分类

全部博文（34）

文章存档

2011年（16）

2010年（7）

2009年（4）

2008年（7）

我的朋友

．前言

Netconsole是Linux2.6版内核的一个新的特性。它允许将本机的dmesg系统信息，通过网络的方式传送到另一台主机上。这样，就可以实现远程监控某台机子的kernel panic信息了。使用起来非常方便，也给开发人员调试内核提供了更加便捷的途径。

．实例配置

由于2.6版本的内核本身已经支持Netconsole，并以模块形式编译进了内核。所以，下面的介绍都是基于2.6版内核，且本身在内核编译的时候，已经选择了将Netconsole以模块进行编译的方式。如果内核本身并没有编译Netconsole这个模块，那就需要重新编译内核了（笔者暂时没有找到其他的解决方案）。

监控主机的配置

监控主机有两种配置方式：使用本身的syslogd和利用netcat这个工具。Syslogd使用了514这个特定的UDP端口，而netcat工具可以指定任意未被使用的UDP端口进行监视。我这里使用的是netcat的方式进行监视的。安装好netcat后，执行shell命令：

# netcat -l -p 30000 –u

这里使用了端口30000来进行接收，这个端口是任意制定的，只要不发生冲突即可。

`被监控主机的配置`

在需要被监控的主机上加载运行netconsole模块，由于netconsole模块需要许多其他模块的依赖，以及在加载时必须配置相关的端口，IP地址等信息，所以shell执行命令如下：

#modprobe netconsole

我们来分析一下这段加载命令。

Modprobe 这个模块加载命令是指在加载该模块时，同时加载其依赖的其它模块；

第一个@前的6666是本地端口，这个我们可以在源码中看到，是一个源代码中默认的本地端口；

10.14.0.225/eth7：这里指配本机的IP地址及网卡名称（通常我们可以用ifconfig –a来查看到）；

：三个位段分别是，远端监控端口，远端IP地址及远端机子的MAC地址。也就是我们在2.1节中指定的监控端口30000，以及其IP地址和MAC地址了。

实例测试

以上配置就算完成了，让我来看一个实例吧。在被监控的主机上运行一个简单的hello world模块，看看我们的printk信息是怎样被发送到监控主机上去的。

模块helloworld源码：

hello.c

1 /*hello.c*/

2 #include

3 #include

4 #include

5 MODULE_LICENSE("Dual BSD/GPL");

7 static int hello_init(void)

8 {

9 printk(KERN_ALERT "Hello World\n" );

10 return 0;

11 }

12 static void hello_exit(void)

13 {

14 printk(KERN_ALERT "Goodbye World\n" );

15 }

17 module_init(hello_init);

18 module_exit(hello_exit);

Makefile:

1 obj-m:=hello.o

2 KDIR:=/lib/modules/$(shell uname -r)/build

3 PWD:=$(shell pwd)

5 default:

6 $(MAKE) -C $(KDIR) M=$(PWD) modules

8 clean:

9 $(RM) *.o *.mod.c *.ko *.symvers

执行：

# make

# insmod hello.ko

在监控机的终端上，我们看到以下信息：

netconsole: network logging started

Hello World

第一条是我们先前加载netconsole模块时收到的，第二条是我们加载helloworld模块时收到的。

再运行rmmod hello

监控机终端上将看到：

netconsole: network logging started

Hello World

Goodbye World

OK，以上测试顺利完成，当然，我们也可以测试kernel panic的情况，例如操作一个内核的非法地址等。下面，我们开始来分析内核源码的netconsole的实现。

．Netconsole内核源码分析

几个重要的数据结构

3.1.1 struct console

Include/linux/console.h

* The interface for a console, or any other device that wants to capture

* console messages (printer driver?)

* If a console driver is marked CON_BOOT then it will be auto-unregistered

* when the first real console is registered. This is for early-printk drivers.

#define CON_PRINTBUFFER (1)

#define CON_CONSDEV (2) /* Last on the command line */

#define CON_ENABLED (4)

#define CON_BOOT (8)

#define CON_ANYTIME (16) /* Safe to call when cpu is offline */

struct console

{

char name[8];

void (*write)(struct console *, const char *, unsigned);

int (*read)(struct console *, char *, unsigned);

struct tty_driver *(*device)(struct console *, int *);

void (*unblank)(void);

int (*setup)(struct console *, char *);

short flags;

short index;

int cflag;

void *data;

struct console *next;

};

从上面的注释，我们看到，这个结构，是提供给那些需要捕捉console终端信息的设备的。我们捕获的dmesg的信息，实际上是用prink输出的，因此，我们必须构造这么一个结构，并实现其信息捕捉函数，即write函数。

3.1.2 struct netpoll

Include/linux/netpoll.h

struct netpoll {

struct net_device *dev; //指向实际的网卡

char dev_name[16], *name; //名字

void (*rx_hook)(struct netpoll *, int, char *, int); //钩子函数，这里并没有用到

void (*drop)(struct sk_buff *skb); //信息发送函数

u32 local_ip, remote_ip; //本地ip地址和远端ip地址

u16 local_port, remote_port;//本地端口和远端端口

unsigned char local_mac[6], remote_mac[6];//本地mac地址和远端mac地址

};

这是netconsole实现的一个关键数据结构。net_device成员指向了实际的网卡，我们正是通过网络进行监视获取dmesg信息，因此就必须通过实际的网卡来发送数据。实际上，这个数据结构包含了一个网络包发送的所有关键字段（除了有效数据）。

3.1.3 struct net_device

Include/linux/netdevice.h

struct net_device

{

* This is the first field of the "visible" part of this structure

* (i.e. as seen by users in the "Space.c" file). It is the name

* the interface.

char name[IFNAMSIZ];

/* device name hash chain */

struct hlist_node name_hlist;

……

#ifdef CONFIG_NETPOLL

struct netpoll_info *npinfo;

#endif

#ifdef CONFIG_NET_POLL_CONTROLLER

void (*poll_controller)(struct net_device *dev);

#endif

/* bridge stuff */

struct net_bridge_port *br_port;

#ifdef CONFIG_NET_DIVERT

/* this will get initialized at each interface type init routine */

struct divert_blk *divert;

#endif /* CONFIG_NET_DIVERT */

/* class/net/name entry */

struct class_device class_dev;

/* space for optional statistics and wireless sysfs groups */

struct attribute_group *sysfs_groups[3];

};

net_device结构包含了一个网络设备相关的所有信息，这里并没有全部列出其成员，其中用粉红色标记出来的，正是和netpoll相关的结构。从这里我们可以看到，如果要支持netconsole，那么就必须编译netpoll，而编译netpoll，就必须配置CONFIG_NETPOLL和CONFIG_NET_POLL_CONTROLLER这两个选项。

结构中，npinfo包含了与netpoll相关的一些重要信息。

3.1.4 struct netpoll_info

Include/linux/netpoll.h

struct netpoll_info {

spinlock_t poll_lock; //spinlock防止并发访问

int poll_owner; //所有者

int tries;//如果发送失败，指定了发送信息的次数

int rx_flags;

spinlock_t rx_lock;

struct netpoll *rx_np; /* netpoll that registered an rx_hook */

struct sk_buff_head arp_tx; /* list of arp requests to reply to */

};

在后面的实现函数分析中，将看到这个结构的详细用处。

实现分析

Drivers/net/netconsole.h

static char config[256]; //定义模块加载时指定参数选项的buf

module_param_string(netconsole, config, 256, 0); //定义模块参数

//模块的描述，说明了模块加载时，指定参数及选项的格式

MODULE_PARM_DESC(netconsole, " netconsole=[src-port]@[src-ip]/[dev],[tgt-port]@/[tgt-macaddr]\n");

3.2.1 Netconsole初始化

前面提到，如果要对console的信息进行捕获，必须要实现一个struct console。因此，我们首先来看一看如何注册一个console结构。（看源码的时候，很多地方我使用的是英文注释）

/* netconsole module initialization */

static int init_netconsole(void)

{

/* The config is the module parameter, which contains the capture options */

//这个config就是上面的模块参数，在加载模块的时候它已经保存了必要的配置信息

if(strlen(config))

option_setup(config); //解析模块参数

if(!configured) { //用户加载时，参数不对，配置失败

printk("netconsole: not configured, aborting\n");

return 0;

}

/* Setup the netpoll capture operations */

if(netpoll_setup(&np)) //初始化建立netpoll的配置，np是一个struct netpoll结构全局变量，//指定了netpoll的相关信息

return -EINVAL;

/* register netconsole that can get the printk info */

register_console(&netconsole); //注册netconsole

printk(KERN_INFO "netconsole: network logging started\n");

return 0;

}

我们首先看一看结构netconsole的定义，去解析对终端信息捕捉的具体操作：

/* Initialize the console capture... */

static struct console netconsole = {

.name = "netcon",

.flags = CON_ENABLED | CON_PRINTBUFFER,

.write = write_msg //这就是对终端信息的捕捉函数

};

来看一下option_setup:

static int configured = 0;

static int option_setup(char *opt)

{

/* parse the console capture options */

configured = !netpoll_parse_options(&np, opt);

return 1;

}

netpoll_parse_options()实现对模块参数的解析，并提取信息，保存到np结构中。这里不再详细解析该函数的实现，有兴趣可以阅读源代码。

下面分析netpoll_setup()函数的实现，在分析之前，先看一下np结构这个全局变量的定义：

static struct netpoll np = {

.name = "netconsole", //名字定义

.dev_name = "eth0", //默认的网络接口卡名称

.local_port = 6665, //默认的本地端口定义

.remote_port = 6666, //默认的远程监控端口定义

.remote_mac = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff}, //默认的远程监控机MAC地址，实际上是一个广播地址

.drop = netpoll_queue, //定义了netpoll在处理console信息的发送函数

};

int netpoll_setup(struct netpoll *np)

{

struct net_device *ndev = NULL;

struct in_device *in_dev;

struct netpoll_info *npinfo;

unsigned long flags;

//检查是否指定了网络接口卡，如果指定了，并查找系统中是否存在这样一块网卡

if (np->dev_name)

/* find the network interface card */

ndev = dev_get_by_name(np->dev_name);

if (!ndev) {

printk(KERN_ERR "%s: %s doesn't exist, aborting.\n",

np->name, np->dev_name);

return -1;

}

np->dev = ndev; //为np结构指定网络接口卡

if (!ndev->npinfo) { //这里就是准备初始化前面说到的net_device结构这个重要的npinfo成员了

npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL);

if (!npinfo)

goto release;

npinfo->rx_flags = 0;

npinfo->rx_np = NULL; //不指定hook函数

spin_lock_init(&npinfo->poll_lock); //初始化自旋锁

npinfo->poll_owner = -1; //暂时无引用

npinfo->tries = MAX_RETRIES; //设定如果发送失败的话，重复的最大次数

spin_lock_init(&npinfo->rx_lock);

skb_queue_head_init(&npinfo->arp_tx); //和arp相关的

} else

npinfo = ndev->npinfo;

if (!ndev->poll_controller) { //检查netdev是否设定了poll_controller，否则将不能支持netconsole。它保证了在不使能中断的情况下，就可以发送skb，而且它并不是在执行终端例程的情况下被执行的。

printk(KERN_ERR "%s: %s doesn't support polling, aborting.\n",

np->name, np->dev_name);

goto release;

}

if (!netif_running(ndev)) { //检查网卡是否启动

unsigned long atmost, atleast;

printk(KERN_INFO "%s: device %s not up yet, forcing it\n",

np->name, np->dev_name);

rtnl_lock();

if (dev_change_flags(ndev, ndev->flags | IFF_UP) < 0) {

printk(KERN_ERR "%s: failed to open %s\n",

np->name, np->dev_name);

rtnl_unlock();

goto release;

}

rtnl_unlock();

atleast = jiffies + HZ/10;

atmost = jiffies + 4*HZ;

while (!netif_carrier_ok(ndev)) {

if (time_after(jiffies, atmost)) {

printk(KERN_NOTICE

"%s: timeout waiting for carrier\n",

np->name);

break;

}

cond_resched();

}

/* If carrier appears to come up instantly, we don't

* trust it and pause so that we don't pump all our

* queued console messages into the bitbucket.

if (time_before(jiffies, atleast)) {

printk(KERN_NOTICE "%s: carrier detect appears"

" untrustworthy, waiting 4 seconds\n",

np->name);

msleep(4000);

}

/* initialize the local mac, that why we need not set the local mac for the options */

//初始化本地mac地址，在加载模块的时候，我们只是指定了监控机的mac地址，所以这里要进行本地mac地址的初始化。

if (is_zero_ether_addr(np->local_mac) && ndev->dev_addr)

memcpy(np->local_mac, ndev->dev_addr, 6);

if (!np->local_ip) {//检查是否指定了本地ip，如果没有的话，主动去获取

rcu_read_lock();

in_dev = __in_dev_get_rcu(ndev);

if (!in_dev || !in_dev->ifa_list) {

rcu_read_unlock();

printk(KERN_ERR "%s: no IP address for %s, aborting\n",

np->name, np->dev_name);

goto release;

}

np->local_ip = ntohl(in_dev->ifa_list->ifa_local);

rcu_read_unlock();

printk(KERN_INFO "%s: local IP %d.%d.%d.%d\n",

np->name, HIPQUAD(np->local_ip));

}

if (np->rx_hook) {

spin_lock_irqsave(&npinfo->rx_lock, flags);

npinfo->rx_flags |= NETPOLL_RX_ENABLED;

npinfo->rx_np = np;

spin_unlock_irqrestore(&npinfo->rx_lock, flags);

}

/* fill up the skb queue */

refill_skbs();

/* last thing to do is link it to the net device structure */

ndev->npinfo = npinfo;

/* avoid racing with NAPI reading npinfo */

synchronize_rcu();

return 0;

release:

if (!ndev->npinfo)

kfree(npinfo);

np->dev = NULL;

dev_put(ndev);

return -1;

}

以上，初始化完成。

3.2.2 具体运行实现

在上一节最开始就提到了netconsole这个结构的定义，它里面的一个重要的write函数的初始化，正是这个函数实现了对终端信息的捕捉。下面，我们就来分析一下write_msg()的具体实现：

#define MAX_PRINT_CHUNK 1000

static void write_msg(struct console *con, const char *msg, unsigned int len)

{

int frag, left;

unsigned long flags;

if (!np.dev)

return;

local_irq_save(flags);

//处理msg的发送，由于发送的包大小不能操作长度MAX_PRINT_CHUNK，所以可能分多次发送这些信息

for(left = len; left; ) {

/* the transport msg should NOT larger than MAX_PRINT_CHUNK*/

frag = min(left, MAX_PRINT_CHUNK);

netpoll_send_udp(&np, msg, frag); /* send the msg */

msg += frag;

left -= frag;

}

local_irq_restore(flags);

}

我们看到，真正发送信息，实际上调用的是netpoll_send_udp()函数：

void netpoll_send_udp(struct netpoll *np, const char *msg, int len)

{

int total_len, eth_len, ip_len, udp_len;

struct sk_buff *skb;

struct udphdr *udph;

struct iphdr *iph;

struct ethhdr *eth;

udp_len = len + sizeof(*udph);

ip_len = eth_len = udp_len + sizeof(*iph);

total_len = eth_len + ETH_HLEN + NET_IP_ALIGN;

skb = find_skb(np, total_len, total_len - len);

if (!skb)

return;

memcpy(skb->data, msg, len);

skb->len += len;

udph = (struct udphdr *) skb_push(skb, sizeof(*udph));

udph->source = htons(np->local_port);

udph->dest = htons(np->remote_port);

udph->len = htons(udp_len);

udph->check = 0;

iph = (struct iphdr *)skb_push(skb, sizeof(*iph));

/* iph->version = 4; iph->ihl = 5; */

put_unaligned(0x45, (unsigned char *)iph);

iph->tos = 0;

put_unaligned(htons(ip_len), &(iph->tot_len));

iph->id = 0;

iph->frag_off = 0;

iph->ttl = 64;

iph->protocol = IPPROTO_UDP;

iph->check = 0;

put_unaligned(htonl(np->local_ip), &(iph->saddr));

put_unaligned(htonl(np->remote_ip), &(iph->daddr));

iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);

eth = (struct ethhdr *) skb_push(skb, ETH_HLEN);

eth->h_proto = htons(ETH_P_IP);

memcpy(eth->h_source, np->local_mac, 6);

memcpy(eth->h_dest, np->remote_mac, 6);

skb->dev = np->dev;

netpoll_send_skb(np, skb);

}

这个函数看起来并不复杂，主要是对一些控制信息的设置，最后调用了netpoll_send_skp()来发送数据。接下来看看netpoll_send_skp()是怎样实现的：

static void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)

{

int status;

struct netpoll_info *npinfo;

//相关检查

if (!np || !np->dev || !netif_running(np->dev)) {

__kfree_skb(skb);

return;

}

npinfo = np->dev->npinfo;

/* avoid recursion */

//检查是不是本地cpu在处理发送信息

if (npinfo->poll_owner == smp_processor_id() ||

np->dev->xmit_lock_owner == smp_processor_id()) {

/* for our own cpu */

//本地cpu的情况，检查netpoll结构是否指定了drop函数，回到3.2.1节的初始化，我们为该结构指定的drop函数是netpoll_queue（在后面分析该函数的实现）

if (np->drop) /* invoke the netpoll_queue of np to send the msg, the work_queue */

np->drop(skb);

else

__kfree_skb(skb);

return;

}

//非本地cpu以及drop函数并没有实现的情况

do {

/* the times of try to send the msg if it fails */

npinfo->tries--;

netif_tx_lock(np->dev);

* network drivers do not expect to be called if the queue is

* stopped.

status = NETDEV_TX_BUSY;

if (!netif_queue_stopped(np->dev))

/* send the package directly */

//直接调用了底层的skb数据包发送函数

status = np->dev->hard_start_xmit(skb, np->dev);

netif_tx_unlock(np->dev);

/* success */

if(!status) {

npinfo->tries = MAX_RETRIES; /* reset */

return;

}

/* transmit busy */

//发送繁忙的情况，进行poll操作，并尝试重新发包

netpoll_poll(np);

udelay(50);

} while (npinfo->tries > 0);

}

最后，剩下最重要的netpoll_queue()函数的实现分析了。它的实现具有一定的神奇之处，慢慢来剖析吧。

void netpoll_queue(struct sk_buff *skb)

{

unsigned long flags;

if (queue_depth == MAX_QUEUE_DEPTH) {

__kfree_skb(skb);

return;

}

spin_lock_irqsave(&queue_lock, flags);

if (!queue_head)

queue_head = skb;

else

queue_tail->next = skb;

queue_tail = skb;

queue_depth++;

spin_unlock_irqrestore(&queue_lock, flags);

schedule_work(&send_queue);

}

从代码已开始，就看到了工作队列，先不说这个函数的实现，补习一下工作队列的相关知识。工作队列是和tasklet一类的东西，但本质上存在极大的差异。表现在：

1、工作队列可以睡眠；

2、工作队列可以运行在多cpu上（默认是同一处理器上）；

3、工作队类不必以原子化执行，它还可以延迟执行；

相比于tasklet，工作队列在实时性上就显得不足了。Tasklet可以在很短的时间内很快执行，并且以原子模式执行。关于工作队列更详细的描述，请参考LDD3（p204—206）。

回到这个程序，我们在看一下在这个函数之前的一些声明和定义：

static void queue_process(void *p)

{

unsigned long flags;

struct sk_buff *skb;

while (queue_head) {

spin_lock_irqsave(&queue_lock, flags);

skb = queue_head;

queue_head = skb->next;

if (skb == queue_tail)

queue_head = NULL;

queue_depth--;

spin_unlock_irqrestore(&queue_lock, flags);

dev_queue_xmit(skb);

}

static DECLARE_WORK(send_queue, queue_process, NULL);

我们看到，在这里定义了work queue（send_queue）,并指定了调用函数queue_process，没有传递参数。

可以看到，queue_process()函数，最终调用函数dev_queue_xmit实现了数据包的发送。

重新回到函数netpoll_queue()，它实际上只是做了一些wrok queue深度的检查，并最终调用函数schedule_work(&send_queue);实现对工作队列的调度，也就是最终能够调度到queue_process()来完成数据包的发送任务。

4．总结

Netconsole提供了一种通过网络监控调试信息的便捷的方法，配置也十分简单。其源代码实现主要也精炼，由于使用了工作队列的机制，它可以安全的工作在中断上下文中。

但是，netconsole的使用仍有一些限制，正如kernel中netconsole.txt文档说的：“only IP networks, UDP packets and ethernet devices are supported”，它只能工作在IP网络中，并且使用不可靠的UDP连接，目前只支持以太网络设备。

期待在新的内核版本中，netconsole会有新的发展。

阅读(3370) | 评论(1) | 转发(0) |

上一篇：Seq_file文件系统实例剖析

下一篇：Linux设备驱动开发庖丁解牛之二——模块编程

给主人留下些什么吧！~~

blueskysee2011-07-14 14:13:07

为什么我在client中catch到的kernel boot message 与实际的dmesg 有挺大差别的呢？

回复 | 举报

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6