内核中的TCP的追踪分析－4-追踪TCP（IPV4)的socket的创建-续2-qinjiana0786-ChinaUnix博客

无名

首页　| 　博文目录　| 　关于我

qinjiana0786

博客访问： 1332048
博文数量： 107
博客积分： 10155
博客等级：上将
技术积分： 2166
用户组：普通用户
注册时间： 2008-03-25 16:57

文章分类

全部博文（107）

如何从应用程序进（24）
allegro（38）
LINUX（11）
TCP/IP（33）
未分配的博文（1）

文章存档

2010年（1）

2009年（1）

2008年（105）

我的朋友

相关博文

内核中的TCP的追踪分析－4-追踪TCP（IPV4)的socket的创建-续2

分类： LINUX

2008-12-08 12:02:16

inet_init()-->inet_register_protosw()

void inet_register_protosw(struct inet_protosw *p)
{
    struct list_head *lh;
    struct inet_protosw *answer;
    int protocol = p->protocol;
    struct list_head *last_perm;

    spin_lock_bh(&inetsw_lock);

    if (p->type >= SOCK_MAX)/* wumingxiaozu */
        goto out_illegal;

    /* If we are trying to override a permanent protocol, bail. */
    answer = NULL;
    last_perm = &inetsw[p->type];
    list_for_each(lh, &inetsw[p->type]) {
        answer = list_entry(lh, struct inet_protosw, list);

        /* Check only the non-wild match. */
        if (INET_PROTOSW_PERMANENT & answer->flags) {
            if (protocol == answer->protocol)
                break;
            last_perm = lh;
        }

        answer = NULL;
    }
    if (answer)
        goto out_permanent;

    /* Add the new entry after the last permanent entry if any, so that
     * the new entry does not override a permanent entry when matched with
     * a wild-card protocol. But it is allowed to override any existing
     * non-permanent entry. This means that when we remove this entry, the
     * system automatically returns to the old
     */
    list_add_rcu(&p->list, last_perm);
out:
    spin_unlock_bh(&inetsw_lock);

    synchronize_net();

    return;

out_permanent:
    printk(KERN_ERR "Attempt to override permanent protocol %d.\n",
     protocol);
    goto out;

out_illegal:
    printk(KERN_ERR
     "Ignoring attempt to register invalid socket type %d.\n",
     p->type);
    goto out;
}

很明显在上面的循环中找到适合的链头位置，将我们的数组中的元素一一注册登记到数组中。我们回到inet_create()函数中，函数所使用的protocol参数是我们应用程序中第三个参数传递下来的0，但是我们的数组inetsw 是根据sock->type也就是应用程序传递下来SOCK_STREAM，确定了数组中的第一个元素即tcp的协议的inet_protosw内容，在inetsw数组中找到了我们的协议类型的链头就会取得其宿主inet_protosw 结构，然后answer = list_entry(p, struct inet_protosw, list);使answer得到了我们上面的tcp协议的结构内容，接着函数判断我们上面看到的协议码是否与参数protocol相同，我们应用程序传递下来的是0，当然不同，所以这里将protocol调整为我们的tcp协议码IPPROTO_TCP，这个值是6，接着函数对answer进行判断是否取到了socket的关于ip协议的接口内容，如果没有找到就要通过request_module()函数来安装了，这里我们当然从应用程序路线中找到了，上面我们介绍过了，接着对其兼容性进行了检测，我们的tcp的这个结构值是-1，所以不会直接返回，下面有二个最关键的地方是

sock->ops = answer->ops;
answer_prot = answer->prot;

这二句首先是为socket的协议操作函数进行了挂钩，我们就要看上面的

    {
        .type = SOCK_STREAM,
        .protocol = IPPROTO_TCP,
        .prot = &tcp_prot,
        .ops = &inet_stream_ops,/* wumingxiaozu */
        .capability = -1,
        .no_check = 0,
        .flags = INET_PROTOSW_PERMANENT |
             INET_PROTOSW_ICSK,
    },

结合这个元素的设置我们明白了，上面socket的ip协议的接口操作函数被设置成了inet_stream_ops()，而answer_prot设置成了tcp_prot结构。这是个struct proto结构，我们不看了，其内容很多，但是这个结构的作用得强调一下，它是专门用于socket的运输层使用的结构，而用于网络层的结构由另一个结构体来表示struct inet_proto。接着函数中分配了一个sock结构。我是无名小卒，尽管3月份才写博客其实研究内核很多年了，写这些博客是为了与朋友们共享知识发扬copyleft精神，所以请转载的朋友注明出处。分配过程是通过sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot);这句代码来调用的，我们看到他传递了一个answer_prot即我们说的用于socket传输层的钩子函数给sk_alloc。

sys_socketcall()-->sys_socket()-->sock_create()-->__sock_create()-->通过pf->create()--> inet_create()-->sk_alloc()

struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
         struct proto *prot)
{
    struct sock *sk;

    sk = sk_prot_alloc(prot, priority | __GFP_ZERO, family);
    if (sk) {
        sk->sk_family = family;
        /*
         * See comment in struct sock definition to understand
         * why we need sk_prot_creator -acme,qinjiana0786@163.com
         */
        sk->sk_prot = sk->sk_prot_creator = prot;
        sock_lock_init(sk);
        sock_net_set(sk, get_net(net));
    }

    return sk;
}

注意上面sk->sk_prot = sk->sk_prot_creator = prot;将运输层的钩子结构tcp_prot挂入到了sock中的sk_prot上，下面我们就会看到这个钩子结构的调用。我们看到其调用了sk_prot_alloc（）函数，我们先看一下sock的结构体声明

struct sock { /* * Now struct inet_timewait_sock also uses sock_common, so please just * don't add nothing before this first member (__sk_common) --acme */ struct sock_common __sk_common; #define sk_family __sk_common.skc_family #define sk_state __sk_common.skc_state #define sk_reuse __sk_common.skc_reuse #define sk_bound_dev_if __sk_common.skc_bound_dev_if #define sk_node __sk_common.skc_node #define sk_bind_node __sk_common.skc_bind_node #define sk_refcnt __sk_common.skc_refcnt #define sk_hash __sk_common.skc_hash #define sk_prot __sk_common.skc_prot #define sk_net __sk_common.skc_net/* wumingxiaozu */ unsigned char sk_shutdown : 2, sk_no_check : 2, sk_userlocks : 4; unsigned char sk_protocol; unsigned short sk_type; int sk_rcvbuf; socket_lock_t sk_lock; /* * The backlog queue is special, it is always used with * the per-socket spinlock held and requires low latency * access. Therefore we special case it's implementation. */ struct { struct sk_buff *head; struct sk_buff *tail; } sk_backlog; wait_queue_head_t *sk_sleep; struct dst_entry *sk_dst_cache; struct xfrm_policy *sk_policy[2]; rwlock_t sk_dst_lock; atomic_t sk_rmem_alloc; atomic_t sk_wmem_alloc; atomic_t sk_omem_alloc; int sk_sndbuf; struct sk_buff_head sk_receive_queue; struct sk_buff_head sk_write_queue; struct sk_buff_head sk_async_wait_queue; int sk_wmem_queued; int sk_forward_alloc; gfp_t sk_allocation; int sk_route_caps; int sk_gso_type; unsigned int sk_gso_max_size; int sk_rcvlowat; unsigned long sk_flags; unsigned long sk_lingertime; struct sk_buff_head sk_error_queue; struct proto *sk_prot_creator; rwlock_t sk_callback_lock; int sk_err, sk_err_soft; atomic_t sk_drops; unsigned short sk_ack_backlog; unsigned short sk_max_ack_backlog; __u32 sk_priority; struct ucred sk_peercred; long sk_rcvtimeo; long sk_sndtimeo; struct sk_filter *sk_filter; void *sk_protinfo; struct timer_list sk_timer; ktime_t sk_stamp; struct socket *sk_socket; void *sk_user_data; struct page *sk_sndmsg_page; struct sk_buff *sk_send_head; __u32 sk_sndmsg_off; int sk_write_pending; void *sk_security; __u32 sk_mark; /* XXX 4 bytes hole on 64 bit */ void (*sk_state_change)(struct sock *sk); void (*sk_data_ready)(struct sock *sk, int bytes); void (*sk_write_space)(struct sock *sk); void (*sk_error_report)(struct sock *sk); int (*sk_backlog_rcv)(struct sock *sk, struct sk_buff *skb); void (*sk_destruct)(struct sock *sk); };

这么大一个数据结构，当然现在你也对他为什么不与socket放在一起感觉情出此因了，socket延续了象vfs文件系统的inode的做法，vfs文件系统中有很多数据结构都是采取这种分离的办法，即把重要的项放在与文件系统密切的结构里，不重要但又要占大量内存的项会分享出来放在另一个结构中，再让二个结构相关联。同时也有一些项是公用的，所以考虑到公用部分也会采取提取公因子的方法提取出来共享使用。这里的作法就是把与应用程序密切相关的部分提取出来放在了socket当中，而与公共相关的部分放在sock结构中，我们在下边还会看到与具体协议相关的还会有一个专用的结构体，例如tcp/ip的AF_INET协议相关的私有结构体inet_sock,与unix使用AF_UNIX协议相关的私有结构体unix_sock，如果朋友们还是不很清楚可以，理解成公用、通用、专用三个说法，公用部分是socket结构体，通用部分是sock结构体，而专用部分则是具体协议的例如本文要讲到的inet_sock结构体(我们还会在以后说明)。反过来想，如果把上面这些结构变量都放在socket中，必然形成一个庞然大物，不但非常的不灵活还要浪费大量的系统内存资源，很多结构变量与具体的协议相关如果放在通用或者公用部分则会大多数时间是处于空闲状态但是分配过程中还会为其开辟相应的内存空间，了解了这个原因后，我们以后的分析过程会变得想当的轻松和自如了，存在皆有理，遇到结构体首先的想法不是立刻分析他，解释他而是先晓其理，知道其产生的原因那么在代码的学习过程中会详细的体会出其中的奥秘，那将是一种非常有成就感的事情，也以此提高了大家阅读代码的兴趣减少了压力和困难，象巨型的数据结构也就变得不再可怕，以后的分析过程中朋友们会经常看到我在代码中列出但是除非必要点出或者画图，一般是随着分析的过程来加以解析的，请大家记住“用时理解”的方法。很多时候就象我们上面讲到的专用结构体变量放在公用部分时造成的内存浪费一样，如果我们把结构体的说明和作用罗列在前让大家花费时间和脑力在记忆和理解这些生硬的定义上，无非象浪费内存一样，还是学习linux的“写时复制”的原理好，不用我们不分析，用了再分析再理解，可见学习linux不仅是代码水平的整体的提高和阅读能力的增强，更重要的是学习方法的改进，让我们坚持“用时理解”的方法吧，不要无谓的浪费时间花在记忆“单词”上了，用的多了，见的多了自然而然的吸收了消化了。但是为了说明公用、通用、专用的关系我们还是画上一幅草图形象展示一下其关系

上图足以说明了socket与sock以及inet_sock结构体中的包含关系，从任意一结构开始都可以找到相关的结构体，正是这种灵活的关系方式，tcp/ip协议中也是广泛利用这种方式实现了所谓的“滑动窗口”协议。以后我们涉及到具体的过程再详细讲解，我是无名小卒本文是原创请转载的朋友注明出处http://qinjiana0786.cublog.cn，我们继续往下分析。篇幅太长，转下一篇继续

阅读(12085) | 评论(0) | 转发(5) |

上一篇：内核中的TCP的追踪分析－3-追踪TCP（IPV4)的socket的创建-续1

下一篇：内核中的TCP的追踪分析－5-追踪TCP（IPV4)的socket的创建-续3

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6