分类: LINUX
2008-03-29 20:40:55
lock_sock and release_sock do not hold a normal spinlock directly but instead hold the owner field and do other housework as well.
lock_sock和release_sock并没有直接的hold一个spinlock,而是hold了sk->sk_lock.owner。
/* This is the per-socket lock. The spinlock provides a synchronization
* between user contexts and software interrupt processing, whereas the
* mini-semaphore synchronizes multiple users amongst themselves.
*/
struct sock_iocb;
typedef struct {
spinlock_t slock;
struct sock_iocb *owner;
wait_queue_head_t wq;
/*
* We express the mutex-alike socket_lock semantics
* to the lock validator by explicitly managing
* the slock as a lock variant (in addition to
* the slock itself):
*/
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
#endif
} socket_lock_t;
lcok_sock 获取锁 sk->sk_lock.slock,禁止掉本地下半部,并且查看owner域。如果owner域有值的话,那么就会自旋直到它被释放,接着设置 owner,并且释放sk->sk_lock.slock。这就意味着bh_lock_sock依然能够执行,即使socket是被 lock_sock“锁住”,其(lock_sock)不能够执行。[通过分析lock_sock和__lock_sock的代码,可以看出 lock_sock中lock了sk_lock.slock,但是如果owner域有值的话,会在__lock_sock中unlock掉 sk_lock.slock,因此lock_sock不影响bh_lock_sock。例如上面的代码中:
调用了 bh_lock_sock_nested(sk),并不影响后面调用 sock_owned_by_user(sk)。]
release_sock获得锁 sk_lock.slock,处理接收到的backlog队列,清除owner域,唤醒等待在sk_lock.wq上的等待队列,接着释放 sk_lock.slock,开下半部。
bh_lock_sock和bh_release_sock只是获得、释放sk->sk_lock.slock锁。
跟踪tcp_prequeue()函数:
/* Packet is added to VJ-style prequeue for processing in process
* context, if a reader task is waiting. Apparently, this exciting
* idea (VJ's mail "Re: query about TCP header on tcp-ip" of 07 Sep 93)
* failed somewhere. Latency? Burstiness? Well, at least now we will
* see, why it failed. 8)8) --ANK
*
* NOTE: is this not too big to inline?
*/
static inline int tcp_prequeue(struct sock *sk, struct sk_buff *skb)
{
struct tcp_sock *tp = tcp_sk(sk);
if (!sysctl_tcp_low_latency && tp->ucopy.task) {
__skb_queue_tail(&tp->ucopy.prequeue, skb);
tp->ucopy.memory += skb->truesize;
if (tp->ucopy.memory > sk->sk_rcvbuf) {
struct sk_buff *skb1;
BUG_ON(sock_owned_by_user(sk));
while ((skb1 = __skb_dequeue(&tp->ucopy.prequeue)) != NULL) {
sk->sk_backlog_rcv(sk, skb1);
NET_INC_STATS_BH(LINUX_MIB_TCPPREQUEUEDROPPED);
}
tp->ucopy.memory = 0;
} else if (skb_queue_len(&tp->ucopy.prequeue) == 1) {
wake_up_interruptible(sk->sk_sleep);
if (!inet_csk_ack_scheduled(sk))
inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK,
(3 * TCP_RTO_MIN) / 4,
TCP_RTO_MAX);
}
return 1;
}
return 0;
}
注:
/* Data for direct copy to user */
struct {
struct sk_buff_head prequeue;
struct task_struct *task;
struct iovec *iov;
int memory;
int len;
#ifdef CONFIG_NET_DMA
/* members for async copy */
struct dma_chan *dma_chan;
int wakeup;
struct dma_pinned_list *pinned_list;
dma_cookie_t dma_cookie;
#endif
} ucopy;
Ucopy is part of the TCP options structure, discussed in Chapter 8, Section 8.7.2. Once
segments are put on the prequeue, they are processed in the application task’s context rather than
in the kernel context. This improves the efficiency of TCP by minimizing context switches
between kernel and user. If tcp_prequeue returns zero, it means that there was no current user
task associated with the socket, so tcp_v4_do_rcv is called to continue with normal "slow path"
receive processing. Tcp_prequeue is covered in more detail later in this chapter.
The field, prequeue, contains the list of socket buffers waiting for processing. Task is
the user-level task to receive the data. The iov field points to the user’s receive data array, and
memory contains the sum of the actual data lengths of all of the socket buffers on the prequeue.
Len is the number of buffers on the prequeue.
tcp_recvmsg()中调用的preueue队列处理函数tcp_prequeue_process函数。static void tcp_prequeue_process(struct sock *sk)