Chinaunix首页 | 论坛 | 博客
  • 博客访问: 407713
  • 博文数量: 62
  • 博客积分: 1483
  • 博客等级: 上尉
  • 技术积分: 779
  • 用 户 组: 普通用户
  • 注册时间: 2009-02-24 12:25
文章分类

全部博文(62)

文章存档

2012年(2)

2011年(6)

2010年(6)

2009年(48)

我的朋友

分类: LINUX

2009-09-29 20:18:49

uboot1.1.6中的nfs命令不好使,里面有个小bug
表现为下载大文件时总是失败,经过和tftp命令的比较,这个小bug终于水落石出。
 
先跟踪下nfs数据包的接收情况
 
root@lzd-laptop:/home/lzd# tcpdump -r aaa 
reading from file aaa, link-type EN10MB (Ethernet)
15:16:44.290988 arp who-has lzd-laptop.local tell 10.107.4.146
15:16:45.100523 arp who-has 10.107.4.254 tell lzd-laptop.local
15:16:46.100539 arp who-has 10.107.4.254 tell lzd-laptop.local
15:16:47.100544 arp who-has 10.107.4.254 tell lzd-laptop.local
15:16:49.290554 arp who-has lzd-laptop.local tell 10.107.4.146
15:16:49.290609 arp reply lzd-laptop.local is-at 00:0b:db:98:a0:5d (oui Unknown)
15:16:49.290724 IP 10.107.4.146.1000 > lzd-laptop.local.sunrpc: UDP, length 56
15:16:49.291018 IP lzd-laptop.local.sunrpc > 10.107.4.146.1000: UDP, length 28
15:16:49.294441 IP 10.107.4.146.1000 > lzd-laptop.local.sunrpc: UDP, length 56
15:16:49.294657 IP lzd-laptop.local.sunrpc > 10.107.4.146.1000: UDP, length 28
15:16:49.298118 IP 10.107.4.146.1000 > lzd-laptop.local.54597: UDP, length 80
15:16:49.299982 IP lzd-laptop.local.54597 > 10.107.4.146.1000: UDP, length 60
15:16:49.303403 IP 10.107.4.146.4905 > lzd-laptop.local.nfs: 104 lookup [|nfs]
15:16:49.303528 IP lzd-laptop.local.nfs > 10.107.4.146.4905: reply ok 128 lookup [|nfs]
15:16:49.306989 IP 10.107.4.146.4906 > lzd-laptop.local.nfs: 104 read [|nfs]
15:16:49.307150 IP lzd-laptop.local.nfs > 10.107.4.146.4906: reply ok 1124 read
15:16:49.309664 IP 10.107.4.146.4907 > lzd-laptop.local.nfs: 104 read [|nfs]
15:16:49.309846 IP lzd-laptop.local.nfs > 10.107.4.146.4907: reply ok 1124 read
15:16:49.312293 IP 10.107.4.146.4908 > lzd-laptop.local.nfs: 104 read [|nfs]
15:16:49.312515 IP lzd-laptop.local.nfs > 10.107.4.146.4908: reply ok 1124 read
15:16:49.314964 IP 10.107.4.146.4909 > lzd-laptop.local.nfs: 104 read [|nfs]
15:16:49.315129 IP lzd-laptop.local.nfs > 10.107.4.146.4909: reply ok 1124 read
15:16:49.317634 IP 10.107.4.146.4910 > lzd-laptop.local.nfs: 104 read [|nfs]
15:16:49.317763 IP lzd-laptop.local.nfs > 10.107.4.146.4910: reply ok 1124 read
15:16:49.320265 IP 10.107.4.146.4911 > lzd-laptop.local.nfs: 104 read [|nfs]
15:16:49.320330 IP lzd-laptop.local.nfs > 10.107.4.146.4911: reply ok 1124 read
15:16:49.322834 IP 10.107.4.146.4912 > lzd-laptop.local.nfs: 104 read [|nfs]
15:16:49.323146 IP lzd-laptop.local.nfs > 10.107.4.146.4912: reply ok 1124 read
15:16:49.325599 IP 10.107.4.146.4913 > lzd-laptop.local.nfs: 104 read [|nfs]
15:16:49.325757 IP lzd-laptop.local.nfs > 10.107.4.146.4913: reply ok 1124 read
15:16:49.328209 IP 10.107.4.146.4914 > lzd-laptop.local.nfs: 104 read [|nfs]
15:16:49.328372 IP lzd-laptop.local.nfs > 10.107.4.146.4914: reply ok 1124 read
15:16:49.330837 IP 10.107.4.146.4915 > lzd-laptop.local.nfs: 104 read [|nfs]
15:16:49.331053 IP lzd-laptop.local.nfs > 10.107.4.146.4915: reply ok 1124 read
15:16:49.333506 IP 10.107.4.146.4916 > lzd-laptop.local.nfs: 104 read [|nfs]
15:16:49.333696 IP lzd-laptop.local.nfs > 10.107.4.146.4916: reply ok 1124 read
15:16:49.336198 IP 10.107.4.146.4917 > lzd-laptop.local.nfs: 104 read [|nfs]
15:16:49.336364 IP lzd-laptop.local.nfs > 10.107.4.146.4917: reply ok 1124 read
15:16:49.338810 IP 10.107.4.146.4918 > lzd-laptop.local.nfs: 104 read [|nfs]
15:16:49.338934 IP lzd-laptop.local.nfs > 10.107.4.146.4918: reply ok 1124 read
15:16:49.341381 IP 10.107.4.146.4919 > lzd-laptop.local.nfs: 104 read [|nfs]
15:16:49.341540 IP lzd-laptop.local.nfs > 10.107.4.146.4919: reply ok 1124 read
15:16:49.343983 IP 10.107.4.146.4920 > lzd-laptop.local.nfs: 104 read [|nfs]
15:16:49.344074 IP lzd-laptop.local.nfs > 10.107.4.146.4920: reply ok 1124 read
15:16:49.346526 IP 10.107.4.146.4921 > lzd-laptop.local.nfs: 104 read [|nfs]
15:16:49.346658 IP lzd-laptop.local.nfs > 10.107.4.146.4921: reply ok 1124 read
15:16:49.349168 IP 10.107.4.146.4922 > lzd-laptop.local.nfs: 104 read [|nfs]
15:16:49.349697 IP lzd-laptop.local.nfs > 10.107.4.146.4922: reply ok 1124 read
15:16:49.352145 IP 10.107.4.146.4923 > lzd-laptop.local.nfs: 104 read [|nfs]
15:16:49.352310 IP lzd-laptop.local.nfs > 10.107.4.146.4923: reply ok 1124 read
15:16:49.354758 IP 10.107.4.146.4924 > lzd-laptop.local.nfs: 104 read [|nfs]
15:16:49.354917 IP lzd-laptop.local.nfs > 10.107.4.146.4924: reply ok 1124 read
15:16:49.357364 IP 10.107.4.146.4925 > lzd-laptop.local.nfs: 104 read [|nfs]
15:16:49.357509 IP lzd-laptop.local.nfs > 10.107.4.146.4925: reply ok 1124 read
15:16:49.359962 IP 10.107.4.146.4926 > lzd-laptop.local.nfs: 104 read [|nfs]
15:16:49.360090 IP lzd-laptop.local.nfs > 10.107.4.146.4926: reply ok 1124 read
15:16:49.362596 IP 10.107.4.146.4927 > lzd-laptop.local.nfs: 104 read [|nfs]
15:16:49.362740 IP lzd-laptop.local.nfs > 10.107.4.146.4927: reply ok 1124 read
15:16:49.365238 IP 10.107.4.146.4928 > lzd-laptop.local.nfs: 104 read [|nfs]
15:16:49.365423 IP lzd-laptop.local.nfs > 10.107.4.146.4928: reply ok 1124 read 这里表示我的主机已经给dm9000一个1124大小的包
15:16:50.104534 arp who-has 10.107.4.254 tell lzd-laptop.local
15:16:51.104547 arp who-has 10.107.4.254 tell lzd-laptop.local
15:16:52.104530 arp who-has 10.107.4.254 tell lzd-laptop.local
15:16:54.288529 arp who-has 10.107.4.146 tell lzd-laptop.local
15:16:55.108525 arp who-has 10.107.4.254 tell lzd-laptop.local
15:16:55.288543 arp who-has 10.107.4.146 tell lzd-laptop.local
15:16:56.108530 arp who-has 10.107.4.254 tell lzd-laptop.local
15:16:56.288526 arp who-has 10.107.4.146 tell lzd-laptop.local
15:16:57.108530 arp who-has 10.107.4.254 tell lzd-laptop.local
root@lzd-laptop:/home/lzd# 
 
我相信自己的笔记本确实发送了这个包,但是打开dm9000的调试,发现
dm9000发送了请求后,就一直没有得到这个包,所以可以肯定是dm9000丢失了这个包,
起初以为是nfs传输的文件块比较大的原因,看了dm9000a的手册,发现不可能,因为他的
环形接收缓冲区有13k,什么包都能接收了。在跟tftp命令作个比较,发现了答案。
tftp命令也会丢失数据包,但是在等待一段 时间后,它就再次发送请求,要求得到刚才的那个
数据包,也就是重传机制。但是nfs却没有这样作。下面的函数说明了这点区别。
 
 
这个是tftp的超时处理函数
static void
TftpTimeout (void)
{
    if (++TftpTimeoutCount > TIMEOUT_COUNT) {
        puts ("\nRetry count exceeded; starting again\n");
        NetStartAgain ();
    } else {
        puts ("T ");
        NetSetTimeout (TIMEOUT * CFG_HZ, TftpTimeout); 重设超时时间
        TftpSend (); 这里重新请求发送了
    }
}
 
 
 
这个是nfs的超时处理函数
static void
NfsTimeout (void)
{
    puts ("Timeout\n");
    NetState = NETLOOP_FAIL; 只要超时,得不到包,就直接死掉了
    return;
}
 
 
如下更改后就ok了。
#if 0
static void
NfsTimeout (void)
{
    puts ("Timeout\n");
    NetState = NETLOOP_FAIL;
    return;
}
#endif
 
NfsTimeout (void)
{
    puts ("Timeout,try again");
    NfsSend ();
}
 
下面是验证:
lzd> tftp 0x30000000 image
dm9000 i/o: 0x20000300, id: 0x90000a46 
MAC: 00:80:00:80:00:80
TFTP from server 10.107.4.145; our IP address is 10.107.4.146
Filename 'image'.
Load address: 0x30000000
Loading: T #################################################################
         ################T ########T #########################################
         #################################################################
         #################################################################
         ########T T ###################################
done
Bytes transferred = 1546996 (179af4 hex)
lzd> nfs 0x32000000 /home/lzd/nfs/image
dm9000 i/o: 0x20000300, id: 0x90000a46 
MAC: 00:80:00:80:00:80                                                                                               
File transfer via NFS from server 10.107.4.145; our IP address is 10.107.4.146
Filename '/home/lzd/nfs/image'.
Load address: 0x32000000
Loading: #################################################################
         ####################Timeout,try again#############################################
         #####################################Timeout,try again############################
         #################################################################
         ###########################################Timeout,try again*** ERROR: Cannot umount
lzd> cmp 0x30000000 0x32000000 0x179af0
word at 0x30179af4 (0x33f4fca8) != word at 0x32179af4 (0x00000000)
Total of 386749 words were the same
lzd> 
 
最后面的ERROR: 提示是因为我的主机一直不停的在向网关发送arp包导致的,当我把网关的ip改成空后,就没有问题了。 
 
lzd>  nfs 0x32000000 /home/lzd/nfs/image
dm9000 i/o: 0x20000300, id: 0x90000a46
MAC: 00:80:00:80:00:80
File transfer via NFS from server 10.107.4.145; our IP address is 10.107.4.146
Filename '/home/lzd/nfs/image'.
Load address: 0x32000000
Loading: ############Timeout,try again##Timeout,try again#########################################Timeout,try again#######
         #################################################################
         #################################################################
         #################################################################
         ############Timeout,try again###Timeout,try again############################
done
Bytes transferred = 1546996 (179af4 hex)
lzd>



 
下面是关于nfs的基础知识:
NFS请求和响应
SunNFS(网络文件系统)的请求和响应显示格式是:
 
src.xid>dst.nfs:lenopargs
src.nfs>dst.xid:replystatlenopresults
 
sushi.6709>wrl.nfs:112readlinkfh21,24/10.73165
wrl.nfs>sushi.6709:replyok40readlink"../var"
sushi.201b>wrl.nfs:
144lookupfh9,74/4096.6878"xcolors"
wrl.nfs>sushi.201b:
replyok128lookupfh9,74/4134.3150
 
在第一行,主机sushi向wrl发送号码为6709的交易会话(注意源主机后面的数字是交易号,不是端口).这项请求长112字节,不包括UDP和IP报
头.在文件句柄(fh)21,24/10.731657119上执行readlink(读取符号连接)操作.(如果运气不错,就象这种情况,文件句柄可以依次翻译成
主次设备号,i节点号,和事件号(generationnumber).)Wrl回答`ok'和连接的内容.
在第三行,sushi请求wrl在目录文件9,74/4096.6878中查找`xcolors'.注意数据的打印格式取决于操作类型.格式应该是可以自我说明的.
 
给出-v(verbose)选项可以显示附加信息.例如:
sushi.1372a>wrl.nfs:
148readfh21,11/12.1958192bytes@24576
wrl.nfs>sushi.1372a:
replyok1472readREG100664ids417/0sz29388
 
(- v同时使它显示IP报头的TTL,ID,和分片域,在这个例子里把它们省略了.)在第一行,sushi请求wrl从文件21,11/12.195的偏移位置24576
开始,读取8192字节.Wrl回答`ok';第二行显示的报文是应答的第一个分片,因此只有1472字节(其余数据在后续的分片中传过来, 但由于这些分片
里没有NFS甚至UDP报头,因此根据所使用的过滤器表达式,有可能不显示).-v选项还会显示一些文件属性(它们作为文件数据的附带部分传回来):文
件类型(普通文件``REG''),存取模式(八进制数),uid和gid,以及文件大小.
如果再给一个-v选项(-vv),还能显示更多的细节.
 
注意NFS请求的数据量非常大,除非增加snaplen,否则很多细节无法显示.试一试`-s192'选项.
 
NFS应答报文没有明确标明RPC操作.因此tcpdump保留有``近来的''请求记录,根据交易号匹配应答报文.如果应答报文没有相应的请求报文,它就
无法分析.
 
-s
从每个报文中截取snaplen字节的数据,而不是缺省的68(如果是SunOS的NIT,最小值是96).68个字节适用于IP,ICMP,TCP和 UDP,但是有可能截掉名字服务器和NFS报文的协议信息(见下面).输出时如果指定``[|proto]'',tcpdump可以指出那些捕捉量过小的数据报,这里的proto是截断发生处的协议层名称.注意,采用更大的捕捉范围既增加了处理报文的时间,又相应的减少了报文的缓冲数量,可能导致报文的丢失.你应该把snaplen设的尽量小,只要能够容纳你需要的协议信息就可以了.
 
阅读(2903) | 评论(0) | 转发(2) |
给主人留下些什么吧!~~