gettimeofday的时间差异问题-gadfly-ChinaUnix博客

学以致用 -- gadfly的博客

首页　| 　博文目录　| 　关于我

gadfly

博客访问： 335085
博文数量： 4
博客积分： 2010
博客等级：大尉
技术积分： 377
用户组：普通用户
注册时间： 2007-02-03 15:28

文章分类

全部博文（4）

生活点滴（0）
JAVA开发（0）

Java基础（0）

JVM（0）

设计模式（0）

代码规范（0）
GOOGLE技术（0）
UNIX系统（3）

svn（1）

rpm（1）

系统工具（0）

Kernel（1）

shell（0）

网络工具（0）

mail（0）

make（0）

Saloris（0）

HP-UX（0）

LINUX（0）
C语言程序开发（1）

数据库开发（0）

线程及并发处理（0）

代码移植（0）

系统调用（0）

内存泄漏（0）

UNIX网络编程（1）
未分配的博文（0）

文章存档

2008年（4）

我的朋友

相关博文

gettimeofday的时间差异问题

分类： LINUX

2008-10-15 15:26:36

文件:	tracksettimeofday.rar
大小:	3KB
下载:	下载

有个通用模块记录日志，当然也包含时间了。但是在客户那边运行，总是每隔一段时间，会出现一次获取的时间跳到未来1小时13或1小时14分之后，然后马上又正常了。但是在我们内部环境没有出现。

开始以为是ntpd同步时间的同时，可能导致时间不一致。但是停掉ntpd，问题依然。

然后又估计可能是有进程设置时间瞬间，导致时间差异，做了个lkm来监控settimeofday。监控日志也没发现什么问题。代码参考附件。

本地测试环境是Fedora core 5. kernel: 2.6.9
2.6.9-55.ELsmp #1 SMP
单CPU，这个问题没有重现。

客户机应该是多CPU的，kernel未知。

再查gettimeofday的相关资料，发现有类似的现象：

Hi, We've been seeing some strange behaviour on some of our applications recently. I've tracked this down to gettimeofday() returning spurious values occasionally. Specifically, gettimeofday() will suddenly, for a single call, return a value about 4398 seconds (~1 hour 13 minutes) in the future. The following call goes back to a normal value. This seems to be occurring when the clock source goes slightly backwards for a single call. In kernel/time/timekeeping.c:__get_nsec_offset(), we have this: cycle_delta = (cycle_now - clock->cycle_last) & clock->mask; So a small decrease in time here will (this is all unsigned arithmetic) give us a very large cycle_delta. cyc2ns() then multiplies this by some value, then right shifts by 22. The resulting value (in nanoseconds) is approximately 4398 seconds; this gets added on to the xtime value, giving us our jump into the future. The next call to gettimeofday() returns to normal as we don't have this huge nanosecond offset. This system is a 2-socket core 2 quad machine (8 cpus), running 32 bit mode. It's a dell poweredge 1950. The kernel selects the TSC as the clock source, having determined that the tsc runs synchronously on this system. Switching the systems to use a different time source seems to make the problem go away (which is fine for us, but we'd like to get this fixed properly upstream). We've also seen this behaviour with a synthetic test program (which just runs 4 threads all calling gettimeofday() in a loop as fast as possible and testing that it doesn't jump) on an older machine, a dell poweredge SC1425 with two p4 hyperthreaded xeons. Can anyone advise on what's going wrong here? I can't find much in the way of documentation on whether the TSC is guaranteed to be monotonically increasing on intel systems. Should the code choose not to use the TSC? Or should the TSC reading code ensure that the returned values are monotonic?

意思是在多core的环境下用TSC作为time source，会出现同样的问题，大概往后跳1:14左右的时间，4398seconds = 2^42ns。

在以下几个版本都出现了：

2.6.18-1.2257.fc5smp

2.6.23-rc3

2.6.20.4
 
似乎有人升级到2.6.21，这个问题就不见了。
 
接着这个帖子有人解释了这个bug:

Subject: x86: tsc prevent time going backwards From: Thomas Gleixner <tglx@linutronix.de> Date: Tue, 01 Apr 2008 19:45:18 +0200 We already catch most of the TSC problems by sanity checks, but there is a subtle bug which has been in the code for ever. This can cause time jumps in the range of hours. This was reported in: http://lkml.org/lkml/2007/8/23/96 and http://lkml.org/lkml/2008/3/31/23 I was able to reproduce the problem with a gettimeofday loop test on a dual core and a quad core machine which both have sychronized TSCs. The TSCs seems not to be perfectly in sync though, but the kernel is not able to detect the slight delta in the bootup sync check. There exists an extremly small window where this delta can be observed with a real big time jump. So far I was only able to reproduce this with the vsyscall gettimeofday implementation, but in theory this might be observable with the syscall based version as well. CPU 0 updates the clock source variables under xtime/vyscall lock and CPU1, where the TSC is slighty behind CPU0, is reading the time right after the seqlock was unlocked. The clocksource reference data was updated with the TSC from CPU0 and the value which is read from TSC on CPU1 is less than the reference data. This results in a huge delta value due to the unsigned subtraction of the TSC value and the reference value. This algorithm can not be changed due to the support of wrapping clock sources like pm timer. The huge delta is converted to nanoseconds and added to xtime, which is then observable by the caller. The next gettimeofday call on CPU1 will show the correct time again as now the TSC has advanced above the reference value. To prevent this TSC specific wreckage we need to compare the TSC value against the reference value and return the latter when it is larger than the actual TSC value. I pondered to mark the TSC unstable when the readout is smaller than the reference value, but this would render an otherwise good and fast clocksource unusable without a real good reason.

意思是说在双核多CPU的机器上，由于不同cpu的TSC数据不一致，导致了时间计算的差异。

后面看了一大串，似乎有patch解决这个问题。

在不升级kernel的情况下，可以考虑修改time source，来解决。修改kernel option。

kernel parameters文档如下：

   clock=          [BUGS=IA-32,HW] gettimeofday timesource override.
                        Forces specified timesource (if avaliable) to be used
                        when calculating gettimeofday(). If specicified
                        timesource is not avalible, it defaults to PIT.
                        Format: { pit | tsc | cyclone | pmtmr }

设置方式可以通过修改/etc/grub.conf，参考文档如下：

http://tonykorn97.itpub.net/post/6414/456362

title Fedora Core (2.6.9-1.667)
root (hd0,0)
kernel /vmlinuz-2.6.9-1.667 ro root=/dev/hda2 clock=pit Adding this boot option disables the kernel's correction for lost ticks, so be sure to also install VMware Tools and turn on time synchronization. The latter prevents the guest clock from losing time over the long term due to lost ticks.

For additional information about working with boot loaders, see your Linux distribution's documentation.

阅读(7991) | 评论(0) | 转发(0) |

上一篇：UDP开发相关的一些心得

下一篇：没有了

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6