 |
| Îļþ: |
tracksettimeofday.rar |
| ´óС: |
3KB |
| ÏÂÔØ: |
ÏÂÔØ | |
ÓиöͨÓÃÄ£¿é¼Ç¼ÈÕÖ¾£¬µ±È»Ò²°üº¬Ê±¼äÁË¡£µ«ÊÇÔÚ¿Í»§ÄDZßÔËÐУ¬×ÜÊÇÿ¸ôÒ»¶Îʱ¼ä£¬»á³öÏÖÒ»´Î»ñÈ¡µÄʱ¼äÌøµ½Î´À´1Сʱ13»ò1Сʱ14·ÖÖ®ºó£¬È»ºóÂíÉÏÓÖÕý³£ÁË¡£µ«ÊÇÔÚÎÒÃÇÄÚ²¿»·¾³Ã»ÓгöÏÖ¡£
¿ªÊ¼ÒÔΪÊÇntpdͬ²½Ê±¼äµÄͬʱ£¬¿ÉÄܵ¼ÖÂʱ¼ä²»Ò»Ö¡£µ«ÊÇÍ£µôntpd£¬ÎÊÌâÒÀÈ»¡£
È»ºóÓÖ¹À¼Æ¿ÉÄÜÊÇÓнø³ÌÉèÖÃʱ¼ä˲¼ä£¬µ¼ÖÂʱ¼ä²îÒ죬×öÁ˸ölkmÀ´¼à¿Øsettimeofday¡£¼à¿ØÈÕ־Ҳû·¢ÏÖʲôÎÊÌâ¡£´úÂë²Î¿¼¸½¼þ¡£
±¾µØ²âÊÔ»·¾³ÊÇFedora core 5. kernel: 2.6.9
2.6.9-55.ELsmp #1 SMP
µ¥CPU£¬Õâ¸öÎÊÌâûÓÐÖØÏÖ¡£
¿Í»§»úÓ¦¸ÃÊǶàCPUµÄ£¬kernelδ֪¡£
ÔÙ²égettimeofdayµÄÏà¹Ø×ÊÁÏ£¬·¢ÏÖÓÐÀàËÆµÄÏÖÏó£º
|
Hi,
We've been seeing some strange behaviour on some of our applications recently. I've tracked this down to gettimeofday() returning spurious values occasionally.
Specifically, gettimeofday() will suddenly, for a single call, return a value about 4398 seconds (~1 hour 13 minutes) in the future. The following call goes back to a normal value.
This seems to be occurring when the clock source goes slightly backwards for a single call. In kernel/time/timekeeping.c:__get_nsec_offset(), we have this: cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;
So a small decrease in time here will (this is all unsigned arithmetic) give us a very large cycle_delta. cyc2ns() then multiplies this by some value, then right shifts by 22. The resulting value (in nanoseconds) is approximately 4398 seconds; this gets added on to the xtime value, giving us our jump into the future. The next call to gettimeofday() returns to normal as we don't have this huge nanosecond offset.
This system is a 2-socket core 2 quad machine (8 cpus), running 32 bit mode. It's a dell poweredge 1950. The kernel selects the TSC as the clock source, having determined that the tsc runs synchronously on this system. Switching the systems to use a different time source seems to make the problem go away (which is fine for us, but we'd like to get this fixed properly upstream).
We've also seen this behaviour with a synthetic test program (which just runs 4 threads all calling gettimeofday() in a loop as fast as possible and testing that it doesn't jump) on an older machine, a dell poweredge SC1425 with two p4 hyperthreaded xeons.
Can anyone advise on what's going wrong here? I can't find much in the way of documentation on whether the TSC is guaranteed to be monotonically increasing on intel systems. Should the code choose not to use the TSC? Or should the TSC reading code ensure that the returned values are monotonic?
|
Òâ˼ÊÇÔÚ¶àcoreµÄ»·¾³ÏÂÓÃTSC×÷Ϊtime source£¬»á³öÏÖͬÑùµÄÎÊÌ⣬´ó¸ÅÍùºóÌø1:14×óÓÒµÄʱ¼ä£¬4398seconds = 2^42ns¡£
ÔÚÒÔϼ¸¸ö°æ±¾¶¼³öÏÖÁË£º
2.6.18-1.2257.fc5smp
2.6.23-rc3
2.6.20.4
ËÆºõÓÐÈËÉý¼¶µ½2.6.21£¬Õâ¸öÎÊÌâ¾Í²»¼ûÁË¡£
½Ó×ÅÕâ¸öÌû×ÓÓÐÈ˽âÊÍÁËÕâ¸öbug:
http://lkml.org/lkml/2008/4/2/621
|
Subject: x86: tsc prevent time going backwards From: Thomas Gleixner <tglx@linutronix.de> Date: Tue, 01 Apr 2008 19:45:18 +0200
We already catch most of the TSC problems by sanity checks, but there is a subtle bug which has been in the code for ever. This can cause time jumps in the range of hours.
This was reported in: http://lkml.org/lkml/2007/8/23/96 and http://lkml.org/lkml/2008/3/31/23
I was able to reproduce the problem with a gettimeofday loop test on a dual core and a quad core machine which both have sychronized TSCs. The TSCs seems not to be perfectly in sync though, but the kernel is not able to detect the slight delta in the bootup sync check. There exists an extremly small window where this delta can be observed with a real big time jump. So far I was only able to reproduce this with the vsyscall gettimeofday implementation, but in theory this might be observable with the syscall based version as well.
CPU 0 updates the clock source variables under xtime/vyscall lock and CPU1, where the TSC is slighty behind CPU0, is reading the time right after the seqlock was unlocked.
The clocksource reference data was updated with the TSC from CPU0 and the value which is read from TSC on CPU1 is less than the reference data. This results in a huge delta value due to the unsigned subtraction of the TSC value and the reference value. This algorithm can not be changed due to the support of wrapping clock sources like pm timer.
The huge delta is converted to nanoseconds and added to xtime, which is then observable by the caller. The next gettimeofday call on CPU1 will show the correct time again as now the TSC has advanced above the reference value.
To prevent this TSC specific wreckage we need to compare the TSC value against the reference value and return the latter when it is larger than the actual TSC value.
I pondered to mark the TSC unstable when the readout is smaller than the reference value, but this would render an otherwise good and fast clocksource unusable without a real good reason.
|
Òâ˼ÊÇ˵ÔÚË«ºË¶àCPUµÄ»úÆ÷ÉÏ£¬ÓÉÓÚ²»Í¬cpuµÄTSCÊý¾Ý²»Ò»Ö£¬µ¼ÖÂÁËʱ¼ä¼ÆËãµÄ²îÒì¡£
ºóÃæ¿´ÁËÒ»´ó´®£¬ËƺõÓÐpatch½â¾öÕâ¸öÎÊÌâ¡£
ÔÚ²»Éý¼¶kernelµÄÇé¿öÏ£¬¿ÉÒÔ¿¼ÂÇÐÞ¸Ätime source£¬À´½â¾ö¡£ÐÞ¸Äkernel option¡£
kernel parametersÎĵµÈçÏ£º
http://www.cyberciti.biz/howto/question/static/linux-kernel-parameters.php
clock= [BUGS=IA-32,HW] gettimeofday timesource override.
Forces specified timesource (if avaliable) to be used
when calculating gettimeofday(). If specicified
timesource is not avalible, it defaults to PIT.
Format: { pit | tsc | cyclone | pmtmr }
ÉèÖ÷½Ê½¿ÉÒÔͨ¹ýÐÞ¸Ä/etc/grub.conf£¬²Î¿¼ÎĵµÈçÏ£º
http://tonykorn97.itpub.net/post/6414/456362
title Fedora Core (2.6.9-1.667)
root (hd0,0)
kernel /vmlinuz-2.6.9-1.667 ro root=/dev/hda2 clock=pit Adding this boot option disables the kernel's correction for lost ticks, so be sure to also install VMware Tools and turn on time synchronization. The latter prevents the guest clock from losing time over the long term due to lost ticks.
For additional information about working with boot loaders, see your Linux distribution's documentation.