·¢²©ÎÄ
ѧÒÔÖÂÓà -- gadflyµÄ²©¿Í

http://blog.chinaunix.net/space.php?uid=9354

   
¸öÈË×ÊÁÏ
  • ²©¿Í·ÃÎÊ£º196205
  • ²©ÎÄÊýÁ¿£º50
  • ²©¿Í»ý·Ö£º2010
  • ²©¿ÍµÈ¼¶£º´óξ
  • ×¢²áʱ¼ä£º2007-02-03 15:28:08
¶©ÔÄÎҵIJ©¿Í
  • ¶©ÔÄ
  • ¶©Ôĵ½Ïʹû
  • ¶©Ôĵ½×¥Ïº
  • ¶©Ôĵ½Google
×ÖÌå´óС£º´ó ÖРС²©ÎÄ
·ÖÀࣺ Kernel


Îļþ: tracksettimeofday.rar
´óС: 3KB
ÏÂÔØ: ÏÂÔØ
ÓиöͨÓÃÄ£¿é¼Ç¼ÈÕÖ¾£¬µ±È»Ò²°üº¬Ê±¼äÁË¡£µ«ÊÇÔÚ¿Í»§ÄDZßÔËÐУ¬×ÜÊÇÿ¸ôÒ»¶Îʱ¼ä£¬»á³öÏÖÒ»´Î»ñÈ¡µÄʱ¼äÌøµ½Î´À´1Сʱ13»ò1Сʱ14·ÖÖ®ºó£¬È»ºóÂíÉÏÓÖÕý³£ÁË¡£µ«ÊÇÔÚÎÒÃÇÄÚ²¿»·¾³Ã»ÓгöÏÖ¡£
¿ªÊ¼ÒÔΪÊÇntpdͬ²½Ê±¼äµÄͬʱ£¬¿ÉÄܵ¼ÖÂʱ¼ä²»Ò»Ö¡£µ«ÊÇÍ£µôntpd£¬ÎÊÌâÒÀÈ»¡£
È»ºóÓÖ¹À¼Æ¿ÉÄÜÊÇÓнø³ÌÉèÖÃʱ¼ä˲¼ä£¬µ¼ÖÂʱ¼ä²îÒ죬×öÁ˸ölkmÀ´¼à¿Øsettimeofday¡£¼à¿ØÈÕ־Ҳû·¢ÏÖʲôÎÊÌâ¡£´úÂë²Î¿¼¸½¼þ¡£
±¾µØ²âÊÔ»·¾³ÊÇFedora core 5. kernel: 2.6.9
2.6.9-55.ELsmp #1 SMP
µ¥CPU£¬Õâ¸öÎÊÌâûÓÐÖØÏÖ¡£
¿Í»§»úÓ¦¸ÃÊǶàCPUµÄ£¬kernelδ֪¡£
ÔÙ²égettimeofdayµÄÏà¹Ø×ÊÁÏ£¬·¢ÏÖÓÐÀàËÆµÄÏÖÏó£º
 
 

Hi,

We've been seeing some strange behaviour on some of our applications
recently. I've tracked this down to gettimeofday() returning spurious
values occasionally.

Specifically, gettimeofday() will suddenly, for a single call, return
a value about 4398 seconds (~1 hour 13 minutes) in the future. The
following call goes back to a normal value.

This seems to be occurring when the clock source goes slightly
backwards for a single call. In
kernel/time/timekeeping.c:__get_nsec_offset(), we have this:
 cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;

So a small decrease in time here will (this is all unsigned
arithmetic) give us a very large cycle_delta. cyc2ns() then multiplies
this by some value, then right shifts by 22. The resulting value (in
nanoseconds) is approximately 4398 seconds; this gets added on to the
xtime value, giving us our jump into the future. The next call to
gettimeofday() returns to normal as we don't have this huge nanosecond
offset.

This system is a 2-socket core 2 quad machine (8 cpus), running 32 bit
mode. It's a dell poweredge 1950. The kernel selects the TSC as the
clock source, having determined that the tsc runs synchronously on
this system. Switching the systems to use a different time source
seems to make the problem go away (which is fine for us, but we'd like
to get this fixed properly upstream).

We've also seen this behaviour with a synthetic test program (which
just runs 4 threads all calling gettimeofday() in a loop as fast as
possible and testing that it doesn't jump) on an older machine, a dell
poweredge SC1425 with two p4 hyperthreaded xeons.

Can anyone advise on what's going wrong here? I can't find much in the
way of documentation on whether the TSC is guaranteed to be
monotonically increasing on intel systems. Should the code choose not
to use the TSC? Or should the TSC reading code ensure that the
returned values are monotonic?

 

Òâ˼ÊÇÔÚ¶àcoreµÄ»·¾³ÏÂÓÃTSC×÷Ϊtime source£¬»á³öÏÖͬÑùµÄÎÊÌ⣬´ó¸ÅÍùºóÌø1:14×óÓÒµÄʱ¼ä£¬4398seconds = 2^42ns¡£

ÔÚÒÔϼ¸¸ö°æ±¾¶¼³öÏÖÁË£º

2.6.18-1.2257.fc5smp
2.6.23-rc3
2.6.20.4
 
ËÆºõÓÐÈËÉý¼¶µ½2.6.21£¬Õâ¸öÎÊÌâ¾Í²»¼ûÁË¡£
 
½Ó×ÅÕâ¸öÌû×ÓÓÐÈ˽âÊÍÁËÕâ¸öbug:
 
 http://lkml.org/lkml/2008/4/2/621

Subject: x86: tsc prevent time going backwards
From: Thomas Gleixner <tglx@linutronix.de>
Date: Tue, 01 Apr 2008 19:45:18 +0200

We already catch most of the TSC problems by sanity checks, but there
is a subtle bug which has been in the code for ever. This can cause
time jumps in the range of hours.

This was reported in:
     http://lkml.org/lkml/2007/8/23/96
and
     http://lkml.org/lkml/2008/3/31/23

I was able to reproduce the problem with a gettimeofday loop test on a
dual core and a quad core machine which both have sychronized
TSCs. The TSCs seems not to be perfectly in sync though, but the
kernel is not able to detect the slight delta in the bootup sync
check. There exists an extremly small window where this delta can be
observed with a real big time jump. So far I was only able to
reproduce this with the vsyscall gettimeofday implementation, but in
theory this might be observable with the syscall based version as
well.

CPU 0 updates the clock source variables under xtime/vyscall lock and
CPU1, where the TSC is slighty behind CPU0, is reading the time right
after the seqlock was unlocked.

The clocksource reference data was updated with the TSC from CPU0 and
the value which is read from TSC on CPU1 is less than the reference
data. This results in a huge delta value due to the unsigned
subtraction of the TSC value and the reference value. This algorithm
can not be changed due to the support of wrapping clock sources like
pm timer.

The huge delta is converted to nanoseconds and added to xtime, which
is then observable by the caller. The next gettimeofday call on CPU1
will show the correct time again as now the TSC has advanced above the
reference value.

To prevent this TSC specific wreckage we need to compare the TSC value
against the reference value and return the latter when it is larger
than the actual TSC value.

I pondered to mark the TSC unstable when the readout is smaller than
the reference value, but this would render an otherwise good and fast
clocksource unusable without a real good reason.

Òâ˼ÊÇ˵ÔÚË«ºË¶àCPUµÄ»úÆ÷ÉÏ£¬ÓÉÓÚ²»Í¬cpuµÄTSCÊý¾Ý²»Ò»Ö£¬µ¼ÖÂÁËʱ¼ä¼ÆËãµÄ²îÒì¡£

ºóÃæ¿´ÁËÒ»´ó´®£¬ËƺõÓÐpatch½â¾öÕâ¸öÎÊÌâ¡£

ÔÚ²»Éý¼¶kernelµÄÇé¿öÏ£¬¿ÉÒÔ¿¼ÂÇÐÞ¸Ätime source£¬À´½â¾ö¡£ÐÞ¸Äkernel option¡£

kernel parametersÎĵµÈçÏ£º
http://www.cyberciti.biz/howto/question/static/linux-kernel-parameters.php

   clock=          [BUGS=IA-32,HW] gettimeofday timesource override.
                        Forces specified timesource (if avaliable) to be used
                        when calculating gettimeofday(). If specicified
                        timesource is not avalible, it defaults to PIT.
                        Format: { pit | tsc | cyclone | pmtmr }


ÉèÖ÷½Ê½¿ÉÒÔͨ¹ýÐÞ¸Ä/etc/grub.conf£¬²Î¿¼ÎĵµÈçÏ£º

http://tonykorn97.itpub.net/post/6414/456362

title Fedora Core (2.6.9-1.667)
root (hd0,0)
kernel /vmlinuz-2.6.9-1.667 ro root=/dev/hda2 clock=pit Adding this boot option disables the kernel's correction for lost ticks, so be sure to also install VMware Tools and turn on time synchronization. The latter prevents the guest clock from losing time over the long term due to lost ticks.

For additional information about working with boot loaders, see your Linux distribution's documentation.

 

 

Ç×£¬Äú»¹Ã»ÓеǼ,Çë[µÇ¼]»ò[×¢²á]ºóÔÙ½øÐÐÆÀÂÛ