Scientific Linux Forum.org



  Reply to this topicStart new topicStart Poll

> leap second danger on 30 June 2012, server hangs possible on older kernels
helikaon
 Posted: Jun 26 2012, 08:58 AM
Quote Post


SLF Moderator
******

Group: Moderators
Posts: 628
Member No.: 4
Joined: 8-April 11









Hi guys,
I just got this info from RH today, so i share it as well here:

------------------------------------------------------------------------

**Introduction**

The UTC time standard, which is widely used for international
timekeeping on computer systems, uses the international standard
definition of the second, based on atomic clocks. However, the duration
of one mean solar day is slightly longer than 86,400 seconds (a UTC
day). The purpose of a leap second is to compensate for this drift, by
scheduling days with 86401 or 86399 international standard seconds.
Because the Earth's rotation speed varies in response to natural events,
UTC leap seconds are irregularly spaced and unpredictable. The last leap
second occured at 23:59:59 UTC on 31 December 2008. Leap seconds occur
based on UTC time, and therefore are timezone independent and occur
around the world at the same moment, regardless of local time.

Some RHEL customers have become concerned with a Red Hat Bug
(https://bugzilla.redhat.com/show_bug.cgi?id=479765) related to the
upcoming leap second event that is scheduled to occur at 23:59:60 UTC on
30 June 2012. While errata has been release for this (see Red Hat
Security Advisory at http://rhn.redhat.com/errata/RHSA-2009-1243.html),
some customers are still running older kernel versions and remain
vulnerable to this bug. The errata resolves an issue where the system
can potentially hang at the moment of the leap second due to a deadlock
triggered by klogd while the leapsecond is processed.

**Foundational Knowledge**

By default in modern kernels every 1/1000th of a second (1ms) the kernel
can make new process scheduling decisions. This 1ms interval is called a
kernel "tick".

**Timeline of Triggering Events**

1. The system inherits a leap second flag from a newer version of the
tzdata package or from an upstream NTP server with knowledge of the
upcoming leap second. (This occurs on the day of the leap second event
and cannot be unset.)

2. At 23:59:59 UTC on the leap second day, the kernel sees the leap
second flag and causes 23:59:59 UTC to occur twice.

3. In order to process the leap second event a lock is acquired to
access the current time.

4. While processing the leap second the kernel issues a printk to notify
the user that the leap second has occurred.

5. The printk triggers klogd to wake up so that it can process the new
kernel message.

6. klogd attempts to acquire a lock to access the current kernel time.

If step 3 happens on the same core and during the same tick as step 6
then a deadlock occurs (on xtime_lock).

**Estimate of Likelihood of Occurrence**

It's exceptionally unlikely that the triggering events would happen as
required to cause a hang. Red Hat Engineering's experience is that it is
extremely difficult to trigger this issue during reproduction attempts,
even when those reproduction attempts included artificially introducing
high printk loads to attempt to trigger the hang.

**Workarounds**

Updating to kernel version kernel-2.6.9-89.EL (RHEL4) or
kernel-2.6.18-164.el5 in RHEL5, or any later RHEL kernel is the most
reliable method to avoid any impact from this bug. The bug has been
patched in these kernel versions.

If your environment includes systems with a kernel version //lower//
than the those patched kernels and you remain concerned even with the
low probability of encountering this issue, there are several
workarounds available to further mitigate the risk of encountering this bug.

1. Manually adjust the system time so that 2012-06-30 T23:59:59 UTC
never occurs.

2. Disable NTP clients on the affected system at least a full day ahead
of the leap second so that the leap second flag is never inherited.
Then, re-enable NTP on those systems after the leap second has occured.
It's important to insure that the tzdata package installed on the system
has not been updated to include the 2012-06-30 leap second, as the
system can inherit the leap second flag from the tzdata file as well,
even if NTP is disabled.

**Conclusion**

While Red Hat regrets that a 100% reliable solution of preventing this
issue is not feasible given the current timeline to the leap second
event and some regressions in newer kernels (that are not vulnerable to
the leap second issue), we believe this risk to be small, and one whose
impact can be further minimized by implementing one of the identified
workarounds on the most critical systems that remain vulnerable.

------------------------------------------------------------------------


--------------------
PMEmail Poster
^
toracat
 Posted: Jun 26 2012, 12:50 PM
Quote Post


SLF Geek
****

Group: Members
Posts: 257
Member No.: 11
Joined: 10-April 11









Thanks for posting this. A greatly condensed version of the leap second warning is in my blog. smile.gif

http://blog.toracat.org/2012/06/leap-seconds-who-cares/


--------------------
ELRepo: repository specializing in hardware support for EL
PMUsers Website
^
redman
 Posted: Jun 27 2012, 10:34 AM
Quote Post


SLF Admin
********

Group: Admins
Posts: 1401
Member No.: 2
Joined: 8-April 11









Thanks for posting.
Fortunately my (RHEL) server running NTP is up-to-date smile.gif


--------------------
What is SL? - Forum Rules - Info on 3rd Party Repos

Desktop: ASUS P5QPL-AM, Intel Dual-Core E6500, 4GB DDR2, ASUS GeForce GT 430 1GB, SL6.6 x86_64
Build server: HP Proliant ML350 G5, 1x Intel Xeon Quad-Core E5410, 9GB ECC DDR2 FB-DIMM, ASUS GeForce GT 730 1GB, SL6.6 x86_64
PM
^
0 User(s) are reading this topic (0 Guests and 0 Anonymous Users)
0 Members:

Topic Options Reply to this topicStart new topicStart Poll