Printable Version of Topic
Click here to view this topic in its original format
Scientific Linux Forum.org > System administration > Major infrastructure crash after upgrading to rpcbind-0.2.0-13.el6_9.x86_64


Posted by: renaud May 24 2017, 11:52 AM
Hi Everybody,

We are nightly mirroring a repository of sl6 distribution on a local server and our machines (servers and workstation) are upgrading through yum on this repository.

On this night (20170524) around 4:00 AM, our sl6 nis servers (master and slaves) and our nis/nfs clients rpcbind packages was upgraded from rpcbind-0.2.0-13.el6.x86_64 to rpcbind-0.2.0-13.el6_9.x86_64. Nfs servers are NetApps.

When I connect this morning, every rpcbind processes was dead (on about 150 machines) on th same vlan than the nis master. No issue on the others vlans (I have a slave on every vlan), but there are less machines on those vlans.

I tryed to restart rpcbind service, autofs services, ypbind services, but rpcbind crash after some time (btw 5 and 60 minutes).

I had to downgrade the package on th entire machines on the vlan of my nis master, in order to stabilize the situation.

I guess, the package rpcbind-0.2.0-13.el6_9.x86_64 is a "bad patch".

Is there other people who ad the same bad experience with this patch ? If yes, how did you managed this issue ?

Thanks for your reply.

Best Regards, Renaud.

Posted by: renaud May 24 2017, 12:36 PM
Bug referenced at RedHat side:

https://bugzilla.redhat.com/show_bug.cgi?id=1454876

Posted by: burakkucat May 24 2017, 12:51 PM
QUOTE (renaud @ May 24 2017, 12:36 PM)
Bug referenced at RedHat side:

https://bugzilla.redhat.com/show_bug.cgi?id=1454876


Yes, that's the one. I was just about to post the link when I saw that you had already discovered the upstream bug report.

Posted by: renaud May 24 2017, 01:54 PM
Hello,

Thanks for you answer. I wonder why there is not the same bugzilla opened for RHEL6.9, like it is mentioned in the RHEL 7.3 one, comment 6 : https://bugzilla.redhat.com/show_bug.cgi?id=1454876#c6

Regards, Renaud.

Posted by: burakkucat May 24 2017, 10:40 PM
There is a Red Hat Knowledge Base entry [1], which discloses bug tracker entries for both RHEL 7.3 [2] and RHEL 6.9 [3] --

QUOTE

rpcbind crashes after update of CVS-2017-8779 when using ypbind

Solution In Progress - Updated 30 minutes ago - English

Environment

    Red Hat Enterprise Linux 6
        seen on rpcbind-0.2.0-13.el6_9.x86_64
    Red Hat Enterprise Linux 7
        seen on rpcbind-0.2.0-38.el7_3.x86_64
    ypbind

Issue

After install of rpcbind-0.2.0-13.el6_9.x86_64 (distributed as security update of RHSA-2017-1262-1) stops after some rpcinfo -p execution by linked glibc memory check.

Resolution

Red Hat Enterprise Linux 7

    A solution to this problem is tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1454876

Red Hat Enterprise Linux 6

    A solution to this problem is tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1455142

Root Cause

    Under investigation

    Product(s) Red Hat Enterprise Linux

    Category Troubleshoot

    Tags nfs3

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.


[1] https://access.redhat.com/solutions/3053461
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1454876
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1455142

Posted by: renaud May 30 2017, 08:24 AM
Hi All,

Wonderful contribution of RedHat this morning in https://access.redhat.com/solutions/3053461
"As a workaround, downgrade the package to the earlier version"

I persist wondering why the bad security patch is still present on RedHat and SL repositories

Best regards, Renaud.

Posted by: renaud May 30 2017, 09:21 AM
Hello,

I noticed that on machines patched with fastbug glibc installed, I cannot reproduce the issue, so I added a comment on https://access.redhat.com/solutions/3053461

Hello,

I had to patch glibc with fastbug patch because I had a problem running Ansys/hfss with the standard glibc provided with RHEL 6.9 (I had not the problem with RHEL 6.8). I only update glibc on my cluster compute nodes. I noticed that on machines with fastbug glibc, I'm not able to reproduce the issue.

My compute node:

[root@aar057 ~]# rpm -q glibc.x86_64 rpcbind.x86_64
glibc-2.12-1.209.el6_9.1.x86_64
rpcbind-0.2.0-13.el6_9.x86_64
[root@aar057 ~]# rpcbind -d -w
...
polling for read on fd < 5 6 7 8 9 10 11 >
polling for read on fd < 5 6 7 8 9 10 11 >
polling for read on fd < 5 6 7 8 9 10 11 >
polling for read on fd < 5 6 7 8 9 10 11 >

My workstation:

[root@kalahari ~]# rpm -q glibc.x86_64 rpcbind.x86_64
glibc-2.12-1.209.el6.x86_64
rpcbind-0.2.0-13.el6_9.x86_64
[root@kalahari ~]# rpcbind -d -w
...
7f665608f000-7f6656090000 rw-p 0000c000 fd:00 151559 /sbin/rpcbind
7f6656090000-7f6656091000 rw-p 00000000 00:00 0
7f66576be000-7f66576df000 rw-p 00000000 00:00 0 [heap]
7ffe0b35d000-7ffe0b386000 rw-p 00000000 00:00 0 [stack]
7ffe0b3c3000-7ffe0b3c4000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Aborted

Another difference btw the two machines is the nscd is activated on compute node but not on workstation:

[root@aar057 ~]# service nscd status
nscd (pid 10037) is running...

[root@kalahari ~]# service nscd status
nscd: unrecognized service

Hope this contibute solving this issue.

Best regards, Renaud.

'
Powered by Invision Power Board (http://www.invisionboard.com)
© Invision Power Services (http://www.invisionpower.com)