scientificlinuxforum.org QR code
Scientific Linux Forum.org



  Reply to this topicStart new topicStart Poll

> kernel panic not syncing stack-protector
Hellboy
 Posted: Jan 2 2012, 04:37 PM
Quote Post


SLF Rookie
*

Group: Members
Posts: 23
Member No.: 329
Joined: 22-June 11









On my homeserver/nas based on a HP microserver i have SL6.1 with latest security updates installed. I have nfs, samba, sabnzbd, openvpn up and running. But i am getting kernel errors lately.

Kernel panic not synic stack-protector kernel stack is corrupted.

I verified that there are no hardware problems etc.

Anyone an idea?
PMEmail Poster
^
Jcink
 Posted: Jan 2 2012, 05:03 PM
Quote Post


SLF IRC Team
****

Group: Members
Posts: 212
Member No.: 15
Joined: 10-April 11









This is a tough one. I searched around for this problem and could only find this happening to people who had bad wireless drivers, and at some point in 2009 there was a kernel bug, but it was fixed.

What kernel are you running right now? Post it here by showing the output of:

CODE
uname -a


How long does it take for this issue to happen as well? A few times daily, or at random? If you were fine before and you still have all of the old kernels installed, I'd go back to one of them and then see if this still happens.
PMUsers Website
^
Hellboy
 Posted: Jan 2 2012, 08:36 PM
Quote Post


SLF Rookie
*

Group: Members
Posts: 23
Member No.: 329
Joined: 22-June 11









I think i found the problem. I added some disks and created a RAID1 and added some parameters to /etc/sysctl.conf

dev.raid.speed_limit_min = 50000
dev.raid.speed_limit_max = 200000

I think i overdone it a little bit. When the server has to do a lot of I/O it get kernel crashes.
PMEmail Poster
^
Hellboy
 Posted: Jan 4 2012, 01:17 PM
Quote Post


SLF Rookie
*

Group: Members
Posts: 23
Member No.: 329
Joined: 22-June 11









It happened again. I uploaded 2 nzb files, and sabnzbd gets active, the system then crashed again. I never had this problem before with sabnzbd.

What could be the problem.
PMEmail Poster
^
Jcink
 Posted: Jan 4 2012, 02:29 PM
Quote Post


SLF IRC Team
****

Group: Members
Posts: 212
Member No.: 15
Joined: 10-April 11









I don't think it's related to sabnzbd, but an i/o issue somewhere as you pointed out.

What kind of disks are they?

Not that I mean to sound like I'm doubting you or anything, but also - how did you verify that there was nothing wrong with the hard disks? I'm guessing you checked SMART data but unless you ran the manufacturer's tools as well, you can't be sure.

Also, is the BIOS up to date?
PMUsers Website
^
Hellboy
 Posted: Jan 4 2012, 02:49 PM
Quote Post


SLF Rookie
*

Group: Members
Posts: 23
Member No.: 329
Joined: 22-June 11









I have 4 disks in the server. 2 * 300GB RAID1 and 2 * 1TB, all software raid, because the hardware raid is a fake one.

on the 2 * 300 GB i have a 512MB partition for /boot and the rest of the disk is lvm and the 2 * 1TB RAID1 is all for LVM.

I have on big volume group.

I didn't have the smartmontools installed, i forgot.

I will install them and have a look.
PMEmail Poster
^
Hellboy
 Posted: Jan 4 2012, 08:59 PM
Quote Post


SLF Rookie
*

Group: Members
Posts: 23
Member No.: 329
Joined: 22-June 11









smartmontools are installed, but there are no errors.
I also did a test to see if i have faulty memory.

I cheched the amount of memory i had 4gb, so i did a dd

dd if=/dev/urandom bs=4021868 of=/data/memtest count=1050

md5sum /data/memtest; md5sum /data/memtest; md5sum /data/memtest

All the checksums are equal, so it can't be memory error.

I uploaded 4 nzb's, ran iometer, but the server didn't crash.

I will see if i can upgrade the BIOS.
PMEmail Poster
^
Hellboy
 Posted: Jan 5 2012, 06:06 PM
Quote Post


SLF Rookie
*

Group: Members
Posts: 23
Member No.: 329
Joined: 22-June 11









I also removed the vm.swappiness = 0 from /etc/sysctl.conf en reloaded with sysctl -p /etc/sysctl.conf. Did some tests and rebooted the server.

I opened up Openvpn + uploaded 4 nzb's, and the server is all fine.

Still a strange problem. I will update the BIOS and other firmware this weekend.
PMEmail Poster
^
helikaon
 Posted: Jan 6 2012, 11:06 AM
Quote Post


SLF Moderator
******

Group: Moderators
Posts: 516
Member No.: 4
Joined: 8-April 11









Hi,
this is definitely hard to track down problem, while we don't know much about your system.

1. as i gather, your problem started after you added new 2x 1TB disks to SW raid1, right? so you have some /dev/mdx device and on top of it LVM

2. you have 64b os?

3. how do you upload the files on your server? i'm not familiar with the "sabnzbd" so you run openvpn and then you connect to sanbzd via some sabnzb client?

4. tail -v /var/log/messages all the time, we could catch something usefull there, which is lost when server is panicked and rebooted

5. try different kernel?

cheers,


--------------------
PMEmail Poster
^
Hellboy
 Posted: Jan 6 2012, 01:24 PM
Quote Post


SLF Rookie
*

Group: Members
Posts: 23
Member No.: 329
Joined: 22-June 11









The hardware is a HP Microserver N36L.
OS = SL 6.1
Kernel = Latest that comes with SL6.1
Sabnzbd is a ptyhon program and has a webinterface (engine is cherrypy) from which you can upload nzb files.
I tried 2 older kernel version, but same result.
I looked at all the logs, nothing usefull.
Did memory tests (memtest86)

The server without my tuningparameters is running stable. And i have done a load of loadtests (iometer, sabnzbd, iperf)

So i will leave it as it is right now.
PMEmail Poster
^
helikaon
 Posted: Jan 6 2012, 01:36 PM
Quote Post


SLF Moderator
******

Group: Moderators
Posts: 516
Member No.: 4
Joined: 8-April 11









Hi,
if the issue is hampering usability of the server, you could still try the non-SL kernels, in our forum kernel section the 'torracat' posted link to precompiled rhel/centos/sl nonstandard (vanilla kernels rpm packaged), so it should be safe to try them out...

Otherwise this is more likely to configure kernel dump and ask on SL devel lists for help ...

cheers,


--------------------
PMEmail Poster
^
Jcink
 Posted: Jan 7 2012, 09:05 AM
Quote Post


SLF IRC Team
****

Group: Members
Posts: 212
Member No.: 15
Joined: 10-April 11









It's a tough one to track down period, unfortunately.

It seems as though you've done a lot of the things I would have tried already if I was stuck in your situation. Checking the memory out multiple times, looking over SMART data, going back to old kernels, verifying other hardware is in check... the only possible thing left that I can think of to do is the BIOS update. Did you try that yet? If it does happen again with your tuning settings removed, I'd say give it a shot and update it. If not this will definitely have to be taken to the SL mailing list unless we can get a crash dump analyzer in here...

PMUsers Website
^
Hellboy
 Posted: Jan 7 2012, 10:39 AM
Quote Post


SLF Rookie
*

Group: Members
Posts: 23
Member No.: 329
Joined: 22-June 11









I will try to update the BIOS asap, i will post my finding then. I see there where more issues, also some folks with ubuntu had kernel-panics. I see that there were also some problems with the firmware for the buitin network card.

The server is now up and running for 2 days under heavy load, and it is stable.

PMEmail Poster
^
Hellboy
 Posted: Jan 8 2012, 03:10 PM
Quote Post


SLF Rookie
*

Group: Members
Posts: 23
Member No.: 329
Joined: 22-June 11









Today i started up my lab spacewalk (open source edition of redhat satellite server). It also has a dhcp onboard. Immediately my nas got a kernel panic.

My nas is also running openvpn in bridged mode, which means my nic is in promiscious mode.
Could that be a problem?
PMEmail Poster
^
Hellboy
 Posted: Jan 14 2012, 08:51 AM
Quote Post


SLF Rookie
*

Group: Members
Posts: 23
Member No.: 329
Joined: 22-June 11









I updated the BIOS and firmware of the onboard network card. Everything is up and running for a week now without problems. I even added services, i am hosting a bunch of websites on the server.
PMEmail Poster
^
helikaon
 Posted: Jan 14 2012, 06:08 PM
Quote Post


SLF Moderator
******

Group: Moderators
Posts: 516
Member No.: 4
Joined: 8-April 11









QUOTE (Hellboy @ Jan 14 2012, 08:51 AM)
I updated the BIOS and firmware of the onboard network card. Everything is up and running for a week now without problems. I even added services, i am hosting a bunch of websites on the server.


Yay,
gratz on that smile.gif
So firmware in the end ... it is something to do if things doesnt work and not to do if it works smile.gif

cheers,


--------------------
PMEmail Poster
^
0 User(s) are reading this topic (0 Guests and 0 Anonymous Users)
0 Members:

Topic Options Reply to this topicStart new topicStart Poll