
| This forum is proudly powered by Scientific Linux 6 | SL website Download SL Help Search Members |
| Welcome Guest ( Log In | Register ) | Resend Validation Email |
![]() ![]() ![]() |
| PSchiffer |
Posted: Aug 14 2012, 09:24 AM
|
|
|
SLF Newbie Group: Members Posts: 13 Member No.: 1797 Joined: 14-August 12 |
Hi!
Coming back from my Holidays I find a problem with my workstation. I am running SL, which I updated to 6.3 today when it kept running for just long enough. The principle error seems to be a kernel i/o (kernel: journal commit I/O error), which leads the system mount the /root filesystem as read only and later on to lock down (i.e. no access over ssh or directly). My system is installed on a ssd which has a small /boot partition and a larger one which are in an lvm set up. The larger partition is further divided into /root, /home and /swap. In addition I've got three 2TB HDDs, two of which are combined in a hardware RAID as /tmp and the last one being partitioned into two 1TB chunks as my /data and /work. System has 48Gigs of RAM and 24virtual Procs. Everything has been running smoothly for half a year. I can't really remember having changed anything substantial before going on vacations besides mounting a remote SAMBA share (on an Ubuntu system). I got the startup message that the max number of mounts is reached for my sda HDD. This seems however not be the main problem, as setting <fsckorder> to 2 for the /data and /work partitions removed the error. Also as said above, the read only is for /root as far as I can see. Quite at a loss here and glad for any advice, also happy to give more log info (although I couldn't see anything myself e.g. in dmesg). Thanks Phil |
|
| PSchiffer |
Posted: Aug 14 2012, 09:31 AM
|
|
|
SLF Newbie Group: Members Posts: 13 Member No.: 1797 Joined: 14-August 12 |
Update: looking at my boot log, what I get is a lot of
udevd[1022]: worker [1103] unexpectedly returned with status 0x0100 ^M udevd[1022]: worker [1103] failed while handling '/devices/LNXSYSTM:00/LNXTHERM:00' ^M Wait timeout. Will continue in the background.udevd[1022]: worker [1026] unexpectedly returned with status 0x0100 ^M udevd[1022]: worker [1026] failed while handling '/devices/LNXSYSTM:00/LNXPWRBN:00/input/input1' ^M udevd[1022]: worker [1048] unexpectedly returned with status 0x0100 ^M udevd[1022]: worker [1048] failed while handling '/devices/pci0000:3e/0000:3e:02.0' and later on udevd[1022]: worker [1194] unexpectedly returned with status 0x0100 ^M udevd[1022]: worker [1194] failed while handling '/devices/virtual/cpuid/cpu18' ^M udevd[1022]: worker [1030] unexpectedly returned with status 0x0100 ^M udevd[1022]: worker [1030] failed while handling '/devices/virtual/cpuid/cpu19' ^M udevd[1022]: worker [1195] unexpectedly returned with status 0x0100 I am wondering im the ^M is indicating something, as other log files (e.g. dmesg) don't contain any weird characters. |
|
| PSchiffer |
Posted: Aug 14 2012, 09:39 AM
|
|
|
SLF Newbie Group: Members Posts: 13 Member No.: 1797 Joined: 14-August 12 |
2 Update (sorry for fragmentation):
I see that my lvm2-monitor(ing) service is dead. Might that be causing the trouble, having the /root (and /boot and /home) on an logical volume? |
|
| tux99 |
Posted: Aug 14 2012, 09:55 AM
|
|
|
SLF Guru ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 1120 Member No.: 224 Joined: 28-May 11 |
My guess is your SSD is dying and has defective sectors (actually memory cells).
Do a check of the SSD with smartctl (but boot from a CD or USB stick). -------------------- My personal SL6 repository, specialized in audio/video software: http://pkgrepo.linuxtech.net/el6/
|
|
| PSchiffer |
Posted: Aug 14 2012, 09:59 AM
|
|||
|
SLF Newbie Group: Members Posts: 13 Member No.: 1797 Joined: 14-August 12 |
hmm, ja, I think that might be possible. However: it's just about 8 months old, it did not really have a lot of writing done to it and disk util said it's ok. Is there another way to check? |
|||
| tux99 |
Posted: Aug 14 2012, 10:03 AM
|
|||||
|
SLF Guru ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 1120 Member No.: 224 Joined: 28-May 11 |
What brand and model is it? Some SSDs are very flakey and have high failure rates.
Post the output of: smartctl --all /dev/sdX (where X is the device letter of your ssd) -------------------- My personal SL6 repository, specialized in audio/video software: http://pkgrepo.linuxtech.net/el6/
|
|||||
| PSchiffer |
|
|||||||
|
SLF Newbie Group: Members Posts: 13 Member No.: 1797 Joined: 14-August 12 |
I know, but given the price of this workstation I would have guessed that the vendor put in something sensible. Seems to be a Micron C400 SSD 128GB (need to open the Computer to check the label, as my invoice only says Highspeed SATA III SSD...) Anyway, out put of smartctl --all /dev/sdb is smartctl 5.39.1 2010-01-28 r3054 [x86_64-redhat-linux-gnu] (local build) Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: C400-MTFDDAC128MAM Serial Number: 0000000011460320397F Firmware Version: 0009 User Capacity: 128.035.676.160 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 6 Local Time is: Tue Aug 14 12:14:14 2012 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x80) Offline data collection activity was never started. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 595) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 9) minutes. Conveyance self-test routine recommended polling time: ( 3) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 050 Pre-fail Always - 0 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 9 Power_On_Hours 0x0032 100 100 001 Old_age Always - 5225 12 Power_Cycle_Count 0x0032 100 100 001 Old_age Always - 46 170 Unknown_Attribute 0x0033 100 100 010 Pre-fail Always - 0 171 Unknown_Attribute 0x0032 100 100 001 Old_age Always - 0 172 Unknown_Attribute 0x0032 100 100 001 Old_age Always - 0 173 Unknown_Attribute 0x0033 100 100 010 Pre-fail Always - 1 174 Unknown_Attribute 0x0032 100 100 001 Old_age Always - 0 181 Program_Fail_Cnt_Total 0x0022 100 100 001 Old_age Always - 94489346069 183 Runtime_Bad_Block 0x0032 100 100 001 Old_age Always - 0 184 End-to-End_Error 0x0033 100 100 050 Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 100 001 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 001 Old_age Always - 0 189 High_Fly_Writes 0x000e 100 100 001 Old_age Always - 84 194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 0 195 Hardware_ECC_Recovered 0x003a 100 100 001 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 100 100 001 Old_age Always - 0 197 Current_Pending_Sector 0x0032 100 100 001 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 100 001 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 100 100 001 Old_age Always - 0 202 Data_Address_Mark_Errs 0x0018 100 100 001 Old_age Offline - 0 206 Flying_Height 0x000e 100 100 001 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. |
|||||||
| tux99 |
Posted: Aug 14 2012, 11:30 AM
|
|||||
|
SLF Guru ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 1120 Member No.: 224 Joined: 28-May 11 |
The Crucial/Micron SSDs are actually among the better ones (but of course even the best ones can fail).
There isn't much useful info in that output, Program_Fail_Cnt_Total is the only attribute that looks suspicious. I found this with regards to it:
Unfortunately the smartctl version included by default in SL6 is rather old and doesn't recognize many SSD specific attributes. You could get a newer version in my linuxtech-backports repo and try with that again: http://pkgrepo.linuxtech.net/el6/backports/x86_64/smartmontools-5.41-1.el6.x86_64.rpm For example the output on my Transcend SSD looks like this (the two lines with SSD specific wear and failure info are in bold):
Also you could try to run an extended disk self-test (but you should do that while booted from a CD or USB drive with all filesystems on the SSD unmounted): smartctl --test=long /dev/sdb -------------------- My personal SL6 repository, specialized in audio/video software: http://pkgrepo.linuxtech.net/el6/
|
|||||
| tux99 |
Posted: Aug 14 2012, 11:44 AM
|
|
|
SLF Guru ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 1120 Member No.: 224 Joined: 28-May 11 |
Page 4 and 5 of this PDF provide better explanations of the smart attributes for your SSD:
http://www.micron.com/~/media/Documents/Products/Technical%20Note/Solid%20State%20Storage/5611tnfd03.ashx -------------------- My personal SL6 repository, specialized in audio/video software: http://pkgrepo.linuxtech.net/el6/
|
|
| PSchiffer |
Posted: Aug 14 2012, 12:22 PM
|
|||
|
SLF Newbie Group: Members Posts: 13 Member No.: 1797 Joined: 14-August 12 |
Hmm, I don't see a difference between the two tests (only pasting sdiff diff below), actually it's just the sector size and LU WWN Id that is reported in addition. Will go through the pdf (many thanks!) and run the long test booting from a life dvd. Will be back with the output after that. Short self-test routine Short self-test routine smartctl 5.41 2011-06-09 r3365 [x86_64-linux-2.6.32-279.1.1.e | smartctl 5.39.1 2010-01-28 r3054 [x86_64-redhat-linux-gnu] (l Copyright © 2002-11 by Bruce Allen, http://smartmontools.so | Copyright © 2002-10 by Bruce Allen, http://smartmontools.so LU WWN Device Id: 5 00a075 10320397f < User Capacity: 128.035.676.160 bytes [128 GB] | User Capacity: 128.035.676.160 bytes Sector Size: 512 bytes logical/physical < Local Time is: Tue Aug 14 14:06:20 2012 CEST | Local Time is: Tue Aug 14 12:14:14 2012 CEST data collection: ( 595) seconds. | data collection: ( 595) seconds. SCT Error Recovery Co < 9 Power_On_Hours 0x0032 100 100 001 Old_a | 9 Power_On_Hours 0x0032 100 100 001 Old_a 12 Power_Cycle_Count 0x0032 100 100 001 Old_a | 12 Power_Cycle_Count 0x0032 100 100 001 Old_a |
|||
| tux99 |
Posted: Aug 14 2012, 12:35 PM
|
|
|
SLF Guru ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 1120 Member No.: 224 Joined: 28-May 11 |
It could well be that smartctl 5.41 is still too old for your SSD and doesn't know about the correct smart attributes yet.
As you can see in the PDF, several attributes have a different name compared to the smartctl output and the data is probably not shown correctly either. -------------------- My personal SL6 repository, specialized in audio/video software: http://pkgrepo.linuxtech.net/el6/
|
|
| tux99 |
Posted: Aug 14 2012, 12:53 PM
|
|
|
SLF Guru ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 1120 Member No.: 224 Joined: 28-May 11 |
Ok, I have quickly rebuilt the latest version (5.43) of the smartmontools package, you can find it here:
http://pkgrepo.linuxtech.net/el6/backports/x86_64/smartmontools-5.43-2.el6.x86_64.rpm -------------------- My personal SL6 repository, specialized in audio/video software: http://pkgrepo.linuxtech.net/el6/
|
|
| PSchiffer |
Posted: Aug 14 2012, 01:40 PM
|
|||
|
SLF Newbie Group: Members Posts: 13 Member No.: 1797 Joined: 14-August 12 |
Yes, you were right. 5.43 seems to work actually. At least it gets Model Family: Crucial/Micron RealSSD C300/C400/m4 and also the formerly unknown attributes are in accordance with the pdf (though there is one, 184 End-to-End_Error, which is not in the document). Is it safe to assume that the Pre-fail attributes indicate an imminent failure then? Guess it's time to call the vendor and ask for replacement (?) - hopefully under warranty. (Long test still not run). SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 050 Pre-fail Always - 0 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 9 Power_On_Hours 0x0032 100 100 001 Old_age Always - 5227 12 Power_Cycle_Count 0x0032 100 100 001 Old_age Always - 48 170 Grown_Failing_Block_Ct 0x0033 100 100 010 Pre-fail Always - 0 171 Program_Fail_Count 0x0032 100 100 001 Old_age Always - 0 172 Erase_Fail_Count 0x0032 100 100 001 Old_age Always - 0 173 Wear_Levelling_Count 0x0033 100 100 010 Pre-fail Always - 1 174 Unexpect_Power_Loss_Ct 0x0032 100 100 001 Old_age Always - 0 181 Non4k_Aligned_Access 0x0022 100 100 001 Old_age Always - 22 1 21 183 SATA_Iface_Downshift 0x0032 100 100 001 Old_age Always - 0 184 End-to-End_Error 0x0033 100 100 050 Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 100 001 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 001 Old_age Always - 0 189 Factory_Bad_Block_Ct 0x000e 100 100 001 Old_age Always - 84 194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 0 195 Hardware_ECC_Recovered 0x003a 100 100 001 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 100 100 001 Old_age Always - 0 197 Current_Pending_Sector 0x0032 100 100 001 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 100 001 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 100 100 001 Old_age Always - 0 202 Perc_Rated_Life_Used 0x0018 100 100 001 Old_age Offline - 0 206 Write_Error_Rate 0x000e 100 100 001 Old_age Always - 0 |
|||
| tux99 |
Posted: Aug 14 2012, 02:05 PM
|
|
|
SLF Guru ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 1120 Member No.: 224 Joined: 28-May 11 |
Pre-fail means something is deteriorating, not necessarily imminent.
As far as I understand it the only attribute that is non-zero and that according to that PDF is relevant with regards to a warranty claim is: 173 Wear_Levelling_Count 0x0033 100 100 010 Pre-fail Always - 1 But I'm not sure how to interpret that value, for example the Wear_Levelling_Count on my SSD is currently 1155/1369 (avg/max). TBH judging purely by the smart output I would think the drive is still OK, but the fact that you are having problems with the kernel remounting the filesystem to read-only (which usually happens when the kernel has detected an uncorrectable error on the device) seems to indicate that the drive has a problem. Have you tried a full forced fsck yet, just to see if there are filesystem errors already? If not try it after the extended smart test has completed, do it still while booted with the live CD. I don't know what else to suggest, I guess contacting Crucial/Micron support could be a good idea but that usually takes time so it's not a quick solution. -------------------- My personal SL6 repository, specialized in audio/video software: http://pkgrepo.linuxtech.net/el6/
|
|
| PSchiffer |
Posted: Aug 14 2012, 02:43 PM
|
|
|
SLF Newbie Group: Members Posts: 13 Member No.: 1797 Joined: 14-August 12 |
Hi again. Many thanks for all the help!
I need to read a students thesis now and will thus do the remaining tests (including fsck) overnight (I guess from a gparted live CD). So I will be back with details tomorrow if something comes up from there. It's still strange to me that I can't pinpoint the exact time or at what action the system swaps to read only. An hour ago I even managed to run two programs (which are using quite some memory and processors without problems), but these data were on other HDDs. Just now I copied all the data from /home (on the ssd) to /tmp, but that was a read access of course. Maybe I will get in contact with my computer vendor anyway (got a next day support in my buying contract). Just to double-check; you think the udevd[1022]: worker [1103] unexpectedly returned with status 0x0100 alike errors I posted are not connected to the main problem? Hey now! I just booted into runlevel 3 (just to see actually) and it is now printing EXT4-fs related errors on device dm-0 (which must be the logical volume). Also there is something with orphaned inodes, but doing a df -i on all volumes does not show anything suspicious (unlikely I know, but I once managed to use all inodes on a disc on another system). Guess more tomorrow, but thanks again! |
|
| tux99 |
Posted: Aug 14 2012, 02:55 PM
|
|||||
|
SLF Guru ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 1120 Member No.: 224 Joined: 28-May 11 |
I think these errors are just a consequence of the main problem (when the filesystem switches to read only udevd will have trouble writing to files).
That sounds like the filesystem is corrupted, which could be purely a filesystem problem but more likely is a consequence of the SSD i/o errors. -------------------- My personal SL6 repository, specialized in audio/video software: http://pkgrepo.linuxtech.net/el6/
|
|||||
| PSchiffer |
Posted: Aug 14 2012, 06:56 PM
|
|
|
SLF Newbie Group: Members Posts: 13 Member No.: 1797 Joined: 14-August 12 |
[QUOTE=tux99,Aug 14 2012, 02:55 PM][QUOTE=PSchiffer,Aug 14 2012, 04:43 PM]
Just to double-check; you think the udevd[1022]: worker [1103] unexpectedly returned with status 0x0100 alike errors I posted are not connected to the main problem?[/QUOTE] [quote]I think these errors are just a consequence of the main problem (when the filesystem switches to read only udevd will have trouble writing to files). [/quote] Hmm, but the udevd errors come already during boot, so before the file system goes read only. [QUOTE=PSchiffer,Aug 14 2012, 04:43 PM]Hey now! I just booted into runlevel 3 (just to see actually) and it is now printing EXT4-fs related errors on device dm-0 (which must be the logical volume). Also there is something with orphaned inodes, but doing a df -i on all volumes does not show anything suspicious (unlikely I know, but I once managed to use all inodes on a disc on another system).[/QUOTE] [quote]That sounds like the filesystem is corrupted, which could be purely a filesystem problem but more likely is a consequence of the SSD i/o errors. fsck says all volumes in the lvm are clean (the problem seems to be actually with /root). wondering if it would make sense to reformat and copy the system back (or do a fresh install), just to make sure it's really not a software issue. |
|
| redman |
Posted: Aug 14 2012, 08:15 PM
|
|
![]() SLF Admin ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Admins Posts: 1667 Member No.: 2 Joined: 8-April 11 |
PSchiffer, please correct the above message, make sure you quotations are correct
-------------------- What is SL? - Forum Rules - Info on 3rd Party Repos - How to post images - How to post large text / config files
Desktop: Asus P5QPL-AM, Intel Dual-Core E6500, 4GB DDR2, Asus GeForce GT 430 1GB, SL6.4 x86_64 Test box: Intel S5000PSL, 2x Intel Xeon E5310, 8GB ECC DDR2 FB-Dimm, Asus GeForce GT 220 1GB, SL6.4 x86_64 |
|
| tux99 |
Posted: Aug 14 2012, 09:51 PM
|
|||||
|
SLF Guru ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 1120 Member No.: 224 Joined: 28-May 11 |
Ok then they aren't caused by the fs going ro, but I still think they are secondary issues caused by the main problem, udev is complaining about several devices unrelated to each other so it's very unlikely those devices all have a problem, more likely udev is malfunctioning for some reason (i.e. this is just an effect of the underlying cause). That said it might be worth doing a 24 hour memtest86 check of your workstation, because in case the culprit isn't the SSD then it could be the RAM (or even the motherboard or the cpu or the PSU, but those are IMHO less likely).
Your reply isn't clear, it would have been more useful if you had posted the output of the fsck (including the command you ran). Did you do a forced fsck "e2fsck -f" for every filesystem on the SSD? Are you saying /root had errors and got repaired but the other filesystems didn't have errors? -------------------- My personal SL6 repository, specialized in audio/video software: http://pkgrepo.linuxtech.net/el6/
|
|||||
| PSchiffer |
Posted: Aug 15 2012, 10:48 AM
|
|||||
|
SLF Newbie Group: Members Posts: 13 Member No.: 1797 Joined: 14-August 12 |
okay, i will be looking at that next. thanks.
Sorry about that. I am posting the output of e2fsck -f below. Please note that under the live environment dm-0 which is mentioned in the error reports becomes dm-2. What you see is e2fsck -f for sda1, which is /boot and not in the lv and then each output for /home and /root in the lv twice (omitting the /swap). /dev/sda1: recovering journal Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/sda1: 53/128016 files (7.5% non-contiguous), 108862/512000 blocks /dev/vg_rechenknecht/lv_root: recovering journal Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/vg_rechenknecht/lv_root: ***** FILE SYSTEM WAS MODIFIED ***** /dev/vg_rechenknecht/lv_root: 189472/3276800 files (0.2% non-contiguous), 3936475/13107200 blocks e2fsck -f /dev/dm-2 e2fsck 1.41.12 (17-May-2010) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/dm-2: 189472/3276800 files (0.2% non-contiguous), 3936475/13107200 block e2fsck -f /dev/vg_rechenknecht/lv_home e2fsck 1.41.12 (17-May-2010) /dev/vg_rechenknecht/lv_home: recovering journal Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/vg_rechenknecht/lv_home: 116900/1277952 files (0.2% non-contiguous), 3238986/5107712 blocks e2fsck -f /dev/dm-4 e2fsck 1.41.12 (17-May-2010) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/dm-4: 116900/1277952 files (0.2% non-contiguous), 3238986/5107712 blocks I am not really sure what to make of that... |
|||||
| tux99 |
Posted: Aug 15 2012, 11:21 AM
|
|
|
SLF Guru ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 1120 Member No.: 224 Joined: 28-May 11 |
Well on the /root fs the "***** FILE SYSTEM WAS MODIFIED *****" indicates that fsck found filesystem problems of some kind and had to correct it.
Of course filesystem problems don't necessarily mean disk problems (just like disk problems don't always necessarily cause filesystem problems as fsck only checks fs metadata not integrity of the actual data). Filesystem problems could have been caused by a simple unclean shutdown. Try booting the system normally from the SSD again now that you have done the fsck and see if you get again errors of any kind (udevd errors, ext4 errors, i/o errors, fs switching read-only or anything else unusual). Also check with smartctl if any of the attribute values have changed. -------------------- My personal SL6 repository, specialized in audio/video software: http://pkgrepo.linuxtech.net/el6/
|
|
| PSchiffer |
Posted: Aug 16 2012, 07:13 AM
|
|||||
|
SLF Newbie Group: Members Posts: 13 Member No.: 1797 Joined: 14-August 12 |
I think I can put this issue to SOLVED:
To my own embarrassment (as I think I should have checked this much earlier) it comes down to a firmware update needed on the SSD.
As the error was still there and after running a couple of hours of mem test without anything coming up I finally called the workstation vendor and well they just said:"oh, yes, there is a firmware update for your SSD that adresses exactly this issue". I am quoting the Micron document below, just for how ridiculous it is. So after the update the computer is running sweetly for more than 12h, even when I put substantial load on the system. Many thanks again for all the advice and help tux99, I guess I learned a lot.
|
|||||
| redman |
Posted: Aug 16 2012, 09:34 AM
|
|
![]() SLF Admin ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Admins Posts: 1667 Member No.: 2 Joined: 8-April 11 |
Thanks for the feedback.
As for the firmware goes, you wouldn't be the first one to forget that -------------------- What is SL? - Forum Rules - Info on 3rd Party Repos - How to post images - How to post large text / config files
Desktop: Asus P5QPL-AM, Intel Dual-Core E6500, 4GB DDR2, Asus GeForce GT 430 1GB, SL6.4 x86_64 Test box: Intel S5000PSL, 2x Intel Xeon E5310, 8GB ECC DDR2 FB-Dimm, Asus GeForce GT 220 1GB, SL6.4 x86_64 |
|
| tux99 |
Posted: Aug 16 2012, 10:11 AM
|
|
|
SLF Guru ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 1120 Member No.: 224 Joined: 28-May 11 |
I'm glad you solved it and I agree with you it's ridiculous that these days you need to worry even about firmware updates for SSDs...
(this never used to be the case for hard disks). -------------------- My personal SL6 repository, specialized in audio/video software: http://pkgrepo.linuxtech.net/el6/
|
|
![]() |
![]() ![]() ![]() |