Difference between revisions of "Disk weirdness on pre production fuller"
From PrgmrWiki
(Created page with "So, a disk mysteriously dropped out of fuller's raid.[1] If I'm counting right, it was the disk on the add-on card. I go to test the disk and smart is spotless, and passes smar...") |
|||
(One intermediate revision by the same user not shown) | |||
Line 6: | Line 6: | ||
+ | so, I replaced the add-in card and see: | ||
+ | <pre> | ||
+ | [root@fuller ~]# smartctl -a /dev/sdg | ||
+ | smartctl 5.39.1 program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO | ||
+ | program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO | ||
+ | program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO | ||
+ | 2010-01-28 r3054 [x86_64-unknown-linux-gnu] (local build) | ||
+ | Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net | ||
+ | Device: /6:0:0:0 Version: | ||
+ | scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46 | ||
+ | >> Terminate command early due to bad response to IEC mode page | ||
+ | A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. | ||
+ | </pre> | ||
Line 59: | Line 72: | ||
− | actually, I"m not at all sure those two numbers are the same device. | + | actually, I"m not at all sure those two numbers are the same device. pretty sure they are not. |
+ | |||
+ | |||
+ | [root@fuller ~]# lspci |grep "05\.0 " | ||
+ | 00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 5 (rev 22) | ||
+ | |||
+ | So my error is on the root hub port 5. yeah. I'm going to blame the add-on sata card. So, just remove it? or replace it? that is the question. | ||
</pre> | </pre> | ||
+ | www.redhat.com/promo/summit/.../fal_prarit_rhsummit2010.pdf |
Latest revision as of 00:29, 13 January 2012
So, a disk mysteriously dropped out of fuller's raid.[1] If I'm counting right, it was the disk on the add-on card. I go to test the disk and smart is spotless, and passes smart tests. I start running badblocks, and I start getting pcieport errors for the ethernet card, of all things[2]
so, I replaced the add-in card and see:
[root@fuller ~]# smartctl -a /dev/sdg smartctl 5.39.1 program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO 2010-01-28 r3054 [x86_64-unknown-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net Device: /6:0:0:0 Version: scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46 >> Terminate command early due to bad response to IEC mode page A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
[1] ata8.00: exception Emask 0x52 SAct 0x0 SErr 0xffffffff action 0xe frozen ata8: SError: { RecovData RecovComm UnrecovData Persist Proto HostInt PHYRdyChg PHYInt CommWake 10B8B Dispar BadCRC Handshk LinkSeq TrStaTrns UnrecFIS DevExch } ata8.00: failed command: FLUSH CACHE EXT ata8.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 res 40/00:04:e0:4a:1d/00:00:01:00:00/40 Emask 0x56 (ATA bus error) ata8.00: status: { DRDY } ata8: hard resetting link ata8: failed to resume link (SControl FFFFFFFF) ata8: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) ata8: hard resetting link ata8: failed to resume link (SControl FFFFFFFF) ata8: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) ata8: limiting SATA link speed to 3.0 Gbps ata8: hard resetting link ata8: failed to resume link (SControl FFFFFFFF) ata8: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) ata8.00: disabled sd 7:0:0:0: rejecting I/O to offline device md: super_written gets error=-5, uptodate=0 md/raid1:md0: Disk failure on sdg1, disabling device. md/raid1:md0: Operation continuing on 6 devices.
[2] [root@fuller ~]# badblocks -w /dev/sdg pcieport 0000:00:05.0: AER: Uncorrected (Non-Fatal) error received: id=0000 pcieport 0000:00:05.0: AER: Uncorrected (Non-Fatal) error received: id=0000 pcieport 0000:00:05.0: AER: Uncorrected (Non-Fatal) error received: id=0000 pcieport 0000:00:05.0: AER: Uncorrected (Non-Fatal) error received: id=0000 pcieport 0000:00:05.0: AER: Uncorrected (Non-Fatal) error received: id=0000 pcieport 0000:00:05.0: AER: Uncorrected (Non-Fatal) error received: id=0000 pcieport 0000:00:05.0: AER: Uncorrected (Non-Fatal) error received: id=0000 pcieport 0000:00:05.0: AER: Uncorrected (Non-Fatal) error received: id=0000 from lspci: 05:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) actually, I"m not at all sure those two numbers are the same device. pretty sure they are not. [root@fuller ~]# lspci |grep "05\.0 " 00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 5 (rev 22) So my error is on the root hub port 5. yeah. I'm going to blame the add-on sata card. So, just remove it? or replace it? that is the question.
www.redhat.com/promo/summit/.../fal_prarit_rhsummit2010.pdf