Re: zpool scrub errors on 3ware 9550SXU
- From: Ian J Hart <ianjhart@xxxxxxxxxxxx>
- Date: Wed, 08 Jul 2009 12:24:17 +0100
Quoting Kip Macy <kmacy@xxxxxxxxxxx>:
Did you answer my question of whether or not this can be reproduced on 7-STABLE?
Yes I did, but the threading is a little broken, sorry that's my fault.
To reiterate, with 7 stable circa Jun 25th scrubs complete okay on the exact same hardware and v6 zpool as fails under 8.0-BETA1.
I'm scrubbing under 7 every time a run under 8 fails.
A reminder of the setup.
3ware 9550SXU-16
16x 1.5TB seagate. These drives throw bad sectors!
2 8 disk raidz2 vdevs combined into one pool.21.8TB.
Test file system with compression on copies 2
I don't think this is a zfs error as such, it looks like the card gives up, which then spawns a whole series of bogus checksum errors (but what do I know).
It's odd that it seems to take 40m+ to fail. Offsets are always large.
How can I test for/eliminate any LBA error?
What else might cause the card to fail (after 40m)?
BTW I have to put this into production soon, so I can start testing all the other stuff which might not work (ie samba).
Thanks for your help.
-Kip
On Tue, Jul 7, 2009 at 1:03 PM, Ian J Hart<ianjhart@xxxxxxxxxxxx> wrote:Quoting ianjhart@xxxxxxxxxxxx:
Quoting ianjhart@xxxxxxxxxxxx:
Quoting Kip Macy <kmacy@xxxxxxxxxxx>:
As usual scrubs cleanly on 7.2. Started throwing errors within a few
minutes under 8. Then it paniced, possibly due to scrub -s.
It's sat at the DB prompt if there's anything I can do. I'll need
idiots guide level instruction. I have a screen dump if someone want to step
up. Off list?
Highlight seems to be...
Memory modified after free 0xffffff0004da0c00(248) val=3000000 @
0xffffff0004dc00
Panic: most recently used by none
Can you test with recent 7-STABLE? That would tell me whether or not
your hitting a general HEAD issues or problems with the v13 import.
It's doing a scrub under 7.2 following another failed test. I'll pull it
up to stable after that.
Have more data will post that once I've done a couple a jobs.
Thanks,
Kip
Here's that extra data.
Updated 3ware/AMCC card firmware.
Enable onboard SATA and fit a 300GB SATA disk. Remove the floppy and fit a
second 300GB SATA disk.
Remove the two 500GB disks and replace with 1.5TB units. I can now create
two 8 disk raidz2 giving the same 12 disks worth of storage I had with one
14 disk raidz2.
Reinstall the two O/S on the 300GB drives.
<slight tangent>
May be of use to someone, so bear with me.
Reset to BIOS defaults. Some issues! Disabling sound helps.
Now suspect motherboard BIOS may be part of the problem. Removed both
cards and tested each version in turn.
ref: http://www.tyan.com.tw/support_download_bios.aspx?model=S.S2895
Started with 1.04 ended up with 1.04. Versions after, detect the internal;
SATA disks as 150 not 300. Most versions lock the keyboard (KVM) when legacy
USB is enabled. That's a PITA when you've just taken the floopy disk out.No
internal SATA disk settings. Be nice to check the geometry as 7 and 8
sysinstall seem to be behaving differently.
With the cards back in.
Add an ATA disk and CDROM while testing.Easyboot order is SATA0 ATA0
SATA1. Fdisk the so far blank ATA disk :)
On board audio clashes with something. BIOS 1.03 and later supports 16
SCSI boot devices. I disabled booting from the RAID card to allow the
onboard SATA drives to boot.
Out of space for option ROM error has gone.
AFAIK CPUs are late enough to support DDR400. Check anyway. Clock down to
333Mhz. Still fails.
</slight tangent>
There's one last thing, this BIOS (1.04) does not supply the fix for AMD
errata 169. Later BIOS incorrectly detect the onboard SATA disks.
Northbridge System Request Queue may stall.
ref:
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25759.pdf
We don't seem to have /dev/msr. Could I fix this using (the shiny new)
cpucontrol?
Thanks
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
_______________________________________________
freebsd-current@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@xxxxxxxxxxx"
FWIW this is still reproducable with 8.0-BETA1.
--
ian j hart
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
_______________________________________________
freebsd-current@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@xxxxxxxxxxx"
--
When bad men combine, the good must associate; else they will fall one
by one, an unpitied sacrifice in a contemptible struggle.
Edmund Burke
_______________________________________________
freebsd-current@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@xxxxxxxxxxx"
--
ian j hart
--
ian j hart
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
_______________________________________________
freebsd-current@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@xxxxxxxxxxx"
- Follow-Ups:
- Re: zpool scrub errors on 3ware 9550SXU
- From: Kip Macy
- Re: zpool scrub errors on 3ware 9550SXU
- References:
- Re: zpool scrub errors on 3ware 9550SXU
- From: Ian J Hart
- Re: zpool scrub errors on 3ware 9550SXU
- From: Kip Macy
- Re: zpool scrub errors on 3ware 9550SXU
- Prev by Date: Re: k3b and mkisofs problems on freebsd-current
- Next by Date: Re: buildworld panic on ia64
- Previous by thread: Re: zpool scrub errors on 3ware 9550SXU
- Next by thread: Re: zpool scrub errors on 3ware 9550SXU
- Index(es):
Relevant Pages
|