Can anyone shed some insight on a system crash?

From: Shea Martin (smartin_at_arcis.com)
Date: 04/04/05


Date: Mon, 04 Apr 2005 15:20:59 -0600

Sunfire 4800/Solaris 8

All was find until about 2 weeks, ago the system crashed. Before the crash,
no extraneous error were being logged, though we don't analyze the messages
file on a daily basis (atleast not on a stable system). Since then:

We have crash 3 more times.

FS /home/CF7 is raid-0 JBOD. There are messages in logs complaining about
bad reads writes from some of the disks in CF7. No known data corruption.
VTS does not seem to find anything wrong with CF7, though it cannot test a
meta-device fs. CF7 is UFS with journaling. CF7 is running on LVD scsi.
In the snippet, you can see one of the parity errors at 15:22 (at beginning).

/home/CF10 is a slice on a Nexsan ATABeast (IDE disk vault), connected via
dual FCAL. We have had some repeatable data corruption problems (writing
binary data to disk, then reading back, does not yeild identical results).
Removing one of the FCAL controllers on the ATABeast apeared to fix the
corruption problem. We had hoped this fix would coincide with the return of
stability of the 4800.

The problem is that the logs show messages that I haven't seen before, with
other hardware issues on the 4800. MY hope in posting is that someone may
have saw similar messages in their log files, and thus know what it took to
fix the issue. It may be a case where we have to send a core file off to Sun...

Anyway, here is the log snippet.

[/var/adm/messages.0]====================================================
Apr 1 15:22:28 crowfoot scsi: [ID 107833 kern.warning] WARNING:
/ssm@0,0/pci@18,600000/pci@1/scsi@5 (qus3):
Apr 1 15:22:28 crowfoot Target synch. rate reduced. tgt 5 lun 0
Apr 1 15:22:28 crowfoot scsi: [ID 107833 kern.warning] WARNING:
/ssm@0,0/pci@18,600000/pci@1/scsi@5 (qus3):
Apr 1 15:22:28 crowfoot Parity Error
Apr 1 15:22:28 crowfoot scsi: [ID 107833 kern.warning] WARNING:
/ssm@0,0/pci@18,600000/pci@1/scsi@5/sd@5,0 (sd185):
Apr 1 15:22:28 crowfoot Error for Command: read(10) Error
Level: Retryable
Apr 1 15:22:28 crowfoot scsi: [ID 107833 kern.notice] Requested Block:
9403536 Error Block: 9403536
Apr 1 15:22:28 crowfoot scsi: [ID 107833 kern.notice] Vendor: SEAGATE
                        Serial Number: 3CE0FGG8
Apr 1 15:22:28 crowfoot scsi: [ID 107833 kern.notice] Sense Key: Aborted
Command
Apr 1 15:22:28 crowfoot scsi: [ID 107833 kern.notice] ASC: 0x48 (initiator
detected error message received), ASCQ: 0x0, FRU: 0x2
Apr 1 16:30:41 crowfoot unix: [ID 836849 kern.notice]
Apr 1 16:30:41 crowfoot ^Mpanic[cpu17]/thread=3002e806020:
Apr 1 16:30:41 crowfoot unix: [ID 198239 kern.notice] free: freeing free
block, dev:0x7600000136, block:17016, ino:23347, fs:/export/home/CF10
Apr 1 16:30:41 crowfoot unix: [ID 100000 kern.notice]
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c76cf0
ufs:real_panic_v+70 (0, 10503fe0, 2a100c76f90, 0, cc, 29860)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
0000000078337f04 00000300074d4d88 000003000461b538 0000000000000000
Apr 1 16:30:41 crowfoot %l4-7: 000003000456d620 000003000456d620
0000000000000000 0000000000000000
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c76da0
ufs:ufs_fault_v+c8 (300061d6448, 0, 2a100c76f90, 2a100c772f8, 5b, 1)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
00000300061d6398 0000000010503fe0 0000007600000136 0000000000001000
Apr 1 16:30:41 crowfoot %l4-7: 0000000000001c00 0000030000a2c5d0
0000030018e20380 00000000084f0a30
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c76e50
ufs:ufs_fault+1c (2a100c772f8, 10503fe0, 7600000136, 4278, 5b33, 300062da0d4)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
00000000000345d0 00000000104f0000 00000300045fdc38 000000001000a408
Apr 1 16:30:41 crowfoot %l4-7: 0000000000002000 0000030000a29818
0000000000000000 000002a100c76e60
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c76f00
ufs:free+2ec (84f, 3000db900a8, 2000, 58a, 8, 300061d6398)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
000003000db903bc 0000030018e20380 00000300061d6398 000002a100c77268
Apr 1 16:30:41 crowfoot %l4-7: 0000000000001dd4 00000300062da000
000003000db90000 0000000000004278
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c77010
ufs:indirtrunc+280 (427c278, ffffffffffffffff, 10, c, ffffffffffffffff,
3000b728000)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
0000000000000008 0000000000000000 000002a100c77268 00000300062da000
Apr 1 16:30:41 crowfoot %l4-7: ffffffffffffffff 0000000000000001
0000000000007900 0000000000000070
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c770e0
ufs:indirtrunc+240 (42784e8, 0, 10, c, 8a, 30013734000)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
0000000000000008 0000000000000001 000002a100c77268 00000300062da000
Apr 1 16:30:41 crowfoot %l4-7: 0000000000045375 0000000000000800
000000000005d6d0 0000000000000095
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c771b0
ufs:ufs_itrunc+628 (4b0188, 2a100c77440, 10, 4, 45b81, 0)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
000002a100c77268 00000300062da000 000000008b704000 0000030022ba1a40
Apr 1 16:30:41 crowfoot %l4-7: 000003002f587578 0000000000000000
0000000000000001 0000000000000008
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c774a0
ufs:ufs_trans_itrunc+180 (0, ffbf, 40, 300061d63e0, 0, 30022ba1b80)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
00000000783109a8 00000300062da000 0000000000000000 00000300061d6398
Apr 1 16:30:41 crowfoot %l4-7: 0000000000000000 000003002f587578
000002a100c77890 0000030022ba1a40
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c77560
ufs:ufs_create+5c4 (3002f587578, 2a100c77990, 2a100c77998, 0, 2700, 1)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
0000030022ba1b78 0000030029c77290 0000000000000000 00000300061d6398
Apr 1 16:30:41 crowfoot %l4-7: 0000030022ba1a40 0000000000000080
0000000000002102 0000030022ba1ad0
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c77640
lofs:lo_create+38 (3001b7c9ab8, 30029c77290, 2a100c77998, 0, 80, 2a100c77990)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
0000000010179444 000003001dd1f528 000002a100c77890 0000000000ff0000
Apr 1 16:30:41 crowfoot %l4-7: 000000000000ff00 0000000081010000
00000000fee56550 0000000000000000
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c77700
genunix:vn_create+438 (4a67748, 0, 104f0a00, 0, 0, 0)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
000000007845374c 0000000004a67748 0000000000000000 000002a100c77998
Apr 1 16:30:41 crowfoot %l4-7: 0000000000000000 000002a100c77990
0000030022ba1ad0 0000000000000080
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c778b0
genunix:vn_open+d4 (2102, 202, 1b6, 100, 80, 200)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
000003001dd1fec8 0000000004a67748 0000000000000000 0000000000002302
Apr 1 16:30:41 crowfoot %l4-7: 0000000000000000 0000000000000000
0000000000000000 000000007fffffff
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c77a20
genunix:copen+94 (4a67748, 2302, 1b6, 2302, 0, fed400f9)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
00000000000000e1 000003001dd1f528 000003002e804428 0000000000000000
Apr 1 16:30:41 crowfoot %l4-7: 0000000000000001 000003001dd1f528
0000000000000000 0000000000000000
Apr 1 16:30:41 crowfoot unix: [ID 100000 kern.notice]
Apr 1 16:30:41 crowfoot genunix: [ID 672855 kern.notice] syncing file
systems...
Apr 1 16:31:11 crowfoot unix: [ID 836849 kern.notice]
Apr 1 16:31:11 crowfoot ^Mpanic[cpu17]/thread=3002e806020:
Apr 1 16:31:11 crowfoot unix: [ID 715357 kern.notice] panic sync timeout
Apr 1 16:31:11 crowfoot unix: [ID 100000 kern.notice]
Apr 1 16:31:11 crowfoot genunix: [ID 353387 kern.notice] dumping to
/dev/md/dsk/d1, offset 65536
Apr 1 16:33:17 crowfoot genunix: [ID 409368 kern.notice] ^M100% done:
160114 pages dumped, compression ratio 2.51,
Apr 1 16:33:17 crowfoot genunix: [ID 851671 kern.notice] dump succeeded
Apr 1 16:30:41 crowfoot unix: [ID 836849 kern.notice]
Apr 1 16:30:41 crowfoot ^Mpanic[cpu17]/thread=3002e806020:
Apr 1 16:30:41 crowfoot unix: [ID 198239 kern.notice] free: freeing free
block, dev:0x7600000136, block:17016, ino:23347, fs:/export/home/CF10
Apr 1 16:30:41 crowfoot unix: [ID 100000 kern.notice]
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c76cf0
ufs:real_panic_v+70 (0, 10503fe0, 2a100c76f90, 0, cc, 29860)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
0000000078337f04 00000300074d4d88 000003000461b538 0000000000000000
Apr 1 16:30:41 crowfoot %l4-7: 000003000456d620 000003000456d620
0000000000000000 0000000000000000
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c76da0
ufs:ufs_fault_v+c8 (300061d6448, 0, 2a100c76f90, 2a100c772f8, 5b, 1)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
00000300061d6398 0000000010503fe0 0000007600000136 0000000000001000
Apr 1 16:30:41 crowfoot %l4-7: 0000000000001c00 0000030000a2c5d0
0000030018e20380 00000000084f0a30
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c76e50
ufs:ufs_fault+1c (2a100c772f8, 10503fe0, 7600000136, 4278, 5b33, 300062da0d4)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
00000000000345d0 00000000104f0000 00000300045fdc38 000000001000a408
Apr 1 16:30:41 crowfoot %l4-7: 0000000000002000 0000030000a29818
0000000000000000 000002a100c76e60
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c76f00
ufs:free+2ec (84f, 3000db900a8, 2000, 58a, 8, 300061d6398)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
000003000db903bc 0000030018e20380 00000300061d6398 000002a100c77268
Apr 1 16:30:41 crowfoot %l4-7: 0000000000001dd4 00000300062da000
000003000db90000 0000000000004278
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c77010
ufs:indirtrunc+280 (427c278, ffffffffffffffff, 10, c, ffffffffffffffff,
3000b728000)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
0000000000000008 0000000000000000 000002a100c77268 00000300062da000
Apr 1 16:30:41 crowfoot %l4-7: ffffffffffffffff 0000000000000001
0000000000007900 0000000000000070
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c770e0
ufs:indirtrunc+240 (42784e8, 0, 10, c, 8a, 30013734000)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
0000000000000008 0000000000000001 000002a100c77268 00000300062da000
Apr 1 16:30:41 crowfoot %l4-7: 0000000000045375 0000000000000800
000000000005d6d0 0000000000000095
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c771b0
ufs:ufs_itrunc+628 (4b0188, 2a100c77440, 10, 4, 45b81, 0)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
000002a100c77268 00000300062da000 000000008b704000 0000030022ba1a40
Apr 1 16:30:41 crowfoot %l4-7: 000003002f587578 0000000000000000
0000000000000001 0000000000000008
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c774a0
ufs:ufs_trans_itrunc+180 (0, ffbf, 40, 300061d63e0, 0, 30022ba1b80)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
00000000783109a8 00000300062da000 0000000000000000 00000300061d6398
Apr 1 16:30:41 crowfoot %l4-7: 0000000000000000 000003002f587578
000002a100c77890 0000030022ba1a40
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c77560
ufs:ufs_create+5c4 (3002f587578, 2a100c77990, 2a100c77998, 0, 2700, 1)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
0000030022ba1b78 0000030029c77290 0000000000000000 00000300061d6398
Apr 1 16:30:41 crowfoot %l4-7: 0000030022ba1a40 0000000000000080
0000000000002102 0000030022ba1ad0
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c77640
lofs:lo_create+38 (3001b7c9ab8, 30029c77290, 2a100c77998, 0, 80, 2a100c77990)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
0000000010179444 000003001dd1f528 000002a100c77890 0000000000ff0000
Apr 1 16:30:41 crowfoot %l4-7: 000000000000ff00 0000000081010000
00000000fee56550 0000000000000000
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c77700
genunix:vn_create+438 (4a67748, 0, 104f0a00, 0, 0, 0)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
000000007845374c 0000000004a67748 0000000000000000 000002a100c77998
Apr 1 16:30:41 crowfoot %l4-7: 0000000000000000 000002a100c77990
0000030022ba1ad0 0000000000000080
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c778b0
genunix:vn_open+d4 (2102, 202, 1b6, 100, 80, 200)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
000003001dd1fec8 0000000004a67748 0000000000000000 0000000000002302
Apr 1 16:30:41 crowfoot %l4-7: 0000000000000000 0000000000000000
0000000000000000 000000007fffffff
Apr 1 16:30:41 crowfoot genunix: [ID 723222 kern.notice] 000002a100c77a20
genunix:copen+94 (4a67748, 2302, 1b6, 2302, 0, fed400f9)
Apr 1 16:30:41 crowfoot genunix: [ID 179002 kern.notice] %l0-3:
00000000000000e1 000003001dd1f528 000003002e804428 0000000000000000
Apr 1 16:30:41 crowfoot %l4-7: 0000000000000001 000003001dd1f528
0000000000000000 0000000000000000
Apr 1 16:30:41 crowfoot unix: [ID 100000 kern.notice]
Apr 1 16:30:41 crowfoot genunix: [ID 672855 kern.notice] syncing file
systems...
Apr 1 16:31:11 crowfoot unix: [ID 836849 kern.notice]
Apr 1 16:31:11 crowfoot ^Mpanic[cpu17]/thread=3002e806020:
Apr 1 16:31:11 crowfoot unix: [ID 715357 kern.notice] panic sync timeout
Apr 1 16:31:11 crowfoot unix: [ID 100000 kern.notice]
Apr 1 16:31:11 crowfoot genunix: [ID 353387 kern.notice] dumping to
/dev/md/dsk/d1, offset 65536
Apr 1 16:33:17 crowfoot genunix: [ID 409368 kern.notice] ^M100% done:
160114 pages dumped, compression ratio 2.51,
Apr 1 16:33:17 crowfoot genunix: [ID 851671 kern.notice] dump succeeded

~S