Re: [mfi] command timeouts
- From: Scott Long <scottl@xxxxxxxxxxxxxxxxx>
- Date: Mon, 19 Feb 2007 16:31:11 -0700
Bjoern A. Zeeb wrote:
On Mon, 19 Feb 2007, Bjoern A. Zeeb wrote:
Hi,
I am testing mfi on a Dell 2950 with 6 PD, 2LD (1st LD=RAID1,
2nd LD=RAID5, 1HTSP).
(The somewhat sucky) megacli "works".
While most commands to gather information work fine, as do pulling out
disks hard, setting a disk offline or running some other commands hangs
'something', which might be the controller?
For example:
foo# megacli -PDOffline -PhysDrv'[1:3]' -a0
EnclId-1 SlotId-3 state changed to OffLine.
foo# foo# ls -l
<hangs forever>
It's not only this process but all disk IO related processes.
On the serial console I get:
...
mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 732 SECONDS
mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 732 SECONDS
mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 732 SECONDS
mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 732 SECONDS
mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 732 SECONDS
mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 732 SECONDS
mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 732 SECONDS
mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 732 SECONDS
mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 732 SECONDS
mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 732 SECONDS
mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 684 SECONDS
mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 679 SECONDS
mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 44 SECONDS
mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 763 SECONDS
mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 763 SECONDS
mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 763 SECONDS
mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 763 SECONDS
mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 763 SECONDS
mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 763 SECONDS
mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 763 SECONDS
mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 763 SECONDS
mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 763 SECONDS
mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 763 SECONDS
mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 763 SECONDS
mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 715 SECONDS
mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 710 SECONDS
mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 75 SECONDS
mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 793 SECONDS
mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 794 SECONDS
mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 794 SECONDS
mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 794 SECONDS
mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 794 SECONDS
mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 794 SECONDS
mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 794 SECONDS
mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 794 SECONDS
mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 794 SECONDS
mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 794 SECONDS
mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 794 SECONDS
mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 746 SECONDS
mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 741 SECONDS
mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 106 SECONDS
mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 824 SECONDS
...
I can still break to ddb. Without disk I/O, the only
possible thing I can really do is type reset.
I'll build a debugging kernel so I can do show alllocks, etc
but if someone with more experience with this driver/hw could
contact me I can run further tests.
this time with the debugging kernel:
foo# megacli -PDOffline -PhysDrv'[1:3]' -a0
EnclId-1 SlotId-3 state changed to OffLine.
foo# foo# foo# foo#
I was able to hit <enter> multiple times after the "uh it still lives"
but then ...
command 0xffffffff80c40000 not in queue, flags = 0x20, bit = 0x80
panic: command not in queue
cpuid = 2
Uptime: 1m17s
Physical memory: 4084 MB
Dumping 199 MB: 184 168 152 136 120 104 88 72 56 40 24 8
Dump complete
telnet> send brk
KDB: enter: Line break on console
[thread pid 15 tid 100009 ]
Stopped at kdb_enter+0x2f: nop
db> where
Tracing pid 15 tid 100009 td 0xffffff012f5c4000
kdb_enter() at kdb_enter+0x2f
siointr1() at siointr1+0x400
siointr() at siointr+0x2e
intr_execute_handlers() at intr_execute_handlers+0x124
Xapic_isr1() at Xapic_isr1+0x7f
--- interrupt, rip = 0xffffffff803c9787, rsp = 0xffffffffac06eb30, rbp = 0xffffffffac06eb60 ---
_mtx_lock_sleep() at _mtx_lock_sleep+0x137
_mtx_lock_flags() at _mtx_lock_flags+0xe1
mfi_timeout() at mfi_timeout+0x32
softclock() at softclock+0x1c8
ithread_loop() at ithread_loop+0xfe
fork_exit() at fork_exit+0xaa
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffffffac06ed40, rbp = 0 ---
db> show alllocks
Process 24 (irq78: mfi0) thread 0xffffff012f5c5000 (100020)
exclusive sleep mutex MFI I/O lock r = 0 (0xffffff012f5cc630) locked @ /u1/src/HEAD/sys/dev/mfi/mfi.c:775
After the reboot it does not seem that the command
was executed as the disk still seems to be online (at least
it was the last time).
megacli is known to be fragile. Don't Do That (tm). As for the panic,
It's probably a side effect of megacli putting the card and the driver into a chaotic state.
Scott
_______________________________________________
freebsd-current@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@xxxxxxxxxxx"
- Follow-Ups:
- Re: [mfi] command timeouts
- From: Bjoern A. Zeeb
- Re: [mfi] command timeouts
- References:
- [mfi] command timeouts
- From: Bjoern A. Zeeb
- Re: [mfi] command timeouts
- From: Bjoern A. Zeeb
- [mfi] command timeouts
- Prev by Date: Re: envy24ht: M-Audio Revolution 5.1 broken (FreeBSD 7.0-CURRENT/AMD64)
- Next by Date: Re: excessive TCP duplicate acks?
- Previous by thread: Re: [mfi] command timeouts
- Next by thread: Re: [mfi] command timeouts
- Index(es):
Relevant Pages
|
|