Re: [mfi] command timeouts



On Mon, 19 Feb 2007, Bjoern A. Zeeb wrote:

Hi,

I am testing mfi on a Dell 2950 with 6 PD, 2LD (1st LD=RAID1,
2nd LD=RAID5, 1HTSP).
(The somewhat sucky) megacli "works".

While most commands to gather information work fine, as do pulling out
disks hard, setting a disk offline or running some other commands hangs
'something', which might be the controller?

For example:

foo# megacli -PDOffline -PhysDrv'[1:3]' -a0

EnclId-1 SlotId-3 state changed to OffLine.
foo# foo# ls -l
<hangs forever>

It's not only this process but all disk IO related processes.


On the serial console I get:

...
mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 732 SECONDS
mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 732 SECONDS
mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 732 SECONDS
mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 732 SECONDS
mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 732 SECONDS
mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 732 SECONDS
mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 732 SECONDS
mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 732 SECONDS
mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 732 SECONDS
mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 732 SECONDS
mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 684 SECONDS
mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 679 SECONDS
mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 44 SECONDS
mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 763 SECONDS
mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 763 SECONDS
mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 763 SECONDS
mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 763 SECONDS
mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 763 SECONDS
mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 763 SECONDS
mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 763 SECONDS
mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 763 SECONDS
mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 763 SECONDS
mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 763 SECONDS
mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 763 SECONDS
mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 715 SECONDS
mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 710 SECONDS
mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 75 SECONDS
mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 793 SECONDS
mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 794 SECONDS
mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 794 SECONDS
mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 794 SECONDS
mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 794 SECONDS
mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 794 SECONDS
mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 794 SECONDS
mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 794 SECONDS
mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 794 SECONDS
mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 794 SECONDS
mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 794 SECONDS
mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 746 SECONDS
mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 741 SECONDS
mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 106 SECONDS
mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 824 SECONDS
...


I can still break to ddb. Without disk I/O, the only
possible thing I can really do is type reset.

I'll build a debugging kernel so I can do show alllocks, etc
but if someone with more experience with this driver/hw could
contact me I can run further tests.

this time with the debugging kernel:

foo# megacli -PDOffline -PhysDrv'[1:3]' -a0

EnclId-1 SlotId-3 state changed to OffLine.
foo# foo# foo# foo#


I was able to hit <enter> multiple times after the "uh it still lives"
but then ...

command 0xffffffff80c40000 not in queue, flags = 0x20, bit = 0x80
panic: command not in queue
cpuid = 2
Uptime: 1m17s
Physical memory: 4084 MB
Dumping 199 MB: 184 168 152 136 120 104 88 72 56 40 24 8
Dump complete

telnet> send brk
KDB: enter: Line break on console
[thread pid 15 tid 100009 ]
Stopped at kdb_enter+0x2f: nop
db> where
Tracing pid 15 tid 100009 td 0xffffff012f5c4000
kdb_enter() at kdb_enter+0x2f
siointr1() at siointr1+0x400
siointr() at siointr+0x2e
intr_execute_handlers() at intr_execute_handlers+0x124
Xapic_isr1() at Xapic_isr1+0x7f
--- interrupt, rip = 0xffffffff803c9787, rsp = 0xffffffffac06eb30, rbp = 0xffffffffac06eb60 ---
_mtx_lock_sleep() at _mtx_lock_sleep+0x137
_mtx_lock_flags() at _mtx_lock_flags+0xe1
mfi_timeout() at mfi_timeout+0x32
softclock() at softclock+0x1c8
ithread_loop() at ithread_loop+0xfe
fork_exit() at fork_exit+0xaa
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffffffac06ed40, rbp = 0 ---
db> show alllocks
Process 24 (irq78: mfi0) thread 0xffffff012f5c5000 (100020)
exclusive sleep mutex MFI I/O lock r = 0 (0xffffff012f5cc630) locked @ /u1/src/HEAD/sys/dev/mfi/mfi.c:775


After the reboot it does not seem that the command
was executed as the disk still seems to be online (at least
it was the last time).

--
Bjoern A. Zeeb bzeeb at Zabbadoz dot NeT
_______________________________________________
freebsd-current@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: Really need help on this one
    ... Is there a way to read the output of a particular command into ... Heres a better example using ssh. ... set timeout $timeout ... exec kill -9 $pid ...
    (comp.lang.tcl)
  • Re: What if Expect buffer overflows
    ... expect_outwhen eof and timeout events happen. ... with your command and see what happens. ... Can anybody please guide what should I do to display the whole info? ...
    (comp.lang.tcl)
  • Re: Timeout error from SqlDataReader even when ConnectionTimeout = 0
    ... as well as the ConnectionTimeout. ... > Make sure you also set Command Time out to a large enough value. ... > Command Timing out even though Connection timeout is not reached. ... >> at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, ...
    (microsoft.public.dotnet.framework.adonet)
  • [mfi] command timeouts
    ... It's not only this process but all disk IO related processes. ... mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 732 SECONDS ...
    (freebsd-current)
  • aac0 command timeouts
    ... Today one of my admins noticed the following errors on a 6.0-REL-p4 system with an Adaptec 2230SLP RAID card: ... aac0: COMMAND 0xffffffff80841700 TIMEOUT AFTER 36 SECONDS ...
    (freebsd-stable)