mfi(4) lockups and the adapter event log



Hello,
We saw an odd crash (more of a lockup) on one of our Dell's with a PERC5/i RAID controller (running 6.3-RELEASE-p10):

mfi0: COMMAND 0xffffffff89dab0e8 TIMEOUT AFTER 31 SECONDS
(repeated many times with different commands)
mfi0: 3325 (310696326s/0x0020/4) - Type 18: Fatal firmware error: Line 1091 in ../../raid/verdeMain.c

After some troubleshooting with Dell, they determined that there was no problem. An OpenManage live CD was used to run their diagnostic utilities, which turned up nothing.

However, after rebooting the system and checking dmesg(8), it could be seen that the adapter was doing some weird things during initialization. It seemed to be going through the whole initialization cycle (Firmware initialization started....) up to five times. This does not jive with our other mfi/PERC5 systems. In addition to the multiple init cycles it was also printing weeks worth of messages from the adapter's event log (mostly battery relearns). Checking the event log with linux-megacli showed TONS of messages. Trying to clear them using linux-megacli seemed to cause a similar lockup, filled with command timeouts, but no fatal firmware error.

We ended up clearing the event log using a linux live CD and the linux native MegaCLI binaries. The system hasn't locked up again since, but we're not sure what caused it in the first place.

Has anyone else seen any kind of oddities with the PERC series of controllers and the adapter's event log? Does it just fill up after a while and start to cause trouble? I seem to remember a similar post recommending a cleaning of the event logs every so often.

Thanks,
Steve

_______________________________________________
freebsd-questions@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: CreateTimerQueueTimer fails unexpectedly
    ... you should use stateless processing for both your IO completion and timeout logic. ... IOCP + timer queues is a good choice, but a key point is that when the timer runs, calling shutdown with SD_BOTH will abort any pending IO and trigger the cleanup logic you have already written to deal with socket disconnect. ... determine if a command has been responded in the desired time period. ...
    (microsoft.public.win32.programmer.kernel)
  • Re: Really need help on this one
    ... Is there a way to read the output of a particular command into ... Heres a better example using ssh. ... set timeout $timeout ... exec kill -9 $pid ...
    (comp.lang.tcl)
  • Re: dd command & reading in background
    ... ^ D,"UnixReview: Shell Corner: The dspl Korn Shell ... user input timeout, do a search on C.U.S. for the ... # 0<c<127 - job exited with this exit code ...
    (comp.unix.shell)
  • Re: mfi timeouts
    ... There is a patch linked to from this PR, ... mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 60 SECONDS ...
    (freebsd-stable)
  • Re: CreateTimerQueueTimer fails unexpectedly
    ... I send a command to a device and the device does not respond, ... So, in case of timeout I don't call shutdown, ... but a key point is that when the timer runs, ... written to deal with socket disconnect. ...
    (microsoft.public.win32.programmer.kernel)