Re: Hard(?) lock when reassociating ath with wpa_supplicant on RELENG_7
- From: "Alexandre \"Sunny\" Kovalenko" <alex.kovalenko@xxxxxxxxxxx>
- Date: Tue, 26 Aug 2008 21:10:40 -0400
On Sat, 2008-07-12 at 21:26 -0700, Sam Leffler wrote:
Alexandre "Sunny" Kovalenko wrote:
On Sat, 2008-07-12 at 09:57 -0700, Sam Leffler wrote:Testing the latest hal is always useful but I've not MFC'd many ath
Alexandre "Sunny" Kovalenko wrote:Would it be of any value to you, if I build the new hal and see what
On Fri, 2008-07-11 at 20:29 -0700, Sam Leffler wrote:Don't know. There appear to be two issues. When the MIB interrupts
Alexandre "Sunny" Kovalenko wrote:I do. And I just tried disabling it, and I could not reproduce the
On Fri, 2008-05-16 at 12:23 -0400, Sam Leffler wrote:Are you running powerd?
Alexandre "Sunny" Kovalenko wrote:I have finally got enough time and equipment to investigate this
On Mon, 2008-05-12 at 19:33 -0700, Sam Leffler wrote:Guess I misunderstood you.
Alexandre "Sunny" Kovalenko wrote:I am not sure, I have parsed this well enough, so I will try to clarify:
I seem to be able to lock my machine by going into wpa_cli and asking itSo this is just livelock due to console debug msgs.
to 'reassoc'. The reason for question mark after "hard" is that debug
information (caused by wlandebug and athdebug) is being printed on the
console. The only way to get machine's attention is to hold power button
for 8 seconds.
machine becomes unresponsive *without* any debugging turned on, to an
extent that pressing the power button twice *does not* generate ACPI
console message (something to the tune of "going into S5 already --
gimme a break"). If I turn ath debugging on, I do see those messages,
and only them, scrolling on the console.
further. Here are some conclusions:
-- at this point (RELENG_7 as of July 9th around 15:30 EST) it is indeed
a livelock.
-- all system does, is executing ath_intr (if_ath.c) in the tight loop
with the same status -- 0x1000 AKA HAL_INT_MIB. In order to eliminate
possibility that I have caused livelock with the debug messages, I have
put conditional panic() into ath_intr, as soon as sc->sc_stats.ast_mib
reaches 10,000. Without any kind of the debug messages, it will be
triggered within 40-60 seconds after starting of wpa_supplicant.
-- I suspect that comment below, might not hold true on my equipment
if (status & HAL_INT_MIB) {
sc->sc_stats.ast_mib++;
/*
* Disable interrupts until we service the MIB
* interrupt; otherwise it will continue to fire.
*/
ath_hal_intrset(ah, 0);
/*
* Let the hal handle the event. We assume it will <============
* clear whatever condition caused the interrupt. <============
*/
ath_hal_mibevent(ah, &sc->sc_halstats);
ath_hal_intrset(ah, sc->sc_imask);
}
My hardware is:
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413,
RF5413)
ath0: <Atheros 5212> mem 0xedf00000-0xedf0ffff irq 17 at device 0.0 on
pci3
ath0: [ITHREAD]
ath0: using obsoleted if_watchdog interface
ath0: Ethernet address: 00:16:cf:26:2f:3f
ath0: mac 10.3 phy 6.1 radio 10.2
My wpa_supplicant.conf is:
ctrl_interface=/var/run/wpa_supplicant
ctrl_interface_group=wheel
eapol_version=2
network={
ssid="XXXXXXXXXXX"
scan_ssid=1
priority=1
proto=WPA
pairwise=TKIP
group=TKIP
key_mgmt=WPA-PSK
psk="xxxxxxxxxxxxxxxxxxxxxx"
}
Access point is Netgear WNDR3300-1B with 11N and 11G SSID set up to
different values. Only 11G SSID is configured in wpa_supplicant.conf. In
the test setup, AP is with 10' (3m) from the laptop.
AP is successfully used by handful of Windows clients (including this
same laptop) and iBook G4.
Neither wpa_supplicant with '-d -d' nor wlandebug 0xFFFFFFFF show
anything but normal scan.
athdebug 0xFFFFFFFF loops with ath_intr: status 0x1000 until I power
down my laptop.
I would appreciate any suggestion on what I can investigate further --
at this point I have comfortable console setup and should be able to
field requests for further information much better.
problem any more. Is there any way to reconcile if_ath with powerd?
arrive the kernel may service them w/ the cpu at a reduced clock
frequency. Since powered is currently the only mechanism for increasing
the frequency and it runs in user space it can take a while to bump the
clock rate leading to livelock (because the logic to reduce the _cause_
of the MIB interrupt takes a long time to run). John Baldwin suggested
raising the clock frequency when handling interrupts in the kernel but
nothing has been done to make that happen.
Separately there is a question as to why the MIB interrupts are
happening at all. This is possibly due to misprogramming of the
baseband h/w in the ath card. Unfortunately I've been trying to get
Atheros to help understand/resolve this question for a very long time
(as their code also exhibits this behaviour) but they've been
unresponsive. I have some experimental code to address this in new hal
versions (such as 0.10.5.6 available in http://www.freebsd.org/~sam) but
apparently it does not entirely fix the problem.
happen? I can live without powerd as the workaround, but I'd rather help
if I can.
changes to RELENG_7 and some people have reported problems on RELENG_7
w/ the new hal that do not occur on HEAD.
I have finally carved enough quality time to see whether new HAL makes
any difference WRT the issue, and I am happy to report that, with the
HAL version 0.10.5.6, I could no longer reproduce it. I am seeing whole
lot of messages
update_stats: bogus ndx0 -1, max 12, mode 3
and
bogus rix 255, max 12, mode 3
scrolling by, but I don't know whether these are related to the problem,
I have seen with the previous version of the HAL, or the downside of
using new HAL with RELENG_7.
Reverting the HAL brings about lockups pretty reliably.
At this point I am very happy with my machine, however, if there is any
additional information that I can provide, or any patches to test, I
will be more than happy to oblige.
Thank you very much for doing this work.
--
Alexandre "Sunny" Kovalenko (Олександр Коваленко)
_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"
- Prev by Date: [Fwd: Updating a minimal install]
- Next by Date: Re: changing a ports final destination via make knobs question...
- Previous by thread: [Fwd: Updating a minimal install]
- Next by thread: DTrace merged ready for 7.1
- Index(es):