Re: powerd broken



Dominic Fandrey wrote:
Dominic Fandrey wrote:
Alexander Motin wrote:
Dominic Fandrey wrote:
Alexander Motin wrote:
It means that one of your CPUs spent most of it's time in interrupt
processing and so far from idle. What does `top -P` shows you? Where
have you seen that ~6% CPU load?
That is the load shown by the e17 CPU module. It's display has always
been in sync with top in the past, no longer though, it appears.

# top -PIS
last pid: 68235; load averages: 0.09, 0.16,
0.17
up 0+05:17:29 13:05:10
137 processes: 4 running, 117 sleeping, 16 waiting
CPU 0: 1.1% user, 0.0% nice, 1.1% system, 61.4% interrupt, 36.3% idle
CPU 1: 9.0% user, 0.0% nice, 3.4% system, 0.0% interrupt, 87.6% idle
Mem: 419M Active, 415M Inact, 416M Wired, 3752K Cache, 183M Buf, 716M
Free
Swap: 4096M Total, 4096M Free

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
COMMAND
11 root 1 171 ki31 0K 16K RUN 1 286:42 93.46%
idle: cpu1
23 root 1 -80 - 0K 16K RUN 0 126:54 59.96%
irq16: hdac0 uhci+
12 root 1 171 ki31 0K 16K RUN 0 179:27 40.09%
idle: cpu0
1318 root 1 46 0 439M 312M select 1 12:55 1.76% Xorg
4361 musicpd 4 44 0 91164K 14412K ucond 1 1:57 0.78% mpd

Some things strike me as odd. The difference between the load reported
by powerd and top is still very significant and of course the high
interrupt load.
powerd now reports/uses summary load of all CPUs (it can be bigger then
100%), while top shows average.

I've got a mouse with a 1khz report rate (the only connected USB
device), but unplugging it doesn't change the load. Neither does
stopping moused (I'm running the system without HAL). There also
is a fingerprint reader, but it is only detected by ugen.
I would start from identifying all devices sharing that IRQ and trying
to disable them (or unload their drivers) one by one. `systat -vm 1`
will show you how much interrupts actually happens there per second.

On most of modern systems you can make hdac0 to not share that IRQ by
enabling MSI there with hint.hdac.0.msi=1 in loader.conf.

I already did unplug all devices to no avail. Afterwards I tried
to 'kldunload -f' all usb devices, but most refused. However during
this I recognized that /boot/modules holds an old u3g.ko, back from
the time when it was not in stable. Apparently modules in
/boot/modules have preference over those in /boot/kernels. I deleted
the old u3g.ko and rebooted (because I couldn't get the system to
unload it). Now everything is fine, the interrupt load is gone.

I'll monitor this and come back here if it turns out this has just
been coincidence.

Thank you for all that help.

Regards

This has turned out to be an early call. I just happened to start
skype_devel and the whole thing started over, so I think it's
pretty certain, now, that this is a hdac issue.
However after a reboot I tried to reproduce this and now skype
and mpd are both running without causing an irq race.

If the irq race reappears I will use the device hint you suggested
to rule out that this is an interrupt sharing issue.

Even after setting the device hint the problem has reoccurred.
It simply starts after a while, I cannot make out any cause. As it
turned out uhci0 is the violent interrupt with an interrupt rate
between 130k and 190k. This is so incredible, I wonder why the interrupt
throttling doesn't kick in.

7 users Load 0.16 0.27 0.28 29 Mar 17:02

Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER
Tot Share Tot Share Free in out in out
Act 805472 76800 1252404 123696 615176 count
All 1062992 97920 14120720 152840 pages
Proc: Interrupts
r p d s w Csw Trp Sys Int Sof Flt cow 182k total
6 84 360k 2770 14k 178k 224 26 12 zfod atkbd0 1
ozfod acpi0 irq9
1.1%Sys 25.9%Intr 0.4%User 0.0%Nice 72.6%Idle %ozfod psm0 irq12
| | | | | | | | | | | daefr ata0 irq14
=+++++++++++++ 15 prcfr 177k uhci0 drm0
9 dtbuf 117 totfr 23 wpi0 uhci1
Namei Name-cache Dir-cache 100000 desvn react ehci0 uhci
Calls hits % hits % 7632 numvn pdwak uhci2 ehci
38 38 100 3663 frevn pdpgs 457 uhci3 21
intrn 2008 cpu0: time
Disks ad4 cd0 sg0 pass0 473048 wire 188 hdac0 256
KB/t 0.00 0.00 0.00 0.00 653788 act bge0 257
tps 0 0 0 0 274796 inact 2008 cpu1: time
MB/s 0.00 0.00 0.00 0.00 20192 cache
%busy 0 0 0 0 594984 free



I can rule out drm0 as the cause, because uhci0 is the only common
presence in all occurrences of this problem.
_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: powerd broken
    ... processing and so far from idle. ... That is the load shown by the e17 CPU module. ... the interrupt load is gone. ... If the irq race reappears I will use the device hint you suggested ...
    (freebsd-stable)
  • Re: Parallel port interrupt revisited
    ... you would create an exe to load. ... beitman AT applieddata DOT net ... compiler output does not even try to compile the parallel port driver. ... -It is better if you first set up all your interrupt stuff, ...
    (microsoft.public.windowsce.platbuilder)
  • Bogus high interrupt load?
    ... triggers an interrupt every 5ms with a free running clock, ... During the debugging of the DSP driver a strange load peak was detected with ... Normally the total load figure was about 1%-2%, ... There is obviously something happening to the load reporting facility of ...
    (Linux-Kernel)
  • Re: AMD64 much slower than i386 on FreeBSD 5.4-pre
    ... > performance testing. ... They tune little pieces here ... > that means you get an interrupt for every ... load rose with an 64bit system. ...
    (freebsd-questions)
  • Re: Hard Hang with __alloc_pages: 0-order allocation failed (gfp=0x20/1) - Not out of memory
    ... has changed since load ... in an interrupt ... doing a zone rebalance ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)