Re: x86, shared IRQ. how to track down the culprit?

In article <f1d05j$g03$1@xxxxxxxxxxxxxxxxxxxx>,
gl@xxxxxxxxxxxxxxxxxxxxxxxx (Jay G. Scott) writes:

here's my /var/adm/messages file, with all the warts.
below i've marked the message:
IRQ20 is being shared by drivers with different interrupt levels.

The rest of this posting is a detailed description of this issue,
but you can safely ignore that notice in every case I've ever seen.

i tried google, and somewhere it said i should look in /etc/driver_aliases.

That was wrong advice.

i've looked in there and i don't see how i can track down the problem.
i've been trying to look up 20, that's coming up empty.
ramdisk isn't in driver_aliases, and i half expected that.

(ahh, i now notice i have problem w/ my dns and ntp configs. i think those
are fixed now. i know what to look for there, but on this IRQ i'm stumped.)

Apr 30 23:02:00 smoking unix: [ID 954099] NOTICE: IRQ20 is being shared by drivers with different interrupt levels.
Apr 30 23:02:00 smoking This may result in reduced system performance.

### the message about IRQ is just two lines above.

Apr 30 23:02:00 smoking mac: [ID 543131] NOTICE: nge0/0 registered

The nVidia gigabit ethernet interface is one of the drivers.
The other will be one of the ones already listed above, but you can't
tell which from this output.

Look at the output of prtconf -v and you will see the devices in your
device tree listed. In the properties, you will find:

Interrupt Specifications:
Interrupt Priority=0x6 (ipl 6), vector=0x5 (5)

The vector value is the IRQ assigned, and you will probably find two entries
with vector=0x14 (20). (Unfortunately, I just tried this on Solaris 11, and
I notice most of the PCI entries are missing this, which is a bug I suspect.)

This warning can normally be ignored, and is pretty inevitable on x86
systems. It means the BIOS has assigned two PCI devices the same IRQ
(which is perfectly permissible). Solaris would prefer to run the
drivers for those two devices at different IPLs (Interrupt Priority
Levels), but it isn't possible to do that when they share an IRQ.
What Solaris has to do is to run the two devices at the highest IPL
either of them requires. This means the interrupt handler of one of
the drivers will be running at a higher priority than the designer
intended. This is normally harmless, but it might in theory mean it's
running at a higher priority than the interrupt handler of some other
device and locking the other device out from processing its interrupts
for rather too long.

If you want to try and get rid of it (which really isn't necessary),
you need to make sure you aren't allocating IRQs to devices you are
never going to use, and get the BIOS to redo the PCI IRQ allocation.
First, remove any unused PCI cards. Next, you need to clear the
ESCD data which is where the BIOS remembers what IRQ it assigned
to each PCI device so it can keep them the same (no modern OS
requires PCI IRQs to remain the same between reboots anymore).
This is most commonly done by clearing the NVRAM, although this
goes by different names in different BIOS's. Finally, you should
go into the BIOS (immediately after clearing the ESCD data) and
disable all the built-in devices you never use, such as printer
ports, serial ports, etc. This frees up their IRQ's so they can
be used by PCI devices, and reduces the chance of the BIOS forcing
PCI devices to share IRQs.

There is a worse form of this error which prevents one of the
devices working. This is where the two devices have IPL's which
are on different sides of IPL 10 (LOCK_LEVEL). Interrupt handlers
above and below LOCK_LEVEL work differently, and Solaris cannot
run the lower one at a level above 10. Currently this isn't much
of a problem because almost no PCI devices run at IPL >= 10.
I've been playing with PCI serial cards which require IPL > 10
to avoid losing characters, and then it does become a problem as
there's a good chance it will be sharing an IRQ with a PCI card
which requires an IRQ < 10, and one of the cards won't work.

Andrew Gabriel
[email address is not usable -- followup in the newsgroup]