Re: stray irq13 at runtime

From: Bruce Evans (bde_at_zeta.org.au)
Date: 05/30/04

  • Next message: Sergey Lyubka: "Re: crash when bpf is used heavily"
    Date: Mon, 31 May 2004 00:31:29 +1000 (EST)
    To: Kris Kennaway <kris@obsecurity.org>
    
    

    On Sun, 30 May 2004, Bruce Evans wrote:

    > On Sat, 29 May 2004, Kris Kennaway wrote:
    >
    > > Since updating the i386 package machines the other day, they've all
    > > experienced the following:
    > >
    > > May 29 21:24:53 <user.err> gohan28 kernel: stray irq13
    > >
    > > irq13: npx0 2 0
    > > stray irq13 1 0
    > >
    > > This is not appearing during boot - those machines have been up for
    > > hours before the interrupt occurs.

    > ...
    > I haven't figured out why the APIC case normally delivers both a normal
    > (fast) interrupt and stray interrupt when we don't wait for the one
    > interrupt that actually occurs. One is counted as stray because it
    > occurs after the bus_teardown_intr(), but both of them seem to occur
    > after that. So there seems to be a race or double counting somewhere.

    I have now figured this out. There is double counting. Interrupts
    are supposed to be counted per-device (more precisely, per group of
    devices sharing an interrupt at a given time), with interrupts that
    have no handler in effect being counted as for the special "stray"
    device and counts being maintained until reboot for all previous
    combinations of devices. This has been broken. Interrupts are now
    counted per-vector and reported as being for the last group of devices
    using the interrupt (so history is lost if the combination is changed),
    and then if their are no devices already using the interrupt, interrupts
    are counted again as "stray". In this case and some others, the stray
    interrupts really did come from the last group of devices causing the
    interrupt, but they shouldn't be counted twice.

    I can duplicate your counts of 2 and 1 and explain them as follows:
    - configure without "device apic" so that the other bug suite doesn't
      complicate things. This gives initial counts of 1 for npx0 and and
      stray irq13.
    - run any program that causes an unmasked NPX exception. This also
      causes an unmasked irq13 (because the recent optimization for edge
      triggering leaves irq13 enabled even when its handler has been torn
      down). The irq13 is double-counted as for npx0 and stray irq13.xi
      Further unmasked NPX exceptions don't cause further irq13 because
      the first one was not properly handled. The npx0 busy latch remains
      set, so further irq13's are masked by that although not by the PIC.

    Further irq13s for unmasked NPX exceptions don't happen for the APIC
    case, although one wants to happen according to the PIC's IRR.

    Summary:
    - this bug really was harmless
    - statistics for interrupt handling are more broken than I thought.

    Bruce
    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"


  • Next message: Sergey Lyubka: "Re: crash when bpf is used heavily"

    Relevant Pages

    • Re: 7-CURRENT-SNAP009-i386-bootonly.iso on Shuttle XPC w/ AMD X2 (was Re: Side note on Shuttle XPC)
      ... You have to catch the stray ... > interrupt vector for every interrupt controller in the system. ... > interrupt also causing a stray interrupt (due to the double INT A cycle). ... > With regards to the LAPICs the story is slightly better. ...
      (freebsd-current)
    • Re: 7-CURRENT-SNAP009-i386-bootonly.iso on Shuttle XPC w/ AMD X2 (was Re: Side note on Shuttle XPC)
      ... :> a spurious ICU interrupt. ... :> such that the spurious ICU vectors get sent to the APIC spurious interrupt ... :Does this imply that the 'correct' fix involves catching the stray ICU ... interrupt also causing a stray interrupt (due to the double INT A cycle). ...
      (freebsd-current)
    • Re: Stray irq7.
      ... The first is a normal consequence of the npx probe for the non-SMP ... Either there is an interrupt pending when irq13 is enabled (due to ISA ...
      (freebsd-current)
    • Re: stray irq13 at runtime
      ... > hours before the interrupt occurs. ... There's some bug in APIC mode that causes a stray irq13 to be delivered ... The reasons for using fnop instead of fwait don't seem ... delivery so that irq13's don't get seen by the wrong thread (if they are ...
      (freebsd-current)