Re: dc0: watchdog timeout and nve0: device timeout



In message: <20060131112447.GA1173@xxxxxxxxxxxxxxxxxxxxxxxx>
Peter Pentchev <roam@xxxxxxxxxxx> writes:
: On Tue, 31 Jan 2006 11:30:02 +0300, Gleb Smirnoff wrote:
: > On Tue, Jan 31, 2006 at 03:08:03AM -0500, Anish Mistry wrote:
: > A> After updating to STABLE today I'm getting the following message with
: > A> my dc and nve NICs every few seconds. UP, AMD64. A kernel from last
: > A> Thursday was fine.
: > A>
: > A> dc0: watchdog timeout
: > A> nve0: device timeout (4)
: >
: > Can you try to backout the code in sys/dev/pci to Thursday? If this
: > doesn't help, you probably need to do a binary search in this small
: > timeframe.
:
: I think I found the problem - the merge was not quite correct, and
: the PCI interrupt rerouting was disabled for some reason.
:
: Warner, is there a reason for hiding the "Try to re-route interrupts"
: code behind an apparently "ifdef 0" case? Well, okay, most probably
: there is a reason, since you've done it, but... it breaks my re0 card
: and it also seems to break Anish's hardware :)

I'm pretty sure that's the problem. I thought I'd specifically
checked to make sure that I didn't merge this :-(

: BTW, the commit message was not quite correct - rev. 1.302 was not
: really merged, it's included in my patch here. Also, rev. 1.305 of
: pci.c seems to have more than just adding the PCI_FIND_EXTCAP method -
: there are a couple of offset fixes that I also included in the patch
: while trying to come as close to the -CURRENT code as possible; could
: you check if they actually apply to -STABLE?

They do.

: Anyway, here's a patch that fixes it for me, although most probably
: the __PCI_REROUTE_INTERRUPT chunk should be sufficient. Warner, if
: you want more details, I could help with debugging this - on my
: system, the re0 card definitely needs this rerouting. I've posted
: some verbose boot output with explanations at
: http://people.FreeBSD.org/~roam/pcirouting/
: The patch itself is also there in case it gets munged by the mail
: swervers along the way.
:
: Index: src/sys/dev/pci/pci.c
: ===================================================================
: RCS file: /home/ncvs/src/sys/dev/pci/pci.c,v
: retrieving revision 1.292.2.6
: diff -u -r1.292.2.6 pci.c
: --- src/sys/dev/pci/pci.c 30 Jan 2006 18:42:10 -0000 1.292.2.6
: +++ src/sys/dev/pci/pci.c 31 Jan 2006 10:57:32 -0000
: @@ -428,7 +428,7 @@
: ptrptr = PCIR_CAP_PTR;
: break;
: case 2:
: - ptrptr = 0x14;
: + ptrptr = PCIR_CAP_PTR_2;
: break;
: default:
: return; /* no extended capabilities support */
: @@ -447,10 +447,10 @@
: }
: /* Find the next entry */
: ptr = nextptr;
: - nextptr = REG(ptr + 1, 1);
: + nextptr = REG(ptr + PCICAP_NEXTPTR, 1);
:
: /* Process this entry */
: - switch (REG(ptr, 1)) {
: + switch (REG(ptr + PCICAP_ID, 1)) {
: case PCIY_PMG: /* PCI power management */
: if (cfg->pp.pp_cap == 0) {
: cfg->pp.pp_cap = REG(ptr + PCIR_POWER_CAP, 2);
: @@ -1040,7 +1040,8 @@
: }
:
: if (cfg->intpin > 0 && PCI_INTERRUPT_VALID(cfg->intline)) {
: -#ifdef __PCI_REROUTE_INTERRUPT
: +#if defined(__ia64__) || defined(__i386__) || defined(__amd64__) || \
: + defined(__arm__) || defined(__alpha__)
: /*
: * Try to re-route interrupts. Sometimes the BIOS or
: * firmware may leave bogus values in these registers.
:
: Hope this helps!

I'm pretty sure that the REROUTE thing is the only one. That
shouldn't have been committed, and I thought I'd checked it
specifically before the commit, but I just checked what I committed
and it slipped by. This fits with the symptoms that I saw my server
last night (the only differences between a stable boot and an older
stable boot was IRQs).

The last part of this patch seems to fix things for me.

Warner
_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: NEWBUS states
    ... :>: I think the right fix here is to maintain in sync devinfo.h and bus.h ... Obviously that is not a finished patch, ... I'm specifically suggesting that we only MFC ... that would mean to not commit a patch and make impossible a future ...
    (freebsd-arch)
  • Re: [GIT PATCH] another tranche of SCSI updates for 2.6.26
    ... commit 064922a805ec7aadfafdd27aa6b4908d737c3c1d ... This patch adds more const keywords where appropriate. ... fix SLUB WARN_ON ... KVM: SVM: remove now obsolete FIXME comment ...
    (Linux-Kernel)
  • Re: Impact: (was Re: [PATCH] update rwlock initialization for nat_table)
    ... The commit e099a173573ce1ba171092aee7bb3c72ea686e59 ... As Andrew mentioned this is a bug (albeit a "nano-bug" as you ... hard to describe the practical impact of a patch in a single line, ... Try it if you dont believe me;-) In ...
    (Linux-Kernel)
  • Re: [Bug #14141] order 2 page allocation failures in iwlagn
    ... *only* printed when the RX buffers are really low. ... that patch triggers them in the extreme swap situation. ... but could not find an obvious candidate within the page allocator itself which ... this driver and I think we have the problem commit there. ...
    (Linux-Kernel)
  • Re: [PATCH] [1/2many] - FInd the maintainer(s) for a patch - scripts/get_maintainer.pl
    ... MAINTAINERS information, we have 3 options: ... There are things git can help, and other things git does not ... Linus already gave a script to grep *-by: lines from commit ... integration to git-based patch flow. ...
    (Linux-Kernel)