Re: dc0: watchdog timeout and nve0: device timeout



In message: <20060131112447.GA1173@xxxxxxxxxxxxxxxxxxxxxxxx>
Peter Pentchev <roam@xxxxxxxxxxx> writes:
: On Tue, 31 Jan 2006 11:30:02 +0300, Gleb Smirnoff wrote:
: > On Tue, Jan 31, 2006 at 03:08:03AM -0500, Anish Mistry wrote:
: > A> After updating to STABLE today I'm getting the following message with
: > A> my dc and nve NICs every few seconds. UP, AMD64. A kernel from last
: > A> Thursday was fine.
: > A>
: > A> dc0: watchdog timeout
: > A> nve0: device timeout (4)
: >
: > Can you try to backout the code in sys/dev/pci to Thursday? If this
: > doesn't help, you probably need to do a binary search in this small
: > timeframe.
:
: I think I found the problem - the merge was not quite correct, and
: the PCI interrupt rerouting was disabled for some reason.
:
: Warner, is there a reason for hiding the "Try to re-route interrupts"
: code behind an apparently "ifdef 0" case? Well, okay, most probably
: there is a reason, since you've done it, but... it breaks my re0 card
: and it also seems to break Anish's hardware :)

I'm pretty sure that's the problem. I thought I'd specifically
checked to make sure that I didn't merge this :-(

: BTW, the commit message was not quite correct - rev. 1.302 was not
: really merged, it's included in my patch here. Also, rev. 1.305 of
: pci.c seems to have more than just adding the PCI_FIND_EXTCAP method -
: there are a couple of offset fixes that I also included in the patch
: while trying to come as close to the -CURRENT code as possible; could
: you check if they actually apply to -STABLE?

They do.

: Anyway, here's a patch that fixes it for me, although most probably
: the __PCI_REROUTE_INTERRUPT chunk should be sufficient. Warner, if
: you want more details, I could help with debugging this - on my
: system, the re0 card definitely needs this rerouting. I've posted
: some verbose boot output with explanations at
: http://people.FreeBSD.org/~roam/pcirouting/
: The patch itself is also there in case it gets munged by the mail
: swervers along the way.
:
: Index: src/sys/dev/pci/pci.c
: ===================================================================
: RCS file: /home/ncvs/src/sys/dev/pci/pci.c,v
: retrieving revision 1.292.2.6
: diff -u -r1.292.2.6 pci.c
: --- src/sys/dev/pci/pci.c 30 Jan 2006 18:42:10 -0000 1.292.2.6
: +++ src/sys/dev/pci/pci.c 31 Jan 2006 10:57:32 -0000
: @@ -428,7 +428,7 @@
: ptrptr = PCIR_CAP_PTR;
: break;
: case 2:
: - ptrptr = 0x14;
: + ptrptr = PCIR_CAP_PTR_2;
: break;
: default:
: return; /* no extended capabilities support */
: @@ -447,10 +447,10 @@
: }
: /* Find the next entry */
: ptr = nextptr;
: - nextptr = REG(ptr + 1, 1);
: + nextptr = REG(ptr + PCICAP_NEXTPTR, 1);
:
: /* Process this entry */
: - switch (REG(ptr, 1)) {
: + switch (REG(ptr + PCICAP_ID, 1)) {
: case PCIY_PMG: /* PCI power management */
: if (cfg->pp.pp_cap == 0) {
: cfg->pp.pp_cap = REG(ptr + PCIR_POWER_CAP, 2);
: @@ -1040,7 +1040,8 @@
: }
:
: if (cfg->intpin > 0 && PCI_INTERRUPT_VALID(cfg->intline)) {
: -#ifdef __PCI_REROUTE_INTERRUPT
: +#if defined(__ia64__) || defined(__i386__) || defined(__amd64__) || \
: + defined(__arm__) || defined(__alpha__)
: /*
: * Try to re-route interrupts. Sometimes the BIOS or
: * firmware may leave bogus values in these registers.
:
: Hope this helps!

I'm pretty sure that the REROUTE thing is the only one. That
shouldn't have been committed, and I thought I'd checked it
specifically before the commit, but I just checked what I committed
and it slipped by. This fits with the symptoms that I saw my server
last night (the only differences between a stable boot and an older
stable boot was IRQs).

The last part of this patch seems to fix things for me.

Warner
_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: [GIT PATCH] another tranche of SCSI updates for 2.6.26
    ... commit 064922a805ec7aadfafdd27aa6b4908d737c3c1d ... This patch adds more const keywords where appropriate. ... fix SLUB WARN_ON ... KVM: SVM: remove now obsolete FIXME comment ...
    (Linux-Kernel)
  • Re: Linux 2.6.21-rc6
    ... [PATCH] ... Change code ordering in disk.c ... The changes here only affect the built-in swsusp. ... the remaining test is to try reverting this commit from -rc6. ...
    (Linux-Kernel)
  • Re: [PATCH] [1/2many] - FInd the maintainer(s) for a patch - scripts/get_maintainer.pl
    ... MAINTAINERS information, we have 3 options: ... There are things git can help, and other things git does not ... Linus already gave a script to grep *-by: lines from commit ... integration to git-based patch flow. ...
    (Linux-Kernel)
  • Re: Linux 2.6.21-rc6
    ... [PATCH] ... Change code ordering in disk.c ... The changes here only affect the built-in swsusp. ... the remaining test is to try reverting this commit from -rc6. ...
    (Linux-Kernel)
  • Re: .version keeps being updated
    ... 8993780a6e44fb4e7ed34e33458506a775356c6e is first bad commit ... Reverting this from 2.6.20-rc1 made the build behave again, ... this patch started all the mess: ... Fix linux banner utsname information ...
    (Linux-Kernel)