Re: dc0: watchdog timeout and nve0: device timeout



On Tue, 31 Jan 2006 11:30:02 +0300, Gleb Smirnoff wrote:
> On Tue, Jan 31, 2006 at 03:08:03AM -0500, Anish Mistry wrote:
> A> After updating to STABLE today I'm getting the following message with
> A> my dc and nve NICs every few seconds. UP, AMD64. A kernel from last
> A> Thursday was fine.
> A>
> A> dc0: watchdog timeout
> A> nve0: device timeout (4)
>
> Can you try to backout the code in sys/dev/pci to Thursday? If this
> doesn't help, you probably need to do a binary search in this small
> timeframe.

I think I found the problem - the merge was not quite correct, and
the PCI interrupt rerouting was disabled for some reason.

Warner, is there a reason for hiding the "Try to re-route interrupts"
code behind an apparently "ifdef 0" case? Well, okay, most probably
there is a reason, since you've done it, but... it breaks my re0 card
and it also seems to break Anish's hardware :)

BTW, the commit message was not quite correct - rev. 1.302 was not
really merged, it's included in my patch here. Also, rev. 1.305 of
pci.c seems to have more than just adding the PCI_FIND_EXTCAP method -
there are a couple of offset fixes that I also included in the patch
while trying to come as close to the -CURRENT code as possible; could
you check if they actually apply to -STABLE?

Anyway, here's a patch that fixes it for me, although most probably
the __PCI_REROUTE_INTERRUPT chunk should be sufficient. Warner, if
you want more details, I could help with debugging this - on my
system, the re0 card definitely needs this rerouting. I've posted
some verbose boot output with explanations at
http://people.FreeBSD.org/~roam/pcirouting/
The patch itself is also there in case it gets munged by the mail
swervers along the way.

Index: src/sys/dev/pci/pci.c
===================================================================
RCS file: /home/ncvs/src/sys/dev/pci/pci.c,v
retrieving revision 1.292.2.6
diff -u -r1.292.2.6 pci.c
--- src/sys/dev/pci/pci.c 30 Jan 2006 18:42:10 -0000 1.292.2.6
+++ src/sys/dev/pci/pci.c 31 Jan 2006 10:57:32 -0000
@@ -428,7 +428,7 @@
ptrptr = PCIR_CAP_PTR;
break;
case 2:
- ptrptr = 0x14;
+ ptrptr = PCIR_CAP_PTR_2;
break;
default:
return; /* no extended capabilities support */
@@ -447,10 +447,10 @@
}
/* Find the next entry */
ptr = nextptr;
- nextptr = REG(ptr + 1, 1);
+ nextptr = REG(ptr + PCICAP_NEXTPTR, 1);

/* Process this entry */
- switch (REG(ptr, 1)) {
+ switch (REG(ptr + PCICAP_ID, 1)) {
case PCIY_PMG: /* PCI power management */
if (cfg->pp.pp_cap == 0) {
cfg->pp.pp_cap = REG(ptr + PCIR_POWER_CAP, 2);
@@ -1040,7 +1040,8 @@
}

if (cfg->intpin > 0 && PCI_INTERRUPT_VALID(cfg->intline)) {
-#ifdef __PCI_REROUTE_INTERRUPT
+#if defined(__ia64__) || defined(__i386__) || defined(__amd64__) || \
+ defined(__arm__) || defined(__alpha__)
/*
* Try to re-route interrupts. Sometimes the BIOS or
* firmware may leave bogus values in these registers.

Hope this helps!

G'luck,
Peter

--
Peter Pentchev roam@xxxxxxxxxxx roam@xxxxxxxx roam@xxxxxxxxxxx
PGP key: http://people.FreeBSD.org/~roam/roam.key.asc
Key fingerprint FDBA FD79 C26F 3C51 C95E DF9E ED18 B68D 1619 4553
"yields falsehood, when appended to its quotation." yields falsehood, when appended to its quotation.

Attachment:pgpw92N6f5LT6.pgp
Description: PGP signature



Relevant Pages

  • Re: [patch] lockf(3) user-exploitable kernel panic
    ... I know my patch fails ... that libutil tries to provide this interface. ... The reason I asked was because I don't have access to many boxes of ... different architectures or operating systems. ...
    (freebsd-arch)
  • Re: PATCH/RFC: [kdump] fix APIC shutdown sequence
    ... this is correct behavior and it is just specific to level ... Even if my patch in the form in which I submitted it is unusable, ... Or is there any specific reason why the current code does it vice-versa? ... PRIMERGY System Software Engineer ...
    (Linux-Kernel)
  • Re: Why are so many built-in types inheritable?
    ... reason why FunctionType is not subclassable is that nobody bothered to ... why is there a need for such a patch? ... The reason why it doesn't work then seems to boil down to the ... I know about practicality beating purity, ...
    (comp.lang.python)
  • Re: [PATCH 0/4] add task handling notifier
    ... For some reason neither ever made a lot of progess (performance ... it adds runtime overhead purely for the convenience of kernel ... While I (obviously, since I submitted the patch disagree), ...
    (Linux-Kernel)
  • Re: EGNOS working at last!
    ... EGNOS satellites, so it may be worth you checking their site to see if ... there's a patch for your model. ... The reason it's so difficult and unreliable right now is that it isn't fully ... it seems a lot of money to spend on whizz-bang ...
    (uk.rec.walking)