Re: nve timeout (and down) regression?



This happens w/o any "real" activity on that interface (which goes into
an Allied Telesyn switch):
.......
Mar 24 19:39:54 worf kernel: nve0: device timeout (1)
Mar 24 19:39:54 worf kernel: nve0: link state changed to DOWN
Mar 24 19:39:55 worf kernel: nve0: link state changed to UP
Mar 24 19:40:14 worf kernel: nve0: device timeout (1)

The problem is the watchdog timeout itself. I've attached am email that
I sent a few months ago which describes the problem, along with a simple
patch which disables the watchdog timer.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.

Date: Wed, 4 Jan 2006 16:21:03 -0800
Subject: Re: nve(4) patch - please test!

Since I sent the mail below I had to discover that the new driver
has a problem when no cable is plugged in, at least on my Asus board.

It doesn't only run into timeouts, during some of these timeout the
machine or at least the keyboard hangs for about a minute.

Is there anything I can do to help debug this?

I ran into this problem recently as well and spent some time diagnosing
it. It's not that the cable isn't plugged in - rather it happens whenever
the traffic levels are low.
The problem is that the nvidia-supplied portion of the driver is defering
the releasing of the completed transmit buffers and this occasionally
results in if_timer expiring, causing the driver watchdog routine to be
called ("device timeout"). The watchdog routine resets the card and the
nvidia-supplied code sits in a high-priority loop waiting for the card
to reset. This can take many seconds and your system will be hung until
it completes.
I have a work-around patch for the problem that I've attached to this
email. It simply disables the watchdog. A real fix would involve accounting
for the outstanding transmit buffers differently (or perhaps not at all -
e.g. always attempt to call the nvidia-supplied code and if a queue-full
error occurs, then wait for an interrupt before trying to queue more
transmit packets).

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.

Index: if_nve.c
===================================================================
RCS file: /home/ncvs/src/sys/dev/nve/if_nve.c,v
retrieving revision 1.7.2.8
diff -c -r1.7.2.8 if_nve.c
*** if_nve.c 25 Dec 2005 21:57:03 -0000 1.7.2.8
--- if_nve.c 5 Jan 2006 00:12:45 -0000
***************
*** 943,949 ****
return;
}
/* Set watchdog timer. */
! ifp->if_timer = 8;

/* Copy packet to BPF tap */
BPF_MTAP(ifp, m0);
--- 943,949 ----
return;
}
/* Set watchdog timer. */
! ifp->if_timer = 0;

/* Copy packet to BPF tap */
BPF_MTAP(ifp, m0);
_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: [PATCH] watchdog: Add driver for Altera Watchdog Timer
    ... To compile this driver as a loadable module, ... I've been writing a driver for a watchdog that also cannot be stopped ... timeout that the watchdog uses so it knows how often to kick it. ... use the software timeout heartbeat then in the future you could emulate ...
    (Linux-Kernel)
  • Re: [WATCHDOG] iTCO_wdt.c - ICH9 reboot issue - testing wanted
    ... Wim Van Sebroeck wrote: ... For people not using the watchdog or without any reboot problems the driver should ... There seems to be a bug into the SMM code that handles TCO Timeout SMI. ...
    (Linux-Kernel)
  • Re: [PATCH] ppc32: Added support for the Book-E style Watchdog Timer
    ... The timer has implementation dependent timeout ... >> One the first Watchdog timeout we get a critical exception. ... >> The Watchdog Timer Period meaning is implementation specific. ...
    (Linux-Kernel)
  • ichwd: option to globally disable SMI
    ... Our ichwd driver attempts to disable TCO SMI generation to avoid potentially ... unhelpful SMI handler's handling of watchdog timeout. ... any side-effects from disabling SMI and watchdog works as expected. ...
    (freebsd-current)