Re: nve timeout (and down) regression?
- From: "David G. Lawrence" <dg@xxxxxxxxxxxxxx>
- Date: Sat, 25 Mar 2006 02:54:38 -0800
This happens w/o any "real" activity on that interface (which goes into
an Allied Telesyn switch):
.......
Mar 24 19:39:54 worf kernel: nve0: device timeout (1)
Mar 24 19:39:54 worf kernel: nve0: link state changed to DOWN
Mar 24 19:39:55 worf kernel: nve0: link state changed to UP
Mar 24 19:40:14 worf kernel: nve0: device timeout (1)
The problem is the watchdog timeout itself. I've attached am email that
I sent a few months ago which describes the problem, along with a simple
patch which disables the watchdog timer.
-DG
David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
Date: Wed, 4 Jan 2006 16:21:03 -0800
Subject: Re: nve(4) patch - please test!
Since I sent the mail below I had to discover that the new driver
has a problem when no cable is plugged in, at least on my Asus board.
It doesn't only run into timeouts, during some of these timeout the
machine or at least the keyboard hangs for about a minute.
Is there anything I can do to help debug this?
I ran into this problem recently as well and spent some time diagnosing
it. It's not that the cable isn't plugged in - rather it happens whenever
the traffic levels are low.
The problem is that the nvidia-supplied portion of the driver is defering
the releasing of the completed transmit buffers and this occasionally
results in if_timer expiring, causing the driver watchdog routine to be
called ("device timeout"). The watchdog routine resets the card and the
nvidia-supplied code sits in a high-priority loop waiting for the card
to reset. This can take many seconds and your system will be hung until
it completes.
I have a work-around patch for the problem that I've attached to this
email. It simply disables the watchdog. A real fix would involve accounting
for the outstanding transmit buffers differently (or perhaps not at all -
e.g. always attempt to call the nvidia-supplied code and if a queue-full
error occurs, then wait for an interrupt before trying to queue more
transmit packets).
-DG
David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
Index: if_nve.c
===================================================================
RCS file: /home/ncvs/src/sys/dev/nve/if_nve.c,v
retrieving revision 1.7.2.8
diff -c -r1.7.2.8 if_nve.c
*** if_nve.c 25 Dec 2005 21:57:03 -0000 1.7.2.8
--- if_nve.c 5 Jan 2006 00:12:45 -0000
***************
*** 943,949 ****
return;
}
/* Set watchdog timer. */
! ifp->if_timer = 8;
/* Copy packet to BPF tap */
BPF_MTAP(ifp, m0);
--- 943,949 ----
return;
}
/* Set watchdog timer. */
! ifp->if_timer = 0;
/* Copy packet to BPF tap */
BPF_MTAP(ifp, m0);
_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"
- References:
- Re: nve timeout (and down) regression?
- From: Bjoern A. Zeeb
- Re: nve timeout (and down) regression?
- From: Kevin Oberman
- Re: nve timeout (and down) regression?
- From: Ion-Mihai Tetcu
- Re: nve timeout (and down) regression?
- Prev by Date: Re: nve timeout (and down) regression?
- Next by Date: Re: nve timeout (and down) regression?
- Previous by thread: Re: new sk driver [was: nve timeout (and down) regression?]
- Next by thread: Re: nve timeout (and down) regression?
- Index(es):
Relevant Pages
|