Re: suspect bug in vge(4)
- From: Pyun YongHyeon <pyunyh@xxxxxxxxx>
- Date: Wed, 10 Jun 2009 11:49:59 +0900
On Tue, Jun 09, 2009 at 02:12:09AM +0200, Thomas Lotterer wrote:
I need advice hunting down a network problem which I suspect to be
a bug in the vge(4) driver. After spending a lot of time on
investigation, I'm out of ideas
My recently built new home server running FreeBSD 8.0-CURRENT as of
2009-06-07 on a VIA ARTiGO A2000 [1] exhibits network problems when
sending more than a couple of dozened kilobytes of TCP traffic.
The server application is "Dovecot" [2] Secure IMAP server.
The client application is "Thunderbird" [3] running on WindowsXP.
The high-level view of the problem is that the client seems to stall
downloading messages or even a complex structure of IMAP folder names.
When using STARTTLS the client often prints the infamous generic and
misleading error "Thunderbird received a message with incorrect Message
Authentication Code. If the error occurs frequently, contact the website
administrator". The origin of this message is the SSL library that ships
with Thunderbird. The same library is used for Firefox where the hint
might actually make sense when the user is attempting to access a broken
HTTPS server. After lots of debugging I found out that the same error is
not only printed for TLS/SSL issues but simply also for broken TCP
streams, let it be wrong TCP checksums or a server process dumping core.
So I tried IMAP without TLS just to see the same issue with the
misleading SSL error replaced by an application hang. I ran truss(1)
against Dovecot, placed Thunderbird in debug mode [4] and found out that
during a stall condition the server did write(2) all the data to the TCP
socket but some data did not arrive at the client.
The low-level view of the problem is that Wireshark on the client side
sooner or later - not for the first few dozened packets - sees a packet
with an incorrect TCP checksum. Usually the next packet is from the
server again, continuing the stream. What follows is an expected but
fruitless attempt of the client sending duplicate ACKs for the last good
packet but the server incorrectly retransmitting more TCP packets with
bad checksums.
To me it sounds like a broken implementation of hardware generated
checksums. Trying to disable all the "-tso" "-lro" "-txcsum" "-rxcsum"
options and using "polling" option on the server side network interface
did not help. So either something deeper is broken or maybe just the
ability to disable these features needs fixing. Btw, the client using
"VMware Accelerated AMD PCNet Adapter" driver with "TCP/IP Offload=off"
and "TsoEnable=0".
Sorry to bother you with more details but here's why I believe it's an
hardware/driver issue. Before I purchased the hardware I tried a dry
run. Installed FreeBSD 7.1-RELEASE as VM guest, then upgraded to FreeBSD
8.0-CURRENT using FreeBSD Administration Toolkit [5]. Built OS and apps
from source, loaded my data - worked! Used the same client that has
problems with the real hardware today. Then used that VM as build host
to create the NanoBSD [6] Flash image for the ARTiGO. Both use exactly
the same sources. The VM works, the metal is broken. One of the few
differences is the NIC and it's driver. As a workaround I copied the VM
to a usual PC equipped with a fxp(4) NIC - worked! So it really looks
like an OS/HW compatibility issue on the ARTiGO.
In case you are considering a hardware defect please note that before I
loaded the OS, apps and my data to this new hardware I thoroughly tested
what I could. One week filling the disks to the max using repetitive
copies of a file created from /dev/random and, after manually breaking
and rebuilding ZFS mirror, checking data integrity using message
digests. No problems with disks, albeit poor SATA performance, but
that's another story. One day running memtest86 [7]. No problems with
memory. One hour NIC test copying /dev/zero to /dev/null over the wire
using "scp -o compression=no". No hangs or hiccups here.
Hope you can help me.
I already know there are possible edge-cases in vge(4) but your
issue looks quite different one than ever reported. Unfortunately
vge(4) hardware I had was broken so I couldn't complete overhauling
the vge(4). The code in the following URL is the latest WIP version
but I don't know whether it fixes the issue as it wasn't tested at
all on real hardware.
http://people.freebsd.org/~yongari/vge/if_vge.c
http://people.freebsd.org/~yongari/vge/if_vgereg.h
http://people.freebsd.org/~yongari/vge/if_vgevar.h
_______________________________________________
freebsd-current@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@xxxxxxxxxxx"
- Follow-Ups:
- Re: suspect bug in vge(4)
- From: Thomas Lotterer
- Re: suspect bug in vge(4)
- References:
- suspect bug in vge(4)
- From: Thomas Lotterer
- suspect bug in vge(4)
- Prev by Date: suspect bug in vge(4)
- Next by Date: Re: FS utils treates directories as files?
- Previous by thread: suspect bug in vge(4)
- Next by thread: Re: suspect bug in vge(4)
- Index(es):
Relevant Pages
|