Re: read() returns ETIMEDOUT on steady TCP connection



Andre Oppermann wrote:
Mark Hills wrote:
On Mon, 21 Apr 2008, Andre Oppermann wrote:

Mark Hills wrote:
On Sun, 20 Apr 2008, Peter Jeremy wrote:

I can't explain the problem but it definitely looks like a resource
starvation issue within the kernel.

I've traced the source of the ETIMEDOUT within the kernel to tcp_timer_rexmt() in tcp_timer.c:

if (++tp->t_rxtshift > TCP_MAXRXTSHIFT) {
tp->t_rxtshift = TCP_MAXRXTSHIFT;
tcpstat.tcps_timeoutdrop++;
tp = tcp_drop(tp, tp->t_softerror ?
tp->t_softerror : ETIMEDOUT);
goto out;
}

Yes, this is related to either lack of mbufs to create a segment
or a problem in sending it. That may be full interface queue, a
bandwidth manager (dummynet) or some firewall internally rejecting
the segment (ipfw, pf). Do you run any firewall in stateful mode?

There's no firewall running.

I'm new to FreeBSD, but it seems to implies that it's reaching a limit of a number of retransmits of sending ACKs on the TCP connection receiving the inbound data? But I checked this using tcpdump on the server and could see no retransmissions.

When you have internal problems the segment never makes it to the
wire and thus you wont see it in tcpdump.

Please report the output of 'netstat -s -p tcp' and 'netstat -m'.

Posted below. You can see it it in there: "131 connections dropped by rexmit timeout"

As a test, I ran a simulation with the necessary changes to increase TCP_MAXRXTSHIFT (including adding appropriate entries to tcp_sync_backoff[] and tcp_backoff[]) and it appeared I was able to reduce the frequency of the problem occurring, but not to a usable level.

Possible causes are timers that fire too early. Resource starvation
(you are doing a lot of traffic). Or of course some bug in the code.

As I said in my original email, the data transfer doesn't stop or splutter, it's simply cut mid-flow. Sounds like something happening prematurely.

Thanks for the help,

The output doesn't show any obvious problems. I have to write some
debug code to run on your system. I'll do that later today if time
permits. Otherwise tomorrow.

http://people.freebsd.org/~andre/tcp_output-error-log.diff

Please apply this patch and enable the sysctl net.inet.tcp.log_debug=1
and report any output. You likely get some (normal) noise from syncache.
What we are looking for is reports from tcp_output.

--
Andre
_______________________________________________
freebsd-net@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Oops with "linux-2.4.29"
    ... Kernel "linux-2.4.29" oopses irregularly. ... We are running software RAID1 on two SCSI hard disks, ... I didn't have the time to write a bug report ... linux kernel 2.4.29 oops ext3 I/O high load SCSI ...
    (Linux-Kernel)
  • [RFC][PATCH] Update REPORTING-BUGS (rev. 2)
    ... want to report a kernel bug for the first time. ... Having a standardized bug report form makes it easier for you not to ... +three mailing lists simultaneously, if you think that it is necessry. ...
    (Linux-Kernel)
  • [BUG] Oops in 2.6.24-rc2-mm1
    ... I am new to the kernel and this is my first report. ... If I had made any mistakes with this report, ... # Firmware Drivers ... # SCSI support type ...
    (Linux-Kernel)
  • Re: Updates caused problems...
    ... > 1) Kernel 2.6.26.3-29.f9.i686: ... > 3) The livna issue is a seperate issue - and is not a show-stopper as I have ... and remains and is unique to Fedora ... > perhaps a link, perhaps what to report, with what expected information, ...
    (Fedora)
  • Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
    ... Harry Edmon wrote: ... The application is the LDM system from UCAR/Unidata. ... the 2.6.16.20 kernel falls way behind with the data ingestion. ... Perhaps a tcpdump of the net traffic will help to determine what's going on. ...
    (Linux-Kernel)