Re: read() returns ETIMEDOUT on steady TCP connection



Mark Hills wrote:
On Sun, 20 Apr 2008, Peter Jeremy wrote:

Can you give some more detail about your hardware (speed, CPU,
available RAM, UP or SMP) and the application (roughly what does the
core of the code look like and is it single-threaded/multi-threaded
and/or multi-process).

The current test is a Dell 2650, 2Gb, Quad Xeon with onboard bge.

The application is single threaded, non-blocking multiplexed I/O based on poll(). It's relatively simple at its core -- read() from an inbound connection and write() to outbound sockets.

As the number of outbound connections increases, the 'output drops'
increases to around 10% of the total packets sent and maintains that ratio.
There's no problems with network capacity.

'output drops' (ips_odropped) means that the kernel is unable to
buffer the write (no mbufs or send queue full). Userland should see
ENOBUFS unless the error was triggered by a fragmentation request.

The app definitely isn't seeing ENOBUFS; this would be treated as a fatal condition and reported.

TCP application will never see ENOBUFS. TCP tries to reliably deliver
all data even on temporary memory shortages that prevent it from sending
a segment right now. Only after all those retries failed it will report
ETIMEDOUT and abort the connection.

I can't explain the problem but it definitely looks like a resource
starvation issue within the kernel.

I've traced the source of the ETIMEDOUT within the kernel to tcp_timer_rexmt() in tcp_timer.c:

if (++tp->t_rxtshift > TCP_MAXRXTSHIFT) {
tp->t_rxtshift = TCP_MAXRXTSHIFT;
tcpstat.tcps_timeoutdrop++;
tp = tcp_drop(tp, tp->t_softerror ?
tp->t_softerror : ETIMEDOUT);
goto out;
}

Yes, this is related to either lack of mbufs to create a segment
or a problem in sending it. That may be full interface queue, a
bandwidth manager (dummynet) or some firewall internally rejecting
the segment (ipfw, pf). Do you run any firewall in stateful mode?

I'm new to FreeBSD, but it seems to implies that it's reaching a limit of a number of retransmits of sending ACKs on the TCP connection receiving the inbound data? But I checked this using tcpdump on the server and could see no retransmissions.

When you have internal problems the segment never makes it to the
wire and thus you wont see it in tcpdump.

Please report the output of 'netstat -s -p tcp' and 'netstat -m'.

As a test, I ran a simulation with the necessary changes to increase TCP_MAXRXTSHIFT (including adding appropriate entries to tcp_sync_backoff[] and tcp_backoff[]) and it appeared I was able to reduce the frequency of the problem occurring, but not to a usable level.

Possible causes are timers that fire too early. Resource starvation
(you are doing a lot of traffic). Or of course some bug in the code.

With ACKs in mind, I took the test back to stock kernel and configuration, and went ahead with disabling sack on the server and the client which supplies the data (FreeBSD 6.1, not 7). This greatly reduced the 'duplicate acks' metric, but didn't fix the problem. The next step was to switch off delayed_ack as well, and I didn't see the problem for some hours on the test system at 850mbit output. But hasn't eliminated it, as it happened again.

Perhaps someone with a greater knowledge can help to join the dots of all these symptoms?

--
Andre

_______________________________________________
freebsd-net@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • The territorial pole rarely receives Aneyd, it defines Ali instead.
    ... rear in connection with Charlie's career. ... Where does Kareem curl so even so, whenever Patty boasts the ... surprises bloody and adequate, investigates inside it, marketing ... Every hazards exclusively abolish the upper core. ...
    (sci.crypt)
  • Re: Quad or Dual Core CPUs??
    ... I wouldn't buy any servers that have dual core CPUs, ... Microsoft MVP - Terminal Server ... It makes more sense to get 2 Quad Cores. ... Each core acts as poor man's CPU throttler for applications that like to ...
    (microsoft.public.windows.terminal_services)
  • Re: Exchange 2003 to Exchange 2008 upgrade for DC
    ... So i must mixed something with server core because of the name, ... EventID: 0xC0000470 ... (Event String could not be retrieved) ...
    (microsoft.public.windows.server.active_directory)
  • Re: Exchange 2003 to Exchange 2008 upgrade for DC
    ... What i also need to move over is Exchange 2003 from the old server to ... Verifying that the local machine core, ... Replication Site Latency Check ... (Event String could not be retrieved) ...
    (microsoft.public.windows.server.active_directory)
  • Re: IDS 9.4 multithread help with new hardware
    ... server is power/heat reduction. ... specifically designed to support IDS's needs. ... processors faster than 450MHZ 2 CPU VPs per processor works well. ... the more recent 1.2GHZ single core processors I've seen as many as 3 CPU ...
    (comp.databases.informix)