odd TCP rtt/retransmit timeout issue...



I was brining up another interface that I just added to /etc/rc.conf and
ran the command /etc/rc.d/netif start to initalize it... But then my
connection never came back.... I found that the shell was still active
as I could type commands like sleep 5, and another session's w would
see sleep 5 run on the session... even filling up the send-q w/ 32k
of data didn't get the HEAD box to send any data to the client...

With the help of silby, I managed to find that the t_rxtcur value in
the tcpcb was getting a very large value. The session that hung had
a retransmit timeout of 19 days... This led us to find that the
TCPT_RANGESET macro was letting very large tvmin values override the
more sane tvmax values due to an extra else. I have added that so
we shouldn't see any more multi day timeouts, but we still apparently
have a problem where the rtt value calculated is wildly incorrect...

It appears that each connection will get a different "random" rtt
values... From a few connections to my machine:
(kgdb) print ((struct tcpcb *)0xc3a34af8)->t_rxtcur
$3 = 64000
(kgdb) print ((struct tcpcb *)0xc3a3457c)->t_rxtcur
$6 = 1662654093
(kgdb) print ((struct tcpcb *)0xc3a343a8)->t_rxtcur
$12 = 1358
(kgdb) print ((struct tcpcb *)0xc3a9e1d4)->t_rxtcur
$17 = 203
(kgdb) print ((struct tcpcb *)0xc3a9e000)->t_rxtcur
$19 = 284155863

most connections are stable around the "picked" value, though I have
seen some connections oscillate between 64000 and a really large value..

I was trying to track this down, and a kernel as of 9/17 exhibits the
problem, but I managed to track it down to a RELENG_6 commit (which
obviously would effect HEAD) when I realized that each connection got
a different value, and my older tests I was getting lucky in not having
a bad timeout...

To obtain these values, I used kgdb kernel /dev/mem, and put the value
returned by netstat -Aanfinet's first column in as the tcpcb pointer
above.. (Why is the columned named Socket, when it's the control block
struct and not the socket struct?)

Anyone want to track down why we are getting such large values in
there? I'll try to back track farther to see how old this issue is..

--
John-Mark Gurney Voice: +1 415 225 5579

"All that I will do, has been done, All that I have, has not."
_______________________________________________
freebsd-net@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • RE: Dialin problem
    ... # /etc/ppp/ppp.conf File for dial out modem to ISP and Dial in modem ... # connection to this FBSD system. ... # it's default options profile set to, NO command echo ATE0 and NO ... Edit /etc/ttys to enable a getty on the port where the dialin ...
    (freebsd-questions)
  • Re: [SLE] Setting up DSL on SUSE 10.1
    ... I won't faint at command line. ... SUSE or Mepis and the non computer oriented PCLinuxOS. ... In some OS's such as MS's the network will be confiured and on. ... on the DSL connection? ...
    (SuSE)
  • Re: More ASP.Net Newbie Questions
    ... The Command is then what you're doing with this connection, ... In regards to your final point, making grids and controls in general do ... > Connection and Recordset objects into, like, 37 different things. ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: Dial-up clients drop connections
    ... Both products physically synchronize with the respondent modem, authenticate, attempt to "talk to the network", then drop the connection. ... Extreme cases may warrant the removal of the TCP/IP protocol..With the NetShell utility, you can reset the TCP/IP stack to restore it to its state that existed when the operating system was installed. ... When you run the reset command, it rewrites pertinent registry keys that are used by the Internet Protocol stack to reach the same result as the removal and the reinstallation of the protocol. ...
    (microsoft.public.windowsxp.general)
  • Re: Troubleshooting connection loss (continued)
    ... If that is the problem, the broken connection must be short-lived, ... Run as is and I think it should fail on testing ISP gateway to modem. ... command line starts with a $ so you can tell command linefrom results ... nameserver 0.238.0.12 ...
    (comp.os.linux.networking)