sockets, closing and TIME_WAIT



For my thesis i've written a server service which is ought to handle a
lot of clients at the same time (the precise specification of the goal
of this service doesn't mind at this point).

During heavy load the server can't follow anymore because the sockets
aren't actually closed or the TIME_WAIT is taking really long.

For example: my server should be able to handle 10 clients connecting
each second (querying for information).


/* ******************************** loop listens for clients
*********************** */
while( listening ) {


/*
* server waits eternal
*/
rset = rset_all;
select(maxfdpl, &rset, NULL, NULL, NULL);

/*
* new client?
*/
pos = -1;
if( FD_ISSET(listen_sd, &rset) ){
pos = getFreePosition(connections);

//extra IF here to avoid unnecessary calls to getFreePosition()
function
if( pos != -1 ) {
len = sizeof(client);
recv_sd = accept(listen_sd, reinterpret_cast<struct
sockaddr*>(&client), &len);


if( recv_sd != -1 ) {
//do something
} else {
throw ASICexception(msg);
}

} else if ( pos == -1 ) {
cout << "Max number of clients reached.\nWaiting untill clients
finish up";

//"patch" include for cleaning up the backlog
recv_sd = accept(listen_sd, reinterpret_cast<struct
sockaddr*>(&client), &len);
if( recv_sd != -1 ) {
close_socket(recv_sd);
}
}
}
}
/* ******************************** loop listens for clients
***********************





My socket is closed this way:

int close_socket(int sd) {
fcntl(sd, F_SETFL, O_NONBLOCK);
shutdown(sd, SHUT_RDWR);
char tmp[100];
recv(sd,tmp,100,0);

return close(sd);
}



After accept() i configure the socket:
int no = 0;
res = setsockopt(sd, SOL_SOCKET, SO_KEEPALIVE, &no, sizeof(no));
if(res < 0 )
dbg->write("Unable to set KEEPALIVE socket option", __FUNCTION__,
loglevel);

int yes = 1;
setsockopt(sd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(yes));





I used to set the SO_LINGER option withOUT a delay, but the Unix FAQ
(ftp://rtfm.mit.edu/pub/usenet/news.answers/unix-faq/socket) says you
should use the TIME_WAIT for TCP to close correctly.

Now, if you look at the while loop, you'll find this:
pos = getFreePosition(connections);

This gets a free position in the array of connections, a position is
considerate free <=> the socket is not used anymore.

I test that with:
bool socket_exists(int sd) {
int res = fcntl(sd, F_GETFL, 0);

return (res != -1);
}



I've already include a "backlog patch" for closing down the client
connection immediately when we can't connect right away (because there
aren't any free positions).


The server can run about a hour without any real problems, but after
that clients can't connect anymore because the getFreePosition()
function can't find any non-active or non-used sockets.

My netstat output looks like this:

$ netstat -s -tcp
Tcp:
51165 active connections openings
39897 passive connection openings
853 failed connection attempts
23182 connection resets received
6 connections established
638389 segments received
657288 segments send out
9046 segments retransmited
4 bad segments received.
9490 resets sent
TcpExt:
723 resets received for embryonic SYN_RECV sockets
31269 TCP sockets finished time wait in fast timer
3 time wait sockets recycled by time stamp
142 packets rejects in established connections because of timestamp
15440 delayed acks sent
138 delayed acks further delayed because of locked socket
Quick ack mode was activated 1090 times
3515 times the listen queue of a socket overflowed
3515 SYNs to LISTEN sockets ignored
321 packets directly queued to recvmsg prequeue.
29383 of bytes directly received from backlog
15884 of bytes directly received from prequeue
130013 packet headers predicted
357 packets header predicted and directly queued to user
90550 acknowledgments not containing data received
104433 predicted acknowledgments
1616 congestion windows recovered after partial ack
0 TCP data loss events
71 timeouts after SACK recovery
1 timeouts in loss state
3988 other TCP timeouts
1152 DSACKs sent for old packets
29 DSACKs received
13733 connections reset due to unexpected data
93 connections reset due to early user close
136 connections aborted due to timeout



Has anybody any idea what i could be doing wrong?
I've been searching a lot for this problem and tried variating socket
options. None seem to resolve the problem completely...

.



Relevant Pages

  • Re: MsgCommunicator v.2.00: Instant Messenger SDK, now with databases support
    ... expect persistent connections. ... they will wait for the server to pick them up. ... your Clients can stay "off-line" for about 30 minutes before they have to ... requests *simultaneously*. ...
    (borland.public.delphi.thirdpartytools.general)
  • Re: TCP/IP redundant connections
    ... The clients have persistent TCP connections to the server, ...
    (freebsd-hackers)
  • Re: Intermittent Network Connections
    ... I've just reconnected the Server LAN nic 1 to the ethernet switch (the same ... Server IP config and Client IP config attached. ... > turn is connected to an ADSL modem out to the internet The clients connect to ... >> Clients can sucessfully log in but periodically loose their connections. ...
    (microsoft.public.windows.server.sbs)
  • Re: network programming: how does s.accept() work?
    ... The program you contact at Google is a server. ... so, the server will usually assign a new port, say 56399, specifically ... connections to a server remain on the same port, ... sockets is what identifies them. ...
    (comp.lang.python)
  • Re: blocking non blocking
    ... not connected to incoming data on the same thread, async sockets have to be ... somehow a server with 50K threads, ... or overlapped I/O servicing 50K clients. ... read and learnt about blocking and non blocking, ...
    (microsoft.public.win32.programmer.networks)