Re: REMINDER: Re: HEADS UP: network stack and socket hackery over the next few weeks (fwd)
- From: Robert Watson <rwatson@xxxxxxxxxxx>
- Date: Wed, 29 Mar 2006 20:52:46 +0000 (GMT)
Reminder that the following is going on on current@. Replies to there, please.
Robert N M Watson
---------- Forwarded message ----------
Date: Wed, 29 Mar 2006 12:05:51 +0000 (GMT)
From: Robert Watson <rwatson@xxxxxxxxxxx>
To: current@xxxxxxxxxxx
Cc: Randall Stewart <rrs@xxxxxxxxx>
Subject: Re: REMINDER: Re: HEADS UP: network stack and socket hackery over the
next few weeks
On Wed, 29 Mar 2006, Robert Watson wrote:
As a reminder, April 1 is now three days away. On April 1, I will be committed an extensive set of socket and netinet changes which will likely render the network stack broken. I say this with some confidence because I have tested the changes fairly extensively, as have a number of other developers, and they appear to mostly work. Therefore, they will be broken :-). I will be posting updated versions of these patches shortly, but unless we run into show-stopper serious instability with them, rather than nits, I will commit them (in their updated form) on April 1 shortly after the netatm build is disabled.
I will post another HEADS UP as the changes go into the tree, and will be monitoring things closely to try and get any bugs that might turn up fixed as quickly as possible. As an FYI, I will be travelling the weeks of April 6 - April 21, but will be online frequently, and working for several days in the Bay Area during the trip. Please report bugs relating to this work to current@.
An updated version of the patch is now available for download at:
http://www.watson.org/~robert/freebsd/netperf/20060329-rwatson_sockref.diff
Earlier versions of the patch may be found in the same directory in similarly named files. The working branch maintaining these changes may be found in Perforce at:
//depot/user/rwatson/sockref/...
As a high level recap, the following classes of changes appear in this patch:
- The socket code now no longer relies on reading so_pcb as a hint regarding
protocol behavior and shutdown. This eliminates a number of races, and
means that only the protocol is responsible for reading/maintaining the
field, and can synchronize it as desired.
- All protocols converted to maintain the invariant that so_pcb will be
non-NULL and point to a valid PCB at all times while the socket is in valid.
Depending on the protocol, this change either removed a number of crashes
and races, or eliminated heavy-weight locking to maintain the validity of
so_pcb during use by the socket layer.
- In some cases, this required significant rewriting of state management --
specifically, for IPX/SPX and TCP/IP. SPX and TCP now maintain DROPPED
flags on their inpcb's to reflect the state previously identified through a
NULL so_pcb pointer.
- Protocols can now explicitly request that a socket not be freed on last
consumer reference, using the SS_PROTOREF flag, in order that they can
continue to access the socket buffer until it is no longer required. I.e.,
TCP after socket close() but before final ACKs from the remote endpoint for
sent data. sotryfree() is eliminated. TCP has gained an inpcb flag to
reflect this condition.
- Improved documentation of kernel socket API calls, which will be followed
with man pages once things are hammered out a bit more.
- fgetsock() and fputsock() are deprecated, with long-term plans to eliminate
the use of soref() and sorele() for consumer use. Consumers now receive a
reference to a socket using socreate(), and release it using soclose(), in
order to avoid use of sockets after close. Consumer reference counts, such
as file descriptor reference counts, should be used in preference, as this
offers cleaner behavior at the socket layer, and also avoids additional
mutex operations. Some consumer still remain, but have been annotated.
- pru_abort, pru_detach are now no longer allowed to fail. Garbage collection
of the socket after these, assuming SS_PROTOREF isn't set, is unconditional,
and not a property of the error value returned.
- Protocols now only call sofree() if they have claimed SS_PROTOREF. They
don't attempt to spontaneously free sockets in numerous situations in the
hopes of not leaking it, since socket teardown is now well-defined.
The following protocols are updated, tested, and believed to work in the new world order:
uipc_usrreq
net (raw, routing)
netinet
netinet6
netipx
netatalk
The following protocols are updated for the new world order, but not tested:
netnatm
ng_socket
netipsec
netinet6/ipsec
netkey
The following protocols are not updated for the new world order, but the maintainer is aware of these changes and plans to updated the protocol in the immediate future:
ng_btsocket
The following protocols are not updated for the new world order, and do not have a maintainer:
netatm
I will commit the changes to make netatm compile, but am pretty sure there will be socket reference problems. Please see posts on arch@ on this topic for more information.
As with all significant kernel changes, these changes likely include significant bugs, which you, the -current user, will have the opportunity to help me find. I will attempt to respond as quickly as I can, although debugging complex network stack issues can, of course, be tricky and take a bit. Hopefully these changes will, in the long term, improve both the stability and performance of the FreeBSD stack, by sanitizing and sanifying otherwise obscure and often broken behavior, and eliminating several subtle types of race conditions that may have been responsible for occasional network instability reported in RELENG_5 and RELENG_6 (and in some cases, RELENG_4). I do expect the ride to initially be bumpy though.
Robert N M Watson
_______________________________________________
freebsd-current@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@xxxxxxxxxxx"
_______________________________________________
freebsd-net@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscribe@xxxxxxxxxxx"
- Prev by Date: Re: ng_netflow documentation
- Next by Date: Re: 802.3ad?
- Previous by thread: Re: ng_netflow documentation
- Next by thread: Re: news good
- Index(es):
Relevant Pages
- Re: Protocols to exchange messages via a socket
... protocols are best suitable: ... socket in case of a decoding error?
... You cannot be sure how many messages were lost on the wire behind any ... assuming
TCP the atomicity of writes isn't guaranteed. ... (comp.unix.programmer) - Re: Example network protocol implementation
... I think netgraph subsystem provides this infrastructure for implementing new protocols.
... This would be good if you are implementing an iSCSI ... you call socreate to
get a socket. ... (freebsd-hackers) - Re: LSP bypass method - pls help
... > What I do now, if I detect there are calls in that socket after WSPSocket, I ...
same lower provider is called, indepently of which protocol is the socket for. ... MS Recomends
installing the LSP over all protocols of the same family, ... (microsoft.public.win32.programmer.networks) - Re: Non-blocking method for reading writing objects to sockets
... to this discussion, socket stream of non ASCII STREAMS, use two to four bytes
preceding the data allowing the received to know the message size and prepare applications to receive
the data. ... Some application protocols e.g. XDR do something else. ... There's
now so much irrelevance being added such as lectures on threads and further incorrect claims that
the orignal incorrect statement is being obscured, ... (comp.lang.java.programmer) - Re: "invalid handle", sockets, threads and garbage collector
... your IntPtr value happens to coincide with a reference. ... The exception
contains "invalid handle" message. ... Socket constructor problem ... finalization
means closing the handle to resource. ... (microsoft.public.dotnet.framework.clr)