Re: Developing a server receiving multiple sockets

From: Nick Landsberg (SPAMhukolauTRAP_at_SPAMworldnetTRAP.att.net)
Date: 09/10/04


Date: Fri, 10 Sep 2004 13:29:12 GMT

Jo wrote:

> sjdevnull@yahoo.com (G. S. Hayes) wrote in message news:<96c2e938.0409090853.344f273d@posting.google.com>...
>
>>JoJoTwilligo@hotmail.com (Jo) wrote in message news:<72ce7f0c.0409081147.66f33886@posting.google.com>...
>>
>>>>If you know that you'll need to do large amounts of processing
>>>>per connection then that will be a different design than if you have a
>>>>mainly IO centric model.
>>>
>>> I don't think any scalability is necessary. I've been asked to
>>>write the thing such that it will be obsolete before the volume gets
>>>to be more than it will handle, so I think we are dealing with the
>>>lower end of a few 1000.
>>
>>The IO vs. computational question is pretty important. Also, are you
>>looking at long-lived connections doing many transactions or are they
>
>
> How long is long? Hours? I doubt it would get to that under
> acceptable circumstances.
>
>
>>one-transaction-per-connection situations? Do you have a lot of
>
>
> What's a transaction? Like a single stream or file? I'd say there
> are several per connection, but not like 100s and 100s.
>
>
>>shared state between connections? Any details you can give about what
>
>
> I'm not sure what you mean by states, but there is no relationship
> between any 2 connections, except that they are vying for the
> attention of the same server.
>
>
>>your server does make it easier to make a good recommendation.
>
>

The number of simultaneous "connections" needed is given
by:

N = (Arrival Rate) * (Service Time)

Where:
(arrival rate) = how often the clients will be
establishing a connection to the system, and:
(service time) = how long they will be connected

The smaller the client population, there more
variation there is likely to be in the instantaneous
arrival rate. So this has to be a SWAG on your part.
One extreme case I encountered was a group of 500 unionized
clerks who all logged in within a minute of 8 AM.
Usually, it's much more spread out than that.

The "service time" computation involves less guesswork.

Using your description below, I would guess that
items 1, 2 and 5 are "in the noise" in comparison
to the total connection time. Items 3 and 4
could be significant.

Assuming that a "row" for item 3 is about 2,000
bytes and that you will be transmitting 100 rows,
that's 200KB worth of traffic.
A quick experiment with a 100 x 200 TIFF image showed it to
be on the order of 60 KB (YMMV, GIFs and JPEGs
usually take up less space). Assuming you are
transmitting on average 10 of these, that's 600KB
worth of traffic.

At broadband speeds, this is rather small. Even a
10baseT LAN (10 Mbits per second), would nomially do that
in less than a second. Say 2 to be safe. Add another
5-10 seconds if there is user interaction up-front
to validate passwords and such, round up to 15 seconds
service time. (Because I'm a professional paraniod :)
Even for the extreme case mentioned above (500 arrivals
per minute), the total number of connections you
would have to have up is about 125.

On the other hand, if your clients were on a dialup,
the equation changes dramatically. 800 KB takes
around 3-4 MINUTES to transmit at dial-up speeds.
Say 210 seconds (3.5 minutes). That's about
15 times as many simultaneous connections to support.
This may or may not be a problem based on the clients'
work habits. It may or may not be a problem to deny
access to clients past a certain number of simultaneous
users, that depends on your requirements.
 From what you have said below, you are already considering
budgeting for the number of simultaneous connections
so I hope that the example exercise above will
give you some pointers on how to refine that budget.
(If I've been preaching to the choir and stating the
obvious, forgive me.)

As far as using threading vs. fork()/exec(), go with
what you are most comfortable with. From your description
it appears that what this application does is mostly
read from disk and write to the network, so CPU usage
should not be a major consideration, IMO. It seems
to be mostly I/O bound, even with the pseudo-encryption.

NPL

> It's easy enough to describe. A typical connection starts with:
> 1. the client sending account information
> 2. account acknowledgement is established
> 3. a table of about 100 columns and I don't know how many rows is
> sent.
> 4. After that a series of image files is sent. I don't know how many
> files, but the size of the images are something like 200x100 at the
> most.
> 5. Finally, some small wrap up, and we're done.
> All of this is done through a 32-bit encryption scheme which is really
> there just to impress the user, not to provide any serious security to
> data no one really cares about.
> The files received are stored in a MySQL database. From what I can
> see, there doesn't look like any appreciable computational constraint;
> the encryption/decryption should only take a few cycles per long word.
> The constraint I'm concerned with is based on the number of
> simultaneous connections. From what I can tell, if I budget for the
> better part of 1000 connections, I'll have it well taken care of.
> Looking at the C10K problem at http://www.kegel.com/c10k.html
> provided by James, the scheme I like best is "Serve many clients with
> each thread, and use nonblocking I/O and level-triggered readiness
> notification" with select() and thread pooling. The only bad thing it
> says about select() is that it is limited to FD_SETSIZE file
> descriptors. Testing this old RH7.3 system, its FD_SETSIZE is 1024,
> which is well above what I need. Furthermore, I propose that the
> server's admin has the option of choosing how many threads will be in
> use. That way he can take care of his own performance problems.
> On the other hand, I could have 10 processes going, each capable of
> accepting a 70 descriptor capacity, and leave it at that. What do you
> think?
>
>
>>> However, I've been seeing a lot of talk about fork(). Doesn't that
>>>create a new process? Even if I have 50 connections going at one time,
>>>that'll be a heck of a lot more processes than I've ever seen doing a
>>>ps aux.
>>
>>Apache by default uses a process per connection (it keeps a pre-forked
>>process pool)--if you have 50 simultaneous connections it will have 50
>>processes to serve them. In version 2.0 there's an option to use
>>threads but there are good reasons not to do that in some cases.
>>
>>The pre-forked one process per connection model has some pretty large
>>robustness and ease of development advantages (particularly in making
>>it easier for people writing their own in-process modules by having
>>good memory protection). It certainly isn't the best-performing model
>>but it is sufficient for some very high-traffic web sites.
>>
>>It might not be appropriate for your task; without more details it's
>>tough to say.

-- 
"It is impossible to make anything foolproof
because fools are so ingenious"
  - A. Bloch


Relevant Pages

  • Re: Developing a server receiving multiple sockets
    ... > between any 2 connections, except that they are vying for the ... The "service time" computation involves less guesswork. ... On the other hand, if your clients were on a dialup, ... 15 times as many simultaneous connections to support. ...
    (comp.os.linux.development.apps)
  • sockets, closing and TIME_WAIT
    ... During heavy load the server can't follow anymore because the sockets ... my server should be able to handle 10 clients connecting ... This gets a free position in the array of connections, ...
    (comp.unix.programmer)
  • Re: MsgCommunicator v.2.00: Instant Messenger SDK, now with databases support
    ... expect persistent connections. ... they will wait for the server to pick them up. ... your Clients can stay "off-line" for about 30 minutes before they have to ... requests *simultaneously*. ...
    (borland.public.delphi.thirdpartytools.general)
  • Re: Accepting external sendmail on 2.0.2
    ... > on a network node capable of doing graphics, ... I really like running remote clients on a local server, ... that these listeners are not accepting external connections by default, ... viusing a MTA for sending email about lost files to local ...
    (comp.unix.bsd.netbsd.misc)
  • Re: Access 2007->SQL Server2005 "connection was forcibly closed",G
    ... connections need to be returned to the pool to be ... Enterprise version of SQL Server 2000. ... server user login to be sure that it is not mixed with other running clients). ... Every new client opens again 30 connections if I open 30 tables ...
    (microsoft.public.sqlserver.connect)