Re: TCP/IP seems to fall over.
From: Bela Lubkin (belal_at_sco.com)
Date: 04/26/03
- Next message: Bela Lubkin: "Re: Booting OSR504 from Ranish Boot Manager"
- Previous message: Bela Lubkin: "Re: 5.0.6a streams leak - HELP"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Date: 26 Apr 2003 08:32:40 -0000
Demian Alexander Phillips wrote:
> The machine is a IBM Xseries (250 I think) server and the on-board NIC
> in use is based on the AMD PCNet PCI chipset.
> All netstat output looks normal. Nothing looked out of place to too
> high, lots of Udp datagrams with dest unreach, but according to
> information I can find that's normal, I will try to get in before they
> reset TCP/IP or Re-boot the next time it does.
> The changes I made was to take the TCP connections value to 1024 and
> Pseudo TTYs up to 512 respectively, this is a 60 user system. Thats
> what the TA reccomends. The TA is about running out of streams memory,
> something I have seen before and this isnt it sadly.
> When I mentioned outside world there I meant that comunications via
> TCP/IP no longer completely stops and all users are do not lose
> sessions. Now it continues to operate but at such a craptastic
> performance level, it might as well be dead.
and originally in
http://www.google.com/groups?selm=50a7f455.0302111507.62f558c1@posting.google.com,
> I have a SCO 5.0.6A box (patched up afaik) that seems to, with some
> regularity, have it's TCP/IP die.
> ps, sar, netstat, top, hog, etc... all show a normal idle system.
> Users can not log in.
> Once I run "tcp stop" and "tcp start" suddenly everything works again
> just fine and there are no problems till the next time.
> The only thing I can find in syslog is an entry about the same tme as
> TCP dies saying:
> telnetd[24318]: ttloop: peer died: Unknown error
There is probably something that someone would see in the output of
these various utilities, if you showed it to us. You aren't seeing it,
but that only means that it looks normal _to you_. Since something is
obviously not normal, you need other eyes on that data.
You've got a system that has at least two, maybe three distinct states.
There's "normal", "slow", and maybe "incommunicado". Capture a set of
statistics in the normal state, then in the slow state. Don't try to
look at the slow-state stats without having normal-state examples at
hand. If you still can't see it, post both sets.
Commands whose output might be interesting/relevant (_not_ by any means
comprehensive):
netstat -in (run twice with a fixed delay between, say 10s, to show
flow rates)
ndstat -l (ditto; but too much to post)
netstat -rn
netstat -ma
ifconfig -a
`sar` flags ubcdrw, done as:
sar -u 1 10
sar -b 1 10
etc
> Users can not log in.
How are they trying to log in? What are the symptoms? Give exact error
messages...
>Bela<
- Next message: Bela Lubkin: "Re: Booting OSR504 from Ranish Boot Manager"
- Previous message: Bela Lubkin: "Re: 5.0.6a streams leak - HELP"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]