Re: Network not responding on idle SCO 5.0.5 system.
- From: "Steve M. Fabac, Jr." <smfabac@xxxxxxx>
- Date: Wed, 06 Aug 2008 10:12:19 -0500
Pat Welch wrote:
Steve M. Fabac, Jr. wrote:I've a client with two SCO 5.0.5 boxes, one is the live
server and the second is a hot spare. The servers have
separate SCO 5.0.5 Enterprise licenses and 25-user
licenses add-on.
Both servers were rebooted due to inability to access the
live server or the backup server via telnet on 8/1 (while
I was on vacation). On 8/4 I was informed that the problem
occurred again but as I was in the car between locations,
I was unable to assist with diagnosing the problem.
I advised the client to reboot the live system and leave the
backup server alone until I returned to my office to attempt to
connect to the backup server remotely.
Both servers are 5.0.5 and fully patched with the latest
patchck version:
Gathering patch information... Please wait...
INSTALLED currently on failover.XXXX.com
-------------------------------------------------------------------- oss471e oss471e - OpenServer Supplement oss471e
oss497c Core OS supplement for 5.0.5
oss600a Year 2000 Supplement for 5.0.5
oss640a Bind supplement for 5.0.5
oss642a Cron supplement for 5.0.5
oss646c Processor supplement for 5.0.5
oss663a oss663a - OpenServer Supplement oss663a
rs505a Release Supplement for OSR5.0.5
system is up-to-date as of July 7, 2008
When I returned to the office, I tried to log in via ssh from the
live system to the backup system after first logging in via SSH from
my office to the live system. Ssh exited with the message
"connection refused."
Rcmd failover ps -ef also failed. (error message not saved)
<snip>
Have you looked in syslog for evidence of hacking attempts?
Yes, none in evidence since change to sshd.conf in April
I've seen OS5 systems lose spooling and net functions after heavy automated login attempts from the internet - usually from our 'friends' working from Korean ISP's (at least on the West coast).
I assume you have mapped SSH to some other port than 22?
Aug 1 18:20:47 failover sshd[710]: Server listening on 0.0.0.0 port 29392.
I don't see anything that pops out as a problem (other than, of course, prngd and sshd just not running).
Further information: The live system went incommunicado after my post and
I was able to have the client restart prngd and sshd and login to the
system prior to having them reboot. The live system had the same type
of netstat -a listing showing only the sshd processes listening with only
scohelp and something on port 488 listening:
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp 0 48 vet.29392 adsl-65-64-102-9.4957 ESTABLISHED
tcp 0 0 *.29392 *.* LISTEN
tcp 0 0 *.scohelp *.* LISTEN
udp 0 0 *.488 *.*
Active UNIX domain sockets
Address Type Recv-Q Send-Q Conn Addr
fcfa6b50 stream 0 0 0 /usr/local/var/prngd/
prngd-pool
fcfa6ac0 stream 0 0 fcfa6a30
Executing /etc/tcp start turned on all the usual LISTEN services and
the users were once again able to login to the system without rebooting:
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp 0 0 *.printer *.* LISTEN
tcp 0 0 *.smux *.* LISTEN
tcp 0 0 *.imap *.* LISTEN
tcp 0 0 *.pop3 *.* LISTEN
tcp 0 0 *.time *.* LISTEN
tcp 0 0 *.daytime *.* LISTEN
tcp 0 0 *.chargen *.* LISTEN
tcp 0 0 *.discard *.* LISTEN
tcp 0 0 *.echo *.* LISTEN
tcp 0 0 *.tcpmux *.* LISTEN
tcp 0 0 *.shell *.* LISTEN
tcp 0 0 *.telnet *.* LISTEN
tcp 0 0 *.ftp *.* LISTEN
tcp 0 48 vet.29392 adsl-65-64-102-9.4957 ESTABLISHED
tcp 0 0 *.29392 *.* LISTEN
tcp 0 0 *.scohelp *.* LISTEN
udp 0 0 localhost.ntp *.*
udp 0 0 vet.ntp *.*
udp 0 0 vetreal.ntp *.*
udp 0 0 *.ntp *.*
udp 0 0 *.2086 *.*
udp 0 0 *.snmp *.*
udp 0 0 *.time *.*
udp 0 0 *.daytime *.*
udp 0 0 *.chargen *.*
udp 0 0 *.discard *.*
udp 0 0 *.echo *.*
udp 0 0 *.ntalk *.*
udp 0 0 *.biff *.*
udp 0 0 *.tftp *.*
udp 0 0 *.route *.*
udp 0 0 *.488 *.*
Active UNIX domain sockets
Address Type Recv-Q Send-Q Conn Addr
fcfa8920 stream 0 0 0 /dev/printer
However, I soon got a call when they were unable to print.
I logged back in and had to run /usr/lib/lpsched to restart the print
spooler.
For grins, I enabled process accounting and when I tried to start accounting
I was informed that cron is not running so I had to restart cron as well:
crontab: cron may not be running - call your system administrator: No such devic
e or address (error 6)
Accounting is now enabled for use
To start accounting, run: /usr/lib/acct/startup
# ps -ef | grep crontab
# ps -ef | grep cron
# # /etc/rc2.d/P75cron start
# ! *** cron started *** pid = 3155 Tue Aug 5 13:37:39 2008
I also noted that syslogd was not running and when I investigated, noted that
syslogd had not started when the system was rebooted on 8/1 and 8/4. After I managed to
manually start syslogd, /usr/adm/syslog was updated with the boot up information
generated on 8/1 but logged as occurring at the time I manually executed
/etc/syslog:
# tail -f /usr/adm/syslog
Aug 1 16:43:02 treal ftpd[3298]: #2 open of pid file failed: No such file or directory
Aug 1 16:44:37 treal ftpd[3389]: #2 open of pid file failed: No such file or directory
Aug 1 16:44:38 treal ftpd[3390]: #2 open of pid file failed: No such file or directory
Aug 1 16:46:25 treal ftpd[3456]: #2 open of pid file failed: No such file or directory
Aug 1 16:46:25 treal ftpd[3457]: #2 open of pid file failed: No such file or directory
Aug 1 16:51:25 treal lockd[440]: term_nlm(): Aug 1 16:51:25 treal lockd[440]: nlm lock server died! exiting.
Aug 1 16:56:07 treal TLW param1=-1
Fri Aug 1 16:56:07 CDT 2008 reboot initated
Mon Aug 4 18:19:18 CDT 2008 shutdown initiated
Wow! did not restart syslogd Friday or Monday reboot!!
Strange, the "Mon Aug 4 18:19:18" entry above had to be written to
/usr/adm/syslog in real time as it existed even without syslogd running
since the reboot on 8/1.
After I manually restarted /etc/syslogd:
Aug 1 16:51:25 treal lockd[440]: term_nlm():
Aug 1 16:51:25 treal lockd[440]: nlm lock server died! exiting.
Aug 1 16:56:07 treal TLW param1=-1
Fri Aug 1 16:56:07 CDT 2008 reboot initated
Mon Aug 4 18:19:18 CDT 2008 shutdown initiated
Aug 5 14:26:25 treal syslogd: restart
Aug 5 14:26:25 treal SCO OpenServer(TM) Release 5
Aug 5 14:26:25 treal
Aug 5 14:26:25 treal (C) 1976-1998 The Santa Cruz Operation, Inc.
Aug 5 14:26:25 treal (C) 1980-1994 Microsoft Corporation
Aug 5 14:26:25 treal All rights reserved.
Aug 5 14:26:25 treal
Possibly NIC's going flaky and re-transmitting like crazy. Ditto a bad port on your router causing flooding of the network.
We've been fighting a stream leak since March 2008 as shown in the
data I log every five minutes from cron:
Tue Mar 25 16:10:03 CDT 2008
Tue Mar 25 16:10:17 CDT 2008 streams memory in use: 1700.73KB
Tue Mar 25 16:15:00 CDT 2008 streams memory in use: 1706.34KB
Tue Mar 25 16:20:00 CDT 2008 streams memory in use: 1708.18KB
Tue Mar 25 16:25:00 CDT 2008 streams memory in use: 1717.02KB
Tue Mar 25 16:30:00 CDT 2008 streams memory in use: 1721.02KB
....
Wed Mar 26 02:55:00 CDT 2008 streams memory in use: 2119.16KB
Wed Mar 26 03:00:00 CDT 2008 streams memory in use: 2121.18KB
Wed Mar 26 03:05:01 CDT 2008 streams memory in use: 4016.57KB
Wed Mar 26 03:10:00 CDT 2008 streams memory in use: 4016.57KB
Cpio backup to failover machine kicks off at 03:00
End of day:
Tue Mar 25 23:55:00 CDT 2008 streams memory in use: 1443.95KB
Wed Mar 26 23:55:00 CDT 2008 streams memory in use: 1450.17KB
Thu Mar 27 23:55:00 CDT 2008 streams memory in use: 1445.48KB
Fri Mar 28 23:55:00 CDT 2008 streams memory in use: 4533.59KB
Sat Mar 29 23:55:00 CDT 2008 streams memory in use: 2013.02KB
Sun Mar 30 23:55:00 CDT 2008 streams memory in use: 4869.45KB
Mon Mar 31 23:55:00 CDT 2008 streams memory in use: 2027.55KB
Tue Apr 1 23:55:00 CDT 2008 streams memory in use: 5061.66KB
Wed Apr 2 23:55:00 CDT 2008 streams memory in use: 2181.63KB
Thu Apr 3 23:05:00 CDT 2008 streams memory in use: 5895.07KB
Fri Apr 4 23:55:00 CDT 2008 streams memory in use: 6900.07KB
Sat Apr 5 23:55:00 CDT 2008 streams memory in use: 7244.71KB
Sun Apr 6 23:55:00 CDT 2008 streams memory in use: 7425.77KB
Mon Apr 7 23:55:00 CDT 2008 streams memory in use: 8831.78KB
Tue Apr 8 23:50:00 CDT 2008 streams memory in use: 9934.26KB
Wed Apr 9 23:55:01 CDT 2008 streams memory in use: 11022.41KB
Thu Apr 10 23:55:00 CDT 2008 streams memory in use: 13063.88KB
Fri Apr 11 23:55:00 CDT 2008 streams memory in use: 14749.66KB
Sat Apr 12 23:55:00 CDT 2008 streams memory in use: 15181.69KB
Sun Apr 13 23:55:00 CDT 2008 streams memory in use: 15583.30KB
Mon Apr 14 23:55:00 CDT 2008 streams memory in use: 16522.23KB
Tue Apr 15 23:55:00 CDT 2008 streams memory in use: 17334.53KB
Wed Apr 16 23:55:00 CDT 2008 streams memory in use: 18487.81KB
Thu Apr 17 16:10:00 CDT 2008 streams memory in use: 19391.11KB
System rebooted
Thu Apr 17 16:20:00 CDT 2008 streams memory in use: 1380.50KB
....
Thu Jul 24 23:55:00 CDT 2008 streams memory in use: 9092.21KB
Fri Jul 25 23:55:00 CDT 2008 streams memory in use: 9964.78KB
Sat Jul 26 15:45:00 CDT 2008 streams memory in use: 10975.49KB
System rebooted
Sat Jul 26 15:50:00 CDT 2008 streams memory in use: 1406.35KB
Sat Jul 26 23:55:01 CDT 2008 streams memory in use: 1474.53KB
Sun Jul 27 23:55:00 CDT 2008 streams memory in use: 4086.12KB
Mon Jul 28 23:55:00 CDT 2008 streams memory in use: 5621.95KB
Tue Jul 29 23:55:00 CDT 2008 streams memory in use: 6879.91KB
Wed Jul 30 23:55:00 CDT 2008 streams memory in use: 8781.20KB
Thu Jul 31 23:55:00 CDT 2008 streams memory in use: 9941.51KB
Fri Aug 1 23:05:00 CDT 2008 streams memory in use: 4020.41KB
Fri Aug 1 16:00:00 CDT 2008 streams memory in use: 13261.82KB
System rebooted
Fri Aug 1 16:15:00 CDT 2008 streams memory in use: 1388.23KB
Sat Aug 2 23:55:00 CDT 2008 streams memory in use: 4325.13KB
Sun Aug 3 23:55:00 CDT 2008 streams memory in use: 4785.69KB
Mon Aug 4 23:55:00 CDT 2008 streams memory in use: 4092.80KB
Tue Aug 5 04:30:00 CDT 2008 streams memory in use: 4097.25KB
System went down hard at 18:00 on 8/5. Both disks of RAID1
dead. Replaced with single remaining spare disk and restored
nightly backup from 03:15 8/5.
Tue Aug 5 22:25:01 CDT 2008 streams memory in use: 1376.86KB
system back up at 22:?? 8/5.
Squirrels/rats finding cable runs in the ceiling edible?
Some neatness freak of doubtful intelligence straightening up cables by using a powered stapler to tack cables to plywood?
The above actually happened to a client of a company I consult for - I was sent to Chicago from the West coast to trouble shoot an intermittent problem with garbage shooting across the screen (dumb terminals, back in the day).
some of the staples actually penetrated the wires, and when heavy trucks went by the plywood flexed enough to make/break contact.
Expensive lesson for the client needless to say.
Well, Not only the two SCO 5.0.5 boxes lost both RAID1 disks, also
one Dell box running MS SBS lost RAID1 disks (both disks of the RAID)
and an IBM box lost 2 of 5 disks. The Windows support technician decided
that the RAID controller in the IBM was also bad as as he worked to restore
the IBM box, more disks went off-line.
I got the live SCO 5.0.5 box backup by restoring the nightly backup and the
14:15 differential backup from the Buffalo NAS server where Backup Edge is
writing its backup files. With only one remaining unused 146G disk on the shelf,
I did not restore the failover machine so the live server is running on
one disk without a RAID1 mirror.
The customer has problems with building power to the server room and the
5 APC UPS (2 new units as of 6/26 on the SCO 5.0.5 servers) did not
prevent the problem. Damn, Damn, Damn.
--
Steve Fabac
S.M. Fabac & Associates
816/765-1670
.
- Follow-Ups:
- Re: Network not responding on idle SCO 5.0.5 system.
- From: Brian K. White
- Re: Network not responding on idle SCO 5.0.5 system.
- References:
- Network not responding on idle SCO 5.0.5 system.
- From: Steve M. Fabac, Jr.
- Re: Network not responding on idle SCO 5.0.5 system.
- From: Pat Welch
- Network not responding on idle SCO 5.0.5 system.
- Prev by Date: Re: x11vnc
- Next by Date: Re: Network not responding on idle SCO 5.0.5 system.
- Previous by thread: Re: Network not responding on idle SCO 5.0.5 system.
- Next by thread: Re: Network not responding on idle SCO 5.0.5 system.
- Index(es):
Relevant Pages
|