Re: Network not responding on idle SCO 5.0.5 system.



Pat Welch wrote:
Steve M. Fabac, Jr. wrote:
I've a client with two SCO 5.0.5 boxes, one is the live
server and the second is a hot spare. The servers have
separate SCO 5.0.5 Enterprise licenses and 25-user
licenses add-on.

Both servers were rebooted due to inability to access the
live server or the backup server via telnet on 8/1 (while
I was on vacation). On 8/4 I was informed that the problem
occurred again but as I was in the car between locations,
I was unable to assist with diagnosing the problem.

I advised the client to reboot the live system and leave the
backup server alone until I returned to my office to attempt to
connect to the backup server remotely.

Both servers are 5.0.5 and fully patched with the latest
patchck version:

Gathering patch information... Please wait...
INSTALLED currently on failover.XXXX.com
-------------------------------------------------------------------- oss471e oss471e - OpenServer Supplement oss471e
oss497c Core OS supplement for 5.0.5
oss600a Year 2000 Supplement for 5.0.5
oss640a Bind supplement for 5.0.5
oss642a Cron supplement for 5.0.5
oss646c Processor supplement for 5.0.5
oss663a oss663a - OpenServer Supplement oss663a
rs505a Release Supplement for OSR5.0.5
system is up-to-date as of July 7, 2008

When I returned to the office, I tried to log in via ssh from the
live system to the backup system after first logging in via SSH from
my office to the live system. Ssh exited with the message
"connection refused."

Rcmd failover ps -ef also failed. (error message not saved)
<snip>

Have you looked in syslog for evidence of hacking attempts?

Yes, none in evidence since change to sshd.conf in April



I've seen OS5 systems lose spooling and net functions after heavy automated login attempts from the internet - usually from our 'friends' working from Korean ISP's (at least on the West coast).

I assume you have mapped SSH to some other port than 22?

Aug 1 18:20:47 failover sshd[710]: Server listening on 0.0.0.0 port 29392.


I don't see anything that pops out as a problem (other than, of course, prngd and sshd just not running).

Further information: The live system went incommunicado after my post and
I was able to have the client restart prngd and sshd and login to the
system prior to having them reboot. The live system had the same type
of netstat -a listing showing only the sshd processes listening with only
scohelp and something on port 488 listening:

Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp 0 48 vet.29392 adsl-65-64-102-9.4957 ESTABLISHED
tcp 0 0 *.29392 *.* LISTEN
tcp 0 0 *.scohelp *.* LISTEN
udp 0 0 *.488 *.*
Active UNIX domain sockets
Address Type Recv-Q Send-Q Conn Addr
fcfa6b50 stream 0 0 0 /usr/local/var/prngd/
prngd-pool
fcfa6ac0 stream 0 0 fcfa6a30

Executing /etc/tcp start turned on all the usual LISTEN services and
the users were once again able to login to the system without rebooting:

Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp 0 0 *.printer *.* LISTEN
tcp 0 0 *.smux *.* LISTEN
tcp 0 0 *.imap *.* LISTEN
tcp 0 0 *.pop3 *.* LISTEN
tcp 0 0 *.time *.* LISTEN
tcp 0 0 *.daytime *.* LISTEN
tcp 0 0 *.chargen *.* LISTEN
tcp 0 0 *.discard *.* LISTEN
tcp 0 0 *.echo *.* LISTEN
tcp 0 0 *.tcpmux *.* LISTEN
tcp 0 0 *.shell *.* LISTEN
tcp 0 0 *.telnet *.* LISTEN
tcp 0 0 *.ftp *.* LISTEN
tcp 0 48 vet.29392 adsl-65-64-102-9.4957 ESTABLISHED
tcp 0 0 *.29392 *.* LISTEN
tcp 0 0 *.scohelp *.* LISTEN
udp 0 0 localhost.ntp *.*
udp 0 0 vet.ntp *.*
udp 0 0 vetreal.ntp *.*
udp 0 0 *.ntp *.*
udp 0 0 *.2086 *.*
udp 0 0 *.snmp *.*
udp 0 0 *.time *.*
udp 0 0 *.daytime *.*
udp 0 0 *.chargen *.*
udp 0 0 *.discard *.*
udp 0 0 *.echo *.*
udp 0 0 *.ntalk *.*
udp 0 0 *.biff *.*
udp 0 0 *.tftp *.*
udp 0 0 *.route *.*
udp 0 0 *.488 *.*
Active UNIX domain sockets
Address Type Recv-Q Send-Q Conn Addr
fcfa8920 stream 0 0 0 /dev/printer

However, I soon got a call when they were unable to print.

I logged back in and had to run /usr/lib/lpsched to restart the print
spooler.

For grins, I enabled process accounting and when I tried to start accounting
I was informed that cron is not running so I had to restart cron as well:
crontab: cron may not be running - call your system administrator: No such devic
e or address (error 6)
Accounting is now enabled for use
To start accounting, run: /usr/lib/acct/startup
# ps -ef | grep crontab
# ps -ef | grep cron
# # /etc/rc2.d/P75cron start
# ! *** cron started *** pid = 3155 Tue Aug 5 13:37:39 2008


I also noted that syslogd was not running and when I investigated, noted that
syslogd had not started when the system was rebooted on 8/1 and 8/4. After I managed to
manually start syslogd, /usr/adm/syslog was updated with the boot up information
generated on 8/1 but logged as occurring at the time I manually executed
/etc/syslog:

# tail -f /usr/adm/syslog
Aug 1 16:43:02 treal ftpd[3298]: #2 open of pid file failed: No such file or directory
Aug 1 16:44:37 treal ftpd[3389]: #2 open of pid file failed: No such file or directory
Aug 1 16:44:38 treal ftpd[3390]: #2 open of pid file failed: No such file or directory
Aug 1 16:46:25 treal ftpd[3456]: #2 open of pid file failed: No such file or directory
Aug 1 16:46:25 treal ftpd[3457]: #2 open of pid file failed: No such file or directory
Aug 1 16:51:25 treal lockd[440]: term_nlm(): Aug 1 16:51:25 treal lockd[440]: nlm lock server died! exiting.
Aug 1 16:56:07 treal TLW param1=-1
Fri Aug 1 16:56:07 CDT 2008 reboot initated
Mon Aug 4 18:19:18 CDT 2008 shutdown initiated

Wow! did not restart syslogd Friday or Monday reboot!!

Strange, the "Mon Aug 4 18:19:18" entry above had to be written to
/usr/adm/syslog in real time as it existed even without syslogd running
since the reboot on 8/1.

After I manually restarted /etc/syslogd:

Aug 1 16:51:25 treal lockd[440]: term_nlm():
Aug 1 16:51:25 treal lockd[440]: nlm lock server died! exiting.
Aug 1 16:56:07 treal TLW param1=-1
Fri Aug 1 16:56:07 CDT 2008 reboot initated
Mon Aug 4 18:19:18 CDT 2008 shutdown initiated
Aug 5 14:26:25 treal syslogd: restart
Aug 5 14:26:25 treal SCO OpenServer(TM) Release 5
Aug 5 14:26:25 treal
Aug 5 14:26:25 treal (C) 1976-1998 The Santa Cruz Operation, Inc.
Aug 5 14:26:25 treal (C) 1980-1994 Microsoft Corporation
Aug 5 14:26:25 treal All rights reserved.
Aug 5 14:26:25 treal



Possibly NIC's going flaky and re-transmitting like crazy. Ditto a bad port on your router causing flooding of the network.

We've been fighting a stream leak since March 2008 as shown in the
data I log every five minutes from cron:

Tue Mar 25 16:10:03 CDT 2008
Tue Mar 25 16:10:17 CDT 2008 streams memory in use: 1700.73KB
Tue Mar 25 16:15:00 CDT 2008 streams memory in use: 1706.34KB
Tue Mar 25 16:20:00 CDT 2008 streams memory in use: 1708.18KB
Tue Mar 25 16:25:00 CDT 2008 streams memory in use: 1717.02KB
Tue Mar 25 16:30:00 CDT 2008 streams memory in use: 1721.02KB
....
Wed Mar 26 02:55:00 CDT 2008 streams memory in use: 2119.16KB
Wed Mar 26 03:00:00 CDT 2008 streams memory in use: 2121.18KB
Wed Mar 26 03:05:01 CDT 2008 streams memory in use: 4016.57KB
Wed Mar 26 03:10:00 CDT 2008 streams memory in use: 4016.57KB

Cpio backup to failover machine kicks off at 03:00

End of day:
Tue Mar 25 23:55:00 CDT 2008 streams memory in use: 1443.95KB
Wed Mar 26 23:55:00 CDT 2008 streams memory in use: 1450.17KB
Thu Mar 27 23:55:00 CDT 2008 streams memory in use: 1445.48KB
Fri Mar 28 23:55:00 CDT 2008 streams memory in use: 4533.59KB
Sat Mar 29 23:55:00 CDT 2008 streams memory in use: 2013.02KB
Sun Mar 30 23:55:00 CDT 2008 streams memory in use: 4869.45KB
Mon Mar 31 23:55:00 CDT 2008 streams memory in use: 2027.55KB
Tue Apr 1 23:55:00 CDT 2008 streams memory in use: 5061.66KB
Wed Apr 2 23:55:00 CDT 2008 streams memory in use: 2181.63KB
Thu Apr 3 23:05:00 CDT 2008 streams memory in use: 5895.07KB
Fri Apr 4 23:55:00 CDT 2008 streams memory in use: 6900.07KB
Sat Apr 5 23:55:00 CDT 2008 streams memory in use: 7244.71KB
Sun Apr 6 23:55:00 CDT 2008 streams memory in use: 7425.77KB
Mon Apr 7 23:55:00 CDT 2008 streams memory in use: 8831.78KB
Tue Apr 8 23:50:00 CDT 2008 streams memory in use: 9934.26KB
Wed Apr 9 23:55:01 CDT 2008 streams memory in use: 11022.41KB
Thu Apr 10 23:55:00 CDT 2008 streams memory in use: 13063.88KB
Fri Apr 11 23:55:00 CDT 2008 streams memory in use: 14749.66KB
Sat Apr 12 23:55:00 CDT 2008 streams memory in use: 15181.69KB
Sun Apr 13 23:55:00 CDT 2008 streams memory in use: 15583.30KB
Mon Apr 14 23:55:00 CDT 2008 streams memory in use: 16522.23KB
Tue Apr 15 23:55:00 CDT 2008 streams memory in use: 17334.53KB
Wed Apr 16 23:55:00 CDT 2008 streams memory in use: 18487.81KB
Thu Apr 17 16:10:00 CDT 2008 streams memory in use: 19391.11KB
System rebooted
Thu Apr 17 16:20:00 CDT 2008 streams memory in use: 1380.50KB
....
Thu Jul 24 23:55:00 CDT 2008 streams memory in use: 9092.21KB
Fri Jul 25 23:55:00 CDT 2008 streams memory in use: 9964.78KB
Sat Jul 26 15:45:00 CDT 2008 streams memory in use: 10975.49KB
System rebooted
Sat Jul 26 15:50:00 CDT 2008 streams memory in use: 1406.35KB
Sat Jul 26 23:55:01 CDT 2008 streams memory in use: 1474.53KB
Sun Jul 27 23:55:00 CDT 2008 streams memory in use: 4086.12KB
Mon Jul 28 23:55:00 CDT 2008 streams memory in use: 5621.95KB
Tue Jul 29 23:55:00 CDT 2008 streams memory in use: 6879.91KB
Wed Jul 30 23:55:00 CDT 2008 streams memory in use: 8781.20KB
Thu Jul 31 23:55:00 CDT 2008 streams memory in use: 9941.51KB
Fri Aug 1 23:05:00 CDT 2008 streams memory in use: 4020.41KB
Fri Aug 1 16:00:00 CDT 2008 streams memory in use: 13261.82KB
System rebooted
Fri Aug 1 16:15:00 CDT 2008 streams memory in use: 1388.23KB
Sat Aug 2 23:55:00 CDT 2008 streams memory in use: 4325.13KB
Sun Aug 3 23:55:00 CDT 2008 streams memory in use: 4785.69KB
Mon Aug 4 23:55:00 CDT 2008 streams memory in use: 4092.80KB
Tue Aug 5 04:30:00 CDT 2008 streams memory in use: 4097.25KB

System went down hard at 18:00 on 8/5. Both disks of RAID1
dead. Replaced with single remaining spare disk and restored
nightly backup from 03:15 8/5.

Tue Aug 5 22:25:01 CDT 2008 streams memory in use: 1376.86KB

system back up at 22:?? 8/5.


Squirrels/rats finding cable runs in the ceiling edible?

Some neatness freak of doubtful intelligence straightening up cables by using a powered stapler to tack cables to plywood?

The above actually happened to a client of a company I consult for - I was sent to Chicago from the West coast to trouble shoot an intermittent problem with garbage shooting across the screen (dumb terminals, back in the day).

some of the staples actually penetrated the wires, and when heavy trucks went by the plywood flexed enough to make/break contact.

Expensive lesson for the client needless to say.



Well, Not only the two SCO 5.0.5 boxes lost both RAID1 disks, also
one Dell box running MS SBS lost RAID1 disks (both disks of the RAID)
and an IBM box lost 2 of 5 disks. The Windows support technician decided
that the RAID controller in the IBM was also bad as as he worked to restore
the IBM box, more disks went off-line.

I got the live SCO 5.0.5 box backup by restoring the nightly backup and the
14:15 differential backup from the Buffalo NAS server where Backup Edge is
writing its backup files. With only one remaining unused 146G disk on the shelf,
I did not restore the failover machine so the live server is running on
one disk without a RAID1 mirror.

The customer has problems with building power to the server room and the
5 APC UPS (2 new units as of 6/26 on the SCO 5.0.5 servers) did not
prevent the problem. Damn, Damn, Damn.

--
Steve Fabac
S.M. Fabac & Associates
816/765-1670
.



Relevant Pages

  • Re: Ntbackup fails on Exchange data store
    ... First time it happened, I rebooted the server ... It also appears on a Symantec Backup System Recovery ... They are the larger files being backed up on their partition. ... If I reboot the server, ...
    (microsoft.public.windows.server.sbs)
  • Re: Backup fails suspect VSS problem
    ... One thing I found when troubleshooting VSS problems is that after a backup failed and I made some changes a reboot was required before it would work again. ... Could not access portions of directory System State\COM+ Class Registration Database. ... The problem I have now is that the server appears to the customer to be working fine, apart from the few remaining minor problems, the staff are now back up and running fine. ...
    (microsoft.public.windows.server.sbs)
  • Re: Total admin n00b needs to know everything
    ... You also might want to consider establishing your own update server so ... Staff just love switch their PC to the fast part of your network. ... You will probably want bigger disks there. ... Copying stuff onto the same hard disk is NOT a backup system as is copying ...
    (comp.os.linux.setup)
  • Re: External Backup without blocking the database
    ... only a backup done when server is blocked by ... the view of IDS server. ... checkpoint before it shuts down. ... puts all things to disk so that disks are physically ...
    (comp.databases.informix)
  • Re: How can I back up SBS 2008 VM inside Hyper-V Server 2008 R2
    ... disks provided they were on the SCSI controller and not the IDE ... up in my senario with SBS backup? ... SCSI controller in SBS 2008 and attached it to the USB drive. ... Hyper-V Server 2008 R2 does not choke when the RD1000 cartridges ...
    (microsoft.public.windows.server.sbs)