Truly bizarre NIC (?) problem



This one has me bemused, not to mention stumped. Fortunately I've
circumvented it for now, so this is mainly to satisfy my curiosity.

I'll skip past some of the investigation I've done as it's long and tedious,
and cut to the chase.


I have a file - an Oracle installation file: some sort of library file, I
think - which I cannot copy over the network.

I've tried using rdist, ftp and rcp and all of them fail with this
individual file. But only when I copy it to a particular adapter.

So...
rcp libfontmanager.a server1:<somewhere> fails.
rcp libfontmanager.a server2:<somewhere> works
rcp libfontmanager.a server1a:<somewhere> works

I've tried copying it from a couple of different servers and it always
produces the same results, so I'm fairly sure the problem lies at the
"server1" end.

"server1" and "server2" are identically configured F80s. At the moment
they're running AIX 5.2 ML7. The problem also exhibited itself before the
AIX migration, when server1 was running 4.3.3 ML11. I upgraded the firmware
to the latest level before the AIX migration.
"server1a" is a second 10/100 ethernet adapter. It's configured slightly
differently. Both ent0 NICs go through a 100Mb switch and are configured as
100 Full Duplex. The ent1 adapters on the two servers are connected
together directly and run at 100 Half Duplex.

server1, ent0:
ent0 P1-I3/E1 IBM 10/100 Mbps Ethernet PCI Adapter (23100020)

Serial Number...............28005403
FRU Number..................091H0397
Part Number.................091H0397
Network Address.............006094DC8016
Displayable Message.........PCI Ethernet Adapter (23100020)
Device Specific.(YL)........P1-I3/E1


PLATFORM SPECIFIC

Name: ethernet
Model: AMD,am79C971
Node: ethernet@1
Device Type: network
Physical Location: P1-I3/E1


server1a, ent1:
ent1 P1/E1 IBM 10/100 Mbps Ethernet PCI Adapter (23100020)

Network Address.............0006298418D2
Displayable Message.........PCI Ethernet Adapter (23100020)
Device Specific.(YL)........P1/E1


PLATFORM SPECIFIC

Name: ethernet
Model: AMD,am79C971
Node: ethernet@1
Device Type: network
Physical Location: P1/E1


server2, ent0:
ent0 P1-I3/E1 IBM 10/100 Mbps Ethernet PCI Adapter (23100020)

Serial Number...............10013661
FRU Number..................091H0397
Part Number.................091H0397
Network Address.............000629DC0C81
Displayable Message.........PCI Ethernet Adapter (23100020)
Device Specific.(YL)........P1-I3/E1


PLATFORM SPECIFIC

Name: ethernet
Model: AMD,am79C971
Node: ethernet@1
Device Type: network
Physical Location: P1-I3/E1


server2a, ent1:
ent1 P1/E1 IBM 10/100 Mbps Ethernet PCI Adapter (23100020)

Network Address.............0006298418AD
Displayable Message.........PCI Ethernet Adapter (23100020)
Device Specific.(YL)........P1/E1


PLATFORM SPECIFIC

Name: ethernet
Model: AMD,am79C971
Node: ethernet@1
Device Type: network
Physical Location: P1/E1

If I compress the file, I can copy it OK. Otherwise, it gets to 655360 out
of 1303443 bytes and stops, the rcp (or whatever) eventually timing out.

I've also had what I think is the same problem if I simply NFS mount the
files from the original remote server. When the Oracle installer is used,
it works for a while but then we get NFS errors and I'm no longer able to
access the filesystem, until I unmount and remount it. Again, this problem
did NOT occur on server2, which was upgraded the week before.

Everything else is working fine. I've copied a load of other similar Oracle
installation files to server1 without difficulty. Everything else - NFS,
telnet, rsh, and a bunch of other network stuff is OK. It's just this one
file.

So it looks to me like there is some strange combination of bits in the file
which is interacting with the NIC somehow and causing the copy to fail.
We've managed to circumvent the problem by copying in a roundabout way to
the second NIC, but I would like to understand what's going on and resolve
it properly.


Has anybody seen anything like this, or can suggest ways to investigate
further and perhaps resolve it? Anything involving an outage to the server
is out, as it host a critical factory system which really does run 24x7.

--
Simon Green
Altria ITSC Europe s.a.r.l.

AIX-L Archive at https://lists.princeton.edu/listserv/aix-l.html
<https://lists.princeton.edu/listserv/aix-l.html>

New to AIX? http://publib-b.boulder.ibm.com/redbooks.nsf/portals/UNIX
<http://publib-b.boulder.ibm.com/redbooks.nsf/portals/UNIX>

N.B. Unsolicited email from vendors will not be appreciated.
Please post all follow-ups to the list.



Relevant Pages

  • Re: Domain not accessable in Network Neighborhood
    ... > I have a brand new Active Directory run by three ... > Windows 2003 Enterprise Edition servers. ... > Server1 is the master browser. ... > use this network resource. ...
    (microsoft.public.windows.server.dns)
  • Re: acessing a share
    ... I have one laptop to which I login as a local admin, ... IP address of server1 in order to access the share1. ... network, I login into that laptop with work's domain user. ... prompted for the username/password like before. ...
    (microsoft.public.windows.server.networking)
  • Win2003 Server: Server not reachable + Browser service erros
    ... I manage a simple network with two Win2003 Std and about ~15 clients with ... of the network, and it's a problem becasue on that server is installed IIS ... After 60-90 seconds the Server1 will be reachable again. ... The browser service was unable to retrieve a list of servers from the ...
    (microsoft.public.windows.server.networking)
  • Re: two modems issue
    ... The vpn clients can use directly 4 servers (RDP ... tested it on another network without gateway and it worked fine. ... Add one route on the server1 to point the VPN client. ... mailserver is getting the mail through modem one. ...
    (microsoft.public.windows.server.networking)