Re: Truly bizarre NIC (?) problem



We have seen this style of problem many, many times. It has always been a mis
match in duplex - we are set to 100/full and the switch is at 100/half. The
symptom we see is small files, ping, and interactive traffic work fine, but any
large file hangs after a several packets. To test, try sending a 1 to 2 GB file
and see if it goes.

Check with your network people to see what speed the port is configured to on
the switch. For 10/100, it should be forced to 100/Full. Many times we find it
at Autonegotiate on either our side or the network switch side.

Mark Hunter
Anheuser-Busch Cos.
MIS Consultant, ES&SO Server Planning and Integration
*Office: (314) 632-6663
*Fax: (314) 632-6901
*Pager: (314) 841-4026
*Email: Mark.Hunter@xxxxxxxxxxxxxxxxxx

The information transmitted (including attachments) is covered by the Electronic
Communications Privacy Act, 18 U.S.C. 2510-2521, is intended only for the
person(s) or entity/entities to which it is addressed and may contain
confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon, this
information by persons or entities other than the intended recipient(s) is
prohibited. If you received this in error, please contact the sender and delete
the material from any computer.




-----Original Message-----
From: IBM AIX Discussion List [mailto:aix-l@xxxxxxxxxxxxx] On Behalf Of Green,
Simon
Sent: Tuesday, May 23, 2006 11:08 AM
To: aix-l@xxxxxxxxxxxxx
Subject: Truly bizarre NIC (?) problem

This one has me bemused, not to mention stumped. Fortunately I've circumvented
it for now, so this is mainly to satisfy my curiosity.

I'll skip past some of the investigation I've done as it's long and tedious, and
cut to the chase.


I have a file - an Oracle installation file: some sort of library file, I think
- which I cannot copy over the network.

I've tried using rdist, ftp and rcp and all of them fail with this individual
file. But only when I copy it to a particular adapter.

So...
rcp libfontmanager.a server1:<somewhere> fails.
rcp libfontmanager.a server2:<somewhere> works
rcp libfontmanager.a server1a:<somewhere> works

I've tried copying it from a couple of different servers and it always produces
the same results, so I'm fairly sure the problem lies at the "server1" end.

"server1" and "server2" are identically configured F80s. At the moment they're
running AIX 5.2 ML7. The problem also exhibited itself before the AIX
migration, when server1 was running 4.3.3 ML11. I upgraded the firmware to the
latest level before the AIX migration.
"server1a" is a second 10/100 ethernet adapter. It's configured slightly
differently. Both ent0 NICs go through a 100Mb switch and are configured as 100
Full Duplex. The ent1 adapters on the two servers are connected together
directly and run at 100 Half Duplex.

server1, ent0:
ent0 P1-I3/E1 IBM 10/100 Mbps Ethernet PCI Adapter (23100020)

Serial Number...............28005403
FRU Number..................091H0397
Part Number.................091H0397
Network Address.............006094DC8016
Displayable Message.........PCI Ethernet Adapter (23100020)
Device Specific.(YL)........P1-I3/E1


PLATFORM SPECIFIC

Name: ethernet
Model: AMD,am79C971
Node: ethernet@1
Device Type: network
Physical Location: P1-I3/E1


server1a, ent1:
ent1 P1/E1 IBM 10/100 Mbps Ethernet PCI Adapter (23100020)

Network Address.............0006298418D2
Displayable Message.........PCI Ethernet Adapter (23100020)
Device Specific.(YL)........P1/E1


PLATFORM SPECIFIC

Name: ethernet
Model: AMD,am79C971
Node: ethernet@1
Device Type: network
Physical Location: P1/E1


server2, ent0:
ent0 P1-I3/E1 IBM 10/100 Mbps Ethernet PCI Adapter (23100020)

Serial Number...............10013661
FRU Number..................091H0397
Part Number.................091H0397
Network Address.............000629DC0C81
Displayable Message.........PCI Ethernet Adapter (23100020)
Device Specific.(YL)........P1-I3/E1


PLATFORM SPECIFIC

Name: ethernet
Model: AMD,am79C971
Node: ethernet@1
Device Type: network
Physical Location: P1-I3/E1


server2a, ent1:
ent1 P1/E1 IBM 10/100 Mbps Ethernet PCI Adapter (23100020)

Network Address.............0006298418AD
Displayable Message.........PCI Ethernet Adapter (23100020)
Device Specific.(YL)........P1/E1


PLATFORM SPECIFIC

Name: ethernet
Model: AMD,am79C971
Node: ethernet@1
Device Type: network
Physical Location: P1/E1

If I compress the file, I can copy it OK. Otherwise, it gets to 655360 out of
1303443 bytes and stops, the rcp (or whatever) eventually timing out.

I've also had what I think is the same problem if I simply NFS mount the files
from the original remote server. When the Oracle installer is used, it works
for a while but then we get NFS errors and I'm no longer able to access the
filesystem, until I unmount and remount it. Again, this problem did NOT occur
on server2, which was upgraded the week before.

Everything else is working fine. I've copied a load of other similar Oracle
installation files to server1 without difficulty. Everything else - NFS,
telnet, rsh, and a bunch of other network stuff is OK. It's just this one file.

So it looks to me like there is some strange combination of bits in the file
which is interacting with the NIC somehow and causing the copy to fail.
We've managed to circumvent the problem by copying in a roundabout way to the
second NIC, but I would like to understand what's going on and resolve it
properly.


Has anybody seen anything like this, or can suggest ways to investigate further
and perhaps resolve it? Anything involving an outage to the server is out, as
it host a critical factory system which really does run 24x7.

--
Simon Green
Altria ITSC Europe s.a.r.l.

AIX-L Archive at https://lists.princeton.edu/listserv/aix-l.html
<https://lists.princeton.edu/listserv/aix-l.html>

New to AIX? http://publib-b.boulder.ibm.com/redbooks.nsf/portals/UNIX
<http://publib-b.boulder.ibm.com/redbooks.nsf/portals/UNIX>

N.B. Unsolicited email from vendors will not be appreciated.
Please post all follow-ups to the list.

The information transmitted (including attachments) is
covered by the Electronic Communications Privacy Act,
18 U.S.C. 2510-2521, is intended only for the person(s) or
entity/entities to which it is addressed and may contain
confidential and/or privileged material. Any review,
retransmission, dissemination or other use of, or taking
of any action in reliance upon, this information by persons
or entities other than the intended recipient(s) is prohibited.
If you received this in error, please contact the sender and
delete the material from any computer.



Relevant Pages

  • Re: Fully parallel Scheme-based language w/ evaluator
    ... Windows Server 2003 and networks in simple - and irreverent - terms. ... If networking really is a big deal, ... Concepts and Terminology in Part I, and The Design and Deployment of Network ...
    (comp.lang.misc)
  • Re: Truly bizarre NIC (?) problem
    ... I put this down to the poor network. ... ES&SO Server Planning and Integration ... "server1a" is a second 10/100 ethernet adapter. ... Both ent0 NICs go through a 100Mb switch and are configured as ...
    (AIX-L)
  • Re: Truly bizarre NIC (?) problem
    ... ES&SO Server Planning and Integration ... Truly bizarre NIC problem ... I put this down to the poor network. ... "server1a" is a second 10/100 ethernet adapter. ...
    (AIX-L)
  • Re: Outgoing POP3 email missing/lost/not received
    ... Funny thing is that I have had this ISP for 8 years and it has always been ... It looks like when you last ran CEICW, you set the ISP's mail server to: ... Internet Connection Wizard. ... After the wizard completes, the following network connection ...
    (microsoft.public.windows.server.sbs)
  • Re: Downloads Fail With IIS 6.0
    ... and many calls between us and Microsoft and Symantec...but ... We had set up a test web server that was ... The internal NIC on the firewall machine was set to 100mbps full duplex, ... > network specialists look at some things. ...
    (microsoft.public.inetserver.iis)