weird network problem with B100s/B1600 chassis

From: Adam Levin (levins_at_westnet.com)
Date: 04/25/05

  • Next message: Benoît Audet: "Increase /tmp and /var/run filesystems"
    Date: Mon, 25 Apr 2005 11:34:50 -0400 (EDT)
    To: Sun Managers Mailing List <sunmanagers@sunmanagers.org>
    
    

    Hey all, we've got a weird problem, and I'm not sure how to proceed
    because these systems are out of warranty at this point.

    We've got a Sun Blade B1600 chassis full of servers. We're using both
    internal switches. SWT0 is on one VLAN, SWT1 is on another, both going to
    our core Cisco switches.

    The chassis is full of B100s servers running as web servers (running
    Apache on Solaris 8 04/01 patched as of a few months ago).

    Recently, two of the blades failed, and we bought two new ones. The
    failure mode was that the second interface, ce1, stopped seeing anything
    on the rest of the network.

    We replaced them, jumpstarted them, and we're still having a problem
    seeing anything on the network:

    [11:31:35]root@http-b01.prod:/root$ ping -s 10.20.50.255
    PING 10.20.50.255: 56 data bytes
    64 bytes from http-b01.san (10.20.50.135): icmp_seq=0. time=1. ms
    64 bytes from http-b01.san (10.20.50.135): icmp_seq=0. time=5. ms
    64 bytes from http-b01.san (10.20.50.135): icmp_seq=1. time=0. ms
    64 bytes from http-b01.san (10.20.50.135): icmp_seq=1. time=0. ms
    64 bytes from http-b01.san (10.20.50.135): icmp_seq=2. time=0. ms
    64 bytes from http-b01.san (10.20.50.135): icmp_seq=2. time=0. ms
    64 bytes from http-b01.san (10.20.50.135): icmp_seq=3. time=0. ms
    64 bytes from http-b01.san (10.20.50.135): icmp_seq=3. time=0. ms
    ^C
    ----10.20.50.255 PING Statistics----
    4 packets transmitted, 8 packets received, 2.00 times amplification
    round-trip (ms) min/avg/max = 0/0/5

    We should be seeing a huge number of servers responding, which we do on
    all other blades in the chassis.

    All other blades are functioning normally. The ce0 interfaces on the two
    bad blades are functioning normally.

    I've tried logging in to the switch module. I tried to ping 10.20.50.135,
    but it failed. I also tried to ping 10.20.50.134, a known good server,
    but that also failed, even though the good server is, well, good.

    Has anybody seen this before? I'm not sure what else to do, since it
    *appears* that the switch module and chassis are OK, and this exact same
    problem is happening on two blades, which were both replaced with new last
    week.

    The switch configuration has not changed -- I've confirmed that by
    comparing the running config with the boot config.

    -Adam
    _______________________________________________
    sunmanagers mailing list
    sunmanagers@sunmanagers.org
    http://www.sunmanagers.org/mailman/listinfo/sunmanagers


  • Next message: Benoît Audet: "Increase /tmp and /var/run filesystems"

    Relevant Pages

    • Re: AIX on JS21 Blades
      ... years of experience with IBM RS/6000s and with AIX in general). ... The H Chassis that we're testing has a 20-port ... Brocade FC switch ... JS21 blades is automatically connected to an internal Brocade port. ...
      (comp.unix.aix)
    • Cajun Avaya P580 crashes
      ... swapped out the chassis, management engine, and all blades. ... switch will simply lock up. ...
      (comp.dcom.sys.cisco)
    • Bonding and arp monitoring
      ... multiple HP BL30p blade servers running Red ... Alle servers in the chassi share two internal switches, ... The ARP monitor relies on the network device driver to maintain two ... If the current slave goes down, ...
      (comp.os.linux.networking)
    • Re: Very Strange Network Problem HELP!!!
      ... single switch, and bad when I plugged in the others. ... I have a client with 200 users running Citrix. ... > Anyways, the client was running old servers, so they upgraded to HP DL380 ... it felt like it might be network traffic. ...
      (microsoft.public.win2000.networking)
    • Very Strange Network Problem HELP!!!
      ... Anyways, the client was running old servers, so they upgraded to HP DL380 ... felt like it might be network traffic. ... I took a catalyst 2900 switch that had never been connected to the network, ...
      (microsoft.public.win2000.networking)