ec1 statistics overflow error

From: Clarence Din (din_at_sas.upenn.edu)
Date: 03/07/05

  • Next message: axsdeny_at_gmail.com: "SGI Monitor question"
    Date: Mon, 07 Mar 2005 11:12:54 -0500
    
    

    We have a problem here involving an "ec1 statistics overflow" error that
    causes our SGI O2s to crash. Here's the situation...

    After our department upgraded our switches (different manufacturer, same
    configuration at 10mbps, half-duplex) we saw "ec1 statistics overflow"
    messages in the system logs a number of times per day on our dual
    Ethernet card O2s. In these O2s, ec1 (the PCI card Ethernet interface)
    is configured for the outside world while ec0 (the internal Ethernet
    interface) is attached to an NMR device (a huge magnet used in chemical
    computations).

    SGI (and Brueker, the company that makes the NMR device) suggested
    increasing the number of ecf_max_rxds in /var/sysgen/master.d/if_ecf
    from 40 to 100. This didn't help, as it still crashed, so we asked them
    what we could increase it to safely. That number was 170 so we changed
    it to that. We were still seeing the ec1 statistics overflow errors,
    although not as much as before. We do experience a lot of traffic on the
    part of the network that this SGI and the NMR device are on, but the
    amount of traffic has not increased dramatically (there is no concrete
    evidence to support this statement).

    The large number of ec1 statistics overflow errors eventually caused our
    O2 to crash with a CPU kernel fault. Increasing the ecf_max_rxds would
    prevent the crashes, but something else was now happening. The network
    interface (ec1) would all of a sudden drop, not immediately, but after a
    day or so, so I wrote a script that runs "ifconfig ec1 up" every 10
    minutes. SGI address the following questions of mine.

    1. Will running ifconfig ec1 up every 10 minutes slow down traffic
    considerably or is it negligible? SGI says it should be negligible. Is
    ifconfig smart enough to know that if it is up, it will not toggle off
    and then back on? (How does ifconfig work?) SGI hasn't gotten back to me
    yet on this.

    2. Will doing ifconfig ec1 up affect ec0 in any way? SGI says no.

    We checked that the port configuration of the old and new switches is
    indeed the same. We've tried connecting the O2 to a port on the old
    switch and then in turn to a port on the new switch. Same problem.

    If you have any further thoughts on this, please reply.

    Clarence


  • Next message: axsdeny_at_gmail.com: "SGI Monitor question"