Re: Tracking down em problem

From: Eric Anderson (anderson_at_centtech.com)
Date: 11/02/05

  • Next message: Eric Anderson: "Re: Tracking down em problem"
    Date: Wed, 02 Nov 2005 09:51:24 -0600
    To: Sven Willenberger <sven@dmv.com>
    
    

    Sven Willenberger wrote:
    > FreeBSD6.0-RC1 (Wed Oct 26 13:31:21 EDT 2005)
    >
    > I seem to have an issue with losing connections to an em interface
    > during process of heavy IO load. There are several variables here so I
    > am hoping for some guidelines to help troubleshoot this.
    >
    > I have a postgresql server (8.0.4) set up on an i386 system. The data
    > directory is on its own partition (which is actually a gstripe/gmirror
    > setup -- see the footnote after my problem description).
    >
    > I have enabled a replication system from another server. When I started
    > relication there was a large amount of data that had to be fed to this
    > server via the em0 interface. During this process, while ssh'ed to the
    > box, my connection would just hang for a few moments, then it would
    > recover. However, if I cd to the data directory (stripe/mirror) and
    > start ls -alrt several times, the connection actually gets broken; not
    > only my ssh connection but the replication connection from the master
    > server is broken.
    >
    > I have tried to set debug.mpsafenet=0 in /boot/loader.conf to no avail
    > -- the same issue happens. Preemption is enabled in the kernel, as is
    > sched_4bsd. I don't really know how to proceed at this point to try and
    > troubleshoot this issue: as it stands now, it is most definitely a show
    > stopper for the purposes of this server.

    I've seen something similar on recent 5.4-STABLE, also using emX
    devices. I have 3 Dell 1850's showing the same exact issue, and a few
    1850's that are not. The ones that are not, are 5.4-RELEASE, and the
    ones that do, are running 5.4-STABLE. In dmesg, I see a warning like this:

    Nov 1 19:56:06 hal kernel: em1: Link is up 1000 Mbps Full Duplex

    I don't see a 'link is down', just 'Link is up'. One machine I've seen
    this on repeatedly is from about August 16th.

    I'm using SCHED_4BSD, SMP, and most of the other GENERIC settings.

    If anyone wants more details, let me know. I have a spare Dell 1850 I
    can play with.

    Eric

    -- 
    ------------------------------------------------------------------------
    Eric Anderson        Sr. Systems Administrator        Centaur Technology
    Anything that works is better than anything that doesn't.
    ------------------------------------------------------------------------
    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
    

  • Next message: Eric Anderson: "Re: Tracking down em problem"