Re: nfs-server silent data corruption




Hello,

Mike Tancsa <mike@xxxxxxxxxx> writes:

At 10:52 AM 4/21/2008, Arno J. Klaassen wrote:

Device is :

nfe0@pci0:0:10:0: class=0x068000 card=0x289510f1
chip=0x005710de rev=0xa3 hdr=0x00
vendor = 'Nvidia Corp'
device = 'nForce4 Ultra NVidia Network Bus Enumerator'
class = bridge
cap 01[44] = powerspec 2 supports D0 D1 D2 D3 current D0

(this is with the default BIOS setting " LAN Bridge Enabled", disabling
that setting makes pciconf say "class = network" but does not influence
my problem)

I will restart my tests now by populating all 4G to only CPU1 and
say whether that matters.

Hi,
How long does it take for the problem to show up ?


Less than an hour in general (running the same client script
simultanuously on a 100Mbps linux box and 1Gbps bds6-x86)

I have what appears
to be a very similar Tyan board (I have an Socket 939 X2 cpu) with the
same NIC, but this one is running RELENG_7 from April 17th. There
have been a few fixes for the nfe driver since 7.0

I am running this small script below on a nfs client (em nic) against
the server (nfe) ( mount options on the client 192.168.245.1:/backup
/backup nfs rw,-r=32768,-w=32768,tcp,noauto )

#!/bin/sh
i=0
while true
do
i=`expr $i + 1`
dd if=/dev/urandom of=/tmp/junk.txt bs=1024 count=81920 > /dev/null 2>&1
cp -p /tmp/junk.txt /backup/
orig=`md5 -q /tmp/junk.txt`
umount /backup
sleep 2
mount /backup
copy=`md5 -q /backup/junk.txt`
echo "$orig and $copy on $i"
if [ $orig != $copy ]; then
echo "\a copy not ok on $i"
exit 255
fi
done


quite the same as what I do (apart from the umount/sleep/mount and I
use same partition for write and copy) :

SIZE=$1

COUNTER=${2:-20}

until [ $COUNTER -lt 1 ]; do
echo "**** Still $COUNTER iterations to go *** "
echo
echo -n Creating random file of $SIZE MBytes ...
dd if=/dev/random of=BIG bs=1048576 count=${SIZE} > /dev/null 2>&1
echo Done
echo -n Calculating md5 checksum ...
CS1=`md5 -q BIG`
echo Done
echo -n Copying file ...
cp -fp BIG BIG2
echo Done
echo -n Calculating md5 checksum ...
CS2=`md5 -q BIG2`
echo Done
if [ ${CS1} != ${CS2} ]; then
echo CHECKSUM MISMATCH
exit -1
else
echo
fi
let COUNTER-=1
done


for info, I test with args '38 999' (38M, try 999 times) on linux
(slightly adapted script BTW) and '138 999' on bsd. The best 'score' I
got was 'still 871 iterations to go'

On the server, I have

nfe0@pci0:0:10:0: class=0x068000 card=0x286510f1 chip=0x005710de
rev=0xa3 hdr=0x00
vendor = 'Nvidia Corp'
device = 'nForce4 Ultra NVidia Network Bus Enumerator'
class = bridge
cap 01[44] = powerspec 2 supports D0 D1 D2 D3 current D0


idem

# ifconfig nfe0
nfe0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=10b<RXCSUM,TXCSUM,VLAN_MTU,TSO4>
ether 00:e0:81:58:91:6a
inet 192.168.245.1 netmask 0xffffff00 broadcast 192.168.245.255
media: Ethernet autoselect (1000baseTX <full-duplex,flag0,flag1>)
status: active

idem

How long does it take for the problem to come up ?

as said : approximately half an hour; never more than 4 hours


Best, Arno
_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: nfs-server silent data corruption
    ... device = 'nForce4 Ultra NVidia Network Bus Enumerator' ... (this is with the default BIOS setting " LAN Bridge Enabled", ...
    (freebsd-stable)
  • Re: config linux as a network switch
    ... ifconfig -> the ifconfig command, ... eth0-2 -> The NICs ... brctl is a bridge config tool. ... echo Nulling NICs ...
    (comp.os.linux.networking)
  • Re: [SLE] Permanant bridge
    ... Drew Burchett wrote: ... rebooted, the bridge goes away. ... to LSB for all commands but status: ... echo -n "Shutting down Bridge" ...
    (SuSE)