Re: nfs-server silent data corruption




Hello,

Mike Tancsa <mike@xxxxxxxxxx> writes:

At 05:57 PM 4/21/2008, Arno J. Klaassen wrote:
Hi,
How long does it take for the problem to show up ?


Less than an hour in general (running the same client script
simultanuously on a 100Mbps linux box and 1Gbps bds6-x86)

I am running my nic at gig speeds only... I recompiled the kernel
this morning to include cpufreq as well as made sure the cool&quiet
was enabled in the BIOS.



for info, I test with args '38 999' (38M, try 999 times) on linux
(slightly adapted script BTW) and '138 999' on bsd. The best 'score' I
got was 'still 871 iterations to go'


So far I have done 150 loops with an 80MB file and no issues and 200
loopswith a 160MB file. My nfe nic does not support MSI and has its
own interrupt

# vmstat -i
interrupt total rate
irq1: atkbd0 5 0
irq4: sio0 3049 1
irq16: twe0 327046 164
irq19: bge0 385147 194
irq21: atapci1 976355 492
irq23: nfe0 11876726 5986
cpu0: timer 3966420 1999
cpu1: timer 3964392 1998


# vmstat -i
interrupt total rate
irq1: atkbd0 4 0
irq14: ata0 69 0
irq20: nfe0 11650955 5283
irq24: atapci1 94 0
irq28: atapci2 178 0
irq29: ahd0 355704 161
cpu0: timer 4409020 1999
cpu1: timer 4391646 1991
cpu2: timer 4391643 1991
cpu3: timer 4391641 1991

I have powerd started up with
powerd_enable="YES"
powerd_flags="-a adaptive -b adaptive -n adaptive"


slightly different, I mostly use "-b adaptive -i 90 -n adaptive -r 80"
but the problem shows up without flags as well.


With the "sleep" in my test script, powerd does seem to be fiddling
with frequencies as well during the inactivity.

I most often provoke slight swapping for "randomizing" frequency changes
and a burnK7 or similar to psuh up and down by hand

# sysctl dev. | grep -i fre
dev.cpu.0.freq: 1800
dev.cpu.0.freq_levels: 2200/110000 2000/105600 1800/89100 1000/49000
dev.powernow.0.freq_settings: 2200/110000 2000/105600 1800/89100 1000/49000
dev.powernow.1.freq_settings: 2200/110000 2000/105600 1800/89100 1000/49000
dev.cpufreq.0.%driver: cpufreq
dev.cpufreq.0.%parent: cpu0
dev.cpufreq.1.%driver: cpufreq
dev.cpufreq.1.%parent: cpu1

funny, when I do that :

# sysctl dev. | grep -i fre
dev.cpu.0.freq: 995
dev.cpu.0.freq_levels: 2587/95000 2388/90300 2189/76200 1990/63800 1791/53200 995/36100
dev.powernow.0.freq_settings: 2587/95000 2388/90300 2189/76200 1990/63800 1791/53200 995/36100
dev.powernow.1.freq_settings: 2587/95000 2388/90300 2189/76200 1990/63800 1791/53200 995/36100
dev.powernow.2.freq_settings: 2587/95000 2388/90300 2189/76200 1990/63800 1791/53200 995/36100
dev.powernow.3.freq_settings: 6747/95000 6228/90300 5709/76200 5190/63800 4671/53200 2595/36100
dev.cpufreq.0.%driver: cpufreq
dev.cpufreq.0.%parent: cpu0
dev.cpufreq.1.%driver: cpufreq
dev.cpufreq.1.%parent: cpu1
dev.cpufreq.2.%driver: cpufreq
dev.cpufreq.2.%parent: cpu2
dev.cpufreq.3.%driver: cpufreq
dev.cpufreq.3.%parent: cpu3

especially the dev.powernow.3.freq_settings look weird ...

that said, I once more dug up the old acpi_ppc.c and slightly
adapted it for fbsd7 (basically some name changes and using
read_cpu_time() i.s.o. cp_time) and the problem disappears ...

the algo of acpi_ppc makes it somewhat harder to push up frequencies,
though I doubt that matters.

I tried as well with hint.acpi_throttle.0.disabled="1" in loader.conf
with no luck (using powerd).

I'm out of office tomorrow but will try to find time tommorow evening
to test with another NIC.

Best, Arno
_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: [PATCH 2/7] Simple Performance Counters: x86_64 support
    ... Well you have to do a lot more work then to handle instable TSCs then. ... In particular the frequencies can be different between CPUs, ... change (which you can catch with cpufreq notifiers) and during the ... might be running immediate frequencies) ...
    (Linux-Kernel)
  • Re: [patch] prefer TSC over PM Timer
    ... >> While there are a great number of systems that can use the TSC, cpufreq ... > frequency changes just fine. ... not all laptops properly notify the kernel when they change ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)