Re: NFS write() calls lead to read() calls?



Greetings,

On Wed, Mar 28, 2007 at 11:38:44AM +0200, Ulrich Spoerlein wrote:

I observe a strange effect, when using the following setup: Three
FreeBSD 6.2[1] machines on Gigabit Ethernet using em(4) interfaces.

HostC is the NFS server, HostB has /net/share mounted from HostC. I
will use HostA and HostB to demonstrate the issue. Picture this:

hostA # scp 500MB hostB:/net/share/

Iff the file "500MB" does not yet exist on the NFS share, I can see X
MB/s going out of HostA, X MB/s coming in on HostB, X MB/s going out
on hostB again and finally X MB/s coming in on HostC.

If I run the scp again, I can see X MB/s going out from HostA, 2*X
MB/s coming in on HostB and X MB/s out plus X MB/s in on HostC. What's
happening is, that HostB issues one NFS READ call for every WRITE
call. The traffic flows like this:

-----> ----->
A B C
<-----

If I rm(1) the file on the NFS share, then the first scp(1) will not
show this behaviour. It is only when overwritting files, that this
happens.

The real weirdness comes into play, when I simply cp(1) from HostB
itself like this:

hostB # cp 500MB /net/share/

I can do this over and over again, and _never_ get any noteworthy
amount of NFS READ calls, only WRITE. The network traffic is also, as
you would expect.

Then I tested using ssh(1) instead of scp(1), like this:

hostA # cat 500MB | ssh hostB "cat >/net/share/500MB"

This works, too. Probably, because sh(1) is truncating the file?

So, can someone please explain to me, what is happening and if/how it
can be avoided?

My first guess is that scp and Samba use too small an I/O block
size. Forget NFS and simply imagine that an application issues
writes in 128-byte blocks while the disc block size is 512 bytes.
If the OS is simple, like MS-DOS :-), then it will read the whole
disc block each time and replace just 128 bytes in it on every
application's write. If the OS is a bit more sophisticated, say
FreeBSD ;-), it will use a buffer cache to alleviate the disc churn.
However, it still will have to read the disc block once on the first
small write to it because it has no way to know that the application
is going to overwrite the whole of the disc block in a moment. So
each disc block is read once and written once; but the OS still has
to read it due to the poor choice of the write block size.

Of course, my scenario implies that the file already contains data
and the writes go over them, not beyond the end of file.

Something similar (but maybe a bit more complex) should be going
on in your case.

--
Yar
_______________________________________________
freebsd-current@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: NFS write() calls lead to read() calls?
    ... HostB has /net/share mounted from HostC. ... Iff the file "500MB" does not yet exist on the NFS share, ... MB/s going out of HostA, X MB/s coming in on HostB, X MB/s going out ... writes in 128-byte blocks while the disc block size is 512 bytes. ...
    (freebsd-net)
  • NFS write() calls lead to read() calls?
    ... HostC is the NFS server, HostB has /net/share mounted from HostC. ... Iff the file "500MB" does not yet exist on the NFS share, ... MB/s going out of HostA, X MB/s coming in on HostB, X MB/s going out ...
    (freebsd-current)
  • NFS write() calls lead to read() calls?
    ... HostC is the NFS server, HostB has /net/share mounted from HostC. ... Iff the file "500MB" does not yet exist on the NFS share, ... MB/s going out of HostA, X MB/s coming in on HostB, X MB/s going out ...
    (freebsd-net)
  • Re: NFS write() calls lead to read() calls?
    ... MB/s coming in on HostB and X MB/s out plus X MB/s in on HostC. ... that HostB issues one NFS READ call for every WRITE ... The traffic flows like this: ...
    (freebsd-current)
  • NFS problem lost connection
    ... i have some problem with my NFS setting. ... this few days it starting to have some connection problem. ... hostB - nfs client with few remote mount filesystem from nfs server plus 1 ... one of the shared partition in hostA remotely mount by hostB sometimes lost ...
    (SunManagers)