Re: How to determine write failure?



Alex Fraser wrote:
"Thomas Maier-Komor" <maierkom@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote
in message news:dsfob2$2om$1@xxxxxxxxxxxxxxxxxxxxxxxx
sebasttj@xxxxxxxxx wrote:
If it matters, I'm targeting Linux 2.6 with klibc, but I performed all
tests with glibc on Debian as well with identical results. I'm
interested in all solutions, including those requiring modifications to
the kernel or C runtime (though I realize this may not be the best NG
for that kind of discussion).
depending on the level of integrity you want to achieve, you might be
using the wrong system. Linux is pretty aggressive, when it comes to
disk caches. The upside of this are pretty good results in filesystem
performance benchmarks. And the downside you are learning right now.

I think the aggressiveness of caching (not that I can see great scope in it)
should be entirely irrelevant to the OP's problem. Shouldn't it?


Yes, because a read() could be satisfied from a write() that never hit
the device. And a write() could be dropped for a write() that occures
after it and targets the same block on the device, which is something
the OP does not want (if I correctly understand his specific requirement
that he posted later).

Look here concerning fsync:

POSIX says:
The fsync() function shall request that all data for the open file
descriptor named by fildes is to be transferred to the storage device
associated with the file described by fildes. The nature of the transfer
is implementation-defined. The fsync() function shall not return until
the system has completed that action or until an error is detected.

Linux says:
fsync copies all in-core parts of a file to disk, and waits until the
device reports that all parts are on stable storage. It also updates
metadata stat information.

This seems to match the behaviour specified by POSIX, although it doesn't
mention the error case or updating of filing system metadata (ie anything it
needs to find the file's data). If it does this, it should be able to
report any write error (perhaps propagated all the way from the device
itself).


It does mention metadata implicitly. POSIX does not say how a filesystem
is to be implemented, so it won't refer to file's metadata. The reason
is that a file descriptor could refer to something that is not
associated with a file on a filesystem (e.g. a network socket). But it
says: "all data [...] associated with the file described by fildes".
This includes all metadata (inode metadata, file-entry, directory inode,
etc.).

It does not necessarily ensure that the
entry in the directory containing the file has also reached disk. For
that an explicit fsync on the file descriptor of the directory is also
needed.

I read this part as supplementary information (a warning, basically). I
don't think the POSIX description is intended to mean that directories
referencing the file must also be on stable storage when fsync() returns
successfully.


Maybe I am wrong. Maybe you can point out why I could be wrong.

So even if you open your file with O_SYNC, you will have to do an fsync
for the directory containing your file, and the directory containing
this directory and so on.

Only the directory you create the file in, unless you created that directory
(if so, the directory containing the directory you created too, and so on).


yes, or someone else did so. I think it depends on how much data
integrity you need how you want to resolve this issue.

BTW: Linux's fdatasync is not POSIX conform, as it does something totally
different than requested by POSIX.

How so?


Do I misunderstand the wording of POSIX?

POSIX (http://www.opengroup.org/onlinepubs/009695399/):
The fdatasync() function shall force all currently queued I/O
operations associated with the file indicated by file descriptor fildes
to the synchronized I/O completion state.

The functionality shall be equivalent to fsync() with the symbol
_POSIX_SYNCHRONIZED_IO defined, with the exception that all I/O
operations shall be completed as defined for synchronized I/O data
integrity completion.

Linux:
fdatasync flushes all data buffers of a file to disk (before the system
call returns). It resembles fsync but is not required to update the
metadata such as access time.

Tom
.



Relevant Pages

  • Re: [patch 01/22] update ctime and mtime for mmaped write
    ... or the inode being written out due to normal system ... activity would also cause the metadata to be updated. ... The file isn't _modified_ by sync() or fsync(). ...
    (Linux-Kernel)
  • Re: How to determine write failure?
    ... disk caches. ... The fsync() function shall request that all data for the open file ... device reports that all parts are on stable storage. ... This seems to match the behaviour specified by POSIX, ...
    (comp.unix.programmer)
  • Re: Syncing a files metadata in a portable way
    ... > says clearly that in order to sync metadata an "explicit fsync on the file ... It depends on the Linux filesystem. ...
    (Linux-Kernel)
  • Re: [patch 01/22] update ctime and mtime for mmaped write
    ... or the inode being written out due to normal system ... activity would also cause the metadata to be updated. ... The file isn't _modified_ by sync() or fsync(). ...
    (Linux-Kernel)
  • Re: How to determine write failure?
    ... disk caches. ... The fsync() function shall request that all data for the open file ... Linux's fdatasync is not POSIX conform, ... a look at Solaris/x86 and maybe ...
    (comp.unix.programmer)