Re: Modifying file access time upon exec...

From: Bruce Evans (bde_at_zeta.org.au)
Date: 05/28/05

  • Next message: Bruce Evans: "Re: Re: Modifying file access time upon exec..."
    Date: Sat, 28 May 2005 14:12:16 +1000 (EST)
    To: Ken Smith <kensmith@cse.Buffalo.EDU>
    
    

    On Fri, 27 May 2005, Ken Smith wrote:

    > On Fri, 2005-05-27 at 11:17 +0200, Marc Olzheim wrote:
    >> On Thu, May 26, 2005 at 04:24:25PM -0400, Ken Smith wrote:
    >>> Any thoughts before I commit it? The patch itself is pretty small. But
    >>> given the sections of code it's mucking with combined with it adding a
    >>> little 'nit' filesystem implementers should be aware of I wanted to run
    >>> it by as many clueful eyes as possible before doing the final commit.
    >>
    >> Has this been run through some kind of real world performance test ? I
    >> can imagine for instance /bin/sh's vnode is being updated a lot... Would
    >> it be eligible to a becoming a mount option ?
    >
    > Bruce did some benchmarking and this approach seemed to be the minimal
    > hit on performance of the options we have. The other things that got
    > tested were things like "fake reads". The whole issue started when the
    > exec mechanisms were shifted away from doing file reads in favor of a
    > more mmap based mechanism for starting the executables.

    The impact is so small that it is hard to see in real world tests. Hence
    microbenchmarks to increase its effect to 10% or so.

    >> From his tests the hit seemed minimal. The noatime mount option seems
    > to be the most appropriate thing to use for turning it off, and in that
    > case the only cost involved with this addition is the check in exec to
    > see if the file is coming from a filesystem that's either noatime or
    > readonly.
    >
    >> I don't see any real problems with it, but perhaps people running
    >> executables over NFS filesystems that cannot be mounted with noatime
    >> might have an issue, like netbooting diskless machines...

    [In a reply, you clarified this to say that another flag might be needed
    to disable this new pess^Wfeature since -noatime might not be available
    for all file systems.]

    Well, if -noatime is not available then you may have already lost
    significantly except on write-mostly or exec-mostly file systems. The
    new behaviour only loses significantly in the exec-mostly case, and
    then only when execs mostly don't cause reads as a side effect.

    For nfs, the -noatime option and atime timestamps generally are horribly
    broken. This brokenness significantly limits the overheads from the
    change unless we add to the patch to make atime timestamps on exec
    actually work for nfs without changing nfs's basic mishandling of atimes.

    An early version of the patch did make atime timestamps sort of work
    for nfs. It did this by setting the atime in vattr (where the current
    code intentionally leaves the atime as VNOVAL so that the VOP_SETATTR()
    call has no effect fof file systems that haven't been changed to
    understand VA_EXEWCVE_ATIME). This made VOP_SETATTR() set the atime
    in the same way as for utimes(2), except there was the VA_EXECVE_ATIME
    flag to modify the behaviour. A modification is needed to bypass
    permissions checking. I only implemented the modification for ffs.
    Thus for nfs, the change had much the same overhead as utimes(2) after
    every exec and permissions stuff was broken. For ffs, atimes are
    cached and are written by delayed writes so utimes(2) has a relatively
    low overhead, but for nfs the timestamps written by utimes(2) are
    considered much more precious than most other timestamps -- they are
    synced immediately, and this involves a slow nfs transaction and a
    synchronous write on the server (modulo sync/normal/async mounts and
    bugs in these), so everything is slowed down; OTOH, other timestamps
    in nfs are mostly handled more efficiently by not doing them right.

    More on broken -noatime mount option and atime timestamps in nfs:
    - Mounting with -noatime on the client has no effect. It is a general
       bug in the mount utilities that some flags which don't apply to the
       particular file system are silently ignored. -noatime is one of the
       generic flags which could in theory work for all file systems, so it
       is passed to all sub-mounts and is then confusing for the file systems
       that don't support it.
    - Mounting with -noatime on the server has an an effect. It stops normal
       atime timestamps for reads (only). This is usually what is wanted, but
       strictly it breaks clients mounted without -noatime.
    - Reads on the client are mostly cached, and nfs apparently isn't aware
       that _all_ reads should set atime (unless the client is mounted with
       -noatime), so it doesn't tell the server anything and most reads don't
       change the atime on the server. It would be too expensive to tell the
       server about all atime changes, so a cache on the client is needed.
       A simple local cache would only work if nothing else looks at the
       timestamps. The cache must somehow be flushed to the server when
       necessary. Syncing every second might work OK. The thing to avoid is
       thousands of transactions every second -- a modern system can easily do
       thousands of reads and/or execs per second provded they are mostly from
       a local cache.
    - Execs on the client involve reads on the server unless the file is cached,
       since although exec() uses mmap() and not read(), uncached files can only
       be read using read() on the server. Thus for nfs, nothing needs to be
       changed for atimes to be set for exec() in the same (wrong) way that they
       are set for read().
    - For utimes(2) and some other metdata changes on the client, the client
       normally wants to force a synchronous change of the metadata on the
       server. The client has sufficient control of the details in nfs >=3.
       However, FreeBSD doesn't implement metadata-only sync, so FreeBSD
       servers have to fake it by syncing everything for the file. This adds
       to overheads and defeats the server's policy of not carin much about
       timestamps except for their efficiency.

    Bruce
    _______________________________________________
    freebsd-arch@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-arch
    To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"


  • Next message: Bruce Evans: "Re: Re: Modifying file access time upon exec..."

    Relevant Pages