Re: Performance problem (I/O)

From: Bob Harris (harris_at_zk3.dec.com)
Date: 03/20/04


Date: Sat, 20 Mar 2004 00:01:40 GMT

In article <c3fijl$os9$1@jeeves.eng.abbnm.com>,
 peter@abbnm.com (Peter da Silva) wrote:

> In article <harris-F51A96.21182918032004@cacnews.cac.cpqcorp.net>,
> Bob Harris <harris@zk3.dec.com> wrote:
> > When you restored the file system using tar, you placed all the files in
> > the exact order that you would then back them up. You positioned the
> > metadata for each file next to the file that would preceed it. As a
> > result you maximized your backup performance.
>
> AdvFS doesn't use an analog of the UFS cylinder group technique to keep
> metadata near the file data?

No.

> That would explain some of the performance problems we see in an app that
> writes a lot of small files.

Not having cylinder groups may not have anything to do with small file
write performance. This could be due to the way the last 8K allocation
of a small file is managed. For space efficency, files under 150K tend
to have the last 8K of the file stored in a frag of from 1K to 7K in
length. But while the is being written, a full 8K is allocated. When
the file is closed, the size of the file is checked and if a 10% saving
in space can be obtained by turning the last 8K into a frag, then a frag
is allocated, the end of the flie copied to the frag, and then the
original 8K is deallocated.

All of this results in additional disk I/O for small files when they are
closed.

In the newer versions of Tru64 UNIX, there is chfsets option to disable
this and make all files created from that point forward allocation
storage in multiples of 8K. For older versions there is a global
variable that can be patched in the kernel to disable frag'ing of a file.

In the 8 million file case, if the small files are evenly distributed
between 1K and and any size that is 8K or greater, then turning off file
frag'ing would increase the storage usage for that file system by about
32 gigabytes. At one time I would have choaked on such a number.
Today, I have more storage than that on my laptop. I will not attempt
to place a value on this to you or your company as laptop storage is not
the same as SCSI, RAID, SAN storage which tends to come in smaller sizes
and cost more. But is still a much lower cost than when I was the
system manager for a VAX-11/780 :-) Times change.

                                        Bob Harris



Relevant Pages

  • Re: EMC to IBM SAN LUN replication
    ... Rather, trying to 'put it inside the SAN on a storage appliance' has severe limitations, but you don't seem to understand them. ... If you posit a shared-storage file system to allow your applications transparent file-level access, then the observations above about applications apply equally to the file system's internal operation. ... Of course, what you've described isn't a very broadly-useful cache, but just a means of supporting lazy inter-site replication. ... "you need something in the replication layer that understands synchronization issues at the file system" ...
    (comp.arch.storage)
  • Re: EMC to IBM SAN LUN replication
    ... you've characterized as 'emerging technology' is in fact very old hat: VMS had it over two decades ago, IBM had it a decade ago in Parallel Sysplex, other Unixes have been developing it more recently, as well as third-parties: it's Windows that's the real laggard. ... you can use the inter-site VMS cluster as a distributed CIFS file server to serve Windows clients - in the manner that you suggested using a distributed 'storage appliance'. ... Only if a) that storage appliance is interlocking raw block access synchronously and b) you're using higher-level shared-storage distributed file system software for file-level accesses. ... Now, if you're using 'distributed cache' to mean something much more like 'distributed locking mechanism' (which tracks potential synchronization issues such that they can be properly addressed should they occur), then we're just using different terminology to describe the same thing. ...
    (comp.arch.storage)
  • Re: Q: Cluster File System Reliability
    ... "cluster file system", means a set of shared storage ... accessible by several computers concurrently (as local storage). ... several hosts, and all hosts are able to concurrently access the ... the disk subsystem will be in consistent ...
    (comp.os.linux.setup)
  • Re: Status of buffered write path (deadlock fixes)
    ... We can't zero these guys and do the commit_write, ... buffers because we'd write junk over good data. ... If they've done allocation, yes. ... You're telling the file system to stop ...
    (Linux-Kernel)
  • Re: Off Topic: Stack vs. Heap
    ... "automatic" storage, which means that their storage is ... C++ does not specify any limit on the heap size. ... Heap allocation is almost always more expensive than ... It's not exactly as popular as other older languages. ...
    (comp.lang.cpp)