Re: Performance problem (I/O)

From: Bob Harris (harris_at_zk3.dec.com)
Date: 03/16/04


Date: Tue, 16 Mar 2004 01:44:11 GMT

In article <ff81d3b0.0403150527.2b6004aa@posting.google.com>,
 paltskallen@hotmail.com (Fredrik) wrote:

> Hello!
>
> We have a advfs filesystem with ~ 8 million small files and it is
> still growing.
> This all is in a SAN, HSG80 with striped disks.
>
> The weekly networker fullbackup takes about 20 hours and it seems that
> it is the os that is waiting for io.
> It is also lsm mirrored between two different SAN's
> The tar command takes about the same time so it seems that it not a
> networker problem.
>
> output from collect:
> # DISK Statistics
> #DSK NAME B/T/L R/S RKB/S W/S WKB/S AVS AVW ACTQ WTQ
> %BSY
> 0 dsk19 - 136 1091 176 1467 5.27 30.76 1.65 3.87
> 76.69
> 1 dsk7 - 136 1097 174 1457 4.94 25.20 1.54 3.14
> 79.09
>
> Anybody has any tips, clues how to speed it up or what the problem is?

That seems about right. 8 million files means most likely a minimum of
about 8 million seeks, with each seek averaging about 8 milli-seconds.

Actually, you are possibily getting faster than 8 milli-seconds per
seek, but figure that there is more than 8 million seeks. There are
seeks to lookup the filename (most likely a lot of this is cached), seek
to get the metadata for the file being opened (but in some cases the
metadata page from a previous file open is cached). Seek from the
metadata to the user data when you read the small file. If the small
file was fragmented, it might be in more than one 8K page, or 8K page
and a less than 8K frag in the frag file (might get some caching if a
frag shared the same page and was still in the cache). Seek for each
non-contiguous extent. And because you opened the file, the file
modification time was changed, so seek back to the metadata and update
the time stamp. And because there is a metadata modification, the
metadata change is transaction logged, so there will be a seek to the
log (this is a lazy update, so multiple updates might be consolidated
into a single write to disk).

If you don't care about access times, you could use the noatimes mount
option.

Place the log file on its own volume (switchlog). Create a volume a bit
larger than the log file, and then addvol and switchlog.

If your files are typically greater than 8K, but less than 100K to 150K,
and are not typically an even multiple of 8K, then if you are not
concerned about the extra disk space, you could disable the frag file so
that the minimum file size if 8K and grows in mulitples of 8K (man
chfsets on more recent versions of AdvFS, otherwise it is a dbx patch
the kernel method; but it only affects new files or files that are
extended; existing files remain frag'ed). NOTE: This can increase the
amount of storage a lot. For example if you have mostly 1K files you
would increase your storage needs 7 fold by disabling frags. But if
your files are more like 100K, then maybe only 10% more space.

I'm sure someone else might have ideas.

                                        Bob Harris



Relevant Pages

  • LVM cant find devices after adding new disks
    ... the disk the system boots normally. ... with an alternate superblock: ... # This is an example configuration file for the LVM2 system. ... # Configuration of metadata backups and archiving. ...
    (Debian-User)
  • Re: CreateFile() and FILE_FLAG_WRITE_THROUGH
    ... write-through to do this as long as I call FlushFileBuffersto update the metadata each time a new block is added. ... is written with write through, and if the power then fails, the data is available to be read on restart. ... variety of hardware platforms with both SATA and SCSI disk systems, ... Surely the whole point of specifying that flag is that I want the data to be ...
    (microsoft.public.win32.programmer.kernel)
  • Re: Why must NFS access metadata in synchronous mode?
    ... But I am not sure how local file system like Ext3 handle this ... I don't think Ext3 must synchronously write metadata (I will ... The page will be flushed to disk afterward. ... server writes to the mmaped ...
    (Linux-Kernel)
  • Re: ext3 metadata performace
    ... My problem however is related to metadata updates. ... which even has an battery buffered disk cache. ... disk cache on my plain SATA disk using ext3, ... In order for the journalling to work, the metadata updates must be written to the journal before any of them start modifying the actual disk metadata, otherwise there is no way to recover in the event of a crash. ...
    (Linux-Kernel)
  • Re: Performance problem (I/O)
    ... defragment on the old domain says: ... > metadata page from a previous file open is cached). ... > frag shared the same page and was still in the cache). ... > larger than the log file, ...
    (comp.unix.tru64)