Re: Bad performance when accessing a lot of small files



* Alexandre Biancalana <biancalana@xxxxxxxxx> [071219 11:35] wrote:
Hi List,

I have a backup server running FreeBSD 7-BETA3. The cpu is CPU:
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz, 3GB Ram, 10x 500GB
SATA, Areca 1231-ML, the filesystem used to backup my other servers
locally is build on top of ARC-1231, 4TB (32k stripe) zfs filesystem
with gzip compression.

This machine receive backups from ~30 servers, (of all kinds and
sizes, databases, fileservers, image servers, webservers, etc) all
night, write the last day in LTO-3 tapes and store some days older
days in disk.

The behavior that I'm observing and that want your help is when the
system is accessing some directory with many small files ( directories
with ~ 1 million of ~30kb files), the performance is very poor.

There is a lot of very good tuning advice in this thread, however
one thing to note is that having ~1 million files in a directory
is not a very good thing to do on just about any filesystem.

One trick that a lot of people do is hashing the directories themselves
so that you use some kind of computation to break this huge dir into
multiple smaller dirs.

If you can figure out a hashing algorithm, that may help you.

For instance, if you tell sendmail to use "/var/spool/mq*"
for its mail spool and you happen to have 256 directories
under "/var/spool/" named "mq000" through "mq256" it will
randomly pick a directory to dump a file in.

This makes the performance a lot better.

For one million files you can probably do a two level hash,
you just have to figure out a good hashing algorithm.

If you you can describe the data, I may be able to help
you come up with a hashing algorithm for it.

-Alfred
_______________________________________________
freebsd-performance@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "freebsd-performance-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: Image storage - db vs filesystem (again)
    ... the table, and not make the same change to the filesystem, or vice-versa? ... Remember that even if you store the image in the database, ... that's a massive single backup task. ... servers, you have backup of the actual file data via redundancy (and you can ...
    (microsoft.public.sqlserver.programming)
  • Bad performance when accessing a lot of small files
    ... I have a backup server running FreeBSD 7-BETA3. ... SATA, Areca 1231-ML, the filesystem used to backup my other servers ...
    (freebsd-performance)
  • Re: Next openSUSE
    ... That will not do because I add the new stuff on the backup drive. ... I take pieces (directory trees) and copy them manually. ... I would have had to use free servers of pay for, ... My guess is that CSS or whatever is causing this bug. ...
    (alt.os.linux.suse)
  • Problematic Backup Rate for Exchange 2007
    ... We are currently only getting about 45 MB/min backup speed to disk ... multi-role servers. ... Can someone relate what speeds they are getting? ... We have three Exchange 2007 servers: ...
    (microsoft.public.exchange.admin)
  • Re: Server Migration - from old server to new server
    ... Get 4 drives, one acts as ... I really don't like to partition SBS servers, ... Dual core is NOT dual CPU, ... also backup the whole server Monthly, ...
    (microsoft.public.windows.server.sbs)