Re: cpio performance vs tar - bit of a mystery



In article <1137150269.285835.304300@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
<ghee2ghee@xxxxxxxxxxx> wrote:
>Hello all,

>I need to backup a lot of data from several drives to several different
>drives on an OSR507 system - and I need to do it quickly. I'm
>surprised so far by what I've seen (using multiple 2GB "chunks" from
>each drive):

>find, cpio -o, gzip: 13mins per chunk
>find, cpio -o, no gzip: 11 mins per chunk
>find, cpio -p, no gzip: 8 mins per chunk
>tar, gzip: 7 mins per chunk
>tar, no gzip: 6 mins per chunk

>First of all, I'm surprised that tar appears to be able to handle data
>faster than cpio - I thought they would use similar system calls and
>would be lightweight programs. Secondly, I'm surprised that cpio -p is
>faster than cpio -o - it's got the same amount of data to shift, and
>all the new file/directory traversal stuff to do.

If you read the man pages carefull you'll see that -o uses what is
called a copy-out mode, while -p is passthru which combines copy-in
and copy-out without using any archive and just copies files from
one directory to another. So a lot of overhead is elminated using
the pass-through mode.

I'd use the pass thru mode often when I needed to move
file-hierarchies which were on the same filesystem.

I'd used -pdlmv - and you can only use -l with the -p.

It creates hard links so that the new directory is the only thing
that is created and files stay in their same location - that's why
it only works in a file-system not across them. Time is blazingly
fast as you only create directories - even on slow systems it's
about 20 times faster.

Then you go and remove the original directories.

This only works on systems that support hard links - and I
understand from reading what others have said [I have not verified
this directly] the Solaris 10 does not support hard links.

You also noticed that gziping things takes longer. Gziping is
great when you move things across wires - but if you are doing it
on directly connected drives you are bringing in a process in the
middle that does not have to be done as you compress, copy, and
uncompress and you actually lose performance as you have noted.

>(via skunkware)
>NB2 - the machine is a dual Xeon 2.8GHz system with bags of memory and
>6 10K 36GB SCSI drives on a non-caching SCSI controller (OS on another
>controller+drive)

Just a comment on the 10K drives. While rotational speed does come
into play the head design has a lot to do with performance. When
you want to know how a drive performs forget the rotational speed
when comparing drives and go directly to the manufactuers technical
specs pages and look at the data transfer speed from platter
to/from buffer. That is a limiting factor.

In the past when 7200 drives were the fast ones I'd measured
instances where 5400 rpm drives outperformed 7200 drives because
there was far more data pertrack - and thus per revolution -
because of better heads on the slower rotating drives.

It's like comparing car engines based only on displacment size - a
larger engine is not neccesarily more powerful.

Bill


--
Bill Vermillion - bv @ wjv . com
.



Relevant Pages

  • Re: cksum entire dir??
    ... but the gentleman who recommened cpio was right on the money. ... How can I uise my FBSD floppy drive to copy files to it (in this case, ... use typically UNIX tape tools to read/write, such as tar and cpio. ... drives, ...
    (freebsd-questions)
  • Re: cpio performance vs tar - bit of a mystery
    ... >drives on an OSR507 system - and I need to do it quickly. ... >find, cpio -o, no gzip: ... >tar, no gzip: 6 mins per chunk ... The read size for OpenServer cpio is always 4096 ...
    (comp.unix.sco.misc)
  • cpio performance vs tar - bit of a mystery
    ... I need to backup a lot of data from several drives to several different ... find, cpio -o, no gzip: 11 mins per chunk ...
    (comp.unix.sco.misc)
  • Re: DLT 7000 runs - but what is the best way to backup files?
    ... >> i experimented with tar, cpio, ... >> but until yet i do not found a good and fast way to backup my files. ... >> last problem i have, on a fast stream with cpio, that i can“t change the ... parallel port drives are too slow, hard drives work but physically take ...
    (comp.os.linux.hardware)
  • md devices: Suggestion for in place time and checksum within the RAID
    ... Even in RAID1 with three drives there is no "two over three" voting mechanism. ... Adding one sector to each chunk to store the time + CRC or ECC value of the whole stripe, making it possible to see and handle such errors below the filesystem level. ... Adding this might break the on the fly raid expansion capabilities. ...
    (Linux-Kernel)