Very fast parallel bzip2.
From: Dave (INVALID.See-signature-for-how-to-determine_at_southminister-branch-line.org.uk)
Date: Thu, 24 Nov 2005 06:59:26 +0000
I came across this
tonight which anyone with a multi-processor Sun (or anything else for
that matter) might like.
It's a parallel implementation of bzip2 - reads and writes the bzip2
file format, but using multi-processors. The speedup (compared to bzip2)
for me during compression was 8.63x faster with only 4 CPUs. File is
0.05% larger than achieved by bzip2.
During decompression the gains we much less impressive (1.46x faster).
These are based on one large file - obviously tesing with multiple files
would be sensible.
I have not played with compiler optimisations, but pbzip2 was built with
Studio 11, which is probably the best compiler I have. The standard
bzip2 may have been built with gcc or an older Sun compiler (can't
File / Hardware details
File size = 1,067,892,736 bytes (1018 MB)
Computer = Sun Ultra 80, 4 x 450 MHz, 4GB RAM
OS = Solaris 9.
Results for compression
Input file = 1,067,892,736 bytes.
bzip2 : output file = 620,758,972 bytes (58.13% of original) t=1984s
pbzip2 output file = 621,236,162 bytes (58.17% of original) t= 230s
So the file is 0.05% bigger with pbzip2, but it 8.6x faster at
compressing the data with 4 CPUs.
Results for decompression
bzip2: t= 795s
pbzip2 t= 543s
Hence pbzip2 is only 1.46x faster at decompression with 4 CPUs.
The only hassle is that if the file is compressed with bzip2, you need
as much RAM as the file size to decompress it. If compressed with
pbzip2, the RAM requirements are very modest by today's standard.
-- Dave K http://www.southminster-branch-line.org.uk/ Please note my email address changes periodically to avoid spam. It is always of the form: month-year@domain. Hitting reply will work for a couple of months only. Later set it manually. The month is always written in 3 letters (e.g. Jan, not January etc)