Speed Up Compression via Parallel BZIP2 (PBZIP2)
Last updated: February 02, 2023By pure chance one morning, I came across a post that mentioned PBZIP2. Having never heard of it, of course I had to look it up. Crikey. File this one under “Why Didn’t Someone Tell Me About This Earlier?!”
“Wait a minute,” I said aloud to nobody in particular. “BZIP2 doesn’t support symmetric multi-processing? And there’s an alternate implementation that does take advantage of multiple CPUs?”
“Whiskey. Tango. Foxtrot.”
And after a few tests, I’ll be tarred and feathered if it ain’t true: the speed improvement was, as promised, linear to the number of cores.
Installation
To install it via Homebrew on MacOS:
brew install pbzip2
To install it on Ubuntu or Debian:
sudo apt install pbzip2
The pbzip2
binary should now be available. Refer to the manpage for the gory details.
Testing
Using a 91 MB tar archive as my test file, I ran the following commands on a quad-core 2.93 GHz i7 running Mac OS X 10.7 (Lion) to see whether there was indeed any improvement in compression speed:
time bzip2 -k testfile.tar
time pbzip2 -k testfile.tar
The results: 18.7 seconds for bzip2, and… wait for it… 3.5 seconds for pbzip2. That represents an 81% reduction in compression time and a five-fold increase in speed in this particular test.
While decompression speed increases weren’t nearly as dramatic, pbzip2 decompression appears to faster than stock bzip2.
New Aliases
I don’t want to have to remember to specifically use the pbzip2
command, so I decided to add some aliases. First, let’s detect whether pbzip2
is installed and available:
# Check to see if pbzip2 is already on path; if so, set BZIP_BIN appropriately
type -P pbzip2 &>/dev/null && export BZIP_BIN="pbzip2"
# Otherwise, default to standard bzip2 binary
if [ -z $BZIP_BIN ]; then
export BZIP_BIN="bzip2"
fi
Using the above logic, I set bz
as an alias to pbzip2
if available, and if not, to bzip2
:
alias bz=$BZIP_BIN
I usually compress directories more often than individual files, so I added some commands to quickly compress directories and expand bzipped tarballs:
tarb() {
tar -cf "$1".tbz --use-compress-prog=$BZIP_BIN "$1"
}
untarbzip() {
$BZIP_BIN -dc "$1" | tar x --exclude="._*"
}
alias buntar=untarbzip
Usage:
bz myfile
tarb mydirectory
buntar mytarball.tbz
Got a better method?
Have you had any experience with parallelized bzip2 compression? Find me on Twitter and let me know.