2007-05-12

Choosing the best disk image format on Mac OS

Apple's official recommendation for software distribution is the internet-enabled disk image.

If you choose to ignore Apple's advice, Mac OS being a Unix, there's always .tar.gz or .tar.bz2. Using that would make Unix types feel at home, but alienate Mac users. I spend more time with other Unixes than with Mac OS, but even I look at a .tar.gz and think "oh, so they don't really care about Mac OS then". The ubiquity of .dmg on Mac OS is such that not using it gives an odd impression...

The possible exception would be using a .zip file instead. The beauty of .zip files is that pretty much anyone who'd try to install software on any machine has come across a .zip file before. If you've been using Mac OS X for long enough, you might think .dmg files are natural, but they're not if you're a new/superficial user. Other advantages of .zip over .dmg are that it compresses well, compresses very fast, and – more importantly to the end-user – decompresses very fast.

But I'm not here to talk about compression/decompression speed. Today I'm only interested in size, mainly because of its affect on download speed.

I'll look at .zip, but I'm going to ignore .tar.gz because it's too unfriendly and looks unfriendly, and .tar.bz2 because it's even worse. This is Mac OS, not Gentoo. Different crowd.

If you check the hdiutil(1) man page, you'll see that leaves quite a few flavors to choose from, all with names that sound like some government's secret police:

UDRW - UDIF read/write image
UDRO - UDIF read-only image
UDCO - UDIF ADC-compressed image
UDZO - UDIF zlib-compressed image
UDBZ - UDIF bzip2-compressed image (OS X 10.4+ only)
UFBI - UDIF entire image with MD5 checksum
UDRo - UDIF read-only (obsolete format)
UDCo - UDIF compressed (obsolete format)
UDTO - DVD/CD-R master for export
UDxx - UDIF stub image
UDSP - SPARSE (grows with content)
RdWr - NDIF read/write image (deprecated)
Rdxx - NDIF read-only image (Disk Copy 6.3.3 format)
ROCo - NDIF compressed image (deprecated)
Rken - NDIF compressed (obsolete format)
DC42 - Disk Copy 4.2 image

Additionally, you can choose a variety of file systems: HFS+, HFS+J, HFSX, HFS, MS-DOS, or UFS. There are also other variations, such as setting the zlib compression level for UDZO.

My choices were: HFS+ and UFS file systems on UDBZ, UDCO, UDRO, and both UDZO and UDZO with zlib compression level 9. I also tested creating .zip files with "zip -6" (the default) and "zip -9" ("best compression").

I compressed all three main software.jessies.org applications: Evergreen, SCM, and Terminator.

Here's a table of the resulting sizes, all measured in MiB:

Adium Evergreen SCM Terminator
zip 29.0 1.7 1.1 1.6
zip -9 29.0 1.7 1.1 1.6
UDBZ HFSX - 1.6 1.1 1.5
UDBZ HFS+ 12.0 1.6 1.1 1.5
UDBZ UFS 21.0 2.0 1.9 2.1
UDCO HFS+ 16.0 2.0 1.3 1.8
UDCO UFS 23.0 2.4 2.1 2.7
UDRO HFS+ 35.0 7.0 2.9 4.3
UDRO UFS 32.0 7.3 3.1 4.6
UDZO-9 HFS+ 13.0 1.7 1.1 1.6
UDZO HFS+ 14.0 2.0 1.2 1.7
UDZO UFS 22.0 2.3 2.0 2.7

Several things strike me as interesting here. If you want a small disk image, avoid UFS. If you only care about 10.4 and later, you want a bzip2-compressed UDBZ. If you want to support 10.3 or earlier (if only to present an error at run time), UDZO with zlib level 9 comes very close. I haven't collected full results for HFSX, but if you want a case-sensitive file system (which would otherwise be a reason to use UFS), UDBZ HFSX disk images seem to be no larger for my projects than UDBZ HFS+ disk images.

Particularly interesting to me is that .zip is very competitive for my projects, but over twice the size of a UDBZ HFS+ .dmg for Adium. I haven't investigated why that is. (It doesn't seem to be true of .zip files of other Mac apps. The developer of Coda speaks highly of .zip, though he doesn't explicitly say that he's comparing to a compressed .dmg when he says .zip files are smaller.)

Not shown here is the time it takes to compress (which I don't think is particularly important) or the time it takes to decompress (which I do think is important). zip(1) is a clear winner in that department, even at compression level 9. I also haven't investigated the best way to get rid of the wasted space that all of the .dmg files have (as reported by "hdiutil imageinfo"). The impressive level of compression combined with impressive speed combined with cross-platform familiarity makes .zip a very tempting choice, though the Adium results show it's not for everyone.

Finally, a little gotcha that got me while compiling these results. If you're used to using "ls -h" on Linux, beware that on Mac OS 10.4, it seems to only give integer results, and it uses round-to-zero behavior. So what Linux would report as "1.9M" is "1M" on Mac OS. If you want sensible behavior on Mac OS, try "du -h" instead.

Update: it turns out there's another gotcha. Some Safari users, some of the time, have trouble with bzip2-compressed disk images. It's explained well in unsanity.org's My DMG is Bwoken After Download!, which also gives the solution: add AddType application/x-apple-diskimage dmg to Apache's .htaccess file. (You can optionally add a dot before "dmg"; Apache doesn't mind either way.) This kind of flakiness is perhaps another reason to seriously consider switching to .zip, even if it's not the "done thing".