[fadden-logo]

Hacking Data Compression

This is a twelve-part course on lossless data compression.  It was originally presented in the "A2Pro" forum of the GEnie online service, starting back in October 1992 (almost exactly 10 years ago to the day that I type this).

Each part of the course consists of a lesson and some source code.  The sources were presented in C and 65816 assembly language, the latter intended for use on an Apple IIgs computer.  The C code was actually developed under SunOS, but was usable with either of the 16-bit compilers then available for the IIgs.

Some of the commentary is out of date.  For example, this was written before gzip and the "deflate" algorithm existed, at a time when "freeze" was being explored as a replacement for the UNIX "compress" command.  You can see the precursors to these in lesson 11.  On the flip side, we're still using Huffman encoding (e.g. it's used in bzip2), and GIF LZW is still widely used.

I've made the lesson text available here.  For the full original lesson files, complete with source code and other goodies, you can download the full archive in ShrinkIt format.  You will need a utility like NuLib2 or CiderPress to unpack it.

Lesson 1: Introduction
Course overview and introduction to compression concepts.
Lesson 2: Basics
Discussion of RLE and differential coding.
Lesson 3: Somewhat Clever Ad-Hoc Schemes
Simple character encodings, MTF, digrams; intro to higher-order models.
Lesson 4: Shannon, Fano, and Huffman
Explanation of probability encoding.
Lesson 5: Adaptive Encoding
Adaptive vs. static methods, Splay trees, adaptive Huffman.
Lesson 6: Arithmetic Coding
How arithmetic encoding works; includes the CACM sources and variations.
Lesson 7: Ziv and Lempel
Discussion of LZ77 and LZ78.
Lesson 8: LZW
Detailed explanation of the LZW algorithm.
Lesson 9: NuFX and ShrinkIt
Discussion about the compression used in ShrinkIt.
Lesson 10: LZSS
Detailed explanation of the LZSS algorithm.
Lesson 11: LZH, LZARI, and LZB
How to improve LZSS with Huffman, arithmetic, and unary coding.
Lesson 12: HardPressed(tm)
How to write compression modules for HardPressed.

I also have the original syllabus, which more or less matches the above.

Historical note: this was written during the same period that HardPressed was under development, and while I was working full-time.  This might help to explain why a 12-week course was presented over a period of nine months.

For anyone intrigued by the "stackable" compression modules described in lesson 12, I have made the complete source code to HardPressed available on this site.