It’s been two weeks since the 1MB Sorting Problem was originally featured on Reddit, and in case you don’t think this artificial problem has been thoroughly stomped into the ground yet, here’s a continuation of last post’s explanation of the working C++ program which solves it.

In that post, I gave a high-level outline of the approach, and showed an encoding scheme – which I’ve since learned is Golomb coding – which comes close to meeting the memory requirements, but doesn’t quite fit. Arithmetic coding, on the other hand, does fit. It’s interesting, because this problem, as it was phrased, almost seems designed to force you into arithmetic coding (though Nick Cleaton’s solution manages to avoid it).

I had read about arithmetic coding before, but I never had any reason to sit down to implement it. It always struck me as kind of mystical: How the heck you encode information in a *fraction* of a bit, anyway? The 1MB Sorting Problem turned out to be a great excuse to learn how arithmetic coding works.

## How Many Sorted Sequences Even Exist?

It’s important to note that the whole reason why we are able to represent a **sorted** sequence of one million 8-digit numbers in less than 1 MB of memory is because mathematically, there simply aren’t that many different sorted sequences which exist.