Here's Some Working Code to Sort One Million 8-Digit Numbers in 1MB of RAM

Earlier this week, this Stack Overflow question was featured on Reddit under /r/programming. Apparently, it was once a Google interview question – or at least, a variation of one:

You have a computer with 1M of RAM and no other local storage. You must use it to accept 1 million 8-digit decimal numbers over a TCP connection, sort them, and then send the sorted list out over another TCP connection. The list of numbers may contain duplicates, which you may not discard. Your code will be placed in ROM, so you need not subtract the size of your code from the 1M. You have been given code to drive the Ethernet port and handle TCP/IP connections, and it requires 2k for its state data, including a 1k buffer via which your code will read and write data.

The challenge here, of course, is that you can’t fit all the data in memory as raw integers. There are 100 million possible 8-digit decimal numbers, so even if you pack each number into 27 bits (227 = ~134 million), that would take 3375000 bytes of storage. The hypothetical machine available to you has only 1048576 bytes of storage. At first glance, it seems to defy the laws of mathematics.

The question received enough attention that a lot of people proposed solutions on Stack Overflow and in the Reddit discussion, but so far, nothing that runs. There’s some code for a solution which doesn’t work, many proposals without code, and some hilarious out-of-the-box answers. A lot of answers come close, but won’t work on every possible input.

In thinking about this problem, it occurred to me that I’ve already written a blog post containing a clue which leads to the ideal solution. I won’t give away which post, but after that, I couldn’t resist cobbling together a working implementation in C++.

The source code listing is just 339 lines with very few comments. I thought I’d post it here to see if anybody can figure out which algorithm was used:

Proof that the memory constraints are satisified:

typedef unsigned int u32;

namespace WorkArea
{
    static const u32 circularSize = 253250;
    u32 circular[circularSize] = { 0 };         // consumes 1013000 bytes

    static const u32 stageSize = 8000;
    u32 stage[stageSize];                       // consumes 32000 bytes

    ...

Together, these two arrays take 1045000 bytes of storage. Tack on the hypothetical 2KB overhead for the TCP connection, and you’re left with a healthy margin of 1048576 - 1045000 - 2×1024 = 1528 bytes for remaining variables and stack space.

It runs in about 23 seconds on my Xeon W3520. You can verify that the program works using the following Python script, assuming a program name of sort1mb.exe. See if you can find any inputs which break it!

from subprocess import *
import random

sequence = [random.randint(0, 99999999) for i in xrange(1000000)]

sorter = Popen('sort1mb.exe', stdin=PIPE, stdout=PIPE)
for value in sequence:
    sorter.stdin.write('%08d\n' % value)
sorter.stdin.close()

result = [int(line) for line in sorter.stdout]
print('OK!' if result == sorted(sequence) else 'Error!')

I submitted this answer to the Stack Overflow question, but it’s pretty late to the party. Update: You’ll find a detailed explanation in my next post, 1MB Sorting Explained.

Nick Cleaton originally proposed the currently accepted solution. His approach is different, and not in any runnable form, but the idea also looks valid.