Preshing on ProgrammingPreshing on Programming

The Windows Heap Is Slow When Launched from the Debugger

For my Hash Table Performance Tests post, I wrote a Windows application which runs a word-counting algorithm using various types of containers. It was originally built using default Release settings in Visual C++ 2008. At first, it ran very slowly, but after some tuning, it ran much faster. You can see the difference in performance in my previous post, Visual C++ Performance Pitfalls.

The first thing I noticed was that the program ran more slowly when launched from the debugger versus running it from the command line:

Both timings were taken using the exact same executable. No settings were changed; even the input data was the same. The only difference was the debugger. This is not normal – the presence of a debugger shouldn’t impact performance to this extent, unless the process is directly interacting with the debugger in some way. That was not the case for my test suite – or so I thought!

Visual C++ Performance Pitfalls

When you develop a C++ application for Windows, it’s pretty convenient to open Visual Studio, click your way though the New Project wizard, and start coding. At first, you’ll build & test using the Debug configuration. Once your program becomes stable, and you’re ready to watch it run at top speed, you’ll switch to Release.

It’s easy to think that’s all you need to do. After all, when you look at default settings for Release builds, you’ll see all sorts of good stuff like “Maximize Speed” and “Whole Program Optimization”. That should take care of things, right?

Not exactly! It turns out that there are several pitfalls which can slow down the performance of your “optimized” C++ application. In some cases, significantly.

Hash Table Performance Tests

Hash tables, like trees, are part of a family called associative maps, or associative arrays. They map a collection of keys to a collection of values. Associative maps are themselves part of a broader family of data structures known as containers.

Many high-level programming languages have their own set of containers built-in, and it’s not usually effective to write your own. C and C++, being native languages, are different: You can write your own containers, and it’s often very effective to do so. The Standard C++ Library provides a few, but it’s common for companies to have their own library of C++ containers, written in-house. You can also find source code for different containers floating around the web, though usually with licensing restrictions. I thought it would be interesting to compare the performance of few such containers, with an emphasis on hash tables, but with a couple of trees thrown in the mix as well.

Word Counting Benchmark

My test program is a simple word-counting application. It reads a text file and creates a container, mapping each word to its number of occurrences. I fed this program a one million word document in which there are about 58,000 unique words, each occurring a different number of times. This results in a lot more lookups than inserts, which is a typical pattern in applications which use maps heavily, such as the gcc compiler, the Python runtime, and in many components of game development.

You can download the source code here.

The tests were compiled in Visual Studio 2008, and ran on Windows using a Core 2 Duo E6600. I used appropriate optimization settings and disabled unnecessary features, like exception handling and iterator debugging. The entire document is preloaded in memory to avoid I/O overhead, and Doug Lea’s malloc is used for memory management. There are sure to be additional optimization opportunities in each case, but I didn’t go overboard.

Hash Collision Probabilities

A hash function takes an item of a given type and generates an integer hash value within a given range. The input items can be anything: strings, compiled shader programs, files, even directories. The same input always generates the same hash value, and a good hash function tends to generate different hash values when given different inputs.

A hash function has no awareness of “other” items in the set of inputs. It just performs some arithmetic and/or bit-magic operations on the input item passed to it. Therefore, there’s always a chance that two different inputs will generate the same hash value.

Take the well-known hash function CRC32, for example. If you feed this function the two strings “plumless” and “buckeroo”, it generates the same value. This is known as a hash collision.

What is the probability of a hash collision? This question is just a general form of the birthday problem from mathematics. The answer is not always intuitive, so it’s difficult to guess correctly. Let’s derive the math and try to get a better feel for those probabilities.