A C++ Profiling Module for Multithreaded APIs
In my post about lock contention, I gave some statistics for the memory allocator in a multithreaded game engine: 15000 calls per second coming from 3 threads, taking around 2% CPU. To collect those statistics, I wrote a small profiling module, which I’ll share here.
A profiling module is different from conventional profilers like xperf or VTune in that no third-party tools are required. You just drop the module into any C++ application, and the process collects and reports performance data by itself.
This particular profiling module is meant to act on one or more target modules in the application. A target module can be anything which exposes a well-defined API, such as a memory allocator. To make it work, you must insert a macro named API_PROFILER into every public function exposed by that API. Below, I’ve added it to dlmalloc, one of the functions in the Doug Lea Malloc API. The same macro should be added to dlrealloc, dlfree, and other public functions as well.
DEFINE_API_PROFILER(dlmalloc);
void* dlmalloc(size_t bytes)
{
API_PROFILER(dlmalloc);
#if USE_LOCKS
ensure_initialization();
#endif
if (!PREACTION(gm))
{
void* mem;
size_t nb;
if (bytes < = MAX_SMALL_REQUEST)
{
...
Always Use a Lightweight Mutex
In multithreaded programming, we often speak of locks (also known as mutexes). But a lock is only a concept. To actually use that concept, you need an implementation. As it turns out, there are many ways to implement a lock, and those implementations vary wildly in performance.
The Windows SDK provides two lock implementations for C/C++: the Mutex and the Critical Section. (As Ned Batchelder points out, Critical Section is probably not the best name to give to the lock itself, but we’ll forgive that here.)
The Windows Critical Section is what we call a lightweight mutex. It’s optimized for the case when there are no other threads competing for the lock. To demonstrate using a simple example, here’s a single thread which locks and unlocks a Windows Mutex exactly one million times.
HANDLE mutex = CreateMutex(NULL, FALSE, NULL);
for (int i = 0; i < 1000000; i++)
{
WaitForSingleObject(mutex, INFINITE);
ReleaseMutex(mutex);
}
CloseHandle(mutex);
Locks Aren’t Slow; Lock Contention Is
Locks (also known as mutexes) have a history of being misjudged. Back in 1986, in a Usenet discussion on multithreading, Matthew Dillon wrote, “Most people have the misconception that locks are slow.” 25 years later, this misconception still seems to pop up once in a while.
It’s true that locking is slow on some platforms, or when the lock is highly contended. And when you’re developing a multithreaded application, it’s very common to find a huge performance bottleneck caused by a single lock. But that doesn’t mean all locks are slow. As I’ll show in this post, sometimes a locking strategy achieves excellent performance.
Perhaps the most easily-overlooked source of this misconception: Not all programmers may be aware of the difference between a lightweight mutex and a “kernel mutex”. I’ll talk about that in my next post, Always Use a Lightweight Mutex. For now, let’s just say that if you’re programming in C/C++ on Windows, the Critical Section object is the one you want.
Other times, the conclusion that locks are slow is supported by a benchmark. For example, this post measures the performance of a lock under heavy conditions: each thread must hold the lock to do any work (high contention), and the lock is held for an extremely short interval of time (high frequency). It’s a good read, but in a real application, you generally want to avoid using locks in that way. To put things in context, I’ve devised a benchmark which includes both best-case and worst-case usage scenarios for locks.
How to Generate Random Timings for a Poisson Process
What’s a Poisson process, and how is it useful?
Any time you have events which occur individually at random moments, but which tend to occur at an average rate when viewed as a group, you have a Poisson process.
For example, the USGS estimates that each year, there are approximately 13000 earthquakes of magnitude 4+ around the world. Those earthquakes are scattered randomly throughout the year, but there are more or less 13000 per year. That’s one example of a Poisson process. The Wikipedia page lists several others.
In statistics, there are a bunch of functions and equations to help model a Poisson process. I’ll present one of those functions in this post, and demonstrate its use in writing a simulation.
The Exponential Distribution
If 13000 such earthquakes happen every year, it means that, on average, one earthquake happens every 40 minutes. So, let’s define a variable λ = and call it the rate parameter. The rate parameter λ is a measure of frequency: the average rate of events (in this case, earthquakes) per unit of time (in this case, minutes).
High-Resolution Mandelbrot in Obfuscated Python
Here’s a followup to last month’s post about Penrose Tiling in Obfuscated Python.
The Mandelbrot set is a traditional favorite among authors of obfuscated code. You can find obfuscated code in C, Perl, Haskell, Python and many other languages. Nearly all examples render the Mandelbrot set as ASCII art.
The following Python script, on the other hand, begins as ASCII art:
_ = (
255,
lambda
V ,B,c
:c and Y(V*V+B,B, c
-1)if(abs(V)<6)else
( 2+c-4*abs(V)**-0.4)/i
) ;v, x=1500,1000;C=range(v*x
);import struct;P=struct.pack;M,\
j ='<QIIHHHH',open('M.bmp','wb').write
for X in j('BM'+P(M,v*x*3+26,26,12,v,x,1,24))or C:
i ,Y=_;j(P('BBB',*(lambda T:(T*80+T**9
*i-950*T **99,T*70-880*T**18+701*
T **9 ,T*i**(1-T**45*2)))(sum(
[ Y(0,(A%3/3.+X%v+(X/v+
A/3/3.-x/2)/1j)*2.5
/x -2.7,i)**2 for \
A in C
[:9]])
/9)
) )
Timing Your Code Using Python’s “with” Statement
It’s common to want to time a piece of code. In Python, the with statement provides a convenient way to do so.
If you’ve followed my previous post, The Python with Statement by Example, you should have no problem following along here. All you need is a class which implements the __enter__ and __exit__ methods:
import time
class Timer:
def __enter__(self):
self.start = time.clock()
return self
def __exit__(self, *args):
self.end = time.clock()
self.interval = self.end - self.start
Of course, this is not a revolutionary idea. We’re just subtracting a couple of time.clock calls. If you google around, you’ll find several people suggesting to use the with statement in the same way. I’ve only tweaked the implementation details.
The Python “with” Statement by Example
Python’s with statement was first introduced five years ago, in Python 2.5. It’s handy when you have two related operations which you’d like to execute as a pair, with a block of code in between. The classic example is opening a file, manipulating the file, then closing it:
with open('output.txt', 'w') as f:
f.write('Hi there!')
The above with statement will automatically close the file after the nested block of code. (Continue reading to see exactly how the close occurs.) The advantage of using a with statement is that it is guaranteed to close the file no matter how the nested block exits. If an exception occurs before the end of the block, it will close the file before the exception is caught by an outer exception handler. If the nested block were to contain a return statement, or a continue or break statement, the with statement would automatically close the file in those cases, too.
Penrose Tiling Explained
Last week, I posted some obfuscated Python which generates Penrose tiling. Today, I’ll explain the basic algorithm behind that Python script, and share the non-obfuscated version.
The algorithm manipulates a list of red and blue isosceles triangles. Each red triangle has a 36° angle at its apex, while each blue triangle has a 108° angle.

In Python, we can represent such triangles as tuples of the form (color, A, B, C). For the first element, color, a value of 0 indicates a red triangle, while 1 indicates blue. The rest of the tuple gives the co-ordinates of the A, B and C vertices, expressed as complex numbers. Complex numbers work well here since they can represent any point on the 2D plane — the real component gives the x co-ordinate, while the imaginary component gives the y co-ordinate.
Penrose Tiling in Obfuscated Python
Who says you can’t write obfuscated Python?
Here’s a Python script which renders some Penrose tiling. Yes, this is valid Python code:
_ =\
"""if!
1:"e,V=100
0,(0j-1)**-.2;
v,S=.5/ V.real,
[(0,0,4 *e,4*e*
V)];w=1 -v"def!
E(T,A, B,C):P
,Q,R=B*w+ A*v,B*w+C
*v,A*w+B*v;retur n[(1,Q,C,A),(1,P
,Q,B),(0,Q,P,A)]*T+[(0,C ,R,B),(1,R,C,A)]*(1-T)"f
or!i!in!_[:11]:S =sum([E (*x)for !x!in!S],[])"imp
ort!cair o!as!O; s=O.Ima geSurfac
e(1,e,e) ;c=O.Con text(s); M,L,G=c.
move_to ,c.line_to,c.s et_sour
ce_rgb a"def!z(f,a) :f(-a.
imag,a. real-e-e)"for!T,A,B,C!in[i !for!i!
in!S!if!i[""";exec(reduce(lambda x,i:x.replace(chr
(i),"\n "[34-i:]), range( 35),_+"""0]]:z(M,A
);z(L,B);z (L,C); c.close_pa
th()"G (.4,.3 ,1);c.
paint( );G(.7 ,.7,1)
;c.fil l()"fo r!i!in
!range (9):"! g=1-i/
8;d=i/ 4*g;G(d,d,d, 1-g*.8
)"!def !y(f,a):z(f,a+(1+2j)*( 1j**(i
/2.))*g)"!for!T,A,B,C!in!S:y(M,C);y(L,A);y(M
,A);y(L,B)"!c.st roke()"s.write_t
o_png('pen rose.png')
""" ))
xkcd Password Generator
The button below will generate a random phrase consisting of four common words. According to yesterday’s xkcd strip, such phrases are hard to guess (even by brute force), but easy to remember, making them interesting password choices.
Visit http://passphra.se/
correct horse battery staple
It’s a novel idea, but xkcd stops short of actually recommending such passwords, and so will I. Use at your own peril! I’m not responsible for anything that happens as a result of your password choice. (But if you’re just signing up for a kitten video forum, you’re probably safe.)
In case you missed the strip, here it is: