I was recently debugging some threading issues in MPICH2 and tried using ltrace to tell me the return code from pthread_mutex_lock
and a few related functions. Tools like strace
, ltrace
, and valgrind are essential for debugging MPI programs because you can’t always strap a debugger on them, and even when you can it tends to be a heavyweight procedure.
Unfortunately, when I did use ltrace
I got a bunch of output that looked like this:
% mpiexec -n 1 ltrace ./examples/cpi |& grep pthread
pthread_mutex_init(0x810543c, 0, 0xbfe8d6c8, 0x80c44a9, 0x80eb4ef) = 0
pthread_mutex_init(0x810540c, 0, 0xbfe8d6c8, 0x80c44a9, 0x80eb4ef) = 0
pthread_mutex_init(0x8105424, 0, 0xbfe8d6c8, 0x80c44a9, 0x80eb4ef) = 0
pthread_self(0x8105424, 0, 0xbfe8d6c8, 0x80c44a9, 0x80eb4ef) = 0xb7da1ae0
pthread_mutex_init(0x81052a0, 0, 0, 0, 0xbfe8d6f4) = 0
pthread_mutex_lock(0x810540c, 0x80eb50d, 128, 0x80c44a9, 0x80eb4ef) = 0
pthread_mutex_unlock(0x810540c, 0xbfe8d73c, 0, 0xbfe8d6b4, 0xbfe8d6bc) = 0
pthread_mutex_destroy(0x810540c, 0, 0xbfe8d708, 0x80597bb, 0xb7eedff4) = 0
Of course, that doesn’t make any sense. Everyone knows that pthread_mutex_lock
only takes one argument, and pthread_self
doesn’t take any arguments. After a few minutes of studying /usr/include/pthread.h
and /etc/ltrace.conf
, I whipped up this output instead:
% mpiexec -n 1 ltrace ./examples/cpi |& grep pthread
pthread_mutex_init(0x810543c, NULL) = 0
pthread_mutex_init(0x810540c, NULL) = 0
pthread_mutex_init(0x8105424, NULL) = 0
pthread_self() = 3084118752
pthread_mutex_init(0x81052a0, NULL) = 0
pthread_mutex_lock(0x810540c) = 0
pthread_mutex_unlock(0x810540c) = 0
pthread_mutex_destroy(0x810540c) = 0
The “secret” sauce goes in ~/.ltrace.conf
:
; pthread.h
; - misc
ulong pthread_self(void);
int pthread_equal(ulong, ulong);
int pthread_create(addr, addr, addr, addr);
int pthread_join(ulong, addr);
int pthread_exit(addr);
; - mutex functions
int pthread_mutex_init(addr, addr);
int pthread_mutex_lock(addr);
int pthread_mutex_trylock(addr);
int pthread_mutex_unlock(addr);
int pthread_mutex_destroy(addr);
; - condition variable functions
int pthread_cond_init(addr, addr);
int pthread_cond_destroy(addr);
int pthread_cond_signal(addr);
int pthread_cond_broadcast(addr);
int pthread_cond_wait(addr, addr);
The system that I was on was a 32-bit Linux box running Ubuntu Hardy. For some reason, the --library
option didn’t work (I still got output from all libraries), which is why I have the grep
bit above. If you are sure you only care about a particular function, the -e pthread_mutex_lock
approach seemed to work OK. You may have to adjust this configuration snippet a bit in order to match your platform. In particular, the ulong
types might change on 64-bit (I haven’t checked though).
I’m not sure why pthreads wasn’t supported out of the box by ltrace
. In fact, given the limited range of supported types for ltrace
, I’m not sure why this can’t be almost completely automated. I think a few hours with pycparser and a little python scripting and you could generate 90% of the ltrace
configuration.