-fstack-protector + valgrind ==> stack array overflow debugging?


stack array overflow background

I’ve recently been playing with valgrind to help find/debug stack overflow bugs in MPICH2. That is, places where someone made a programming mistake and scribbled past the end of an array that lives on the stack, such as in this MPICH2 ticket (and associated fix). I’m not really concerned about finding malicious buffer-overflow attacks in this post, more just the general program bugs from stack array overflows.

Of course, valgrind won’t normally detect these sorts of errors, because whenever the stack pointer is moved down (that is, the stack grows) everything between the old SP position and the new SP position is marked as addressable. This is entirely reasonable, since this memory is actually addressable and it means that valgrind works on code that doesn’t perfectly follow a particular calling convention, such as that generated by an experimental/broken compiler or from hand-written assembly.

In case you don’t feel like trying to grok the changes I referenced above, here’s a simple program that has a similar problem that should illustrate what’s going on.

 1 static int foo(void)
 2 {
 3     int a = 1;
 4     char b[32];
 5
 6     printf("before memset\n");
 7     memset(b, 0, 64); /* stomps the stack */
 8     printf("after memset\n");
 9
 10     return 0;
 11 }
 12
 13 int main(int argc, char **argv)
 14 {
 15     int x;
 16
 17     printf("before foo()\n");
 18     x = foo();
 19     printf("after foo()\n");
 20     return 0;
 21 }
 22

Line 7 is the line that causes the badness here. Most of you at home know why immediately, but I’ll write it out just in case so that the main topic of this post is clear to everyone. When main() calls foo(), the stack ends up looking like this (as of line 5). Earlier lines in the diagram are higher addresses (the stack grows down on most architectures and operating systems).

- last address in caller's stack frame
- any arguments in reverse order (none in the case of foo)
- return address                       <-- frame pointer register (EBP) points here
- old frame pointer value
- storage for "int a" (4 bytes)
- storage for "char b[32]" (32 bytes)  <-- ESP points here

This is an x86 cdecl calling-convention view of the world, but besides possibly the frame pointer and the int size, things will basically look this way on most systems. The important thing to observe here is that the storage for b actually has a lower address than both the storage for a and the storage for the return address for the function. The problem with our sample code is that the memset operation on line 7 sets all of b to zero (expected), but keeps on going and clobbers the values for a, old frame pointer, the return address, and probably part of the caller’s stack frame.

When the RET instruction at line 10 is executed, the processor will pop the return address off of the stack and jump there. In our case, that value isn’t correct anymore, it’s 0 instead and will result in a general protection fault. And of course, if we had logic in foo() that depended on the value of a making sense then we might have had some other very strange behavior manifest instead. And the worst part is that valgrind won’t catch this for you. You’ll get the crash just the same as if you had run int outside of valgrind. In a large-ish code base this can be tricky to track down sometimes, and having a tool catch the problem for you would be really handy.

enter the MPICH2 Stack Guard macros

So, I’ve implemented some macros that can help catch this:

#define MPIU_SG_GUARD_VALUE_ (0xdeadbeef)
#define MPIU_SG_FIRST_DECL int MPIU_SG_upper_guard = MPIU_SG_GUARD_VALUE_
#define MPIU_SG_LAST_DECL \
    int MPIU_SG_upper_guard_blk_handle; \
    do { \
        MPIU_SG_upper_guard_blk_handle = VALGRIND_CREATE_BLOCK(&MPIU_SG_upper_guard, sizeof(MPIU_SG_upper_guard), "stack guard variable"); \
        VALGRIND_MAKE_MEM_NOACCESS(&MPIU_SG_upper_guard, sizeof(MPIU_SG_upper_guard)); \
    } while (0)
#define MPIU_SG_AT_RETURN() \
    do { \
        VALGRIND_MAKE_MEM_DEFINED(&MPIU_SG_upper_guard, sizeof(MPIU_SG_upper_guard)); \
        VALGRIND_DISCARD(MPIU_SG_upper_guard_blk_handle); \
        MPIU_Assert(MPIU_SG_upper_guard == MPIU_SG_GUARD_VALUE_); \
    } while (0)

And here’s an updated version of our foo() function:

  1 static int foo(void)
  2 {
  3     MPIU_SG_FIRST_DECL;
  4     int a = 1;
  5     char b[32];
  6     MPIU_SG_LAST_DECL;
  7
  8     printf("before memset\n");
  9     memset(b, 0, 64); /* stomps the stack */
 10     printf("after memset\n");
 11
 12     MPIU_SG_AT_RETURN();
 13     return 0;
 14 }

If someone overflows the b array by a sufficient amount, they will end up scribbling over the guard variable. The various VALGRIND_ macros tell valgrind that the guard variable shouldn’t be touched during the main body of the function. Furthermore, we check the value via the MPIU_Assert to see if it has been scribbled over by some bad code in the function so that we get failures even when we aren’t running under valgrind. So running this updated code under valgrind will give you a warning message indicating the line where the bad access occurred, rather than having to try to work backwards from the GPF that you get at return time or from any wonky behavior that was predicated on a corrupted value of a:

% valgrind -q ./foo
before foo()
before memset
==49581== Invalid write of size 4
==49581==    at 0x1E2B: foo (foo.c:36)
==49581==    by 0x1F68: main (foo.c:48)
==49581==  Address 0xbfffedd4 is 0 bytes inside a stack guard variable of size 4 client-defined
==49581==    at 0x1DB2: foo (foo.c:33)
==49581==    by 0x1F68: main (foo.c:48)
after memset
Assertion failed: (MPIU_SG_upper_guard == (0xdeadbeef)), function foo, file foo.c, line 39.

This technique is a little too ugly and cumbersome for blanket use in the code base. However if you are poking around with a debugger or printf() and you see memory corruption that valgrind can’t seem to find, you might be able to insert these macros into any suspicious functions and find your bug rather quickly. It also will only flag an error when the array access overflows far enough to clobber the guard. In our example valgrind won’t flag an overflow of 4 bytes or less because it will corrupt a but not the guard variable. Intermediate guards can help with this if you have strong suspicions about a particular function, but it takes a slightly unwieldy tool and makes it even more cumbersome.

I can’t possibly be the first person to do this, but I haven’t seen it anywhere else yet to cite as another example of this. Please add links in the comments if you know of of something/someone else that does this.

stack smashing protection

Various techniques have been developed over the years to protect against malicious buffer-overflow attacks, which are very similar to this type of bug. One of them is the Stack Smashing Protector for GCC (the -fstack-protector option). This technique basically involves a modification to the compiler to insert a guard variable similar to the one above on any function that allocates an array on the stack. Then prior to any return from the function the guard value is checked to ensure that it hasn’t been modified. This prevents malicious code from overwriting the return address and hijacking execution on function return, which is the standard approach for a buffer-overflow attack.

All you have to do to get this protection from a modern version of GCC is to compile with -fstack-protector. I believe that there is a slight performance penalty associated with enabling this feature, but it’s probably a good trade-off for network-facing software.

valgrind and -fstack-protector: two great flavors that could taste better together

Of course, everything earlier was just setup for the main point of this post. In my mind these two techniques are begging to be combined in some way. I’m not familiar enough yet with the details of the -fstack-protector implementation to know how technically feasible/difficult this is. I know that the exact mechanics of the guard depend heavily on OS support and the architecture’s capabilities. For example, the availability of thread-local-storage (TLS) changes the implementation. On top of that, I’m not familiar enough with any of the debugging symbol information (like DWARF) to know how suitable it is for annotating information about the guard variable. Maybe this would be a good opportunity to play with LLVM a bit more, since I have used that a little in the past.

Anyway, if someone (maybe me) could work out the technical issues, this could be a really handy tool for debugging in some situations. If you suspected stack corruption issues at all, you would just need to recompile with -fstack-protector (if not already so-compiled) and maybe the currently-non-existent -fstack-protector-annotate-for-valgrind and then run valgrind to help locate your bug.

Updated (2009-06-16):

There’s a relevant GCC feature that I just learned about called _FORTIFY_SOURCE. I haven’t had the time yet to check whether or not this would have caught our actual MPICH2 bug (I don’t think it would have, unfortunately), but it definitely catches the error in the sample code above. As written, the it catches at compile time even, which is far better than waiting until you experience a bug to run valgrind on it, although the sample code is admittedly trivial.