Friday, April 3, 2009

Detect Stack Corruption

Stack corruption bug is sometimes difficult to fix if we can't find out the steps to reproduce it. The cause of the bug may not be so obvious. The best thing is to have the culprit reveal itself as soon as possible, even before the stack corrupted. In this post, I'll introduce the tool to help discover stack corruption.

Rationale:
The figure below shows the structure of stack frame.It's important to know that stack grows downwards. The callee's frame is at lower position relative to caller's frame, and the callee's local variables are at lower position relative to return address. So, if our code carelessly write to a local variable beyond its boundary, the saved %ebp and return address may be corrupted. The appication may continue running until the callee returns or even later and then crash.
This is a sample code demonstrates this:


1 int foo(int a)

2 {

3 char var[4];

4 strcpy(var, "corrupt me!!!");

5 int a, b;

6 a = a + b;

7 return 0;

8 }

9

10 int bar()

11 {

12 return foo();

13 }


It's not hard to see that the saved %ebp should always stay unchanged during the execution of the callee since it will be used on return to restore the caller's %ebp.
So we can:
  1. Save the saved %ebp value at the beginning of the callee;
  2. Get the saved %ebp value before the callee returns;
  3. Compare these two value to see if they are the same;
Implementation:
The saved %ebp is the value of the memory that %ebp register is pointing to. In order to get its value, we need to use assembly language. But it's not difficult, it usually doesn't take more than one instruction to achieve. Here is the one for GCC on x86 platform.
asm("mov (%%ebp),%0": "=r" (variable for storing ebp's value));

Armed with this knowledge, we have the macro below to help detecting stack corruption.


1 #ifndef _h_DBGHELPER

2 #define _h_DBGHELPER

3

4

5 #include <assert.h>

6

7

8 #define STACKCHECK

9 #ifdef STACKCHECK // stack check enabled

10

11 #define STACK_CHECK_RAND 0xCD000000

12 #define STACK_CHECK_MASK 0x00FFFFFF

13

14 // the internal logic of checking stack state

15 #define STACK_CHECK_END_INTERNAL() u_STACK_CHECK_EBP_VALUE_RETURN = ((u_STACK_CHECK_EBP_VALUE_RETURN & STACK_CHECK_MASK)\

16 | STACK_CHECK_RAND);\

17 if((u_STACK_CHECK_EBP_VALUE_ENTER & ~STACK_CHECK_MASK) != STACK_CHECK_RAND)\

18 {\

19 fprintf(stderr, \

20 "Corrupted u_STACK_CHECK_EBP_VALUE_ENTER!! It's %x\n", u_STACK_CHECK_EBP_VALUE_ENTER);\

21 assert((u_STACK_CHECK_EBP_VALUE_ENTER & ~STACK_CHECK_MASK) == STACK_CHECK_RAND);\

22 }\

23 if((u_STACK_CHECK_EBP_VALUE_RETURN & ~STACK_CHECK_MASK) != STACK_CHECK_RAND)\

24 {\

25 fprintf(stderr, \

26 "Corrupted u_STACK_CHECK_EBP_VALUE_RETURN!! It's %x\n", u_STACK_CHECK_EBP_VALUE_RETURN);\

27 assert((u_STACK_CHECK_EBP_VALUE_RETURN & ~STACK_CHECK_MASK) == STACK_CHECK_RAND);\

28 }\

29 if(u_STACK_CHECK_EBP_VALUE_ENTER != u_STACK_CHECK_EBP_VALUE_RETURN)\

30 {\

31 fprintf(stderr, "Stack overflow!!!\nThe EBP should be %x, but it's %x( %s )\n\n",\

32 u_STACK_CHECK_EBP_VALUE_ENTER, u_STACK_CHECK_EBP_VALUE_RETURN, \

33 (char*)&u_STACK_CHECK_EBP_VALUE_RETURN);\

34 assert(u_STACK_CHECK_EBP_VALUE_RETURN == u_STACK_CHECK_EBP_VALUE_ENTER);\

35 }

36 // end

37

38 #ifndef ARM_9260EK // x86

39 #define STACK_CHECK_BEGIN() unsigned int u_STACK_CHECK_EBP_VALUE_ENTER = 0; \

40 asm("mov (%%ebp),%0"\

41 : "=r" (u_STACK_CHECK_EBP_VALUE_ENTER));\

42 u_STACK_CHECK_EBP_VALUE_ENTER = (u_STACK_CHECK_EBP_VALUE_ENTER & STACK_CHECK_MASK) | STACK_CHECK_RAND

43

44 #define STACK_CHECK_END() do{unsigned int u_STACK_CHECK_EBP_VALUE_RETURN = 0;\

45 asm("mov (%%ebp),%0"\

46 : "=r" (u_STACK_CHECK_EBP_VALUE_RETURN));\

47 STACK_CHECK_END_INTERNAL();}while(0)

48

49

50 #else // arm

51 #define STACK_CHECK_BEGIN() unsigned int u_STACK_CHECK_EBP_VALUE_ENTER = 0; \

52 asm("str fp, %0 \n" \

53 : "=m" (u_STACK_CHECK_EBP_VALUE_ENTER)); \

54 u_STACK_CHECK_EBP_VALUE_ENTER = (u_STACK_CHECK_EBP_VALUE_ENTER & STACK_CHECK_MASK) | STACK_CHECK_RAND

55

56 #define STACK_CHECK_END() do{unsigned int u_STACK_CHECK_EBP_VALUE_RETURN = 0;\

57 asm("str fp, %0 \n" \

58 : "=m" (u_STACK_CHECK_EBP_VALUE_RETURN));\

59 STACK_CHECK_END_INTERNAL();}while(0)

60

61 #endif

62

63

64 #else // STACK Check disabled

65

66 #define STACK_CHECK_BEGIN() do{}while(0)

67 #define STACK_CHECK_END() do{}while(0)

68

69 #endif

70

71 #endif // _h_DBGHELPER


The basic idea of the macro is pretty much the same as I mentioned before. One thing to note is the variables used to keep the value of %ebp register are defined on the stack too. So they are on the current frame and may be corrupted too. In order to avoid this, we have several options. First, we can define them as static so that they will be in global data region rather than stack. But it will be unusable in mutl-threading environment. Second, we can define them on heap. Third, we can use a predefined random value to guard these variables and make sure they're not overwritten.
The third option is the one we used here.

Usage:
We can update previous code to take advantage of this feature as follows:

1 int foo(int a)

2 {

3 STACK_CHECK_BEGIN();

4 char var[4];

5 strcpy(var, "corrupt me!!!");

6 int a, b;

7 a = a + b;

8 STACK_CHECK_END();

9 return 0;

10 }

11

12 int bar()

13 {

14 return foo();

15 }


The application will gracefully assert that it detects a stack corruption just before the foo() method returns.


Microsoft's c++ compiler
and gcc have already provide stack checking functions. But I still think the macro is convenient and my effort has greatly consolidated my understanding of stack structure.

References:

http://blogs.msdn.com/vcblog/archive/2009/03/19/gs.aspx

3 comments:

Sudhaker said...

Thanks man, this was very informative...

Anonymous said...

The program is dumping in this line

asm("mov (%%ebp),%0" : "=r" (u_STACK_CHECK_EBP_VALUE_ENTER));

Please anyone help .. .

sovin said...

Hi ,

I know its a very old blog.But just for my understanding on the ARM arch running Android N, I tried to compile and run above piece of code to see if I get the same kind of crash (stack corruption) .
To my surprise it did NOT crash and ran successfully. However , it crashed on my linux machine.I also tried to write a long string in buffer var , but still it did not crash on aforementioned ARM machine.

Details of ARM machine(where the crash in NOT seen).
1. MediaTek 5890 chip running latest Android N.

My local ubuntu machine (crash seen )
Linux inblrlx039 3.2.0-29-generic #46-Ubuntu SMP Fri Jul 27 17:03:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Can someone please explain this behavior ?
Thanks you in advance.