Lec. 04: Stack Frame Management
Table of Contents
1 Review
We continue in this lesson to understand the resulting binary for the hello world program:
#include <stdio.h> int main(int argc, char *argv){ char hello[15]="Hello, World!\n"; char * p; for(p = hello; *p; p++){ putchar(*p); } return 0; }
So far, we've identified that the result of compilation is a
executable in ELF. We can analyze the ELF with readelf
, and look at
the x86 code with objdump
or gdb
. The instructions of the main
program are as follows:
(gdb) ds main Dump of assembler code for function main: 0x0804841d <+0>: push ebp 0x0804841e <+1>: mov ebp,esp 0x08048420 <+3>: and esp,0xfffffff0 0x08048423 <+6>: sub esp,0x30 0x08048426 <+9>: mov DWORD PTR [esp+0x1d],0x6c6c6548 0x0804842e <+17>: mov DWORD PTR [esp+0x21],0x57202c6f 0x08048436 <+25>: mov DWORD PTR [esp+0x25],0x646c726f 0x0804843e <+33>: mov WORD PTR [esp+0x29],0xa21 0x08048445 <+40>: mov BYTE PTR [esp+0x2b],0x0 0x0804844a <+45>: lea eax,[esp+0x1d] 0x0804844e <+49>: mov DWORD PTR [esp+0x2c],eax 0x08048452 <+53>: jmp 0x804846b <main+78> 0x08048454 <+55>: mov eax,DWORD PTR [esp+0x2c] 0x08048458 <+59>: movzx eax,BYTE PTR [eax] 0x0804845b <+62>: movsx eax,al 0x0804845e <+65>: mov DWORD PTR [esp],eax 0x08048461 <+68>: call 0x8048310 <putchar@plt> 0x08048466 <+73>: add DWORD PTR [esp+0x2c],0x1 0x0804846b <+78>: mov eax,DWORD PTR [esp+0x2c] 0x0804846f <+82>: movzx eax,BYTE PTR [eax] 0x08048472 <+85>: test al,al 0x08048474 <+87>: jne 0x8048454 <main+55> 0x08048476 <+89>: mov eax,0x0 0x0804847b <+94>: leave 0x0804847c <+95>: ret End of assembler dump.
Looking at an individual instruction, we see that it typically has the form:
operation <dst>, <src>
However, x86 being a CISC architecture, this can vary. Additionally, recall that instructions can manipulate memory and processor registers which store program state. The two registers, in particular, are used to manage the stack frame representing the current function:
ebp
the base pointer, top of the frameesp
the stack pointer, bottom of the frame
We will use other registers along the way to manage intermediary
values, particular the eax
registers, which is also used as the
return value.
In the rest of this lesson, we will reverse engineer the instructions of the main function.
2 Stack Frames
The stack frame is an encapsulation of the local memory state for a function call. It has information about the local variables of the function as well as to which function (or specifically which instruction address) to jump to once this function returns.
A stack frame is managed by two registers: ebp and esp. The ebp register, or base pointer, is at the top of the stack frame (higher addresses) and the esp regiser, or stack pointer, is at the bototm of the stack frame (lower addresses). To make this even more confusing, we sometimes refer to the esp being the "top of the stack," but it is really the bottom since it is in lower address space.
When we do memory references within a function, we are almost always describing the address of that memory as a positive or negative offset of esp or ebp.
For example, a typical stack frame looks like so:
<- 4 bytes -> .-------------. | ... | higher address ebp+0x8 ->| func args | ebp+0x4 ->| return addr | ebp ->| saved ebp | ebp-0x4 ->| | : : : ' ' ' local vars . . . : : : esp+0x4 ->| | esp ->| | lower addreses '-------------'
The base pointer always references the saved based pointer of the calling function (more on that soon), and above that (in higher address) is the return address and the arguments to the function.
3 Stack Machines
We often describe x86 processors as stack machines. The reason for this is that the execution model is built around a stack. As functions are called, there infomration is pushed on to the stack, and as functions return, they are popped off of the stack.
For example, in this code:
int foo(){ bar(); baz(); } int main(){ return foo(); }
In the stack machine model, the first calling function is pushed on
the stack, namely main
. Since main
calls foo
, we push foo
onto
the stack next. Now we have something that looks like this:
.------. | main | | foo |
Since foo calls bar
before it can return, bar
is pushed on the
stack.
.------. | main | | foo | | bar |
Once bar
returns, foo
still can't return since baz
needs to be
called and is pushed on the stack.
.------. | main | | foo | | baz |
At this point, it is important to note that the entire state of
functions ot complete, main
and foo
, have not been forgotten. They
are still there. Once baz
returns and is poped, foo
can return and
be popped, and finally, main
returns and is popped.
This stack-based model of execution is directly applied in the function stack in memory. Each function is described as a stack frame, and each stack frame has a link back to the previous stack frame execution reference (the return address) and alignment information (the saved base pointer). As functions are called and completed, their stack frames are pushed and popped off the stack accordingly, but this doesn't happen automatically. There needs to be explicit instructions to do that, and that is what we'll look at next.
4 Allocating a new stack frame
The first set of instructions in our main function is for managing the stack frame for the current function. In our code, this looks like this:
0x0804841d <+0>: push ebp 0x0804841e <+1>: mov ebp,esp 0x08048420 <+3>: and esp,0xfffffff0 0x08048423 <+6>: sub esp,0x30
Let's first analyze the first four instructions. The push instruction
will push a value onto the stack, and in this case it is the previous
base pointer, ie, the saved based pointer. Next, the base pointer is
set to the stack pointer (mov
), and then aligned to 4-bits
(and
). Next, subtracting from the stack pointer allocates the rest
of the stack frame, which is 0x30 bytes long or 48 bytes (don't forget
about hex).
The need for these instructions is so that the previous stack frame,
the calling function's stack frame, can be reconstructed. To
understand this, recall the layout of a function's stack frame. The
ebp
register references the saved value for the base pointer of the
calling stack. Once the function returns, memory is deallocated by
adjusting the stack pointer to the current base pointer; setting the
base pointer to the saved base pointer; and then performing a leave
by popping off the saved base pointer and return address, setting the
current instruction pointer to the return address.
Visually the frame construction looks like this:
(0) calling (1) return value of (2) push ebp function's calling function stack frame pushed onto stack .-------------. .-------------. .-------------. | ... | | ... | | ... | ebp->| | ebp->| | ebp->| | : : : : : : calling calling calling : stack : : stack : : stack : | frame | | frame | | frame | | | | | | | esp->| func args | | func args | | func args | '-------------' esp->| return adr | | return adr | '-------------' esp->| saved ebp | '-------------' (3) mov ebp,esp (4) and esp,0xFFFFFF0 (5) sub esp,0x30 4-bit align .-------------. .-------------. .-------------. | ... | | ... | | ... | | | | | | | : : : : : : calling calling calling : stack : : stack : : stack : | frame | | frame | | frame | | | | | | | | func args | | func args | | func args | | retur adr | | retur adr | | retur adr | esp,ebp->| saved ebp | ebp->| saved ebp | ebp->| saved ebp | '-------------' esp->'-------------' | | : : New : stack : esp->| Frame | '-------------'
5 De-allocating a stack frame
Now that a new stack frame construction is complete, we can jump to the end and look at what happens once the function returns. How is the stack frame deallocated and popped off the stack.
Two instructions manage that process.
0x0804847b <+94>: leave 0x0804847c <+95>: ret
First the leave
instruction will do two things:
mov esp,ebp
: set the stack pointer to the base pointer, essential deallocating the local variables in the functionpop ebp
: after the move, the top of the stack is the saved base pointer, so by poping the value at the top of the stack into ebp, we are reseting the base pointer to the saved one.
Visually this would look like this:
leave leave 1. =mov esp,ebp= 2. pop ebp .-------------. .-------------. .-------------. | ... | | ... | ebp->| ... | | | | | | | : : : : : : calling calling calling : stack : : stack : : stack : | frame | | frame | | frame | | | | | | | func args | | func args | | func args | | retur val | | return adr | esp->| return adr | ebp->| saved ebp | ebp,esp->| saved ebp | '-------------' | | '-------------' : : New : stack : esp->| Frame | '-------------'
At this point the stack is almost back to its state prior to the
function call. The last thing to do is to pop off the return value and
set the instruction pointer (i.e., the current execution point) to the
return value. Conceptually, we can see that ret
does both those in
two steps:
pop eip
: pop the return value and set it to the instruction pointerjmp eip
: move execution to the instruction pointer
In reality, these procedures happen at the same time, but it's good to think of them as separate steps.
ret ret 1. pop eip 2. jmp eip .-------------. .-------------. ebp-> | ... | ebp->| ... | Execution : : : : returns to calling calling calling function after the call : stack : : stack : with stack frame reset | frame | | frame | | | | | | func args | esp->| func args | esp-> | return adr | '-------------' '-------------'