Lec. 05: Memory References, Jumps/Loops, and Function Calls
Table of Contents
1 Review
We've been working on understanding this bit of x86 code:
(gdb) ds main Dump of assembler code for function main: 0x0804841d <+0>: push ebp 0x0804841e <+1>: mov ebp,esp 0x08048420 <+3>: and esp,0xfffffff0 0x08048423 <+6>: sub esp,0x30 0x08048426 <+9>: mov DWORD PTR [esp+0x1d],0x6c6c6548 0x0804842e <+17>: mov DWORD PTR [esp+0x21],0x57202c6f 0x08048436 <+25>: mov DWORD PTR [esp+0x25],0x646c726f 0x0804843e <+33>: mov WORD PTR [esp+0x29],0xa21 0x08048445 <+40>: mov BYTE PTR [esp+0x2b],0x0 0x0804844a <+45>: lea eax,[esp+0x1d] 0x0804844e <+49>: mov DWORD PTR [esp+0x2c],eax 0x08048452 <+53>: jmp 0x804846b <main+78> 0x08048454 <+55>: mov eax,DWORD PTR [esp+0x2c] 0x08048458 <+59>: movzx eax,BYTE PTR [eax] 0x0804845b <+62>: movsx eax,al 0x0804845e <+65>: mov DWORD PTR [esp],eax 0x08048461 <+68>: call 0x8048310 <putchar@plt> 0x08048466 <+73>: add DWORD PTR [esp+0x2c],0x1 0x0804846b <+78>: mov eax,DWORD PTR [esp+0x2c] 0x0804846f <+82>: movzx eax,BYTE PTR [eax] 0x08048472 <+85>: test al,al 0x08048474 <+87>: jne 0x8048454 <main+55> 0x08048476 <+89>: mov eax,0x0 0x0804847b <+94>: leave 0x0804847c <+95>: ret End of assembler dump.
Which is the disassembling of the following C-program:
#include <stdio.h> int main(int argc, char *argv){ char hello[15]="Hello, World!\n"; char * p; for(p = hello; *p; p++){ putchar(*p); } return 0; }
The last lesson focused on just the stack frame managent associated with these operations:
0x0804841d <+0>: push ebp 0x0804841e <+1>: mov ebp,esp 0x08048420 <+3>: and esp,0xfffffff0 0x08048423 <+6>: sub esp,0x30 ... 0x0804847b <+94>: leave 0x0804847c <+95>: ret
The first set of operations set up the new stack fram for the function, and the last set of operations deallocate that frame and restore the previous stack frame.
Today, we'll focus on understand the body of the function, everything in between the construction and destruction of the stack frame. This will include managing memory on the stack and also control flow.
2 Referencing, De-Referencing, and Setting Memory
The next set of instructions we will observe initializes the memory of the stack. Let's switch back to the C-code to see this in c first before we look at it in assembly.
char hello[15]="Hello, World!\n";
The string "Hello World!\n" is set on the stack in 15 byte character array. In assembly, this looks like this.
0x08048426 <+9>: mov DWORD PTR [esp+0x1d],0x6c6c6548 0x0804842e <+17>: mov DWORD PTR [esp+0x21],0x57202c6f 0x08048436 <+25>: mov DWORD PTR [esp+0x25],0x646c726f 0x0804843e <+33>: mov WORD PTR [esp+0x29],0xa21 0x08048445 <+40>: mov BYTE PTR [esp+0x2b],0x0
If you squint at the <src>
of the operators, you'll recognize that
this is ASCII. If you don't believe me, check out the ASCII table. The
DWORD or WORD or BYTE PTR are deference commands.
BYTE PTR[addr]
: byte-pointer : de-reference one byte at the addressWORD PTR[addr]
: word-pointer : de-reference the two bytes at the addressDWORD PTR[addr]
: double word-pointer : de-reference the four bytes at the address
Another way to look at these instructions in C would be like this (don't program like this, though):
char hello[15]; // l l e H * ((int *) (hello)) = 0x6c6c6548; // set hello[0]->hello[3] // W , o * ((int *) (hello + 4)) = 0x57202c6f; // set hello[4]->hello[7] // d l r o * ((int *) (hello + 8)) = 0x646c726f; // set hello[8]->hello[11] // \n ! * ((short *) (hello + 12)) = 0x0a21; // set hello[12]->hello[13] // \0 * ((char *) (hello+14)) = 0x00; // set hello[14]
The next two instructions are a bit different:
0x0804844a <+45>: lea eax,[esp+0x1d] 0x0804844e <+49>: mov DWORD PTR [esp+0x2c],eax
lea
stands for load effective address and is a short cut for to do
a bit a math and calculate a pointer offset and store it. If we look
at what's next in the C-program, we see that it is setting up the
for-loop.
for(p = hello; *p; p++){
The first part of the for loop is initializing the pointer p
to
refernce the start of the string hello. From the previous code, the
start of the string hello is at address offset esp+0x1d
and we want
to set that address to the value of p
. This is a two step process:
- The actually address must be computed using addition from
esp
and stored.lea eax,[esp+0x1d]
will calculate the address and store it ineax
. - The value in
eax
must be stored in the memory reserved forp
, which is at addressesp+0x2c
, the move command accomplishes that.
At this point, everything is set up. And for reference, remeber that
the address of p
is at esp+0x2c
.
3 Loops, Jumps, and Condition Testing
Now, we've reached the meat of the program: the inner loop. We can follow the execution at this point by following the jumps.
0x08048452 <+53>: jmp 0x804846b <main+78> # -----------. 0x08048454 <+55>: mov eax,DWORD PTR [esp+0x2c] # <-------. | 0x08048458 <+59>: movzx eax,BYTE PTR [eax] # | | 0x0804845b <+62>: movsx eax,al # | | 0x0804845e <+65>: mov DWORD PTR [esp],eax # | | //loop body 0x08048461 <+68>: call 0x8048310 <putchar@plt> # | | 0x08048466 <+73>: add DWORD PTR [esp+0x2c],0x1 # | | 0x0804846b <+78>: mov eax,DWORD PTR [esp+0x2c] # <-------+--' 0x0804846f <+82>: movzx eax,BYTE PTR [eax] # | //exit condition 0x08048472 <+85>: test al,al # | 0x08048474 <+87>: jne 0x8048454 <main+55> # -------'
A jmp
instruction changes the instruction pointer to the destination
specified. It is not conditioned, it is explicit hard jump. Following
that jump in the code, we find the following three instructions:
0x0804846f <+82>: movzx eax,BYTE PTR [eax] 0x08048472 <+85>: test al,al 0x08048474 <+87>: jne 0x8048454 <main+55>
Easier to start with the movzx
instruction. Recall that at this
point in the code, eax
has the value that is the same as p
. And
you can see that to be case in the previous instruction mov eax,DWORD
PTR [esp+0x2c]
where esp+0x2c
is the memory address for p.
The movzx
instruction will deference the address stored in eax
which is whatever p
references, read one byte at that address and
write it to the lower 8-bits of eax. This is essentially the *p
operation which is some character in hello, and so what we want to
test is if p
references the NULL at the end of hello.
That test occurs test al,al
which compares to registers in a number
of ways. Here we are testing the al
register which is the lower
8-bits= of eax
, where we stored the deference of p
. The results of
the test, greater then, less than, equal, not zero, etc. are stored in
a set of bit flags. The one we care about is the ZF
flag or the
zero flag. If al
is zero then ZF
is set to 1 which would be the
case when p
references the end of the hello
string.
The jne
command says to jump when not equal to zero. If it is the
case that al
is zero, do not jump, otherwise continue to the address
and continue the loop.
4 Function Calls
If we investigate the loop body, we find the following instructions:
0x08048454 <+55>: mov eax,DWORD PTR [esp+0x2c] 0x08048458 <+59>: movzx eax,BYTE PTR [eax] 0x0804845b <+62>: movsx eax,al 0x0804845e <+65>: mov DWORD PTR [esp],eax 0x08048461 <+68>: call 0x8048310 <putchar@plt>
The first set of instructions, much like the test before, is to
deference the pointer p
.
- load the value o
p
, a memory address, intoeax
- Read the byte referenced at
p
into the lower 8-bits ofeax
- zero out the remaining bits of
eax
leaving only lower 8-bits
At this point, eax
stores a value like 0x0000048 (i.e, 'H') where
the lowest byte is the character of interest, and the remaining bytes
are 0.
This value is then writen to the top of the stack as referenced by
esp
because we are about to make a function call. The arguments to
functions are pushed onto the stack before a call. In this case, we
allocated that stack space ahead of time so we don't need to push, but
the argument is in the right place, at the top of the stack.
The next operation is a call
which will execute the function
putchar
, conveniently told to us by gdb. Once that function
completes, execution will continue to the point right after the
call
, which is the instruction add
.
0x08048466 <+73>: add DWORD PTR [esp+0x2c],0x1
Looking closely at this instruction, you see that this will increment
the pointer p
, and the instructions following test weather p
now
references zero. And the loop goes on … as the world turns.