Lec. 05: Memory References, Jumps/Loops, and Function Calls

1. Review
2. Referencing, De-Referencing, and Setting Memory
3. Loops, Jumps, and Condition Testing
4. Function Calls

1 Review

We've been working on understanding this bit of x86 code:

(gdb) ds main
Dump of assembler code for function main:
   0x0804841d <+0>:	push   ebp
   0x0804841e <+1>:	mov    ebp,esp
   0x08048420 <+3>:	and    esp,0xfffffff0
   0x08048423 <+6>:	sub    esp,0x30
   0x08048426 <+9>:	mov    DWORD PTR [esp+0x1d],0x6c6c6548
   0x0804842e <+17>:	mov    DWORD PTR [esp+0x21],0x57202c6f
   0x08048436 <+25>:	mov    DWORD PTR [esp+0x25],0x646c726f
   0x0804843e <+33>:	mov    WORD PTR [esp+0x29],0xa21
   0x08048445 <+40>:	mov    BYTE PTR [esp+0x2b],0x0
   0x0804844a <+45>:	lea    eax,[esp+0x1d]
   0x0804844e <+49>:	mov    DWORD PTR [esp+0x2c],eax
   0x08048452 <+53>:	jmp    0x804846b <main+78>
   0x08048454 <+55>:	mov    eax,DWORD PTR [esp+0x2c]
   0x08048458 <+59>:	movzx  eax,BYTE PTR [eax]
   0x0804845b <+62>:	movsx  eax,al
   0x0804845e <+65>:	mov    DWORD PTR [esp],eax
   0x08048461 <+68>:	call   0x8048310 <putchar@plt>
   0x08048466 <+73>:	add    DWORD PTR [esp+0x2c],0x1
   0x0804846b <+78>:	mov    eax,DWORD PTR [esp+0x2c]
   0x0804846f <+82>:	movzx  eax,BYTE PTR [eax]
   0x08048472 <+85>:	test   al,al
   0x08048474 <+87>:	jne    0x8048454 <main+55>
   0x08048476 <+89>:	mov    eax,0x0
   0x0804847b <+94>:	leave  
   0x0804847c <+95>:	ret    
End of assembler dump.

Which is the disassembling of the following C-program:

#include <stdio.h>

int main(int argc, char *argv){

    char hello[15]="Hello, World!\n";
    char * p;

    for(p = hello; *p; p++){

        putchar(*p);         

    }

    return 0;
}

The last lesson focused on just the stack frame managent associated with these operations:

0x0804841d <+0>:	push   ebp
0x0804841e <+1>:	mov    ebp,esp
0x08048420 <+3>:	and    esp,0xfffffff0
0x08048423 <+6>:	sub    esp,0x30

...
0x0804847b <+94>:	leave  
0x0804847c <+95>:	ret

The first set of operations set up the new stack fram for the function, and the last set of operations deallocate that frame and restore the previous stack frame.

Today, we'll focus on understand the body of the function, everything in between the construction and destruction of the stack frame. This will include managing memory on the stack and also control flow.

2 Referencing, De-Referencing, and Setting Memory

The next set of instructions we will observe initializes the memory of the stack. Let's switch back to the C-code to see this in c first before we look at it in assembly.

char hello[15]="Hello, World!\n";

The string "Hello World!\n" is set on the stack in 15 byte character array. In assembly, this looks like this.

0x08048426 <+9>:	mov    DWORD PTR [esp+0x1d],0x6c6c6548
0x0804842e <+17>:	mov    DWORD PTR [esp+0x21],0x57202c6f
0x08048436 <+25>:	mov    DWORD PTR [esp+0x25],0x646c726f
0x0804843e <+33>:	mov    WORD PTR [esp+0x29],0xa21
0x08048445 <+40>:	mov    BYTE PTR [esp+0x2b],0x0

If you squint at the <src> of the operators, you'll recognize that this is ASCII. If you don't believe me, check out the ASCII table. The DWORD or WORD or BYTE PTR are deference commands.

BYTE PTR[addr] : byte-pointer : de-reference one byte at the address
WORD PTR[addr] : word-pointer : de-reference the two bytes at the address
DWORD PTR[addr] : double word-pointer : de-reference the four bytes at the address

Another way to look at these instructions in C would be like this (don't program like this, though):

char hello[15];
//                      l l e H  
* ((int *) (hello)) = 0x6c6c6548;      // set hello[0]->hello[3]
//                          W   , o
* ((int *) (hello + 4)) = 0x57202c6f; // set hello[4]->hello[7]
//                          d l r o      
* ((int *) (hello + 8)) = 0x646c726f; // set hello[8]->hello[11]
//                             \n !
* ((short *) (hello + 12)) = 0x0a21;  // set hello[12]->hello[13]
//                         \0
* ((char *) (hello+14)) = 0x00;  // set hello[14]

The next two instructions are a bit different:

0x0804844a <+45>:	lea    eax,[esp+0x1d]
0x0804844e <+49>:	mov    DWORD PTR [esp+0x2c],eax

lea stands for load effective address and is a short cut for to do a bit a math and calculate a pointer offset and store it. If we look at what's next in the C-program, we see that it is setting up the for-loop.

for(p = hello; *p; p++){

The first part of the for loop is initializing the pointer p to refernce the start of the string hello. From the previous code, the start of the string hello is at address offset esp+0x1d and we want to set that address to the value of p. This is a two step process:

The actually address must be computed using addition from esp and stored. lea eax,[esp+0x1d] will calculate the address and store it in eax.
The value in eax must be stored in the memory reserved for p, which is at address esp+0x2c, the move command accomplishes that.

At this point, everything is set up. And for reference, remeber that the address of p is at esp+0x2c.

3 Loops, Jumps, and Condition Testing

Now, we've reached the meat of the program: the inner loop. We can follow the execution at this point by following the jumps.

0x08048452 <+53>: jmp    0x804846b <main+78>      # -----------.
0x08048454 <+55>: mov    eax,DWORD PTR [esp+0x2c] # <-------.  |
0x08048458 <+59>: movzx  eax,BYTE PTR [eax]       #         |  |
0x0804845b <+62>: movsx  eax,al                   #         |  |
0x0804845e <+65>: mov    DWORD PTR [esp],eax      #         |  |  //loop body
0x08048461 <+68>: call   0x8048310 <putchar@plt>  #         |  |
0x08048466 <+73>: add    DWORD PTR [esp+0x2c],0x1 #         |  |
0x0804846b <+78>: mov    eax,DWORD PTR [esp+0x2c] # <-------+--'
0x0804846f <+82>: movzx  eax,BYTE PTR [eax]       #         |    //exit condition
0x08048472 <+85>: test   al,al                    #         |
0x08048474 <+87>: jne    0x8048454 <main+55>      #  -------'

A jmp instruction changes the instruction pointer to the destination specified. It is not conditioned, it is explicit hard jump. Following that jump in the code, we find the following three instructions:

0x0804846f <+82>: movzx  eax,BYTE PTR [eax]       
0x08048472 <+85>: test   al,al                    
0x08048474 <+87>: jne    0x8048454 <main+55>

Easier to start with the movzx instruction. Recall that at this point in the code, eax has the value that is the same as p. And you can see that to be case in the previous instruction mov eax,DWORD PTR [esp+0x2c] where esp+0x2c is the memory address for p.

The movzx instruction will deference the address stored in eax which is whatever p references, read one byte at that address and write it to the lower 8-bits of eax. This is essentially the *p operation which is some character in hello, and so what we want to test is if p references the NULL at the end of hello.

That test occurs test al,al which compares to registers in a number of ways. Here we are testing the al register which is the lower 8-bits= of eax, where we stored the deference of p. The results of the test, greater then, less than, equal, not zero, etc. are stored in a set of bit flags. The one we care about is the ZF flag or the zero flag. If al is zero then ZF is set to 1 which would be the case when p references the end of the hello string.

The jne command says to jump when not equal to zero. If it is the case that al is zero, do not jump, otherwise continue to the address and continue the loop.

4 Function Calls

If we investigate the loop body, we find the following instructions:

0x08048454 <+55>: mov    eax,DWORD PTR [esp+0x2c] 
0x08048458 <+59>: movzx  eax,BYTE PTR [eax]       
0x0804845b <+62>: movsx  eax,al                   
0x0804845e <+65>: mov    DWORD PTR [esp],eax      
0x08048461 <+68>: call   0x8048310 <putchar@plt>

The first set of instructions, much like the test before, is to deference the pointer p.

load the value o p, a memory address, into eax
Read the byte referenced at p into the lower 8-bits of eax
zero out the remaining bits of eax leaving only lower 8-bits

At this point, eax stores a value like 0x0000048 (i.e, 'H') where the lowest byte is the character of interest, and the remaining bytes are 0.

This value is then writen to the top of the stack as referenced by esp because we are about to make a function call. The arguments to functions are pushed onto the stack before a call. In this case, we allocated that stack space ahead of time so we don't need to push, but the argument is in the right place, at the top of the stack.

The next operation is a call which will execute the function putchar, conveniently told to us by gdb. Once that function completes, execution will continue to the point right after the call, which is the instruction add.

0x08048466 <+73>: add    DWORD PTR [esp+0x2c],0x1

Looking closely at this instruction, you see that this will increment the pointer p, and the instructions following test weather p now references zero. And the loop goes on … as the world turns.

Lec. 05: Memory References, Jumps/Loops, and Function Calls

Table of Contents

1 Review

2 Referencing, De-Referencing, and Setting Memory

3 Loops, Jumps, and Condition Testing

4 Function Calls