Lec. 03: Disassembling a Program
Table of Contents
1 A binary program … what is it?
Last week we looked a lot at writing programs in c, compiling them into binaries, and then running them. This week, we peal back the covers further and look right at the binary files themselves. We will examine both what exactly is a binary, how is it formatted, and how do we parse or dissemble the contents within?
2 objdump
and readelf
basics
For this entire class, we will pick apart a simple helloworld program:
#include <stdio.h> int main(int argc, char *argv){ char hello[15]="Hello, World!\n"; char * p; for(p = hello; *p; p++){ putchar(*p); } return 0; }
Let's compile the program to create a binary:
user@si485H-base:demo$ gcc helloworld.c -o helloworld
Now if we use the file
command we can see what kind of file the
binary is.
user@si485H-base:demo$ file helloworld helloworld: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=2b27688b97f10f626f1ff62c232d7a2298d6afa1, not stripped
We see that it is actually an ELF
file, which stands for Executable
and Linkable Format. We will work exclusively with binaries in ELF.
2.1 ELF Files and ELF Headers
All ELF files have a header describing the different sections and
general information. We can read the header information for our
helloworld
program using the readelf
user@si485H-base:demo$ readelf -h helloworld ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: Intel 80386 Version: 0x1 Entry point address: 0x8048370 Start of program headers: 52 (bytes into file) Start of section headers: 4472 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 9 Size of section headers: 40 (bytes) Number of section headers: 30 Section header string table index: 27
Most of this information isn't too useful, but let me point out some key things.
- There is a magic number! The magic number is used to say, hey this is ELF and what version
- The class is ELF32, so it's 32 bit
- The machine is Intel 80386, or x386 to be execpected
- The entry point for the file is address 0x804870, essentially what is the first intsruction in the _start section function which calls main.
Everything else is not super useful for our purposes. Another nice
thing we can do with readelf
is we can look at all the sections,
which is regions of the binary for different purposes.
user@si485H-base:demo$ readelf -S helloworld There are 30 section headers, starting at offset 0x1178: Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .interp PROGBITS 08048154 000154 000013 00 A 0 0 1 [ 2] .note.ABI-tag NOTE 08048168 000168 000020 00 A 0 0 4 [ 3] .note.gnu.build-i NOTE 08048188 000188 000024 00 A 0 0 4 [ 4] .gnu.hash GNU_HASH 080481ac 0001ac 000020 04 A 5 0 4 [ 5] .dynsym DYNSYM 080481cc 0001cc 000060 10 A 6 1 4 [ 6] .dynstr STRTAB 0804822c 00022c 000068 00 A 0 0 1 [ 7] .gnu.version VERSYM 08048294 000294 00000c 02 A 5 0 2 [ 8] .gnu.version_r VERNEED 080482a0 0002a0 000030 00 A 6 1 4 [ 9] .rel.dyn REL 080482d0 0002d0 000008 08 A 5 0 4 [10] .rel.plt REL 080482d8 0002d8 000020 08 A 5 12 4 [11] .init PROGBITS 080482f8 0002f8 000023 00 AX 0 0 4 [12] .plt PROGBITS 08048320 000320 000050 04 AX 0 0 16 [13] .text PROGBITS 08048370 000370 0001f2 00 AX 0 0 16 [14] .fini PROGBITS 08048564 000564 000014 00 AX 0 0 4 [15] .rodata PROGBITS 08048578 000578 000008 00 A 0 0 4 [16] .eh_frame_hdr PROGBITS 08048580 000580 00002c 00 A 0 0 4 [17] .eh_frame PROGBITS 080485ac 0005ac 0000b0 00 A 0 0 4 [18] .init_array INIT_ARRAY 08049f08 000f08 000004 00 WA 0 0 4 [19] .fini_array FINI_ARRAY 08049f0c 000f0c 000004 00 WA 0 0 4 [20] .jcr PROGBITS 08049f10 000f10 000004 00 WA 0 0 4 [21] .dynamic DYNAMIC 08049f14 000f14 0000e8 08 WA 6 0 4 [22] .got PROGBITS 08049ffc 000ffc 000004 04 WA 0 0 4 [23] .got.plt PROGBITS 0804a000 001000 00001c 04 WA 0 0 4 [24] .data PROGBITS 0804a01c 00101c 000008 00 WA 0 0 4 [25] .bss NOBITS 0804a024 001024 000004 00 WA 0 0 1 [26] .comment PROGBITS 00000000 001024 00004d 01 MS 0 0 1 [27] .shstrtab STRTAB 00000000 001071 000106 00 0 0 1 [28] .symtab SYMTAB 00000000 001628 000440 10 29 45 4 [29] .strtab STRTAB 00000000 001a68 000274 00 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings) I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific)
Again, a bunch of this information isn't too useful for us, but might be later. Some key things to look at:
- The .bss section is listed, this is the same as bss in the program memory layout
- There is also .data section, same as the program memory layout
- Finally, there is a .text section, same as before. And notice that it is at address 0x08048370 which is the same address in the header to the start of the program instructions.
2.2 Getting at the assembly with objdump
Now that we have some idea of the how the file is formatted, it would
be nice to get down into the details of the machine instructions
themselves. For this, we'll use objdump
or "object dump". Simply, we
can call it on the binary executable like so:
user@si485H-base:demo$ objdump -d helloworld helloworld: file format elf32-i386 Disassembly of section .init: 080482f8 <_init>: 80482f8: 53 push %ebx 80482f9: 83 ec 08 sub $0x8,%esp 80482fc: e8 9f 00 00 00 call 80483a0 <__x86.get_pc_thunk.bx> 8048301: 81 c3 ff 1c 00 00 add $0x1cff,%ebx 8048307: 8b 83 fc ff ff ff mov -0x4(%ebx),%eax 804830d: 85 c0 test %eax,%eax 804830f: 74 05 je 8048316 <_init+0x1e> 8048311: e8 2a 00 00 00 call 8048340 <__gmon_start__@plt> 8048316: 83 c4 08 add $0x8,%esp 8048319: 5b pop %ebx 804831a: c3 ret Disassembly of section .plt: 08048320 <__stack_chk_fail@plt-0x10>: 8048320: ff 35 04 a0 04 08 pushl 0x804a004 8048326: ff 25 08 a0 04 08 jmp *0x804a008 804832c: 00 00 add %al,(%eax) ...
It's going to dump a lot of stuff, but lets look more carefully down, we'll see one header that looks familiar, main:
0804841d <main>: 804841d: 55 push %ebp 804841e: 89 e5 mov %esp,%ebp 8048420: 83 e4 f0 and $0xfffffff0,%esp 8048423: 83 ec 30 sub $0x30,%esp 8048426: c7 44 24 1d 48 65 6c movl $0x6c6c6548,0x1d(%esp) 804842d: 6c 804842e: c7 44 24 21 6f 2c 20 movl $0x57202c6f,0x21(%esp) 8048435: 57 8048436: c7 44 24 25 6f 72 6c movl $0x646c726f,0x25(%esp) 804843d: 64 804843e: 66 c7 44 24 29 21 0a movw $0xa21,0x29(%esp) 8048445: c6 44 24 2b 00 movb $0x0,0x2b(%esp) 804844a: 8d 44 24 1d lea 0x1d(%esp),%eax 804844e: 89 44 24 2c mov %eax,0x2c(%esp) 8048452: eb 17 jmp 804846b <main+0x4e> 8048454: 8b 44 24 2c mov 0x2c(%esp),%eax 8048458: 0f b6 00 movzbl (%eax),%eax 804845b: 0f be c0 movsbl %al,%eax 804845e: 89 04 24 mov %eax,(%esp) 8048461: e8 aa fe ff ff call 8048310 <putchar@plt> 8048466: 83 44 24 2c 01 addl $0x1,0x2c(%esp) 804846b: 8b 44 24 2c mov 0x2c(%esp),%eax 804846f: 0f b6 00 movzbl (%eax),%eax 8048472: 84 c0 test %al,%al 8048474: 75 de jne 8048454 <main+0x37> 8048476: b8 00 00 00 00 mov $0x0,%eax 804847b: c9 leave 804847c: c3 ret 804847d: 66 90 xchg %ax,%ax 804847f: 90 nop
This is the assembly for the main function. Looking across, from left to right, the furtherest left is the address this instruction is loaded into, then the actually bytes of the instruction, and then finally the name of the details of the instruction.
The first thing you might notice about the instruction itself is that
it is really, really hard to read. That's because it is AT&T syntax,
which, simply, sucks! We will use an alternative format called Intel
syntax, which is much, much nicer. For that, we need to pass an
argument to objdump
:
user@si485H-base:demo$ objdump -M intel -d helloworld helloworld: file format elf32-i386 Disassembly of section .init: 080482f8 <_init>: 80482f8: 53 push ebx 80482f9: 83 ec 08 sub esp,0x8 (... snip ...) 0804841d <main>: 804841d: 55 push ebp 804841e: 89 e5 mov ebp,esp 8048420: 83 e4 f0 and esp,0xfffffff0 8048423: 83 ec 30 sub esp,0x30 8048426: c7 44 24 1d 48 65 6c mov DWORD PTR [esp+0x1d],0x6c6c6548 804842d: 6c 804842e: c7 44 24 21 6f 2c 20 mov DWORD PTR [esp+0x21],0x57202c6f 8048435: 57 8048436: c7 44 24 25 6f 72 6c mov DWORD PTR [esp+0x25],0x646c726f 804843d: 64 804843e: 66 c7 44 24 29 21 0a mov WORD PTR [esp+0x29],0xa21 8048445: c6 44 24 2b 00 mov BYTE PTR [esp+0x2b],0x0 804844a: 8d 44 24 1d lea eax,[esp+0x1d] 804844e: 89 44 24 2c mov DWORD PTR [esp+0x2c],eax 8048452: eb 17 jmp 804846b <main+0x4e> 8048454: 8b 44 24 2c mov eax,DWORD PTR [esp+0x2c] 8048458: 0f b6 00 movzx eax,BYTE PTR [eax] 804845b: 0f be c0 movsx eax,al 804845e: 89 04 24 mov DWORD PTR [esp],eax 8048461: e8 aa fe ff ff call 8048310 <putchar@plt> 8048466: 83 44 24 2c 01 add DWORD PTR [esp+0x2c],0x1 804846b: 8b 44 24 2c mov eax,DWORD PTR [esp+0x2c] 804846f: 0f b6 00 movzx eax,BYTE PTR [eax] 8048472: 84 c0 test al,al 8048474: 75 de jne 8048454 <main+0x37> 8048476: b8 00 00 00 00 mov eax,0x0 804847b: c9 leave 804847c: c3 ret 804847d: 66 90 xchg ax,ax 804847f: 90 nop 0804846d <main>: 804846d: 55 push ebp 804846e: 89 e5 mov ebp,esp 8048470: 83 e4 f0 and esp,0xfffffff0 8048473: 83 ec 30 sub esp,0x30 8048476: 65 a1 14 00 00 00 mov eax,gs:0x14 804847c: 89 44 24 2c mov DWORD PTR [esp+0x2c],eax 8048480: 31 c0 xor eax,eax 8048482: c7 44 24 1d 48 65 6c mov DWORD PTR [esp+0x1d],0x6c6c6548 8048489: 6c 804848a: c7 44 24 21 6f 2c 20 mov DWORD PTR [esp+0x21],0x57202c6f 8048491: 57 8048492: c7 44 24 25 6f 72 6c mov DWORD PTR [esp+0x25],0x646c726f 8048499: 64 804849a: 66 c7 44 24 29 21 0a mov WORD PTR [esp+0x29],0xa21 80484a1: c6 44 24 2b 00 mov BYTE PTR [esp+0x2b],0x0 80484a6: 8d 44 24 1d lea eax,[esp+0x1d] 80484aa: 89 44 24 18 mov DWORD PTR [esp+0x18],eax 80484ae: eb 17 jmp 80484c7 <main+0x5a> 80484b0: 8b 44 24 18 mov eax,DWORD PTR [esp+0x18] 80484b4: 0f b6 00 movzx eax,BYTE PTR [eax] 80484b7: 0f be c0 movsx eax,al 80484ba: 89 04 24 mov DWORD PTR [esp],eax 80484bd: e8 9e fe ff ff call 8048360 <putchar@plt> 80484c2: 83 44 24 18 01 add DWORD PTR [esp+0x18],0x1 80484c7: 8b 44 24 18 mov eax,DWORD PTR [esp+0x18] 80484cb: 0f b6 00 movzx eax,BYTE PTR [eax] 80484ce: 84 c0 test al,al 80484d0: 75 de jne 80484b0 <main+0x43> 80484d2: b8 00 00 00 00 mov eax,0x0 80484d7: 8b 54 24 2c mov edx,DWORD PTR [esp+0x2c] 80484db: 65 33 15 14 00 00 00 xor edx,DWORD PTR gs:0x14 80484e2: 74 05 je 80484e9 <main+0x7c> 80484e4: e8 47 fe ff ff call 8048330 <__stack_chk_fail@plt> 80484e9: c9 leave 80484ea: c3 ret 80484eb: 66 90 xchg ax,ax 80484ed: 66 90 xchg ax,ax 80484ef: 90 nop (... snip ...)
2.3 Dissasembling with gdb
Another way to get the dissambly code is using gdb
, the gnu
debugger, which also does a tone of other tasks which we will look at
later. To start, run the program under the debugger:
user@si485H-base:demo$ gdb helloworld GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i686-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from helloworld...(no debugging symbols found)...done. (gdb)
This will print out a disclaimer and leave you in a gdb terminal. Now,
you can type disassemble main
to disassemble the main function. If
you set up the alias already as suggested in the resource page, you
can shorten that to ds
for dissasemble:
(gdb) ds main Dump of assembler code for function main: 0x0804841d <+0>: push ebp 0x0804841e <+1>: mov ebp,esp 0x08048420 <+3>: and esp,0xfffffff0 0x08048423 <+6>: sub esp,0x30 0x08048426 <+9>: mov DWORD PTR [esp+0x1d],0x6c6c6548 0x0804842e <+17>: mov DWORD PTR [esp+0x21],0x57202c6f 0x08048436 <+25>: mov DWORD PTR [esp+0x25],0x646c726f 0x0804843e <+33>: mov WORD PTR [esp+0x29],0xa21 0x08048445 <+40>: mov BYTE PTR [esp+0x2b],0x0 0x0804844a <+45>: lea eax,[esp+0x1d] 0x0804844e <+49>: mov DWORD PTR [esp+0x2c],eax 0x08048452 <+53>: jmp 0x804846b <main+78> 0x08048454 <+55>: mov eax,DWORD PTR [esp+0x2c] 0x08048458 <+59>: movzx eax,BYTE PTR [eax] 0x0804845b <+62>: movsx eax,al 0x0804845e <+65>: mov DWORD PTR [esp],eax 0x08048461 <+68>: call 0x8048310 <putchar@plt> 0x08048466 <+73>: add DWORD PTR [esp+0x2c],0x1 0x0804846b <+78>: mov eax,DWORD PTR [esp+0x2c] 0x0804846f <+82>: movzx eax,BYTE PTR [eax] 0x08048472 <+85>: test al,al 0x08048474 <+87>: jne 0x8048454 <main+55> 0x08048476 <+89>: mov eax,0x0 0x0804847b <+94>: leave 0x0804847c <+95>: ret End of assembler dump.
If your output is in AT&T syntax, then issue the command:
(gdb) set disassembly-flavor intel
To have gdb output in Intel syntax.
I'll mostly work with gdb dissambled output because it's more nicely formatted, IMHO. Our next task is understanding what the hell is going on?!?!
3 x86 the Processor Register State
3.1 x86 Instruction Set
Let's start with a simple item, what is x86? It is an assembly instruction set – a programming language. We can repressed x86 terms of it's byte (as seen in the objdump output) or in a human readable form (as seen in the Intel or AT&T) syntax.
You may have worked with an instruction set previously, such as MIPS. MIPS has the property that it is a RISC instrument set, or a Reduce Instruction Set Computing, which has the advantage that all instructions and arguments are always the same size, 32 bits.
x86 is a CISC instruction set, or Complex Instruction Set Computing, and it has the property that instruction sizes are not contest. They can very between 8 bits and 64 bits and more, depending on the instruction. You may wonder, why in the world would anything be designed this way? The answer is market inertia and backwards capability. As Intel chips dominated the market, more and more binary was x86.
Today, another instruction set has become very relevant: ARM or Acron Risc Machine. And, as the name indicates, it is a RISC instruction set and thus is bringing back a bit of sanity to the instruction set world. ARM is also the architecture of choice on many mobile devices, so it will be relevant for quite some time.
However, we will not be working with ARM in this class, just x86, and we will only be using a very small set of the x86 instructions. You can read more about x86 in the extensive online resources, and when we encounter an unfamiliar instruction, we will look it up.
3.2 Anatomy of an Instruction
An instruction, in the human readable format, has the following format:
operation <dst>, <src>
The operation name is the kind of operation that will be
performed. For example, it could be an add
or mov
or and
. The
<dst>
is where the result will be stored, which is typically a
register. The <src>
is from where the data is read to be operated
over which might also include data referenced in the <dst>
. The
<src>
is optional, and depends on the command.
If we take a few operations from our sample code:
0x0804841d <+0>: push ebp 0x0804841e <+1>: mov ebp,esp 0x08048420 <+3>: and esp,0xfffffff0
The first command push
takes one argument an places that argument on
the stack, adjusting the stack pointer. In this case, it pushes the
value of the base pointer stored in the register ebp
onto the
stack. The second command, mov
takes two arguments, and will move
a value from one location to another, much like assignment. The second
command moves the value in the stack pointer esp
and saves it in the
base pointer ebp
. Finally, the last command is a bitwise and
operation taking two arguments. It will perform a bitwise and on the
<dst>
with the <src>
and store the result in <dst>
. In this
context, the and
command aligns the stack pointer with the lowest
4-bit value. The 4-bit alignment is due to an old bug in the division
unit of the x86 processor, and so you'll see this sequence a lot in
assembly.
We will take a closer look at these instructions again in a second, but before we do that, we need to understand these registers and what they are used for in more detail.
3.3 Processor Registers
Registers are special storage spaces on the processor that store the state of the program. Some registers are used for general purpose storage to store intermediate storage, while others are used to keep track of the execution state, e.g., like what is the next instruction.
Here are the standard registers you will encounter. There are some others, but we'll explain them when we come across them:
esp
: 32-bit register to store the stack pointerebp
: 32-bit register to store the base pointereax
: 32-bit general purpose register, sometimes called the "accumulator"ecx
: 32-bit general purpose registerebx
: 32-bit general purpose registeredx
: 32-bit general purpose registeresi
: 32-bit general purpose registers mostly used for loading and storingedi
: 32-bit general purpose registers mostly used for loading and storing
Each of the general purpose registers can be referenced either by
there full 32-bit value or by some subset of that, such as the first 8
bits or second 8 bits. For example, eax
refers to the 32-bit general
registers, but ax
refers to the last 16 bits of the eax
register
and al
is the first 8 bits. Depending on the kind of data the
register is storing, we may reference different parts.
3.4 The Base Pointer and Stack Pointer
Two registers will be referenced more than any other: the base and stack pointer. These registers maintain the memory reference state for the current execution, with reference to the current function frame. A function frame is portion of memory on the stack that stores the information for a current functions execution, including local data and return addresses. The base pointer define the top and bottom of the function frame.
The structure of a function frame is like such
<- 4 bytes -> .-------------. | ... | higher address ebp+0x8 ->| func args | ebp+0x4 ->| return addr | ebp ->| saved ebp | ebp-0x4 ->| | : : : ' ' ' local args . . . : : : esp+0x4 ->| | esp ->| | lower addreses '-------------'
Moving from higher addresses to lower addresses, the top of the frame
stores the function arguments. These are typically referenced in
positive offsets of ebp
register. For example, the first argument is
at ebp+0x8
moving upwards from there.
The second item in the function frame is the return address at
ebp+0x4
. The value stored in this memory is where the next
instruction is after the return statement, or what instruction occurs
after the call to this insturction completes. We will spend a LOT OF
TIME talking about this later.
Finally, there is the saved ebp
, this is the address where the last
base pointer for the calling function. We need to save this value so
that the calling function's stack frame can be restored onced this
function completes.
The stack pointer references the bottom of the stack, the lowest address allocated. Addresses past this point are considered un-allocated. However, it's pretty easy to allocate more space, we'll just subtract from the stack pointer.
3.5 Managing the Stack Frame and the Stack
Now that we have some context for the registers, let's take a look at the first set of instructions in our code:
0x0804841d <+0>: push ebp 0x0804841e <+1>: mov ebp,esp 0x08048420 <+3>: and esp,0xfffffff0 0x08048423 <+6>: sub esp,0x30
Let's first analyze the first four instructions. The push instruction
will push a value onto the stack, and in this case it is the previous
base pointer, ie, the saved based pointer. Next, the base pointer is
set to the stack pointer (mov
), and then aligned to 4-bits
(and
). Next, subtracting from the stack pointer allocates the rest
of the stack frame, which is 0x30 bytes long or 48 bytes (don't forget
about hex).
3.6 Referencing, De-Referencing, and Setting Memory
The next set of instructions entitles the memory of the stack. Let's switch back to the C-code to see this in c first before we look at it in assembly.
char hello[15]="Hello, World!\n";
The string "Hello World!\n" is set on the stack in 15 byte character array. In assembly, this looks like this.
0x08048426 <+9>: mov DWORD PTR [esp+0x1d],0x6c6c6548 0x0804842e <+17>: mov DWORD PTR [esp+0x21],0x57202c6f 0x08048436 <+25>: mov DWORD PTR [esp+0x25],0x646c726f 0x0804843e <+33>: mov WORD PTR [esp+0x29],0xa21 0x08048445 <+40>: mov BYTE PTR [esp+0x2b],0x0
If you squint at the <src>
of the operators, you'll recognize that
this is ASCII. If you don't believe, check out the ASCII table. The
DWORD or WORD or BYTE PTR are deference commands.
BYTE PTR[addr]
: byte-pointer : de-reference one byte at the addressWORD PTR[addr]
: word-pointer : de-reference the two bytes at the addressDWORD PTR[addr]
: double word-pointer : de-reference the four bytes at the address
Another way to look at these instructions in C would be like this (don't program like this, though):
char hello[15]; // l l e H * ((int *) (hello)) = 0x6c6c6548; // set hello[0]->hello[3] // W , o * ((int *) (hello + 4)) = 0x57202c6f; // set hello[4]->hello[7] // d l r o * ((int *) (hello + 8)) = 0x646c726f; // set hello[8]->hello[11] // \n ! * ((short *) (hello + 12)) = 0x0a21; // set hello[12]->hello[13] // \0 * ((char *) (hello+14)) = 0x00; // set hello[14]
The next two ins ructions are a bit different:
0x0804844a <+45>: lea eax,[esp+0x1d] 0x0804844e <+49>: mov DWORD PTR [esp+0x2c],eax
lea
stands for load effective address and is a short cut for to do
a bit a math and calculate a pointer offset and store it. If we look
at what's next in the C-program, we see that it is setting up the
for-loop.
for(p = hello; *p; p++){
The first part of the for loop is initializing the pointer p
to
refernce the start of the string hello. From the previous code, the
start of the string hello is at address offset esp+0x1d
and we want
to set that address to the value of p
. This is a two step process:
- The actually address must be computed using addition from
esp
and stored.lea eax,[esp+0x1d]
will calculate the address and store it ineax
. - The value in
eax
must be stored in the memory reserved forp
, which is at addressesp+0x2c
, the move command accomplishes that.
At this point, everything is set up. And for reference, remeber that
the address of p
is at esp+0x2c
.
3.7 Loops, Jumps, and Condition Testing
Now, we've reached the meat of the program: the inner loop. We can follow the execution at this point by following the jumps.
0x08048452 <+53>: jmp 0x804846b <main+78> # -----------. 0x08048454 <+55>: mov eax,DWORD PTR [esp+0x2c] # <-------. | 0x08048458 <+59>: movzx eax,BYTE PTR [eax] # | | 0x0804845b <+62>: movsx eax,al # | | 0x0804845e <+65>: mov DWORD PTR [esp],eax # | | //loop body 0x08048461 <+68>: call 0x8048310 <putchar@plt> # | | 0x08048466 <+73>: add DWORD PTR [esp+0x2c],0x1 # | | 0x0804846b <+78>: mov eax,DWORD PTR [esp+0x2c] # <-------+--' 0x0804846f <+82>: movzx eax,BYTE PTR [eax] # | //exit condition 0x08048472 <+85>: test al,al # | 0x08048474 <+87>: jne 0x8048454 <main+55> # -------'
A jmp
instruction changes the instruction pointer to the destination
specified. It is not conditioned, it is explicit hard jump. Following
that jump in the code, we find the following three instructions:
0x0804846f <+82>: movzx eax,BYTE PTR [eax] 0x08048472 <+85>: test al,al 0x08048474 <+87>: jne 0x8048454 <main+55>
Easier to start with the movzx
instruction. Recall that at this
point in the code, eax
has the value that is the same as p
. And
you can see that to be case in the previous instruction mov eax,DWORD
PTR [esp+0x2c]
where esp+0x2c
is the memory address for p.
The movzx
instruction will deference the address stored in eax
which is whatever p
references, read one byte at that address and
write it to the lower 8-bits of eax. This is essentially the *p
operation which is some character in hello, and so what we want to
test is if p
references the NULL at the end of hello.
That test occurs test al,al
which compares to registers in a number
of ways. Here we are testing the al
register which is the lower
8-bits= of eax
, where we stored the deference of p
. The results of
the test, greater then, less than, equal, not zero, etc. are stored in
a set of bit flags. The one we care about is the ZF
flag or the
zero flag. If al
is zero then ZF
is set to 1 which would be the
case when p
references the end of the hello
string.
The jne
command says to jump when not equal to zero. If it is the
case that al
is zero, do not jump, otherwise continue to the address
and continue the loop.
3.8 Function Calls
If we investigate the loop body, we find the following instructions:
0x08048454 <+55>: mov eax,DWORD PTR [esp+0x2c] 0x08048458 <+59>: movzx eax,BYTE PTR [eax] 0x0804845b <+62>: movsx eax,al 0x0804845e <+65>: mov DWORD PTR [esp],eax 0x08048461 <+68>: call 0x8048310 <putchar@plt>
The first set of instructions, much like the test before, is to
deference the pointer p
.
- load the value o
p
, a memory address, intoeax
- Read the byte referenced at
p
into the lower 8-bits ofeax
- zero out the remaining bits of
eax
leaving only lower 8-bits
At this point, eax
stores a value like 0x0000048 (i.e, 'H') where
the lowest byte is the character of interest, and the remaining bytes
are 0.
This value is then writen to the top of the stack as referenced by
esp
because we are about to make a function call. The arguments to
functions are pushed onto the stack before a call. In this case, we
allocated that stack space ahead of time so we don't need to push, but
the argument is in the right place, at the top of the stack.
The next operation is a call
which will execute the function
putchar
, conveniently told to us by gdb. Once that function
completes, execution will continue to the point right after the
call
, which is the instruction add
.
0x08048466 <+73>: add DWORD PTR [esp+0x2c],0x1
Looking closely at this instruction, you see that this will increment
the pointer p
, and the instructions following test weather p
now
references zero. And the loop goes on … as the world turns.