SI485H: Stack Based Binary Exploits and Defenses (F15)

Home Policy Calendar Resources

Lec. 01: C Review

Table of Contents

1 Hello World

Let's start in the beginning: Hello World!

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char * argv[]){

  printf("Hello World!\n");

  return 0;
}

This program prints "Hello World!" by making a call to the library function printf(), the format print function. Additionally, note that main() in c programs take two arguments:

  • int argc : the number of command line arguments (always at least 1)
  • char * argv[] : a NULL terminated array of strings for the command line arguments.

We'll come back to main()'s function arguments later when we discuss arrays and strings. The last item to note from the main() function is that it has a return value, namely 0. This return value is also the exit value for the program. It is customary for programs that return successful to return 0 while those that do not succeed to return some value other than 0, typically 1 or 2 depending on the error.

Finally, note the two includes: stdio.h stdlib.h. These are the header files for portions of the c-standard lib (often referred to as clib). stdio.h refers to c standard input and output, and stdlib refers to c standard library functions.

As we will see below, by default for all c programs, libc is included, but the headers describe which portions of the library functions you will use. The header files contain the function definitions, for example for printf(), so the compiler knows if the function call type checks. More on the compilation process next.

1.1 Simple Compilation Process

We will use gcc (the GNU c compiler) exclusively to do compilation for c programs. The most straight forward way to use gcc is to just call it with the program source file as the argument.

user@si485H-base:demo$ gcc helloworld.c 
user@si485H-base:demo$ ls
a.out  helloworld.c
user@si485H-base:demo$ ./a.out 
Hello World!

This produces an output binary file called a.out that we can execute to get the "Hello World!" message. If we want to compile the program to a specific file name, same helloworld, then we use the -o option to specify the name of the output file.

user@si485H-base:demo$ gcc -o helloworld helloworld.c 
user@si485H-base:demo$ ./helloworld 
Hello World!

1.2 Multi Step Compilation Process

There is is actually obscuring a large portion of the compilation process which really involves multiple steps. A source program actually goes through two stages before becoming a binary executable.

First, the source code must be compiled into object code, which is an intermediate representation of the source file. This is called compilation because there is a literal transformation of one source doe to another source code. The object file contains the compiled source in machine level instructions (e.g. x86 assembly) but it is not executable yet because it must be assembled and linked properly with some other code sources (e.g., code from the c standard library) so that it can actually execute on the specific target machine.

To see how this works, lets look at multi-file hello world program. In one file (below) we have the main() function which calls two other functions hello() and world(), but only hello() is provided in the program source. Both functions have definitions, that is, the types of their input is known, but not the code for both functions.

#include <stdio.h>
#include <stdlib.h>

void world(void);
void hello(void);

void hello(){
  printf("Hello ");
}

int main(int argc, char *argv[]){
  hello();
  world();
}

If we were to try and compile this program, we will get an error.

user@si485H-base:demo$ gcc -o hello hello.c
/tmp/ccC4VYbK.o: In function `main':
hello.c:(.text+0x20): undefined reference to `world'
collect2: error: ld returned 1 exit status

Looking closely at the error, we see that it is actually not gcc that is printing an error, but rather ld. That is because the program actually compiled but did not assemble properly. ld the GNU linker was not able to find the reference (or code) for world() and failed to link the code into the executable source and those nothing was assembled.

You can see, that yes, this program does actually compile by using the -c tag with gcc, which says to compile the source to an object file:

user@si485H-base:demo$ gcc -c -o hello.o hello.c

This succeeds, and now we have an object file for hello and we need to provide more compiled code to complete the assembly process. Specifcially, we need to provide code that fills in the world() function.

#include <stdio.h>
#include <stdlib.h>

void world( ){

  printf("World!\n");
}

Once we have that, we can compile world.c into world.o and we can assemble the two .o files into a single executable.

user@si485H-base:demo$ gcc -c -o world.o world.c
user@si485H-base:demo$ gcc -o hello hello.o world.o
user@si485H-base:demo$ ./hello 
Hello World!

However, as you will see many times in this class. There is still more going on beneath the surface. There is still more code that is being used in the assembly process. And we can actually use ld directly to do the final linking to expose all those parts.

ld -o hello hello.o world.o --dynamic-linker /lib/ld-linux.so.2 /usr/lib/i386-linux-gnu/crt1.o /usr/lib/i386-linux-gnu/crti.o -lc /usr/lib/i386-linux-gnu/crtn.o

The compilation actually requires three other object files crt1.o crti.o and crtn.o as well as a dynamically linked library ld-linkux.so.2 to really assemble the code. These object files provide important starter and ending code blocks and other functions that will become relevant when we start to reverse engineer some software.

2 Library Functions vs. System Calls

If you look more closely at the ld command line above, you will also see the flag -lc which says to include clib in the compilation. The c standard library provides a lot of functionality for the programmer, but its primary task is to provide an interface by which the programmer can easily access the underlying operating system interface.

Recall that a system call is a mechanisms for the programmer to gain access to an operating system feature. The operating system provides a few key features relevant to this class:

  • Input/Output : reading and writing from devices, such as the terminal, network, and other peripherals
  • Memory Management : maintain the memory for programs and ensuring that programs do not access memory that is invalid, this includes maintaining the virtual memory layout for a program
  • Program Execution: loading and unloading programs and executing them on the CPU

2.1 Tracing Function and System Calls

A library function, on the other hand, provides a more user friendly interface to the system calls. To see this dynamic, we can trace a programs execution and see where library functions vice system calls. There are two tracers we will use heavily in this class:

  • ltrace trace library functions
  • strace trace system calls.

And we can look at the output of these traces to get a sense of how programs execute.

user@si485H-base:demo$ ltrace ./hello > /dev/null 
__libc_start_main(0x8048304, 1, 0xbffa7744, 0x8048360 <unfinished ...>
printf("Hello ")                                                                  = 6
puts("World!")                                                                    = 7
+++ exited (status 7) +++

In the ltrace above, we see that for the hello program from the previous section, there are two library calls: printf() and puts(). If you were to look at the manual page, both puts() printf() takes a string and writes it to stdout. However, we know that the actual method for writing to stdout is using the write() system call, and we can see this using strace.

user@si485H-base:demo$ strace ./hello > /dev/null 
execve("./hello", ["./hello"], [/* 20 vars */]) = 0
brk(0)                                  = 0x87b7000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7760000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=70286, ...}) = 0
mmap2(NULL, 70286, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb774e000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/i386-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\233\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1754876, ...}) = 0
mmap2(NULL, 1759868, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb75a0000
mmap2(0xb7748000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a8000) = 0xb7748000
mmap2(0xb774b000, 10876, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb774b000
close(3)                                = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb759f000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb759f940, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb7748000, 8192, PROT_READ)   = 0
mprotect(0xb7783000, 4096, PROT_READ)   = 0
munmap(0xb774e000, 70286)               = 0
fstat64(1, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0xbf8187d8) = -1 ENOTTY (Inappropriate ioctl for device)
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb775f000
write(1, "Hello World!\n", 13)          = 13
exit_group(7)                           = ?
+++ exited with 7 +++

In fact, there are a lot of system calls that get involved here. Starting at the top, we have the execve() system call that executes the program, but after that, there is a lot of loading and reading libraries into memory. And finally, two from the bottom, we see the write() system call to stdout (file descriptor 1).

2.2 Hello System Call

We can, of course, write a hello-world program without any library functions. But, we'll need some helper functions, like writing our own string length function.

#include <unistd.h>


int mystrlen(char * str){

  int i;
  for(i=0; str[i]; i++);

  return i;

}


int main(int argc, char *argv[]){

  char str[] = "Hello World!\n";

  write(1,str,mystrlen(str));

}

Compiling and executing this program and analyzing the ltrace, we still see a call to write() but no calls to puts() or printf().

user@si485H-base:demo$ ltrace ./hellosystem > /dev/null 
__libc_start_main(0x8048494, 1, 0xbf8c5034, 0x8048510 <unfinished ...>
write(1, "Hello World!\n", 13)                                                                                                    = 13
+++ exited (status 13) +++

The reason that write() still appears is that this write() is not the real system call write(), but is a library wrapper to it … but that is a story for another day. What's more interesting is the strace, which if you observe closely, you will see is the same as the other version of the program.

user@si485H-base:demo$ strace ./hellosystem > /dev/null 
execve("./hellosystem", ["./hellosystem"], [/* 20 vars */]) = 0
brk(0)                                  = 0x8c9b000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb77c3000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=70286, ...}) = 0
mmap2(NULL, 70286, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb77b1000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/i386-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\233\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1754876, ...}) = 0
mmap2(NULL, 1759868, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7603000
mmap2(0xb77ab000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a8000) = 0xb77ab000
mmap2(0xb77ae000, 10876, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb77ae000
close(3)                                = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7602000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7602940, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb77ab000, 8192, PROT_READ)   = 0
mprotect(0x8049000, 4096, PROT_READ)    = 0
mprotect(0xb77e6000, 4096, PROT_READ)   = 0
munmap(0xb77b1000, 70286)               = 0
write(1, "Hello World!\n", 13)          = 13
exit_group(13)                          = ?
+++ exited with 13 +++