Lec. 01: C Review
Table of Contents
1 Hello World
Let's start in the beginning: Hello World!
#include <stdio.h> #include <stdlib.h> int main(int argc, char * argv[]){ printf("Hello World!\n"); return 0; }
This program prints "Hello World!" by making a call to the library
function printf()
, the format print function. Additionally, note
that main()
in c programs take two arguments:
int argc
: the number of command line arguments (always at least 1)char * argv[]
: a NULL terminated array of strings for the command line arguments.
We'll come back to main()
's function arguments later when we discuss
arrays and strings. The last item to note from the main()
function
is that it has a return value, namely 0. This return value is also the
exit value for the program. It is customary for programs that return
successful to return 0 while those that do not succeed to return some
value other than 0, typically 1 or 2 depending on the error.
Finally, note the two includes: stdio.h
stdlib.h
. These are the
header files for portions of the c-standard lib (often referred to
as clib). stdio.h
refers to c standard input and output, and
stdlib
refers to c standard library functions.
As we will see below, by default for all c programs, libc is included,
but the headers describe which portions of the library functions you
will use. The header files contain the function definitions, for
example for printf()
, so the compiler knows if the function call
type checks. More on the compilation process next.
1.1 Simple Compilation Process
We will use gcc
(the GNU c compiler) exclusively to do compilation
for c programs. The most straight forward way to use gcc
is to just
call it with the program source file as the argument.
user@si485H-base:demo$ gcc helloworld.c user@si485H-base:demo$ ls a.out helloworld.c user@si485H-base:demo$ ./a.out Hello World!
This produces an output binary file called a.out
that we can execute
to get the "Hello World!" message. If we want to compile the program
to a specific file name, same helloworld
, then we use the -o
option to specify the name of the output file.
user@si485H-base:demo$ gcc -o helloworld helloworld.c user@si485H-base:demo$ ./helloworld Hello World!
1.2 Multi Step Compilation Process
There is is actually obscuring a large portion of the compilation process which really involves multiple steps. A source program actually goes through two stages before becoming a binary executable.
First, the source code must be compiled into object code, which is an intermediate representation of the source file. This is called compilation because there is a literal transformation of one source doe to another source code. The object file contains the compiled source in machine level instructions (e.g. x86 assembly) but it is not executable yet because it must be assembled and linked properly with some other code sources (e.g., code from the c standard library) so that it can actually execute on the specific target machine.
To see how this works, lets look at multi-file hello world program. In
one file (below) we have the main()
function which calls two other
functions hello()
and world()
, but only hello()
is provided in
the program source. Both functions have definitions, that is, the
types of their input is known, but not the code for both functions.
#include <stdio.h> #include <stdlib.h> void world(void); void hello(void); void hello(){ printf("Hello "); } int main(int argc, char *argv[]){ hello(); world(); }
If we were to try and compile this program, we will get an error.
user@si485H-base:demo$ gcc -o hello hello.c /tmp/ccC4VYbK.o: In function `main': hello.c:(.text+0x20): undefined reference to `world' collect2: error: ld returned 1 exit status
Looking closely at the error, we see that it is actually not gcc
that is printing an error, but rather ld
. That is because the
program actually compiled but did not assemble properly. ld
the
GNU linker was not able to find the reference (or code) for world()
and failed to link the code into the executable source and those
nothing was assembled.
You can see, that yes, this program does actually compile by using the
-c
tag with gcc, which says to compile the source to an object file:
user@si485H-base:demo$ gcc -c -o hello.o hello.c
This succeeds, and now we have an object file for hello
and we need
to provide more compiled code to complete the assembly
process. Specifcially, we need to provide code that fills in the
world()
function.
#include <stdio.h> #include <stdlib.h> void world( ){ printf("World!\n"); }
Once we have that, we can compile world.c
into world.o
and we can
assemble the two .o
files into a single executable.
user@si485H-base:demo$ gcc -c -o world.o world.c user@si485H-base:demo$ gcc -o hello hello.o world.o user@si485H-base:demo$ ./hello Hello World!
However, as you will see many times in this class. There is still more
going on beneath the surface. There is still more code that is being
used in the assembly process. And we can actually use ld
directly to
do the final linking to expose all those parts.
ld -o hello hello.o world.o --dynamic-linker /lib/ld-linux.so.2 /usr/lib/i386-linux-gnu/crt1.o /usr/lib/i386-linux-gnu/crti.o -lc /usr/lib/i386-linux-gnu/crtn.o
The compilation actually requires three other object files crt1.o
crti.o
and crtn.o
as well as a dynamically linked library
ld-linkux.so.2
to really assemble the code. These object files
provide important starter and ending code blocks and other functions
that will become relevant when we start to reverse engineer some
software.
2 Library Functions vs. System Calls
If you look more closely at the ld
command line above, you will also
see the flag -lc
which says to include clib
in the
compilation. The c standard library provides a lot of functionality
for the programmer, but its primary task is to provide an interface by
which the programmer can easily access the underlying operating system
interface.
Recall that a system call is a mechanisms for the programmer to gain access to an operating system feature. The operating system provides a few key features relevant to this class:
- Input/Output : reading and writing from devices, such as the terminal, network, and other peripherals
- Memory Management : maintain the memory for programs and ensuring that programs do not access memory that is invalid, this includes maintaining the virtual memory layout for a program
- Program Execution: loading and unloading programs and executing them on the CPU
2.1 Tracing Function and System Calls
A library function, on the other hand, provides a more user friendly interface to the system calls. To see this dynamic, we can trace a programs execution and see where library functions vice system calls. There are two tracers we will use heavily in this class:
ltrace
trace library functionsstrace
trace system calls.
And we can look at the output of these traces to get a sense of how programs execute.
user@si485H-base:demo$ ltrace ./hello > /dev/null __libc_start_main(0x8048304, 1, 0xbffa7744, 0x8048360 <unfinished ...> printf("Hello ") = 6 puts("World!") = 7 +++ exited (status 7) +++
In the ltrace
above, we see that for the hello
program from the
previous section, there are two library calls: printf()
and
puts()
. If you were to look at the manual page, both puts()
printf()
takes a string and writes it to stdout
. However, we know
that the actual method for writing to stdout
is using the write()
system call, and we can see this using strace
.
user@si485H-base:demo$ strace ./hello > /dev/null execve("./hello", ["./hello"], [/* 20 vars */]) = 0 brk(0) = 0x87b7000 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7760000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=70286, ...}) = 0 mmap2(NULL, 70286, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb774e000 close(3) = 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/lib/i386-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\233\1\0004\0\0\0"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=1754876, ...}) = 0 mmap2(NULL, 1759868, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb75a0000 mmap2(0xb7748000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a8000) = 0xb7748000 mmap2(0xb774b000, 10876, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb774b000 close(3) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb759f000 set_thread_area({entry_number:-1 -> 6, base_addr:0xb759f940, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0 mprotect(0xb7748000, 8192, PROT_READ) = 0 mprotect(0xb7783000, 4096, PROT_READ) = 0 munmap(0xb774e000, 70286) = 0 fstat64(1, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0 ioctl(1, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0xbf8187d8) = -1 ENOTTY (Inappropriate ioctl for device) mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb775f000 write(1, "Hello World!\n", 13) = 13 exit_group(7) = ? +++ exited with 7 +++
In fact, there are a lot of system calls that get involved
here. Starting at the top, we have the execve()
system call that
executes the program, but after that, there is a lot of loading and
reading libraries into memory. And finally, two from the bottom, we
see the write()
system call to stdout
(file descriptor 1).
2.2 Hello System Call
We can, of course, write a hello-world program without any library functions. But, we'll need some helper functions, like writing our own string length function.
#include <unistd.h> int mystrlen(char * str){ int i; for(i=0; str[i]; i++); return i; } int main(int argc, char *argv[]){ char str[] = "Hello World!\n"; write(1,str,mystrlen(str)); }
Compiling and executing this program and analyzing the ltrace
, we
still see a call to write()
but no calls to puts()
or printf()
.
user@si485H-base:demo$ ltrace ./hellosystem > /dev/null __libc_start_main(0x8048494, 1, 0xbf8c5034, 0x8048510 <unfinished ...> write(1, "Hello World!\n", 13) = 13 +++ exited (status 13) +++
The reason that write()
still appears is that this write()
is not
the real system call write()
, but is a library wrapper to it
… but that is a story for another day. What's more interesting is
the strace
, which if you observe closely, you will see is the same
as the other version of the program.
user@si485H-base:demo$ strace ./hellosystem > /dev/null execve("./hellosystem", ["./hellosystem"], [/* 20 vars */]) = 0 brk(0) = 0x8c9b000 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb77c3000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=70286, ...}) = 0 mmap2(NULL, 70286, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb77b1000 close(3) = 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/lib/i386-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\233\1\0004\0\0\0"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=1754876, ...}) = 0 mmap2(NULL, 1759868, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7603000 mmap2(0xb77ab000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a8000) = 0xb77ab000 mmap2(0xb77ae000, 10876, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb77ae000 close(3) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7602000 set_thread_area({entry_number:-1 -> 6, base_addr:0xb7602940, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0 mprotect(0xb77ab000, 8192, PROT_READ) = 0 mprotect(0x8049000, 4096, PROT_READ) = 0 mprotect(0xb77e6000, 4096, PROT_READ) = 0 munmap(0xb77b1000, 70286) = 0 write(1, "Hello World!\n", 13) = 13 exit_group(13) = ? +++ exited with 13 +++