Basic Binary Auditing
Posted on Tue 01 July 2014 in Reverse Engineering
Before I go into some of the protections that are commonly in place, I thought it would be best to show how to detect these 2 basic vulnerabilities using reverse engineering (as opposed to randomly fuzzing inputs as we did in parts 1, 2 and 3).
Reverse engineering (reversing) is an extremely powerful tool in the hackers arsenal and when there is no source code for the application that you are targeting nothing is better.
Assembly is the language of reversing and a debugger is the most important tool.
Assembly is essentially the language of the processor, the actual "machine code" that people think of what the computer deals with (whether viewed as binary or hex) is just a different representation of assembly language, so this is the lowest level programming language possible to those outside of processor firmware development.
A debugger is an application that allows you to view an applications virtual memory segment as the application itself views it, as well as change the values in sections of memory or CPU registers at run time.
Another important feature of a debugger is the ability to set breakpoints so you can force the application to stop execution at a specific part of the application and view values or step through the application instruction by instruction.
The App
We will use the same basic application we used in parts 1 and 2:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
|
This time we will not exploit this application (we've done that already), instead we'll just use the debugger it figure out that these vulnerabilities exist.
Setting Up The Environment
This is the same as in part 1 and 2 so please refer to the Setting Up The Environment section of 1 of those.
Looking For The Juicy Bits
First we'll test the application as usual:
1 2 3 4 5 6 |
|
Nothing unusual there but we now know that the application takes 1 argument. If we open this using gdb
we can have a closer look at it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
|
Here we can tell that the application was written in C because it includes __libc_start_main
on lines 25 and 26. This means we have a main
function which is the start of our application (shown on line 38).
There are a couple of other functions of interest here but let's leave them for a bit and look at the main
function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|
The first 4 instructions are the function prologue (lines 3, 4, 5 and 6). Here the stack frame is set up.
The last 2 instructions are the function epilogue (lines 39 and 40). Here the leave
instruction preforms the inverse of what the prologue did.
Looking at the prologue and epilogue we can see that the calling convention is probably cdecl.
I will not go into calling conventions much here, because it isn't terribly relevant although its important to know what they are and the differences, but a calling convention basically defines how a function is called.
Back on topic, initially when looking for a vulnerability we should check some of the known vulnerable functions commonly used by developers. The main 1's are the printf
family of functions and the string copying/moving functions.
Looking back at our list of functions, a couple of interest are being used. Mainly printf
and strncpy
. In the main function though only printf
out of those 2 is being used. Let's examine them a little closer.
The first, on line 10, is set up on line 9 with an argument:
1 2 |
|
What the first instruction is doing here is moving the address 0x80487f0
into the address pointed to by the ESP register. These 2 lines relate to line 17 in our source code above.
The ESP register points to the top of the stack and in the cdecl calling convension, before the actual call to the function, its arguments are pushed onto the stack in reverse order. As there is only 1 argument to this call only 1 is put on the stack.
To be honest, this call doesn't look like its going to be of interest as the argument is a static address and it points to the text segment of memory which isn't writable, but we can check the value of this just to make sure:
1 2 |
|
So it looks to be part of an error message. The next call to printf
looks more interesting but first we need to understand how a stack frame is arranged in an application like this.
Stack Frames
Below is the top of an example stack frame which is getting ready for a function call:
Here we are unable to see the base pointer (EBP) but we can see the stack pointer (ESP) which always points to the top of the stack.
Putting arguments on the stack can be done in a number of ways. Firstly it can be done using the push
instruction as follows:
1 2 3 |
|
Here the value is the EAX register is being pushed onto the stack as the third argument (or "ARG 3" in our diagram), then the static value 0x80487f0
as the second argument and finally EBP+c (or EBP+12, which is usually the second argument to the current function) as the first argument.
The push
instruction automatically adjusts the value of ESP accordingly but it can also be done manually:
1 2 3 4 |
|
This set of instructions are functionally the same as the previous. These are followed by a call
instruction and after the call instruction our stack looks like this:
The call
instruction autmatically pushes the memory address of the next instruction onto the stack. This is done so that when a function returns the application knows where to start executing instructions.
Inside the function that we have just called we start executing that functions prologue. First there is a push ebp
instruction which does this to the stack:
After that it executes mov ebp, esp
:
Lastly any space for needed for local variables is subtracted from ESP (sub esp, 0x8
), so our stack ends up like this:
EBP always points to the start of the current functions stack frame and ESP to the top of the stack so if we call another function inside the current function the same process would happen.
The functions epilogue does the opposite, in the application we are debugging it just have to leave
instruction. The leave
instruction automates the cleanup of the stack frame.
In our example stack, the leave
function would be equivalent to:
1 2 |
|
This would bring our stack frame back to this:
And then the final ret
instruction would remove the RET ADDR from the stack setting everything back to how it was before the function call, ret
essentially does pop eip
.
Juicy Bits Continued
Now that we understand how the stack works we can have a look at that second call to printf
. The first argument to printf
is always the format string so when looking for a format string vulnerability we are trying to figure out if we can control the first argument.
The relevant lines that setup and call printf
are:
1 2 3 4 |
|
These 4 lines of code is actually line 18 in the source of the application. Line 1 moves the second argument (ebp+0xc
) (the second argument is always +C or +12 because EBP points to the old EBP, +4 points to the return address and +8 points to the first argument) into EAX.
In C the second argument to the main function is a list of pointers to the actual application arguments.
Because this argument is an array of pointers, line 2 moves the first pointer in this array into EAX (this normally points to the path of the application itself).
This pointer is moved to the address pointed to by ESP (the top of the stack) and finally printf
is called. This shows that only 1 argument was given and that argument is the application path.
We can check this using gdb
but first there was a conditional statement which determined if this code got executed:
1 2 |
|
This is the if
statement on line 16 of the source code.
Line 1 compares the first argument ebp+0x8
, with 1 and jumps to 0x804864c
if the first argument is greater than 1. As you can see the assembly condition is the opposite to what is in the source code, this is often the case.
In C the first argument to the main function is the number of arguments give to the application on the command line so to enter the section of code we want to analyse we just need to give the application 1 argument (the name of the application is considered the first argument so there is always at least 1).
Integer Overflow
The jg
instruction means that the numbers that are being compared are signed (it would be ja
if they were unsigned) and because there is no bound checking done on ebp+0x8
, it is vulnerable to an integer overflow:
I wanted to demostrate this as soon as I realised but because it is an integer I need to send at least 2147483647 arguments, I couldn't do this on my test machine because there just isn't enough RAM.
So in the name of science, I rewrote the application so that the argc
argument (or the number of arguments passed to the main function) is a char
instead, here is my new application:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
|
Here is the quick demonstration:
1 2 3 4 5 |
|
What is happening here is that the argument argc
is being interpreted as a signed char and the max value for this type of variable is 127:
1 2 3 |
|
As the application is the first argument, we can have another 126 argument before the variable overflows and becomes -128, which is obviously smaller than 2.
Back To The Juicy Bits
So now we know how to get to the code we want to analyse, which is:
1 2 3 4 |
|
Let's set a breakpoint on line 1 here (or 0x08048627
) and run the application without any arguments.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
This shows that our assumptions were correct and that there is likely a format string vulnerability here which we can exploit by chaning the name of the application (or creating a symlink as in part 2.
We also have a very similar set of printf
calls towards the end of the application:
1 2 3 4 5 6 7 |
|
We are interested in the second printf
here but to figure out how to get to it we need to have a look at the memory at 0x8048804
which is printed just before.
1 2 |
|
So we get to this section of code when we give a wrong password. The call to the printf
in question is the same as previous except 4 is added to EAX before the pointer is followed. This suggests the second argument is being printed (also the previous printf
supports our theory), but let's check.
Let's set a breakpoint and examine the memory again:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
This is the second format string vulnerability.
Buffer Overflow
So far we have found an integer overflow and 2 format string vulnerabilities.
Next we should look over the checkpass
function which is called on line 23 of the disassembly above. Here is the relevant instructions related to the call to checkpass
:
1 2 3 4 5 |
|
We've already seen a set of instructions that were exactly the same as this, the second call to printf
, so this function takes 1 argument, the second argument to the application.
Here is the disassembly of checkpass
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
In the prologue, 0x228 bytes (or 552 bytes) are reserved for local variables and function call arguments.
The interesting call here is the call to strncpy
but we need to examine the call to strlen
first because it looks like output is the third argument to strncpy
.
The call to strlen
:
1 2 3 |
|
It's clear the first argument is being used as the argument to strlen
. Return values are normally passed using the EAX register.
Here is the call to strncpy
:
1 2 3 4 5 6 7 |
|
You can see that 1 is added to the return value and it is put on the stack as the third argument to strncpy
.
The pointer to the function argument is then put on the stack as the second argument (on line 3 and 4).
Lastly the address of the local variable is then put on the stack as the first argument (on lines 5 and 6).
Here we can see that the local variable is 0x20c bytes (524 bytes) away from EBP, meaning that we'll need to write 528 bytes until we overwrite EIP using an overflow here, 4 bytes are added for the old EBP saved during the prologue.
Looking at the prototype for strncpy
(using man strncpy
), we can see that the first argument is the destination, second the source and third the maximum characters to copy:
1 |
|
Knowing all of this, its easy to see that there is in fact a buffer overflow here because the developer has used the length of the input buffer to bound the copy function. We can even see how many bytes we have until we overwrite EIP.
Conclusion
While its technically possible to just fuzz all of the application inputs, the more complex the application gets the more infeasible it becomes.
This is also true for reverse engineering every section of an application so its important that you know how to focus on the important parts of the application.
Ultimately reverse engineering is much more powerful than fuzzing but both should be used in combination to increase efficiency.
Happy Hacking :-)