Basic Binary Auditing

Posted on Tue 01 July 2014 in Reverse Engineering

Before I go into some of the protections that are commonly in place, I thought it would be best to show how to detect these 2 basic vulnerabilities using reverse engineering (as opposed to randomly fuzzing inputs as we did in parts 1, 2 and 3).

Reverse engineering (reversing) is an extremely powerful tool in the hackers arsenal and when there is no source code for the application that you are targeting nothing is better.

Assembly is the language of reversing and a debugger is the most important tool.

Assembly is essentially the language of the processor, the actual "machine code" that people think of what the computer deals with (whether viewed as binary or hex) is just a different representation of assembly language, so this is the lowest level programming language possible to those outside of processor firmware development.

A debugger is an application that allows you to view an applications virtual memory segment as the application itself views it, as well as change the values in sections of memory or CPU registers at run time.

Another important feature of a debugger is the ability to set breakpoints so you can force the application to stop execution at a specific part of the application and view values or step through the application instruction by instruction.

The App

We will use the same basic application we used in parts 1 and 2:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define PASS "topsecretpassword"

#define SFILE "secret.txt"

int checkpass(char *p);
void printfile();

int main(int argc, char **argv)
{
    int r;

    if (argc < 2) {
        printf("Usage: ");
        printf(argv[0]);
        printf(" <password>\n");
        exit(1);
    }
    r = checkpass(argv[1]);
    if (r != 0) {
        printf("Wrong password: ");
        printf(argv[1]);
        printf("\n");
        exit(1);
    }
    printfile();
}

int checkpass(char *a)
{
    char p[512];
    int r;
    strncpy(p, a, strlen(a)+1);
    r = strcmp(p, PASS);
    return r;
}

void printfile()
{
    FILE *f;
    int c;
    f = fopen(SFILE, "r");
    if (f) {
        while ((c = getc(f)) != EOF)
            putchar(c);
        fclose(f);
    } else {
        printf("Error opening file: " SFILE "\n");
        exit(1);
    }
}

This time we will not exploit this application (we've done that already), instead we'll just use the debugger it figure out that these vulnerabilities exist.

Setting Up The Environment

This is the same as in part 1 and 2 so please refer to the Setting Up The Environment section of 1 of those.

Looking For The Juicy Bits

First we'll test the application as usual:

1
2
3
4
5
6
testuser@dev:~$ ./app
Usage: ./app <password>
testuser@dev:~$ ./app test
Wrong password: test
testuser@dev:~$ echo $?
1

Nothing unusual there but we now know that the application takes 1 argument. If we open this using gdb we can have a closer look at it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
testuser@dev:~$ gdb -q ./app
Reading symbols from /home/testuser/app...(no debugging symbols found)...done.
(gdb) set disassembly-flavor intel
(gdb) info functions
All defined functions:

Non-debugging symbols:
0x0804842e  _init
0x08048460  strcmp
0x08048460  strcmp@plt
0x08048470  printf
0x08048470  printf@plt
0x08048480  fclose
0x08048480  fclose@plt
0x08048490  _IO_getc
0x08048490  _IO_getc@plt
0x080484a0  puts
0x080484a0  puts@plt
0x080484b0  __gmon_start__
0x080484b0  __gmon_start__@plt
0x080484c0  exit
0x080484c0  exit@plt
0x080484d0  strlen
0x080484d0  strlen@plt
0x080484e0  __libc_start_main
0x080484e0  __libc_start_main@plt
0x080484f0  fopen
0x080484f0  fopen@plt
0x08048500  putchar
0x08048500  putchar@plt
0x08048510  strncpy
0x08048510  strncpy@plt
0x08048520  _start
0x08048550  deregister_tm_clones
0x08048580  register_tm_clones
0x080485c0  __do_global_dtors_aux
0x080485e0  frame_dummy
0x0804860c  main
0x080486a2  checkpass
0x080486f0  printfile
0x08048760  __libc_csu_fini
0x08048770  __libc_csu_init
0x080487ca  __i686.get_pc_thunk.bx
0x080487d0  _fini

Here we can tell that the application was written in C because it includes __libc_start_main on lines 25 and 26. This means we have a main function which is the start of our application (shown on line 38).

There are a couple of other functions of interest here but let's leave them for a bit and look at the main function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
(gdb) disassemble main
Dump of assembler code for function main:
   0x0804860c <+0>: push   ebp
   0x0804860d <+1>: mov    ebp,esp
   0x0804860f <+3>: and    esp,0xfffffff0
   0x08048612 <+6>: sub    esp,0x20
   0x08048615 <+9>: cmp    DWORD PTR [ebp+0x8],0x1
   0x08048619 <+13>:    jg     0x804864c <main+64>
   0x0804861b <+15>:    mov    DWORD PTR [esp],0x80487f0
   0x08048622 <+22>:    call   0x8048470 <printf@plt>
   0x08048627 <+27>:    mov    eax,DWORD PTR [ebp+0xc]
   0x0804862a <+30>:    mov    eax,DWORD PTR [eax]
   0x0804862c <+32>:    mov    DWORD PTR [esp],eax
   0x0804862f <+35>:    call   0x8048470 <printf@plt>
   0x08048634 <+40>:    mov    DWORD PTR [esp],0x80487f8
   0x0804863b <+47>:    call   0x80484a0 <puts@plt>
   0x08048640 <+52>:    mov    DWORD PTR [esp],0x1
   0x08048647 <+59>:    call   0x80484c0 <exit@plt>
   0x0804864c <+64>:    mov    eax,DWORD PTR [ebp+0xc]
   0x0804864f <+67>:    add    eax,0x4
   0x08048652 <+70>:    mov    eax,DWORD PTR [eax]
   0x08048654 <+72>:    mov    DWORD PTR [esp],eax
   0x08048657 <+75>:    call   0x80486a2 <checkpass>
   0x0804865c <+80>:    mov    DWORD PTR [esp+0x1c],eax
   0x08048660 <+84>:    cmp    DWORD PTR [esp+0x1c],0x0
   0x08048665 <+89>:    je     0x804869b <main+143>
   0x08048667 <+91>:    mov    DWORD PTR [esp],0x8048804
   0x0804866e <+98>:    call   0x8048470 <printf@plt>
   0x08048673 <+103>:   mov    eax,DWORD PTR [ebp+0xc]
   0x08048676 <+106>:   add    eax,0x4
   0x08048679 <+109>:   mov    eax,DWORD PTR [eax]
   0x0804867b <+111>:   mov    DWORD PTR [esp],eax
   0x0804867e <+114>:   call   0x8048470 <printf@plt>
   0x08048683 <+119>:   mov    DWORD PTR [esp],0xa
   0x0804868a <+126>:   call   0x8048500 <putchar@plt>
   0x0804868f <+131>:   mov    DWORD PTR [esp],0x1
   0x08048696 <+138>:   call   0x80484c0 <exit@plt>
   0x0804869b <+143>:   call   0x80486f0 <printfile>
   0x080486a0 <+148>:   leave  
   0x080486a1 <+149>:   ret    
End of assembler dump.

The first 4 instructions are the function prologue (lines 3, 4, 5 and 6). Here the stack frame is set up.

The last 2 instructions are the function epilogue (lines 39 and 40). Here the leave instruction preforms the inverse of what the prologue did.

Looking at the prologue and epilogue we can see that the calling convention is probably cdecl.

I will not go into calling conventions much here, because it isn't terribly relevant although its important to know what they are and the differences, but a calling convention basically defines how a function is called.

Back on topic, initially when looking for a vulnerability we should check some of the known vulnerable functions commonly used by developers. The main 1's are the printf family of functions and the string copying/moving functions.

Looking back at our list of functions, a couple of interest are being used. Mainly printf and strncpy. In the main function though only printf out of those 2 is being used. Let's examine them a little closer.

The first, on line 10, is set up on line 9 with an argument:

1
2
   0x0804861b <+15>:    mov    DWORD PTR [esp],0x80487f0
   0x08048622 <+22>:    call   0x8048470 <printf@plt>

What the first instruction is doing here is moving the address 0x80487f0 into the address pointed to by the ESP register. These 2 lines relate to line 17 in our source code above.

The ESP register points to the top of the stack and in the cdecl calling convension, before the actual call to the function, its arguments are pushed onto the stack in reverse order. As there is only 1 argument to this call only 1 is put on the stack.

To be honest, this call doesn't look like its going to be of interest as the argument is a static address and it points to the text segment of memory which isn't writable, but we can check the value of this just to make sure:

1
2
(gdb) x/s 0x80487f0
0x80487f0:   "Usage: "

So it looks to be part of an error message. The next call to printf looks more interesting but first we need to understand how a stack frame is arranged in an application like this.

Stack Frames

Below is the top of an example stack frame which is getting ready for a function call:

Here we are unable to see the base pointer (EBP) but we can see the stack pointer (ESP) which always points to the top of the stack.

Putting arguments on the stack can be done in a number of ways. Firstly it can be done using the push instruction as follows:

1
2
3
push eax
push 0x80487f0
push [ebp+c]

Here the value is the EAX register is being pushed onto the stack as the third argument (or "ARG 3" in our diagram), then the static value 0x80487f0 as the second argument and finally EBP+c (or EBP+12, which is usually the second argument to the current function) as the first argument.

The push instruction automatically adjusts the value of ESP accordingly but it can also be done manually:

1
2
3
4
sub esp, 0xc
mov [esp+8], eax
mov [esp+4], 0x80487f0
mov [esp], [ebp+c]

This set of instructions are functionally the same as the previous. These are followed by a call instruction and after the call instruction our stack looks like this:

The call instruction autmatically pushes the memory address of the next instruction onto the stack. This is done so that when a function returns the application knows where to start executing instructions.

Inside the function that we have just called we start executing that functions prologue. First there is a push ebp instruction which does this to the stack:

After that it executes mov ebp, esp:

Lastly any space for needed for local variables is subtracted from ESP (sub esp, 0x8), so our stack ends up like this:

EBP always points to the start of the current functions stack frame and ESP to the top of the stack so if we call another function inside the current function the same process would happen.

The functions epilogue does the opposite, in the application we are debugging it just have to leave instruction. The leave instruction automates the cleanup of the stack frame.

In our example stack, the leave function would be equivalent to:

1
2
add esp, 0x8
pop ebp

This would bring our stack frame back to this:

And then the final ret instruction would remove the RET ADDR from the stack setting everything back to how it was before the function call, ret essentially does pop eip.

Juicy Bits Continued

Now that we understand how the stack works we can have a look at that second call to printf. The first argument to printf is always the format string so when looking for a format string vulnerability we are trying to figure out if we can control the first argument.

The relevant lines that setup and call printf are:

1
2
3
4
   0x08048627 <+27>:    mov    eax,DWORD PTR [ebp+0xc]
   0x0804862a <+30>:    mov    eax,DWORD PTR [eax]
   0x0804862c <+32>:    mov    DWORD PTR [esp],eax
   0x0804862f <+35>:    call   0x8048470 <printf@plt>

These 4 lines of code is actually line 18 in the source of the application. Line 1 moves the second argument (ebp+0xc) (the second argument is always +C or +12 because EBP points to the old EBP, +4 points to the return address and +8 points to the first argument) into EAX.

In C the second argument to the main function is a list of pointers to the actual application arguments.

Because this argument is an array of pointers, line 2 moves the first pointer in this array into EAX (this normally points to the path of the application itself).

This pointer is moved to the address pointed to by ESP (the top of the stack) and finally printf is called. This shows that only 1 argument was given and that argument is the application path.

We can check this using gdb but first there was a conditional statement which determined if this code got executed:

1
2
   0x08048615 <+9>: cmp    DWORD PTR [ebp+0x8],0x1
   0x08048619 <+13>:    jg     0x804864c <main+64>

This is the if statement on line 16 of the source code.

Line 1 compares the first argument ebp+0x8, with 1 and jumps to 0x804864c if the first argument is greater than 1. As you can see the assembly condition is the opposite to what is in the source code, this is often the case.

In C the first argument to the main function is the number of arguments give to the application on the command line so to enter the section of code we want to analyse we just need to give the application 1 argument (the name of the application is considered the first argument so there is always at least 1).

Integer Overflow

The jg instruction means that the numbers that are being compared are signed (it would be ja if they were unsigned) and because there is no bound checking done on ebp+0x8, it is vulnerable to an integer overflow:

I wanted to demostrate this as soon as I realised but because it is an integer I need to send at least 2147483647 arguments, I couldn't do this on my test machine because there just isn't enough RAM.

So in the name of science, I rewrote the application so that the argc argument (or the number of arguments passed to the main function) is a char instead, here is my new application:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define PASS "topsecretpassword"

#define SFILE "secret.txt"

int checkpass(char *p);
void printfile();

int main(char argc, char **argv)
{
    int r;

    if (argc < 2) {
        printf("Usage: ");
        printf(argv[0]);
        printf(" <password>\n");
        exit(1);
    }
    r = checkpass(argv[1]);
    if (r != 0) {
        printf("Wrong password: ");
        printf(argv[1]);
        printf("\n");
        exit(1);
    }
    printfile();
}

int checkpass(char *a)
{
    char p[512];
    int r;
    strncpy(p, a, strlen(a)+1);
    r = strcmp(p, PASS);
    return r;
}

void printfile()
{
    FILE *f;
    int c;
    f = fopen(SFILE, "r");
    if (f) {
        while ((c = getc(f)) != EOF)
            putchar(c);
        fclose(f);
    } else {
        printf("Error opening file: " SFILE "\n");
        exit(1);
    }
}

Here is the quick demonstration:

1
2
3
4
5
root@dev:/home/testuser# gcc -z execstack -fno-stack-protector -o app-intof app-intof.c 
root@dev:/home/testuser# ./app-intof $(python -c 'print "A "*126')
Wrong password: A
root@dev:/home/testuser# ./app-intof $(python -c 'print "A "*127')
Usage: ./app-intof <password>

What is happening here is that the argument argc is being interpreted as a signed char and the max value for this type of variable is 127:

1
2
3
root@dev:/home/testuser# grep SCHAR_MAX /usr/include/limits.h 
#  define SCHAR_MAX 127
#   define CHAR_MAX SCHAR_MAX

As the application is the first argument, we can have another 126 argument before the variable overflows and becomes -128, which is obviously smaller than 2.

Back To The Juicy Bits

So now we know how to get to the code we want to analyse, which is:

1
2
3
4
   0x08048627 <+27>:    mov    eax,DWORD PTR [ebp+0xc]
   0x0804862a <+30>:    mov    eax,DWORD PTR [eax]
   0x0804862c <+32>:    mov    DWORD PTR [esp],eax
   0x0804862f <+35>:    call   0x8048470 <printf@plt>

Let's set a breakpoint on line 1 here (or 0x08048627) and run the application without any arguments.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
(gdb) break *0x08048627
Breakpoint 1 at 0x8048627
(gdb) r
Starting program: /home/testuser/app 

Breakpoint 1, 0x08048627 in main ()
(gdb) disassemble $eip,+10
Dump of assembler code from 0x8048627 to 0x8048631:
=> 0x08048627 <main+27>:    mov    eax,DWORD PTR [ebp+0xc]
   0x0804862a <main+30>:    mov    eax,DWORD PTR [eax]
   0x0804862c <main+32>:    mov    DWORD PTR [esp],eax
   0x0804862f <main+35>:    call   0x8048470 <printf@plt>
End of assembler dump.
(gdb) x/xw $ebp+0xc
0xbfc674f4: 0xbfc67594
(gdb) x/xw 0xbfc67594
0xbfc67594: 0xbfc6795f
(gdb) x/s 0xbfc6795f
0xbfc6795f:  "/home/testuser/app"

This shows that our assumptions were correct and that there is likely a format string vulnerability here which we can exploit by chaning the name of the application (or creating a symlink as in part 2.

We also have a very similar set of printf calls towards the end of the application:

1
2
3
4
5
6
7
   0x08048667 <+91>:    mov    DWORD PTR [esp],0x8048804
   0x0804866e <+98>:    call   0x8048470 <printf@plt>
   0x08048673 <+103>:   mov    eax,DWORD PTR [ebp+0xc]
   0x08048676 <+106>:   add    eax,0x4
   0x08048679 <+109>:   mov    eax,DWORD PTR [eax]
   0x0804867b <+111>:   mov    DWORD PTR [esp],eax
   0x0804867e <+114>:   call   0x8048470 <printf@plt>

We are interested in the second printf here but to figure out how to get to it we need to have a look at the memory at 0x8048804 which is printed just before.

1
2
(gdb) x/s 0x8048804
0x8048804:   "Wrong password: "

So we get to this section of code when we give a wrong password. The call to the printf in question is the same as previous except 4 is added to EAX before the pointer is followed. This suggests the second argument is being printed (also the previous printf supports our theory), but let's check.

Let's set a breakpoint and examine the memory again:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
(gdb) info breakpoints
Num     Type           Disp Enb Address    What
1       breakpoint     keep y   0x08048627 <main+27>
(gdb) delete 1
(gdb) break *0x0804867b
Breakpoint 2 at 0x804867b
(gdb) r ABC
Starting program: /home/testuser/app ABC

Breakpoint 2, 0x0804867b in main ()
(gdb) x/s $eax
0xbffff96d:  "ABC"

This is the second format string vulnerability.

Buffer Overflow

So far we have found an integer overflow and 2 format string vulnerabilities.

Next we should look over the checkpass function which is called on line 23 of the disassembly above. Here is the relevant instructions related to the call to checkpass:

1
2
3
4
5
   0x0804864c <+64>:    mov    eax,DWORD PTR [ebp+0xc]
   0x0804864f <+67>:    add    eax,0x4
   0x08048652 <+70>:    mov    eax,DWORD PTR [eax]
   0x08048654 <+72>:    mov    DWORD PTR [esp],eax
   0x08048657 <+75>:    call   0x80486a2 <checkpass>

We've already seen a set of instructions that were exactly the same as this, the second call to printf, so this function takes 1 argument, the second argument to the application.

Here is the disassembly of checkpass:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
(gdb) disassemble checkpass
Dump of assembler code for function checkpass:
   0x080486a2 <+0>: push   ebp
   0x080486a3 <+1>: mov    ebp,esp
   0x080486a5 <+3>: sub    esp,0x228
   0x080486ab <+9>: mov    eax,DWORD PTR [ebp+0x8]
   0x080486ae <+12>:    mov    DWORD PTR [esp],eax
   0x080486b1 <+15>:    call   0x80484d0 <strlen@plt>
   0x080486b6 <+20>:    add    eax,0x1
   0x080486b9 <+23>:    mov    DWORD PTR [esp+0x8],eax
   0x080486bd <+27>:    mov    eax,DWORD PTR [ebp+0x8]
   0x080486c0 <+30>:    mov    DWORD PTR [esp+0x4],eax
   0x080486c4 <+34>:    lea    eax,[ebp-0x20c]
   0x080486ca <+40>:    mov    DWORD PTR [esp],eax
   0x080486cd <+43>:    call   0x8048510 <strncpy@plt>
   0x080486d2 <+48>:    mov    DWORD PTR [esp+0x4],0x8048815
   0x080486da <+56>:    lea    eax,[ebp-0x20c]
   0x080486e0 <+62>:    mov    DWORD PTR [esp],eax
   0x080486e3 <+65>:    call   0x8048460 <strcmp@plt>
   0x080486e8 <+70>:    mov    DWORD PTR [ebp-0xc],eax
   0x080486eb <+73>:    mov    eax,DWORD PTR [ebp-0xc]
   0x080486ee <+76>:    leave  
   0x080486ef <+77>:    ret    
End of assembler dump.

In the prologue, 0x228 bytes (or 552 bytes) are reserved for local variables and function call arguments.

The interesting call here is the call to strncpy but we need to examine the call to strlen first because it looks like output is the third argument to strncpy.

The call to strlen:

1
2
3
   0x080486ab <+9>: mov    eax,DWORD PTR [ebp+0x8]
   0x080486ae <+12>:    mov    DWORD PTR [esp],eax
   0x080486b1 <+15>:    call   0x80484d0 <strlen@plt>

It's clear the first argument is being used as the argument to strlen. Return values are normally passed using the EAX register.

Here is the call to strncpy:

1
2
3
4
5
6
7
   0x080486b6 <+20>:    add    eax,0x1
   0x080486b9 <+23>:    mov    DWORD PTR [esp+0x8],eax
   0x080486bd <+27>:    mov    eax,DWORD PTR [ebp+0x8]
   0x080486c0 <+30>:    mov    DWORD PTR [esp+0x4],eax
   0x080486c4 <+34>:    lea    eax,[ebp-0x20c]
   0x080486ca <+40>:    mov    DWORD PTR [esp],eax
   0x080486cd <+43>:    call   0x8048510 <strncpy@plt>

You can see that 1 is added to the return value and it is put on the stack as the third argument to strncpy.

The pointer to the function argument is then put on the stack as the second argument (on line 3 and 4).

Lastly the address of the local variable is then put on the stack as the first argument (on lines 5 and 6).

Here we can see that the local variable is 0x20c bytes (524 bytes) away from EBP, meaning that we'll need to write 528 bytes until we overwrite EIP using an overflow here, 4 bytes are added for the old EBP saved during the prologue.

Looking at the prototype for strncpy (using man strncpy), we can see that the first argument is the destination, second the source and third the maximum characters to copy:

1
       char *strncpy(char *dest, const char *src, size_t n);

Knowing all of this, its easy to see that there is in fact a buffer overflow here because the developer has used the length of the input buffer to bound the copy function. We can even see how many bytes we have until we overwrite EIP.

Conclusion

While its technically possible to just fuzz all of the application inputs, the more complex the application gets the more infeasible it becomes.

This is also true for reverse engineering every section of an application so its important that you know how to focus on the important parts of the application.

Ultimately reverse engineering is much more powerful than fuzzing but both should be used in combination to increase efficiency.

Happy Hacking :-)