Reversing A Simple Obfuscated Application
Posted on Tue 30 September 2014 in Reverse Engineering
I created this application as a little challenge and some practice at manually obfuscating an application at the assembly level.
I wrote the application in IA32 assembly and then manually obfuscated it using a couple of different methods.
Here I will show how to solve the challenge in 2 different ways.
Lastly I will show how the obfuscation could have been done better so that it would have been a lot more difficult to solve this using a simple static disassembly.
The Challenge
We are given the static disassembly below of a 32bit linux application which says whether or not the author is going to some event:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
|
The challenge is to figure out whether or not the author is going based solely on this static disassembly.
Method 1: The Easy Way
In this method we'll rebuild the application and simply run it to get the answer.
The first step is to copy the instruction into a new nasm file, if we do that we get:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
|
When we try to assemble this we get:
1 2 3 4 5 6 7 8 9 10 11 |
|
Looking at the lines that have caused the errors:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
You can see that its all lines that have [SIZE] PTR, we will remove any DWORD PTR and BYTE PTR and for the lines that had BYTE put that before the first operand, so they end up like this:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Now we try to assemble it again:
1 2 3 |
|
So there is still a problem with 2 lines, it looks as if these instructions are invalid, this could possibly be data, what we shall do is replace these 2 instructions with the raw opcodes from the disassembly, so our application ends up like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
|
If we assemble this and test it out:
1 2 3 4 |
|
So it assembles and links now but we get a segmentation fault. Let's investigate why:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
|
So it looks as if we've landed in the middle of an instruction.
Near the start of the application (on line 16 above), it jumps it a certain memory address which is the middle of an instruction. The resulting instruction, as seen on line 9, tries to move a value to the address pointed to by the EAX register.
On line 11 you can see that the value in EAX is 0, which is what caused the segfault, 0 is an invalid memory address.
The reason for this is because the original application jumped to static memory addresses, in the application the memory addresses are different so this will need to be fixed for the application to work.
What we need to do is replace any fixed memory addresses with labels. We can find where in the application the memory addresses are meant to go by looking at the original disassembly.
Once we have done this the resulting application is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
|
There are a couple of values here (on lines 55, 59 and 60) which look like memory addresses but they aren't valid memory addresses in the original disassembly so they could just be normal values or, as its in the same section as the invalid instructions, part of some data.
With this done we can test this application:
1 2 3 4 |
|
So we have our answer, the author is not going :-)
Method 2: The Hard Way
Here we will attempt to understand the application and figure out what the application does without building and running it.
Although you would have needed some understanding of IA32 to do the previous method, obviously you will need a better understanding of it to do this.
The first step would be what we have already done. Well, there would be no need for the ability to assemble the application, or even have a valid nasm file but we would need to replace any known addresses with labels because this will make the disassembly significantly easier to read.
For this will we just use the nasm file above (going-or-not-obf-test4.nasm), just because it will make this post a little shorter :-)
What we do now is follow the control flow of the application and simplfy it as we go by replacing more complex sequencies with less complex 1's or even only 1 instruction in some cases and removing any dead instructions (instructions which have no effect on the application at all) altogether.
This process is manual deobfuscation and can be applied to small sections of applications instead of just full applications like the last method.
Let's start with the first instruction mov edx,eax
, this looks like it is a junk line (or dead code) mainly because this is the first instruction of the application, if this was just a code segment instead of a full application this code would be more likely to be meaningful.
The second instruction mov edi,0x25
, is also very difficult to quickly determine its usefulness to the application, what we need to do here is take note of the value inside the EDI register.
The next 4 instructions do something interesting, if you follow the control flow of the application and line the instructions sequentially you get:
1 2 3 4 5 6 |
|
So the 3rd instruction (on line 5) is not related here, and is similar to the previous mov instruction, just make a note that bl contains 0x32.
The other 3 instructions are using a technique used in some shellcode to get the an address in memory when the code might start at a different point in memory.
Its called the JMP-CALL-POP technique and gets the address of the address immediately following the call instruction into the register used in the pop instruction.
Knowing this we can replace the entire code above with:
1 2 |
|
Let's look at the next 4 instructions:
1 2 3 4 5 |
|
So here, on line 5, we use the EDI register, we zero EAX, set it to 0xc9 (201), adds it to EDI (0x25 or 37) and stores the result in EAX, this series of instructions are what is called constant unfolding where a series of instructions are done to work out the actual required value instead of just assigning the value to begin with.
We could use the opposite, a common compiler optimization constant folding, to decrease the complexity of this code, so these 4 instructions could be replaced by:
1 |
|
The next 5 instructions are:
1 2 3 4 5 6 |
|
This set of instructions just sets EBP and ECX to 0 and EDX to 1. Now its obvious that the instrction at the beginning was dead code because EDX hasn't been used at all and now it has been overwritten.
We can rewrite the application so far in a much more simplfied way:
1 2 3 4 5 6 7 8 |
|
As you can see, this is much easier to read than the previous code that was jumping about all over the place.
I kept the assignment to EDI (on line 2) there because, although I've removed the need for it in assigning the value of EAX (on line 5), it still might be used in the future.
Also, the assignment to bl (on line 3) still might not be needed but we shall keep it there just incase.
Let's quickly review the state of the registers:
1 2 3 4 5 6 7 |
|
The register state and code rewrite should be constantly updated as you go through the code.
The next instruction is lea ebp,[esp+ecx*1]
, which is the same as EBP = ESP + ECX * 1 or EBP = ESP + 0 * 1 or EBP = ESP.
After this instruction we enter the following loop:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
So this first moves a byte at ESI + EDX * 1, which is basically just ESI + EDX, into the cl register. We know at this point the value inside EDX is 1 and that ESI points to some address in the middle of the application, so our loop will start getting data 1 byte after that address.
This byte is them compared with al, which we know is 0xee, and if they are the same execution will jump to Six.
Providing the jump to Six isn't taken, the byte is moved to the top of the stack (which ESP points to), ESP is adjusted accordingly, EDX is incremented by 1 and the loop is rerun.
The mov instruction on line 8 doesn't do anything, dead code which can be removed.
Now we can find all of the data that is being worked on here:
1 |
|
The starting address of this data is 80480bc in the original disassembly, which is 1 byte after the address of the instruction following the call instruction in the jmp-call-pop routine at the start of the application.
It ends with the ee value because this is the point at which the jump to Six is taken.
Also, notice that nowhere here is a 0x0 (or 00) byte, this means that the jg (jump if greater than) instruction on line 10 will always be taken, every byte there is above 0 so the 2 instructions after are dead code and can be removed from the analysis and the jg can be replaced with a jmp.
It is clear that this data, which is sitting in the middle of the application, is being put on the stack for some reason, the lea instruction right before the loop just saved the address pointing to the beginning of the new location of the data on the stack into the EBP register.
We could try to figure out how meaningful this data is now but it would be best to have a look to see what the application does with it first.
Now let's take the jump to Six:
1 2 3 |
|
First it loads the address of the data on the stack, currently in EBP, into EDX.
cl, which is currently 0xee, is put onto the stack and ESP is adjusted accordingly.
We then enter into the 2nd loop:
1 2 3 4 5 6 7 8 9 10 |
|
This is a very unusual loop, you will only see this type of code when reversing obfuscated code.
It started by pushing its own address to the stack, this allows the ret on line 10 to return to Seven.
The test instruction on line 3 is dead code because all test does is set EFLAGS, but they are immediately overwritten by the cmp instruction that follows.
Lines 4 and 5 again test the value of a byte in the data, this time pointed to by EDX, against 0xee and jump's to Eight when its reached.
The next 2 instructions, lines 6 and 7, move the value from EDI into EBX and add's 0x1f to it. We already know that 0x25 is currently in EDI, so EBX = 0x25 + 0x1f or EBX = 0x44.
The byte in the data is then xor'd with bl (or 0x44) and EDX is decremented.
Clearly this is a simply xor encoding of the data, I wrote a python script a while ago to xor a number of bytes with 1 byte and output both the resulting bytes as ascii characters, and the same but with the characters reversed (due to little endian architectures), here is the script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
This script is very simple, 1 thing to bare in mind though is that, because we are dealing with data outside of the printable ascii range (0x20 - 0x7e), we can just type the characters on the command line.
So we run the script like this:
1 2 3 4 5 6 7 8 9 10 |
|
So now we know what that data is in the middle of the application, clearly it was done like this to confuse but we have reversed enough of the application now to figure out what this is.
With this is mind, we no longer need those 2 loops, or any of the code aimed at moving and decoding the data, we can simply put it in as is.
Let's review our rewritten application:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
I have obviously removed most of the code because it simply isn't needed now, I've made sure that EBP still points to the end of the data and EDX to the beginning just incase there is some reason for this, but most of the code so far was devoted to decoding the data which is no longer needed.
Now for the registers:
1 2 3 4 5 6 7 |
|
The next 5 instructions show another weird use of call and jmp:
1 2 3 4 5 6 7 8 |
|
Firstly there is an assignment to bh (the second 8 bits of the EBX register) but then, on line 5, the whole EBX register is cleared using xor so line 2 is dead code.
The call instruction on line 3 and the jmp instruction on line 8 seem to be used just to confuse the reverser, there is no reason for this, but bare in mind that this would have stuck 4 bytes on the stack, next to the decoded data, which hasn't been cleaned up (this could effect the application in some way).
The rest of this code just zero's out EBX, ECX and EDX.
The next 8 instructions are very interesting:
1 2 3 4 5 6 7 8 |
|
Lines 1 and 3 fix the value of ESP after the call, jmp sequence earlier.
The rest xor's 0x5 with the byte at One and compares the result with 0x4. We can test this out in python, we know the byte at One is 0xed, so:
1 2 3 4 5 6 7 8 |
|
This isn't equal to 0x4 so the jump on line 8 will not be taken.
The next instruction lea ecx,[ebp-0xf]
loads EBP - 16 into ECX, ECX will now point to somewhere in the middle of the data (it will actually point 16 characters from the end, which is the start of the string I am not going!).
We can probably guess at what this is going to do from here but let's finish the analysis.
0x10 is then loaded into EDX and then 2 unconditional jumps are taken:
1 2 3 |
|
The only reason for these jumps is to confuse the reverser, we can just ignore them.
The next 7 lines is a very important part of the application:
1 2 3 4 5 6 7 |
|
So lines 1-4 set EAX to 0x4, lines 5 and 6 set EBX to 0x1 and then the interrupt *0x80 is initiated.
Interrupt 0x80 is a special interrupt which initiates a system call, the system call number has to be stored in EAX, which is 0x4 at this moment in time.
We can figure out what system call this is:
1 2 |
|
This makes sense, the prototype for this syscall is:
1 |
|
Each of the arguments go in EBX, ECX and EDX. So to write to stdout, EBX should be 1 which it is.
ECX should point to the string, which it currently points to I am not going!, and EDX should contain the number of characters to print which it does.
The last 4 instructions just run another syscall, exit, you can check this yourself if you wish:
1 2 3 4 |
|
Obviously we can now wrtie this in a much simpler way, but there is no need, we know exactly what this application does and how it does it.
Improving Obfuscation
As I mentioned earlier, the obfuscation could have been done better to make the reversing process harder. I actually purposefully made the obfuscation weaker than I could have to make the challenge easier.
Inserting more junk data inbetween some instructions could make the static disassembly significantly more difficult to read and understand.
I have to actually add a byte (0x89) at the end of the data section because the next few instructions were being obfuscated in a way that made them unreadable:
1 2 3 4 5 6 |
|
The disassembly shown here has had the last byte of the data removed and is the last line of the data section; and a few lines after.
As you can see the byte following the data section has been moved to the data section and as a result the next few instructions have been incorrectly disassembled.
This method can be implemented throughout the whole application, making most of the instructions disassemble incorrectly.
Constant unfolding could be improved here, for instance:
1 2 3 4 5 6 |
|
Could be rewritten to:
1 2 3 4 5 6 7 8 9 |
|
They both do the same thing but the second is a little harder to read, you could obviously keep extending this by implementing more and more complex algorithms to work out your required value.
This can also be applied to references to memory addresses, for instance, if you want to jump to a certain memory address, do some maths to work out the memory address before jumping there.
More advanced instructions could be used like imul, idiv, cmpsb, rol, stosb, rep, movsx, fadd, fcom... The list goes on...
The MMX and other unusual registers could have been taken advantage of.
Also, the key to decrypt the data could have been a command line argument or somehow retreived from outside of the application, this way it would have been extremely difficult decode the data.
Conclusion
There are sometimes easier ways to get a result other than reversing the whole application, maybe just understanding a few bits might be enough.
Although there are ways to make the reversers job more difficult, its never possible to make it impossible to reverse, providing the reverser is able to run the application (if the CPU can see the instructions, then so can the reverser).
A good knowledge of assembly is needed to do any type of indepth reverse engineering.
Further Reading
Reversing: Secrets of Reverse Engineering by Eldad Eilam