Disassembly Required

If you really want to hack software, you are going to face a time when you have to take apart someone’s machine code. If you aren’t very organized, it might even be your own — source code does get lost. If you want to impress everyone, you’ll just read through the hex code (well, the really tough old birds will read it in binary). That was hard to do even when CPUs only had a handful of instructions.

A more practical approach is to use a tool called a disassembler. This is nothing more than a program that converts numeric machine code into symbolic instructions. The devil, of course, is in the details. Real programs are messy. The disassembler can’t always figure out the difference between code and data, for example. The transition points between data and code can also be tricky.

When Not to Use

If you are coding your own program in assembly,  a disassembler isn’t usually necessary. The disassembly can’t recover things like variable names, some function names, and — of course — comments. If you use a high-level language and you want to check your compiler output, you can easily have the compiler provide assembly language output (see below).

The real value of a disassembler is when you don’t have the source code. But it isn’t easy, especially for anything nontrivial. Be prepared to do a lot of detective work in most cases.

An Online Tool

Exactly what tool you use will depend on what CPU architecture you want to work with. However, there is a very interesting online tool that can handle a lot of different architectures. In the old days, a disassembler just generated a lot of output in a file or a print out. But this online version does a lot of smart analysis and provides hyperlinked cross-references. Even better, you can interactively give it some hints about the subject code and it will improve the results. You can even collaborate with others, which would be really handy when working on a large project.

How’s It Work?

Just to get a taste of how the tool works, have a look at this simple program:


#include
#include

void do_it(void)
{
printf("Howdy Hackaday!\n");
}

int main(int argc, char *argv[])
{
char *p=malloc(100);
do_it();
free(p);
return 0;
}

I compiled this to an executable using GCC under Cygwin. Of course, this is cheating because we already know too much about the code for it to be a fair test. In addition, the disassembler can pull information out of the executable file that helps it do things like segregate code and data. Don’t forget, if we really wanted to see what the compiler was generating, we could just ask it.  If you want a more realistic example, the web site has a menu where you can pick several examples, but they are much more complex.

On the Web

Once you have the executable in hand, you can upload it to the disassembler using the File menu (use the Upload item, of course). Since the PE file format Windows uses has some information in it, the disassembler knows about some symbols and segments. The left side shows a kind of button bar that lets you select different items in the left-hand navigation pane. The top button shows symbols and if you click on main and make sure the right hand top selector is set to disassembly, you’ll see your main function (see below).

The other left hand panes let you pick strings, do searches, or identify data items. Along the top to the right you can pick to see a call graph, a hex dump, the file sections (populated because this was a structured file), and information about the file itself.

Everything that makes sense is linked. If you click on the call to do_it, for example, the view will jump to that part of the code. That doesn’t always seem to work on data though. Here’s the do_it function:

If you click on puts, you’ll jump to the code, but look at the lea instruction ahead of it that loads the string to print. No link.

You can skim through the strings or do a search. However, you can also note the address and the section (.rdata). Clicking on the sections display lets you jump to .rdata directly. From there you can find the address quickly and see the string you expect.

By right clicking on the screen (or using keyboard commands) you can add comments, define variables and functions, or tell the tool what area is code vs data. In this case, some of that is done for us, but if you spend time you can document the disassembly very nicely. For example, here’s one of the samples provided:

The arrows showing the jumps is a nice touch.

Going Forward

You could do worse than to take the tutorial on the Help menu. The tool claims to support 60 CPUs, but to find the list you need to open the configuration menu for the “live view” where you can just type in hex codes or load a binary file. They do have quite a list including x86, ARM, AVR, VAX, System 390, MIPS, PPC, and even the Z80. I was sorry not to see the 1802, but I can still disassemble its code by myself.

The next time you want to peek inside some binary code, this web site is a useful tool. Just the fact that it has so many CPUs is worth something. I’m not likely to have a VAX disassembler handy, much less one with so many analysis and collaboration tools.


Filed under: software hacks

from Hackaday http://ift.tt/2i9F8oh
via IFTTT