Monday, November 7, 2011

How does gdb work ? part 2

The magic behind INT 3BACK TO TOC

It is time to dig a bit into subject that is not adored by most of the programmers and that is assembler language. I am afraid we don’t have much choice because breakpoints work on assembler level.

We have to understand that each our compiled program is actually a set of instructions that tells CPU what to do. Some of our C expressions translated into single instruction, while others may be translated into hundreds and even thousands of instructions. Instruction may be bigger or smaller. From 1 byte up to 15 bytes long for modern CPUs (Intel x86_64).

Debuggers mostly operate on CPU instruction level. The matter of fact that gdbunderstands C/C++ code and allows you to place breakpoints at certain C/C++ line is only an enhancement over gdb‘s basic ability to place breakpoints on certain instruction.

There are several ways to place breakpoints. The most widely used is the INT 3 instruction. It is a single byte operation code instruction that once reached by CPU, tells it to call special breakpoint interrupt handler, provided by operating system during its initialization. Since INT 3 instruction operation code is so small, we can safely substitute any instruction with it. Once operating system’s interrupt handler called, it figures what process reached a breakpoint and notifies it and its debugging process via signals.

Breakpoints hands on

Lets return to our debuggie/debugger friends. As we mentioned debugger does have a chance to place a breakpoint before letting the debuggie process to run. Lets see how this can be done.

Breakpoints placed with INT 3 instruction. Before writing the actual 0xcc (INT 3 operation code), we should figure where to place the instruction. For purpose of this article we will do it manually. On the contrary, real debuggers include complex logic that calculates where and when to place the breakpoints. gdb places several breakpoints by itself, without you even knowing about it. And obviously it has functionality that places breakpoints once you ask it to do so.

In our previous example we had our debuggie process executing ls. It is not suitable for our next demonstration. We will need a sample program that would let us easily demonstrate breakpoints in action. Here it is.

01#include
02
03int main()
04{
05 printf( "~~~~~~~~~~~~> Before breakpoint\n" );
06 // The breakpoint
07 printf( "~~~~~~~~~~~~> After breakpoint\n" );
08
09 return 0;
10}

And here is the disassembler output of the main() routine.

010000000000400508
:
02 400508: 55 push %rbp
03 400509: 48 89 e5 mov %rsp,%rbp
04 40050c: bf 18 06 40 00 mov $0x400618,%edi
05 400511: e8 12 ff ff ff callq 400428
06 400516: bf 2a 06 40 00 mov $0x40062a,%edi
07 40051b: e8 08 ff ff ff callq 400428
08 400520: b8 00 00 00 00 mov $0x0,%eax
09 400525: c9 leaveq
10 400526: c3 retq

We can see that if we will place a breakpoint at address 0×400516, we will see a printout before reaching the breakpoint and right after reaching it. For the sake of our demonstration, we will place a breakpoint at this address. Once we will reach the breakpoint, we will sleep and then let the debuggie running. We should see debuggie producing first printout, then sleeping for a few seconds and then producing second printout.

We’ll achieve our goal in several steps.

  1. First of all, we should fork() off the debuggie. We already did something similar.
  2. Next step is to intercept the execve() call in debuggie. Been there, done that.
  3. Here’s something new. We should modify a byte at address 0×400516 from 0xbf to 0xcc, saving original value (0xbf). This is how we place the breakpoint.
  4. Next, we’re going to wait() for the process. Once it will reach the breakpoint, we’ll be notified.
  5. Once the debuggie reaches the breakpoint we want to restore the code we broke with our 0xcc to its original state.
  6. In addition, we want to fix value of RIP register. This register tells CPU what is the location in memory of next meaningful instruction for it to execute. It’s value will be 0×400517, one byte after 0xcc that we placed. We want to set the RIP register to 0×400516 value because we don’t want the CPU to skip over that MOV instruction that we broke with our 0xcc.
  7. Finally, we want to wait five seconds for the sake of demonstration and let the debuggie continue running.

First things first. Lets see how we do step 3.

01.
02.
03.
04 addr = 0x400516;
05
06 data = ptrace( PTRACE_PEEKTEXT, child, (void *)addr, NULL );
07 orig_data = data;
08 data = (data & ~0xff) | 0xcc;
09 ptrace( PTRACE_POKETEXT, child, (void *)addr, data );
10.
11.
12.

Again, we can see how ptrace() does the job for us. First we peek 8 (sizeof( long )) bytes from address 0×400516. On some architectures this could cause lots of headache because of unaligned memory access. Luckily, we’re on x86_64 and unaligned memory accesses are permitted. Next we set the lowest byte to be 0xcc – INT 3 instruction. Finally, we place 8 bytes back to their place.

We’ve seen how we can wait for certain event in debuggie. Also, we now know how to restore the original value at address 0×400516. So we can skip over steps 4-5 and jump right into step 6. This is something that we haven’t done so far.

What we have to do is to read debuggie registers, change them and write them back. Againptrace() does all the job for us.

01.
02.
03.
04 struct user_regs_struct regs;
05.
06.
07.
08 ptrace( PTRACE_GETREGS, child, NULL, &regs );
09 regs.rip = addr;
10 ptrace( PTRACE_SETREGS, child, NULL, &regs );
11.
12.
13.

Things are not too well documented here. For instance ptrace() documentation never mentions struct user_regs_struct, however this is what ptrace() system call expects to receive in kernel. Once we know what we should use as ptrace() arguments, it is easy. We use PTRACE_GETREGS operation to obtain values of debuggie’s registers, we modify the RIP register and write them back with PTRACE_SETREGS operation. Clear and simple.

Lets see how things actually work. You can find complete listing of debugger process here. Compiling and running listing2.c, produces following output.

01In debuggie process 29843
02In debugger process 29842
03Process 29842 received signal 17
04~~~~~~~~~~~~> Before breakpoint
05Process 29842 received signal 17
06RIP before resuming child is 400517
07Time before debugger falling asleep: 1206346035
08Time after debugger falling asleep: 1206346040. Resuming debuggie...
09~~~~~~~~~~~~> After breakpoint
10Process 29842 received signal 17
11Debuggie exited...
12Debugger exiting...

You can see that “Before breakpoint” printout appears 5 seconds before “After breakpoint” printout. The “RIP before resuming child is 400517″ clearly indicates that the debuggie has stopped on address 0×400517, as we expected.

Single steps

After seeing how easy to place a breakpoint, you can guess that stepping over one line of C/C++ code is simply a matter of placing a breakpoint on the next line of code. This is exactly what gdb does when you want it to single step over some expression.

Conclusion

Debuggers and how they work often associated with some kind of magic.

Debuggers, and gdb as an example, are exceptionally complicated piece of software. Placing breakpoints and single stepping is only a small fraction of what it is able to do. gdbin particular works on dozens of hardware architectures. It supports remote debugging. It is perhaps the most advanced and complicated executable analyzer. It knows when a program loads dynamic library and analyzes the code of that library automatically. It supports bunch of programming languages – from C/C++ to ADA. And these are just few out of its features.

On the contrary, we’ve seen how easy to start debugging certain process, place a breakpoint, etc. The basic functionality that allows debugging is in the operating system and in the CPU, waiting for us to use it.

1 comment: