GistTree.Com
Entertainment at it's peak. The news is by your side.

How Does a C Debugger Work?

0

Even as you exhaust GDB, that you might possibly perhaps uncover about that it has a total retain watch over over your
utility job. Hit CtrlC whereas the
utility is working and the system execution stops, and GDB presentations
its original discipline, stack ticket, and heaps others.

Nonetheless how can it attain it?

How they don’t work?

Let’s open first with the blueprint it does now not work. It does now not simulate the
execution, by reading and interpreting the binary instructions. It
might possibly perhaps possibly, and that might possibly perhaps perhaps well work (that the potential valgrind memory debugger
works), but that might possibly perhaps perhaps well be too leisurely. Valgrind slows the utility
1000x down, GDB does now not. That might possibly perhaps perhaps even be the potential digital machines cherish
Qemu work.

So, what’s the trick? Sad magic! … no, that might possibly perhaps perhaps well be too straightforward.

One other wager? … ? Hacking! proceed, there’s a neatly fine deal of that, plus
succor from the OS kernel.

Initially, there’s one ingredient to take dangle of about Linux processes:
guardian processes can accumulate additional files about their
childhood, in explicit the flexibility to ptrace them. And, that that you might possibly perhaps perhaps be ready to
wager, the debugger is the guardian of the debuggee job (or it
turns into, processes can undertake a baby in Linux :-).

Linux Ptrace API

Linux ptrace API
enables a (debugger) job to entry low-diploma files about
one other job (the debuggee). In explicit, the debugger can:

  • be taught and write the debuggee’s memory: PTRACE_PEEKTEXT,
    PTRACE_PEEKUSER, PTRACE_POKE…
  • be taught and write the debuggee’s CPU registers: PTRACE_GETREGSET,
    PTRACE_SETREGS,
  • be notified of draw events: PTRACE_O_TRACEEXEC,
    PTRACE_O_TRACECLONE, PTRACE_O_EXITKILL, PTRACE_SYSCALL (you
    can perceive the exec syscall, clone, exit, and all of the oppositesyscalls)
  • retain watch over its execution: PTRACE_SINGLESTEP, PTRACE_KILL,
    PTRACE_INTERRUPT, PTRACE_CONT (gaze the CPU single-stepping
    here)
  • alter its signal going thru: PTRACE_GETSIGINFO, PTRACE_SETSIGINFO

How is Ptrace implemented?

Ptrace implementation is outside of the scope of this post, but I
don’t must transfer the sad-box one step above, so let me informquickly the blueprint it essentially works (I’m no kernel expert, please appropriate me if I amdefective and excuse me if I simplify too grand :-).

Ptrace
is piece of Linux kernel,
so it has entry to all the kernel-diploma files about the
job:

  • reading
    and writing files? Linux’s
    copy_to/from_user
  • accessing
    CPU registers? straightforward with
    copy_regset_to/from_user. (There
    is nothing refined here, as CPU registers are saved somewhere in
    Linux’s
    struct task_struct *
    scheduler constructions, when the system is unscheduled.
  • altering
    the signal going thru? update enviornment
    last_siginfo
  • single-stepping?
    web page the ethical flag
    (ARM,
    x86)
    on the job constructing and, before triggering the execution, on theprocessor.
  • Ptrace might possibly perhaps perhaps also be hooked (perceive unbiased
    ptrace_event)
    in many scheduling operations, so as that it’s a ways going to ship a
    SIGTRAP signal
    to the debugger if requested (PTRACE_O_TRACEEXEC option and its
    family).

What about techniques without Ptrace?

The clarification above targeted Linux native debugging, but it completely’s validfor barely a few of the different environments. To accumulate a clue on what GDB asks to
its completely different targets, that that you might possibly perhaps perhaps be ready to choose a be aware at the operations of its
target stack.

In this target interface, that you might possibly perhaps uncover about all of the excessive-diploma operations
required for C debugging:

struct target_ops 
{
  struct target_ops *under;   /To the target under this one.  */
  const char *to_shortname; /Title this target variety */
  const char *to_longname;  /Title for printing */
  const char *to_doc;       /Documentation.  Would now not encompass trailing
               newline, and begins with a one-line descrip-
               tion (likely fair like to_longname).  */

 void (*to_attach) (struct target_ops *ops, const char *, int);
 void (*to_fetch_registers) (struct target_ops *, struct regcache *, int);
 void (*to_store_registers) (struct target_ops *, struct regcache *, int);
 int (*to_insert_breakpoint) (struct target_ops *, struct gdbarch *,
             struct bp_target_info *);
 int (*to_insert_watchpoint) (struct target_ops *,
             CORE_ADDR, int, int, struct expression *);
 ...
}

The generic piece of GDB calls these functions, and the target-particularcomponents implement them. It is (conceptually) shaped as a stack, or a
pyramid: the high of the stack is extremely generic, as an instance:

The
distant
target is attention-grabbing, because it splits the execution stack between two
“computer techniques”, thru a verbal substitute protocol (TCP/IP, serial port).

The distant piece might possibly perhaps possibly merely even be gdbserver, working in a single other Linux box. Howeverit is more seemingly to be an interface to a hardware-debugging port (JTAG) or a
digital machine hypervisor (e.g
Qemu), that will play the role of
the kernel+ptrace. In its build of querying the OS kernel constructions, the
distant debugger stub will place a query to the hypervisor constructions, or without prolong
the hardware registers of the processor.

For additional reading about this distant protocol, Embecosm wrote a
detail files about completely different messages. Gdbserver event-processing loop
is there, and
Qemu gdb-server stub
might possibly perhaps perhaps also be on-line.

To sum up

We are going to have the flexibility to uncover about here that every person the low-diploma mechanisms required to
implement a debugger are there, supplied by this ptrace API:

  • Retract the exec syscall and block the open of the execution,
  • Quiz the CPU registers to build up the system’s original instruction and
    stack discipline,
  • Retract for clone/fork events to detect new threads,
  • Peep and drag files addresses to be taught and alter memory variables.

Nonetheless is that every person a debugger does? no, that staunch the very low degreeingredients … It also deals with image going thru. That is link between the
binary code and the program sources. And one ingredient is soundless missing,
possibly the supreme one: breakpoints! I will first sing how
breakpoints work because it’s barely attention-grabbing and advanced, then I will nearback on image administration.

Breakpoints are no longer piece of Ptrace API

As we now enjoy viewed above, breakpoints are no longer piece of ptrace API
services and products. Nonetheless we are able to alter the memory, and salvage the debugee’s
alerts. You can perhaps well possibly no longer uncover about the link? That is on story of breakpoint
implementation is extremely advanced and hacky! Let’s perceive the supreme intention to web page a
breakpoint at a given take care of:

  1. The debugger reads (ptrace check) the binary instruction saved at
    this take care of, and saves it in its files constructions.
  2. It writes an invalid instruction at this discipline. What ever this
    instruction, it staunch has to be invalid.
  3. When the debuggee reaches this invalid instruction (or, place more
    wisely, the processor, setup with the debuggee memory context), the
    it gained’t be ready to elevate out it (on story of it’s invalid).
  4. In widespread multitask OSes, an invalid instruction does now not break the
    total draw, but it completely affords the retain watch over assist to the OS kernel, by blueprint ofraising an interruption (or a fault).
  5. This interruption is translated by Linux into a SIGTRAP signal,
    and transmitted to the system … or to it’s guardian, as the debugger
    asked for.
  6. The debugger gets the knowledge about the signal, and assessments the
    trace of the debuggee’s instruction pointer (i.e., where the trap
    occurred). If the IP take care of is in its breakpoint checklist, meaningit’s a debugger breakpoint (in any other case, it’s a fault within the system,
    staunch plug the signal and let it break).
  7. Now that the debuggee is stopped at the breakpoint, the debugger
    can let its person attain what ever s/he wants, unless it’s time to proceedthe execution.
  8. To proceed, the debugger desires to 1/ write the supreme instruction
    assist within the debuggee’s memory, 2/ single-step it (proceed the
    execution for one CPU instruction, with ptrace single-step) and 3/
    write the invalid instruction assist (so as that the execution can stopagain subsequent time). And 4/, let the execution drift on the total.

Trim, is no longer it? As a facet statement, that that you might possibly perhaps perhaps be ready to gaze that this algorithmwill no longer work if no longer all the threads are stopped at the identical time
(on story of working threads might possibly perhaps possibly merely plug the breakpoint when the legitimate
instruction is in impart). I gained’t detail the potential GDB guys solved it,
but it completely’s discussed in detail this paper:
Steady Multi-threaded Debugging in GDB. Put
temporarily, they write the instruction in other areas in memory, web page the
instruction pointer to that discipline and single-step the
processor. Nonetheless the topic is that some instruction are
take care of-connected, as an instance the jumps and conditional jumps …

Image and debug files going thru

Now, let’s reach assist to the image and debug files handlingaspect. I didn’t look that piece into crucial aspects, so I will supreme original an
overview.

Initially, can we debug without debug files and imageaddresses? The acknowledge is proceed, as, as we now enjoy viewed above, all of thelow-diploma commands kind out CPU registers and memory addresses, and
no longer supply-diploma files. Due to the this truth, the link with the sources are
supreme for the person’s comfort. With out debug files, that that you might possibly perhaps perhaps be ready to uncover aboutyour utility the potential the processor (and the kernel) uncover about it: as
binary (assembly) instructions and memory bits. GDB does now not want any
additional files to translate binary files into CPU instructions:

(gdb) x/10x $non-public computer # heXadecimal representation0x402c60:   0x56415741  0x54415541  0x55f48949  0x4853fd89
0x402c70:   0x03a8ec81  0x8b480000  0x8b48643e  0x00282504
0x402c80:   0x89480000  0x03982484
(gdb) x/10i $non-public computer # Instruction illustration
=> 0x402c60:    push   %r15
0x402c62:   push   %r14
0x402c64:   push   %r13
0x402c66:   push   %r12
0x402c68:   mov    %rsi,%r12
0x402c6b:   push   %rbp
0x402c6c:   mov    %edi,%ebp
0x402c6e:   push   %rbx
0x402c6f:   sub    $0x3a8,%rsp
0x402c76:   mov    (%rsi),%rdi

Now if we add image going thru files, GDB can match addresses
with image names:

(gdb) $non-public computer
$1 = (void [1289, 0]()) 0x402c60 

You can perhaps well possibly checklist the symbols of an ELF binary with nm -a $file:

nm -a /usr/lib/debug/usr/bin/ls.debug | grep " main"
0000000000402c60 T main

GDB will also be ready to original the stack ticket (more on that later),
but with a shrimp curiosity:

(gdb) where
#0  write ()
#1  0x0000003d492769e3 in _IO_new_file_write ()
#2  0x0000003d49277e4c in new_do_write ()
#3  _IO_new_do_write ()
#4  0x0000003d49278223 in _IO_new_file_overflow ()
#5  0x00000000004085bb in print_current_files ()
#6  0x000000000040431b in main ()

We enjoy obtained the PC addresses, the corresponding unbiased, but that isit. Internal a unbiased, that that you might possibly perhaps perhaps must debug in assembly!

Now let’s add debug files: that is the DWARF fashioned, gcc -g
option. I’m no longer very mindful of this fashioned, but I mark itprovides:

  • take care of to line and line to take care of mapping
  • files variety definitions, including typedefs and constructions
  • native variables and unbiased parameters, with their variety

Are attempting dwarfdump to perceive the knowledge embedded in you
binaries. addr2line also makes exercise of these files:

$ dwarfdump /usr/lib/debug/usr/bin/ls.debug | grep 402ce4
0x00402ce4  [1289, 0] NS
$ addr2line -e /usr/lib/debug/usr/bin/ls.debug  0x00402ce4
/usr/src/debug/coreutils-8.21/src/ls.c: 1289

Many supply-diploma debugging commands will count on these files,
cherish the divulge subsequent, that sets a breakpoint at the take care of of the
subsequent line, the print divulge that relies on the kinds to original the
variables within the ethical variety (char, int, plug along with the drift, as opposed tobinary/hexadecimal!).

Final words

We enjoy viewed many ingredients of debugger’s internals, so I will staunch teach a
few words of the final aspects:

  • the stack ticket is “unwinded” from the original frame ($sp and
    $bp/#fp) upwards, one frame at a time. Capabilities’ name,
    parameters and native variables are level to within the debug files.
  • watchpoints are implemented (if accessible) with the succor of the
    processor: write in its registers which addresses must bemonitored, and it will lift an exception when the memory is be taught or
    written. If this toughen is no longer accessible, or do you need to build a query to more
    watchpoints than the processor helps … then the debugger falls
    assist to “hand-made” watchpoints: elevate out the utility instruction
    by instruction, and take a look at if the original operation touches a
    watchpointed take care of. Yes, that’s very leisurely!
  • Reverse debugging might possibly perhaps possibly merely even be performed this means too, legend the discontinuance of
    every instruction, and apply it backward for reverse execution.
  • Conditional breakpoints are customary breakpoints, besides that,
    internally, the debugger assessments the conditions before giving the
    retain watch over to the person. If the condition is no longer matched, the execution
    is silently persisted.

And play with gdb gdb, or better (potential better if truth be told), gdb –pid $(pidof gdb), on story of two debuggers within the identical terminal is insane :-). One other extensive ingredient for studying is draw debugging:

qemu-draw-i386 -gdb tcp:: 1234
gdb --pid $(pidof qemu-draw-i386)
gdb /boot/vmlinuz --exec "target distant localhost: 1234"

but I will retain that for one other article!

Read More

Leave A Reply

Your email address will not be published.