Explore variable handling techniques in Linux debuggers!-LINUX-php.cn

Introduction

Variables are sneaky. Sometimes they'll happily sit in the register, only to end up on the stack as soon as they turn around. For optimization purposes, the compiler may throw them out of the window entirely. No matter how variables move through memory, we need some way to track and manipulate them in the debugger. This article will teach you how to handle variables in the debugger and demonstrate a simple implementation using libelfin.

Series Article Index

Preparing the environment
Breakpoint
Registers and Memory
ELF and DWARF
Source code and signals
Source code level step by step execution
Source level breakpoints
Stack expansion
Handling variables
Advanced Topics

Before starting, please make sure you are using the version of libelfin fbreg on my branch. This contains a few hacks to support getting the base address of the current stack frame and evaluating a list of locations, neither of which is provided by native libelfin. You may need to pass the -gdwarf-2 parameter to GCC to generate compatible DWARF messages. But before implementing that, I'll detail how positional encoding works in the latest DWARF 5 specification. If you want to know more, you can get the standard here.

DWARF Location

The location of a variable in memory at a given moment is encoded in the DWARF message using the DW_AT_location attribute. A location description can be a single location description, a composite location description, or a list of locations.

Simple position description: Describes the position of a continuous part (usually all parts) of the object. A simple location description can describe a location in addressable memory or a register, or the lack thereof (with or without a known value). For example, DW_OP_fbreg -32: An entire stored variable - 32 bytes starting from the stack frame base.
Composite location description: Describe objects in terms of fragments, each object can be contained in a portion of a register or stored in a memory location independent of other fragments. For example, DW_OP_reg3 DW_OP_piece 4 DW_OP_reg10 DW_OP_piece 2: The first four bytes are in register 3 and the last two bytes are in a variable in register 10.
Location list: Describes objects that have limited lifetime or change location during lifetime. for example:
- - [ 0] DW_OP_reg0
  - [ 1] DW_OP_reg3
  - [ 2] DW_OP_reg2
Variables that move between registers based on the current value of the program counter.

DW_AT_location is encoded in three different ways depending on the type of location description. exprloc encodes simple and composite position descriptions. They consist of a byte length followed by a DWARF expression or location description. Encoded location lists for loclist and loclistptr, which provide the index or offset in the .debug_loclists section, which describes the actual location list.

DWARF expression

Use a DWARF expression to calculate the actual position of a variable. This includes a series of operations that manipulate stack values. There are many DWARF operations available, so I won't explain them in detail. Instead, I'll give some examples from each expression to give you something to work with. Also, don't be afraid of this; libelfin will handle all this complexity for us.

Literal encoding
- DW_OP_lit0, DW_OP_lit1……DW_OP_lit31
  - Push literals onto the stack
- DW_OP_addr
  - Push the address operand onto the stack
- DW_OP_constu
  - Push unsigned value onto stack
Register value
- DW_OP_fbreg
  - Push the value found at the base of the stack frame, offset by the given value
- DW_OP_breg0, DW_OP_breg1... DW_OP_breg31
  - Push the contents of the given register plus the given offset onto the stack
Stack operations
- DW_OP_dup
  - Copy the value at the top of the stack
- DW_OP_deref
  - Treat the top of the stack as a memory address and replace it with the contents of that address
Arithmetic and logical operations
- DW_OP_and
  - Pop the two values at the top of the stack and push back their logical AND
- DW_OP_plus
  - Same as DW_OP_and, but adds value
Control flow operations
- DW_OP_le, DW_OP_eq, DW_OP_gt, etc.
  - Pop the first two values, compare them, and push 1 if the condition is true, 0 otherwise
- DW_OP_bra
  - Conditional branch: If the top of the stack is not 0, skip forward or backward in the expression via offset
Input conversion
- DW_OP_convert
  - Convert the value at the top of the stack to a different type, which is described by the DWARF info entry at the given offset
Special operations
- DW_OP_nop
  - do nothing!

DWARF type

DWARF type representations need to be powerful enough to provide useful variable representations to debugger users. Users often want to be able to debug at the application level rather than at the machine level, and they need to understand what their variables are doing.

The DWARF type is encoded in DIE along with most other debugging information. They can have properties indicating their name, encoding, size, bytes, etc. A myriad of type tags are available to represent pointers, arrays, structures, typedefs, and anything else you might see in C or a C program.

Take this simple structure as an example:

struct test{ int i; float j; int k[42]; test* next; };

Copy after login

The parent DIE of this structure is like this:

< 1><0x0000002a> DW_TAG_structure_type DW_AT_name "test" DW_AT_byte_size 0x000000b8 DW_AT_decl_file 0x00000001 test.cpp DW_AT_decl_line 0x00000001

Copy after login

The above is that we have a structure called test, with a size of 0xb8, declared on line 1 of test.cpp. Next there are a number of sub-DIEs describing the members.

< 2><0x00000032> DW_TAG_member DW_AT_name "i" DW_AT_type <0x00000063> DW_AT_decl_file 0x00000001 test.cpp DW_AT_decl_line 0x00000002 DW_AT_data_member_location 0 < 2><0x0000003e> DW_TAG_member DW_AT_name "j" DW_AT_type <0x0000006a> DW_AT_decl_file 0x00000001 test.cpp DW_AT_decl_line 0x00000003 DW_AT_data_member_location 4 < 2><0x0000004a> DW_TAG_member DW_AT_name "k" DW_AT_type <0x00000071> DW_AT_decl_file 0x00000001 test.cpp DW_AT_decl_line 0x00000004 DW_AT_data_member_location 8 < 2><0x00000056> DW_TAG_member DW_AT_name "next" DW_AT_type <0x00000084> DW_AT_decl_file 0x00000001 test.cpp DW_AT_decl_line 0x00000005 DW_AT_data_member_location 176(as signed = -80)

Copy after login

Each member has a name, a type (which is a DIE offset), a declaration file and line, and a byte offset pointing to the structure in which its member resides. Its type points are as follows.

< 1><0x00000063> DW_TAG_base_type DW_AT_name "int" DW_AT_encoding DW_ATE_signed DW_AT_byte_size 0x00000004 < 1><0x0000006a> DW_TAG_base_type DW_AT_name "float" DW_AT_encoding DW_ATE_float DW_AT_byte_size 0x00000004 < 1><0x00000071> DW_TAG_array_type DW_AT_type <0x00000063> < 2><0x00000076> DW_TAG_subrange_type DW_AT_type <0x0000007d> DW_AT_count 0x0000002a < 1><0x0000007d> DW_TAG_base_type DW_AT_name "sizetype" DW_AT_byte_size 0x00000008 DW_AT_encoding DW_ATE_unsigned < 1><0x00000084> DW_TAG_pointer_type DW_AT_type <0x0000002a>

Copy after login

As you can see, int on my laptop is a 4-byte signed integer type, and float is a 4-byte floating point number. The integer array type has 2a elements by pointing to type int as its element type and sizetype (think of it as size_t) as the index type. The test * type is DW_TAG_pointer_type, which refers to the test DIE.

Implementing a simple variable reader

As mentioned above, libelfin will handle most of the complexity for us. However, it does not implement all methods for representing variable positions, and handling these in our code will become very complex. Therefore, I now choose to only support exprloc. Please add support for more types of expressions as needed. If you're really brave, please submit a patch to libelfin to help complete the necessary support!

Processing variables mainly involves locating different parts in memory or registers, and reading or writing is the same as before. To keep things simple, I'll just tell you how to implement reading.

First we need to tell libelfin how to read registers from our process. We create a class that inherits from expr_context and use ptrace to handle everything:

class ptrace_expr_context : public dwarf::expr_context { public: ptrace_expr_context (pid_t pid) : m_pid{pid} {} dwarf::taddr reg (unsigned regnum) override { return get_register_value_from_dwarf_register(m_pid, regnum); } dwarf::taddr pc() override { struct user_regs_struct regs; ptrace(PTRACE_GETREGS, m_pid, nullptr, ®s); return regs.rip; } dwarf::taddr deref_size (dwarf::taddr address, unsigned size) override { //TODO take into account size return ptrace(PTRACE_PEEKDATA, m_pid, address, nullptr); } private: pid_t m_pid; };

Copy after login

Reading will be handled by the read_variables function in our debugger class:

void debugger::read_variables() { using namespace dwarf; auto func = get_function_from_pc(get_pc()); //... }

Copy after login

The first thing we did above is find the function we are currently in, then we need to iterate through the entries in that function to find the variables:

for (const auto& die : func) { if (die.tag == DW_TAG::variable) { //... } }

Copy after login

We obtain location information by looking for the DW_AT_location entry in DIE:

auto loc_val = die[DW_AT::location];

Copy after login

Next we make sure it's an exprloc and ask libelfin to evaluate our expression:

if (loc_val.get_type() == value::type::exprloc) { ptrace_expr_context context {m_pid}; auto result = loc_val.as_exprloc().evaluate(&context);

Copy after login

Now that we have evaluated the expression, we need to read the contents of the variable. It can be in memory or registers, so we'll handle both cases:

switch (result.location_type) { case expr_result::type::address: { auto value = read_memory(result.value); std::cout << at_name(die) << " (0x" << std::hex << result.value << ") = " << value << std::endl; break; } case expr_result::type::reg: { auto value = get_register_value_from_dwarf_register(m_pid, result.value); std::cout << at_name(die) << " (reg " << result.value << ") = " << value << std::endl; break; } default: throw std::runtime_error{"Unhandled variable location"}; }

Copy after login

You can see that I printed out the value without explanation based on the type of the variable. Hopefully with this code you can see how there is support for writing variables, or searching for variables with a given name.

Finally we can add this to our command parser:

else if(is_prefix(command, "variables")) { read_variables(); }

Copy after login

have a test

Write some small function with some variables, compile it without optimization and with debug information, and then see if you can read the value of the variable. Try writing to the memory address where the variable is stored and see how the program changes behavior.

There are already nine articles, and the last one is left! Next time I will discuss some more advanced concepts that may be of interest to you. Now you can find the code for this post here.

The above is the detailed content of Explore variable handling techniques in Linux debuggers!. For more information, please follow other related articles on the PHP Chinese website!