Tracking variables and their values – Advanced IR Generation-2
Again, we have to specify a source location for the variable. An instance of llvm::DILocation is a container that holds the line and column of a location associated with a scope. Furthermore, the insertDeclare() method adds the call to the intrinsic function of the LLVM IR. In terms of this function’s parameters, it requires the address of the variable, stored in Val, and the debug metadata for the variable, stored in DbgValVar. We also pass an empty address expression and the debug location we created previously. As with normal instructions, we need to specify into which basic block the call is inserted. If we specify a basic block, then the call is inserted at the end. Alternatively, we can specify an instruction, and the call is inserted before that instruction. We also have the pointer to the alloca instruction, which is the last instruction that we inserted into the underlying basic block. Therefore, we can use this basic block, and the call gets appended after the alloca instruction.
If the value of a local variable changes, then a call to llvm.dbg.value must be added to the IR to set the new value of a local variable. The insertValue() method of the llvm::DIBuilder class can be used to achieve this.
When we implemented the IR generation for functions, we used an advanced algorithm that mainly used values and avoided allocating storage for local variables. In terms of adding debug information, this only means that we use llvm.dbg.value much more often than you see it in clang-generated IR.
What can we do if the variable does not have dedicated storage space but is part of a larger, aggregate type? One of the situations where this can arise is with the use of nested functions. To implement access to the stack frame of the caller, you must collect all used variables in a structure and pass a pointer to this record to the called function. Inside the called function, you can refer to the variables of the caller as if they are local to the function. What is different is that these variables are now part of an aggregate.
In the call to llvm.dbg.declare, you use an empty expression if the debug metadata describes the whole memory the first parameter is pointing to. However, if it only describes a part of the memory, then you need to add an expression indicating which part of the memory the metadata applies to.
In the case of the nested frame, you need to calculate the offset in the frame. You need access to a DataLayout instance, which you can get from the LLVM module into which you are creating the IR code. If the llvm::Module instance is named Mod, and the variable holding the nested frame structure is named Frame and is of the llvm::StructType type, you can access the third member of the frame in the following manner. This access gives you the offset of the member:
const llvm::DataLayout &DL = Mod->getDataLayout();
uint64_t Ofs = DL.getStructLayout(Frame)->getElementOffset(3);
Moreover, the expression is created from a sequence of operations. To access the third member of the frame, the debugger needs to add the offset to the base pointer. As an example, you need to create an array and this information like so:
llvm::SmallVector<int64_t, 2> AddrOps;
AddrOps.push_back(llvm::dwarf::DW_OP_plus_uconst);
AddrOps.push_back(Offset);
From this array, you can create the expression that you must then pass to llvm.dbg.declare instead of the empty expression:
llvm::DIExpression *Expr = DBuilder.createExpression(AddrOps);
It is important to note that you are not limited to this offset operation. DWARF knows many different operators, and you can create fairly complex expressions. You can find the complete list of operators in the LLVM include file, called llvm/include/llvm/BinaryFormat/Dwarf.def.
At this point, you can create debug information for variables. To enable the debugger to follow the control flow in the source, you also need to provide line number information. This is the topic of the next section.