DWARF: function return value types and parameter types
I've recently had to look deeper into the DWARF debugging standard. DWARF is the debugging format used in the ELF executable format, commonly used by debuggers, but also in tools like my Kcov code coverage tool. It's complex and quite a bit daunting to get the grips on. Anyway, I have a small project where I needed help from DWARF. What I wanted from DWARF is the following:
The two libraries are libdwarf and libdw/elfutils. libdwarf is the original one, while elfutils is a newer library which promises a slightly simplified and more modern API. They both install /usr/include/dwarf.h though, so you can only have one of the development libraries installed. elfutils is easier to find examples for, so I wanted to use that. Unfortunately, Kcov is written for libdwarf (by Thomas Neumann), so I first rewrote that to use libdw instead.
So on to some DWARF concepts and an overview of my problem. A deeper introduction can be found in the Introduction to the DWARF debugging format by Michael J. Eager, written for the standard committee. Here, I'll limit myself to the steps needed to solve my particular issue. The figure below illustrates the data structures used for function return values and parameters.

The basic structure in DWARF is the DIE, the Debugging Information Entry. Each of the big, rounded blobs in the figure are DIEs. These are structued in a tree of sorts, where a DIE can have children and siblings. Each DIE is identified by a tag, which describes what type of DIE we're dealing with. The colors of the rounded blobs signify the tags, or DIE types. A DIE can also have attributes, which describe, well, attributes of the DIE. For example, DIEs often have names (a function name), types (a reference to int, double, etc) and sizes (the size of a type).
Now, if we want to get the function return type and parameter types, it should be clear that we need to traverse the DIE tree to lookup functions and their children. From there, we can get the types through the DIE attributes. I'll describe in code how to do this with libdw from elfutils. The full code is listed here, and the code below is simplified, skipping error checks and some application-specific setuff. So let's begin.
... with initializing libdwarf from an ELF-structure. This returns a pointer to a DWARF context. We continue by iterating through the compilation units, which is done in two steps:
the first of which looks up the offset into the ELF of the compilation unit, and the second returns the DIE for this. As you can see in the figure, we're actually intersted in the child of the compilation unit - where the functions are.
So dwarf_child just places the child DIE in the result variable. We then want to iterate through the childs siblings and do something for each function:
And here we check the tag of the DIE and call a helper routine for each function. So onto handleDwarfFunction():
We first lookup the DW_AT_type attribute of the function, which is the return type of the function. mips_arg_size() is a helper function which return the size of this attribute (if it can be derived). It's taken from elfutils and will iterate through typedefs etc to get to the actual type. We'll skip that here though. In the figure, the return value attribute is represented by the square box in the lower left. For the parameters, we'll need to recurse down to the children of the function:
so, similar to what we had before, we lookup the child and then each sibling of the child. From the tag of it we can see if it is actually a parameter, and then convert the type attribute to the size by the mips_arg_size() helper.
So that's it. Not so difficult, right? Well, not when you know how to do it - but I can assure you that this took quite a bit of time to find out.
And what's the use of it then? That's for the next post!
- Looking up the type of function return values
- Looking up the type and number of arguments to functions
The two libraries are libdwarf and libdw/elfutils. libdwarf is the original one, while elfutils is a newer library which promises a slightly simplified and more modern API. They both install /usr/include/dwarf.h though, so you can only have one of the development libraries installed. elfutils is easier to find examples for, so I wanted to use that. Unfortunately, Kcov is written for libdwarf (by Thomas Neumann), so I first rewrote that to use libdw instead.
So on to some DWARF concepts and an overview of my problem. A deeper introduction can be found in the Introduction to the DWARF debugging format by Michael J. Eager, written for the standard committee. Here, I'll limit myself to the steps needed to solve my particular issue. The figure below illustrates the data structures used for function return values and parameters.
The basic structure in DWARF is the DIE, the Debugging Information Entry. Each of the big, rounded blobs in the figure are DIEs. These are structued in a tree of sorts, where a DIE can have children and siblings. Each DIE is identified by a tag, which describes what type of DIE we're dealing with. The colors of the rounded blobs signify the tags, or DIE types. A DIE can also have attributes, which describe, well, attributes of the DIE. For example, DIEs often have names (a function name), types (a reference to int, double, etc) and sizes (the size of a type).
Now, if we want to get the function return type and parameter types, it should be clear that we need to traverse the DIE tree to lookup functions and their children. From there, we can get the types through the DIE attributes. I'll describe in code how to do this with libdw from elfutils. The full code is listed here, and the code below is simplified, skipping error checks and some application-specific setuff. So let's begin.
/* Initialize libdwarf */ dbg = dwarf_begin_elf(this->elf, DWARF_C_READ, NULL);
... with initializing libdwarf from an ELF-structure. This returns a pointer to a DWARF context. We continue by iterating through the compilation units, which is done in two steps:
while (dwarf_nextcu(dbg, offset, &offset, &hdr_size, 0, 0, 0) == 0) {
Dwarf_Die result, cu_die;
if (dwarf_offdie(dbg, last_offset + hdr_size, &cu_die) == NULL)
break;
last_offset = offset;the first of which looks up the offset into the ELF of the compilation unit, and the second returns the DIE for this. As you can see in the figure, we're actually intersted in the child of the compilation unit - where the functions are.
if (dwarf_child (&cu_die, &result) != 0)
continue;So dwarf_child just places the child DIE in the result variable. We then want to iterate through the childs siblings and do something for each function:
do {
switch (dwarf_tag(&result))
{
case DW_TAG_subprogram:
case DW_TAG_entry_point:
case DW_TAG_inlined_subroutine:
this->handleDwarfFunction(&result);
break;
default:
break;
}
} while(dwarf_siblingof(&result, &result) == 0);And here we check the tag of the DIE and call a helper routine for each function. So onto handleDwarfFunction():
void CibylElf::handleDwarfFunction(Dwarf_Die *fun_die)
{
attr = dwarf_attr_integrate(fun_die, DW_AT_type, &attr_mem);
ret_size = mips_arg_size(elf, fun_die, attr);We first lookup the DW_AT_type attribute of the function, which is the return type of the function. mips_arg_size() is a helper function which return the size of this attribute (if it can be derived). It's taken from elfutils and will iterate through typedefs etc to get to the actual type. We'll skip that here though. In the figure, the return value attribute is represented by the square box in the lower left. For the parameters, we'll need to recurse down to the children of the function:
if (dwarf_child (fun_die, &result) != 0)
return;
do {
switch (dwarf_tag (&result))
{
case DW_TAG_formal_parameter:
attr = dwarf_attr_integrate(&result, DW_AT_type, &attr_mem);
arg_size = mips_arg_size(elf, fun_die, attr);
break;
case DW_TAG_inlined_subroutine:
/* Recurse further down */
this->handleDwarfFunction(&result);
break;
default:
break;
}
} while(dwarf_siblingof(&result, &result) == 0);
}so, similar to what we had before, we lookup the child and then each sibling of the child. From the tag of it we can see if it is actually a parameter, and then convert the type attribute to the size by the mips_arg_size() helper.
So that's it. Not so difficult, right? Well, not when you know how to do it - but I can assure you that this took quite a bit of time to find out.
And what's the use of it then? That's for the next post!