Pernosco

Tackling the C++ debugging UI nightmare with Pernosco

2025-01-27T00:00:00+00:00

The full unambiguous names of C++ functions and variables include type names and namespace names, which makes them verbose. This is especially true with heavy use of templates because the type names include template parameter names (recursively). The challenge for a debugger (and other tools) is to display names that convey enough information to the user without overwhelming the UI. Pernosco tackles this by abbreviating names but making the abbreviations interactive.

New features in Linux 6.10 contributed by Pernosco

2024-09-19T00:00:00+00:00

The Linux 6.10 kernel release contains two new features in the perf event subsystem contributed by Pernosco. These features are intended to benefit rr (and thus Pernosco), but they also have broader applications if adopted by other software. In this blog post I will discuss what we added, why it benefits rr, and the broader applications it could have.

ELF symbol interposition and RTLD_LOCAL

2022-07-19T00:00:00+00:00

You may be familiar with "the LD_PRELOAD trick</a>". This "trick" is used to implement things like heaptrack</a>. By interposing a third library between an application and libc's malloc/free you can track the state of the heap and recognize errors like double frees and memory leaks. But this doesn't work for libraries loaded with RTLD_LOCAL, which is the default behavior of dlopen</a>. Why not? Let's look at how this sort of linking works normally first, and then we can figure out why it goes wrong with RTLD_LOCAL.

Shrink Your Compile Times With Split DWARF

2021-12-21T00:00:00+00:00

What if you could reduce the time it takes to link your program by 25%, reduce the memory it takes to link your program by 40%, and reduce the size of the binary by 50%, all by changing a compiler flag? That's the power of "split DWARF", a compiler and debugger feature that uses a new format for the DWARF debugging information that's specifically designed to reduce the work the linker is required to do. Let's dive into how it works and what is required for you to benefit from it.

Automatic Downcasting: How Does It Work?

2021-12-14T00:00:00+00:00

Many programming languages include mechanisms for dynamic polymorphism</a>. These pose challenges for debuggers, because viewing only fields from the declared type of a variable may not be particularly useful. Automatically deducing the most-"derived" type and downcasting to it presents the entire object to developers and makes debugging code that uses dynamic polymorphism much more pleasant. Our Pernosco Omniscient Debugger automatically downcasts types that use dynamic polymorphism in supported languages (C++, Rust, and Ada). You might also be familiar with this technique in gdb via the set print object on</code> command. But how is it actually implemented?
Making Debuggers Sad: C++ Identifier Canonicalization 2021-11-24T00:00:00+00:00 Why do debuggers like gdb take so long to start up on large programs? There are many reasons, but one surprising reason is that gdb spends significant amounts of time parsing C++ identifiers and re-emitting them into a canonical form</a>. This is due to deficiencies in clang++ and g++ (and, arguably, DWARF) — but not everyone agrees. The underlying reasons also apply to Pernosco so we have implemented something similar, although we're able to hide the startup impact more effectively by folding it into our "build the big database of everything" step. Suppose a debugger user wants to evaluate the value of the variable Foo<short>::FOO</code>, where Foo</code> is declared with template <typename T> struct Foo { enum Enum { FOO = 1 }; }; </code></pre> The DWARF debuginfo for this program contains a DW_TAG_structure_type</code> for Foo<short></code>, which contains debuginfo for the Enum</code> enum and its values. We'll have to search for this DW_TAG_structure_type</code> by name. Unfortunately, the DW_AT_name</code> in the debuginfo produced by gcc 9 is not Foo<short></code> — it's Foo<short int></code> — so we may not find the type with a naive search 😞. The basic problem here is that there are many valid ways of writing the same template parameters, so the user might pick a different way than the compiler emitted. This applies not just to template parameters that are types, but also values, e.g. given template <unsigned long V> struct Bar { ... }</code> the compiler might emit a type with name Bar<1UL></code> (as clang++ 12 does), while the user enters Bar<1></code>. The only general solution here is for the debugger to parse all the C++ type names that contain template parameters and store them in a canonical form. When a user enters a type name, it is canonicalized using the same algorithm, so a match will be found if one exists. E.g. in the above examples the debugger could canonicalize the DWARF names Foo<short int></code> and Bar<1UL></code> to Foo<short></code> and Bar<1></code> respectively and use the latter for lookup. This requires parsing C++ type syntax, which is nasty, but the debugger already needs to do this to handle various forms of user input, so it's not a new problem. Potentially parsing many gigabytes of C++ symbols does subject the parser to increased performance stress, however. There are situations where it gets very difficult or impossible to correctly parse C++ type syntax outside the context of a compilation unit, but let's studiously ignore that. Interaction with demangling</h2> C++ entities with "linkage", i.e. functions and variables, are assigned mangled names</a> in their binaries. Debuggers demangle these into fully-qualified human-readable names. We take advantage of this in Pernosco by ensuring that the demangler always produces names in our canonical form (via options passed to cpp_demangle</a>). This greatly reduces the number of C++ identifers we would otherwise have to parse and canonicalize. BTW you would hope that GNU's c++filt</code> demangler at least produces names that are consistent with the names gcc emits into debuginfo, but it does not</a>. Likewise llvm-cxxfilt</code> produces names inconsistent with clang++. Ideal solution</h2> Ideally the text serialization of C++ names would be standardized, gcc and clang++ would produce standard names in their debuginfo, their demanglers would also emit standard names (at least when the right options are set), and debuggers could detect that this has been done and avoid a lot of work. That isn't likely to ever happen; as far as I can tell, compiler maintainers don't think that the current situation is a problem. A slightly less ideal approach that would still be a big improvement is the same thing I suggested for structured identifiers</a>: making debuginfo include mangled names for all C++ types. This would let us rely on demangling instead of having to parse C++ type syntax in debuginfo names. But my guess is this won't happen either. So it looks like we'll just have to get good at parsing gigabytes of C++ type syntax really fast. Where Should the Debugger Set a Breakpoint? 2021-11-09T00:00:00+00:00