The Pasture

The Cost of a Closure in C, The Rest

2025-12-30T00:00:00+00:00

The last article checked the landscape of various C and C extension implementations of Closures for their performance capabilities. But, there were a few tweaks and extra things we could do to check the performance of other techniques. At the time, we ignored such techniques because they were so common, but it helps to quantify their performance relative to everything else, so we re-ran the benchmarks with a few new categories!

Skipping the Introductions

If you want an introduction to what is going on, there’s a gentle description with some 10,000 foot overview in the previous article. Additionally, if you’d like to learn more about specific kinds of Closures as they exist in C and/or C++, you can read a much older article or read the entire introduction in this work-in-progress C proposal. The much older article is a much gentler introduction; the work-in-progress C proposal goes through a lot of the technical and design nitty-gritty and why things work or do not work very well.

The purpose of this article will, once again, be performance and deducing the performance characteristics of various designs. Much of this was covered in the previous article, so we’re going to focus on the new additions to the Benchmarks since then and the important takeaways.

As always, the implementations of my benchmarks are publicly available¹.

Experimental Setup

The only thing that changed from the last time we did this was to use 150 repetitions of the whole 100,000+ sample iterations benchmarks rather than just 50 or 100 repetitions. You can find the full, detailed explanation at the bottom of this article.

Plain C - New Categories

The new benchmarking categories reflected in the new bar graphs explicitly track the performance of a few different kinds of “Plain C” testing.

Normal Functions: regular C functions which add an extra argument to the function call in order to pass more data. Somewhat similar in representation to rewriting qsort to qsort_r/qsort_s to pass a user data pointer.
Normal Functions (Rosetta Code): regular C functions which add an extra argument to the function call in order to pass more data. Taken directly from the Rosetta Code weekly, and uses a pointer int* k to refer to an already-existing value of k during a series of recursive calls.
Normal Functions (Static): regular C function which uses a static variable to pass the specific context to the next function. Not thread safe. Does not modify the function call signature.
Normal Functions (Thread Local): same as “Normal Functions (Static)” but using a thread_local variable instead of a static variable. Obviously thread safe. Does not modify the function call signature.

These are different from the “Normal Functions” in small but important ways, and – critically – two of them do not modify the signature of the function call, meaning they can be used with the old-style of qsort APIs that do not take a void* user_data parameter. In particular, rather than taking an extra or dummy argument like arg* in:

int f0(arg* unused) {
	(void)unused;
	return 0;
}

int f1(arg* unused) {
	(void)unused;
	return 1;
}

int f_1(arg* unused) {
	(void)unused;
	return -1;
}

It instead preserves the initial interface, without the (potentially unused) argument. This is important for Foreign Function Interfaces (FFI) and other shenanigans that gets used with closure-style code. Thus, rather than needing to write new functions with an extra argument, the return 1, return -1, and return 0 helpers can be written in the normal, plain, usual way:

int f0() {
	return 0;
}

int f1() {
	return 1;
}

int f_1() {
	return -1;
}

One would imagine that such a change would not actually have any meaningful performance impact, and that using something like static variables or global variables to shuttle that data over into whatever function that needed it wouldn’t cause any measurable performance difference.

Results

Of course, if it were true that there was no performance difference, I wouldn’t be forced to write about it! So, here we are, the cost or non-cost for the various kinds of “Normal Functions” usages, as compared to all the others:

_{For the vision-impaired, a text description is available.}

As shown in the last article, performance is SO TERRIBLE for some solutions that it completely crowds out any useful visual from the linear graphs. So, we need to swap to the logarithmic graphs to get a better picture:

_{For the vision-impaired, a text description is available.}

Still, the logarithmic graphs render things like the black error bars on each bar graph completely useless. So, we swap back to linear this time, but with the caveat that we remove some of the worst “outliers” (e.g., the things that had the most awful performance metrics). This, effectively, means cutting out the “Lambda (Rosetta Code)” category and bar graph. This gives us the following linearly-scaled graph:

_{For the vision-impaired, a text description is available.}

There, that’s much better and easier to read! It also gives us a more precise look at the faster-performing functions, and lets us talk about it much more clearly!

Insights

There are quite a few insights here that are important to elaborate on. We will start first with the obvious DRASTIC improvements we need from the original code contained in the previous article to where are are now: “Normal Functions (Rosetta Code)” to “Normal Functions”.

Becoming the Most Normal Function

The only difference between this and “Normal Functions (Rosetta Code)” is us not holding onto a pointer. Specifically, the all structure in the Normal Functions is just:

typedef struct all {
	int (*B)(struct all*);
	int k;
	struct all *x1, *x2, *x3, *x4, *x5;
} all;

static int A(int k, all* x1, all* x2, all* x3, all* x4, all* x5);

static int B(all* self) {
	return A(--self->k, self, self->x1, self->x2, self->x3, self->x4);
}

static int A(int k, all* x1, all* x2, all* x3, all* x4, all* x5) {
	if (k <= 0) {
		return x4->B(x4) + x5->B(x5);
	}
	else {
		all y = { .B = B, .k = k, .x1 = x1, .x2 = x2, .x3 = x3, .x4 = x4, .x5 = x5 };
		return B(&y);
	}
}

The only change here is that instead of using int* k like in the arg structure of Rosetta Code we use int k directly:

typedef struct arg {
	int (*fn)(struct arg*);
	int* k;
	struct arg *x1, *x2, *x3, *x4, *x5;
} arg;

static int f_1(arg* _) {
	return -1;
}

static int f0(arg* _) {
	return 0;
}

static int f1(arg* _) {
	return 1;
}

// --- helper
static int eval(arg* a) {
	return a->fn(a);
}

static int A(arg*);

// --- functions
static int B(arg* a) {
	int k    = *a->k -= 1;
	arg args = { B, &k, a, a->x1, a->x2, a->x3, a->x4 };
	return A(&args);
}

static int A(arg* a) {
	return *a->k <= 0 ? eval(a->x4) + eval(a->x5) : B(a);
}

It turns out needing to do that indirect load to get at int* k cost us a LOT more than any of us could hope for. This is surprising, given that the lambda uses a single default capture of & and references the k it was made with transparently. In essence: it works actually like the poorly-performing “Normal Functions (Rosetta Code)” example, and yet the compiler is able to outperform this in comparison to the structure passed as an explicit argument.

The problem is that the indirect load through both (a) the int* k and (b) the all*/arg* structure are actually impeding compiler optimization and slowing us down. In C, we like to imagine that doing in-place modification and operations directly on a given piece of memory can generally be better and faster than other techniques. This applies for big data sets and huge arrays, but for smaller work like what is in the Man or Boy test, it’s actually the opposite: pointers to smaller pieces of data are a big waste of time.

The good news is that removing the int* k only means we have one level of indirection to deal with, and that really boosts performance compared to the original, bad Rosetta Code Wiki example that this benchmark is based on. Unfortunately, despite getting a huge boost from its old performance…

Lambdas Are Still Peak

It is the encapsulation and the preservation of type information without hiding it behind an additional structures that keeps the performance lean. This means that the design of lambdas – a unique object with its own type that is not immediately hoisted or erased like it is in Apple Blocks, GNU Nested Functions, and other compiler techniques – is actually the leanest possible implementation.

The drawback of this that is especially egregious in C, unfortunately, is that unlike C++ there are no templates in C. There’s no “fake” recursion parameter we can add to limit an infinity-spiral of self-calls. This means that unique typings – while an unrestricted boon in C++ – is actually a bit of a drawback in C! In terms of passing arguments around and returning them, there’s no type-generics at compile-time that can help with this.

So either all the code interacting with it has to be macros (EWWWW), OR we need to develop at least one layer of indirection so we can prevent things like infinite recursion or realistically handle lots of data types. The much more sadder conclusion is that a programming language like C, unless you drop down to assembly or hand-unroll loops with your own selection of manually-crafted strong types, you will lose out on some degree of performance. This is not normally something anybody would be able to say about C, but it turns out that needing to do type-erasure imposes a cost. If the compiler cannot unroll that cost for any number of reasons, you will end up paying for it in performance. (But you can still get pretty good code size, so that part is nice at least.)

The Next Tier Up: Very Small Amounts of Type Erasure

While Lambdas are the best and standalone in what they are capable of, they are only the best under C++-ish, template-ish circumstances (like C macro generics). When you have to ditch the templates and the perfect type information, C++-style Lambdas lose a good bit of their competitive edge. Primarily, any amount of lean type erasure adds an non-negotiable impact to performance over the base case, as shown by “Normal Functions”, “Custom C++ Class”, “Lambdas std::function_ref”, and “Normal Functions (Statics)”.

I put “Normal Functions (Statics)” into this group despite it clearly having very bad performance implications from how GCC implements it that actually make it slightly wore than the others. It’s also surprising that passing a variable by static variable – a solution touted by many C developers and often said to be “just as good” as being able to hijack the function signature and add a new parameter – is actually strictly worse than “Normal Functions”. One can imagine that a static variable in charge of doing transportation is inevitably going to have to pay for the cost of loads and stores for each function call, and that compilers have to try to contest with that differently.

Slightly Worse: `thread_local`

No surprise that no matter the setup, using the thread_local keyword instead of the static keyword adds more overhead. I was, again, surprised by exactly how much assigning into it once and then reading it a single time once inside the function could have on the performance metrics, but it turns out that this is not free either.

It goes to show that having what the Closures WIP ISO C proposal asks for both C++-style Lambdas and C-style “Capture Functions” (nested functions that do not have the design, ABI issues, and Implementation Baggage of regular GNU Nested Functions)² along with a Wide Function Pointer type would be better than trying to figure out a magic static or magic thread_local style of implementation.

We are not sure what to think of the Local Functions and Function Literals proposals³, because neither of them try to allow you to access local variables. Which is 90%⁴ of the reason anyone uses Nested Functions to begin with!

What Is Going On With GNU Nested Functions???

Honestly, I do NOT know at this point.

It’s worth saying that I almost had to cut out GNU Nested Functions because of how god-awfully the were performing in the GCC graphs. It made it exponentially harder to get a good, zoomed-in look at the rest of the entries. While some have talked about standardizing just GNU Nested Functions, I do not think that ISO C could standardize an extension like this in any kind of Good Faith and still call itself a language concerned about low-level code and speed. Its existing implementations are so performance-deleting it’s a wonder why the decades-old code generated for it hasn’t been improved or touched up. I can only hope that the forthcoming -ftrampoline-impl=heap code from GCC puts it more in-line with the “Normal Functions (Static)” or “Normal Functions (Thread Local)” category, but if the performance of the new trampoline is just as awful as the current one I’d consider GNU Nested Functions to be dead-on-arrival for a lot of use cases.

This sort of awful performance also retroactively justifies Clang’s public and open decision to never, ever implement GNU Nested Functions. On top of the security issues the typical stack-based trampoline creates, the performance qualities are so egregious that just asking everyone to use -fblocks and the Apple Blocks extension for this functionality is probably the lesser of two evils. It also brings into question whether a “lean” approach that grabs the “environment pointer” or the “stack frame” pointer directly, as in n3654⁵ is a good idea to start with.

But, it’s premature to condemn n3654 because it’s unknown if the problem is the fact that the use of accessing variables through what is effectively __builtin_stack_address and a trampoline is why performance sucks so bad, or if it’s the way the trampoline screws with the stack. There are many compounding reasons why GNU Nested Functions as they exist today do so poorly, and more investigation is needed to make sure the approach in n3654 of accessing the “Context” of a nested function isn’t actually a huge performance footgun.

Final Takeways

Now that we have thoroughly evaluated the solution space for C, including many of the home-cooked favorite solutions written in plain C, I think the safe conclusions I can draw are:

Lambdas (and the proposed Capture Functions²) are the best for performance, so long as perfect information is retained.
A type-preserving closure (e.g. Lambdas or Capture Functions) combined with the smallest, thinnest possible type erasure (a Wide Function Pointer type) would bring immediate performance gains over existing C extensions and plain C code that does not modify the function signature.
Both Apple Blocks and GNU Nested Functions have parts of their designs and implementations that are deeply problematic for integration into normal compilers.
It is unclear if making what is effectively access to the function frame / “environment” through a pointer is an advisable course of action for the future of the C ecosystem.
C users writing typical C code will, at some point, suffer some degree of performance loss in complex scenarios due to necessary type erasure to work with complex, compiler-generated closure types. Type-generic macro programming can help here, but the tradeoff for code size versus speed should be considered on whether to use a normal, type-erased interface versus an entirely (macro-)generic set of function calls.

Finally, both static and thread_local have performance cost, moreso on GCC than on Clang. I’d be interested to run the MSVC numbers too as more than just a quick “this works on the damn compiler” check, but I think these numbers are more than enough to draw general conclusions about the viability of the various approaches.

Happy New Year, and until next weird niche performance bit. 💚

Banner and Title Photo by Lukas, from Pexels

P.S.

Methodology

The tests were ran on a 13-inch 2020 MacBook Pro M1. It has 16 GB of RAM and is on MacOS 15.7.2 Sequoia at the time the test was taken, using the stock MacOS AppleClang Compiler and the stock brew install gcc compiler in order to produce the numbers seen on December 28th, 2025.

The experimental setup used the Man or Boy test, but with the given k value loaded by calling a function in a DLL / Shared Object. The expected k value that the Man or Boy test is supposed to yield is also loaded from a DLL / Shared Object. This prevents optimizing out all recursion and doing enough ahead-of-time computation to simply collapse the benchmarked code into a constant-time, translation-time calculation. It ensures the benchmark is actually measuring the actual performance characteristics of the technique used, as all of them are computing from the same initial k value and all of them are expected to produce the same expected_k answer.

There 2 measures being conducted: Real (“wall clock”) Time and CPU Time. The time is gathered by running a single iteration of the code within a for loop. That loop runs anywhere from a couple thousand to hundreds of thousands of times to produce confidence in that run of the benchmark, and each loop run is considered an individual iteration. The iterations are then averaged to produce the first point after there is confidence that the measurement is accurate and the benchmark is warm. The iteration process to produce a single mean was then repeated 150 times. All 150 means are used as the points for the values (shown as transparent dots) on the bar graph, and the average of all of those 150 means is then used as the height of a bar in a bar graph.

The bars are presented side-by-side as a horizontal bar chart with various categories of C or C++ code being measured. The 13 total categories of C and C++ code are:

no-op: Literally doing nothing. It’s just there to test environmental noise and make sure none of our benchmarks are so off-base that we’re measuring noise rather than computation. Helps keep us grounded in reality.
Normal Functions: regular C functions which add an extra argument to the function call in order to pass more data. Somewhat similar in representation to rewriting qsort to qsort_r/qsort_s to pass a user data pointer.
Normal Functions (Static): regular C function which uses a static variable to pass the specific context to the next function. Not thread safe.
Normal Functions (Thread Local): same as “Normal Functions (Static)” but using a thread_local variable instead of a static variable. Obviously thread safe.
Lambdas (No Function Helpers): a solution using C++-style lambdas. Rather than using helper functions like f0, f1, and f_1, we compute a raw lambda that stores the value meant to be returned for the Man-or-Boy test (with a body of just return i;) in the lambda itself and then pass that uniquely-typed lambda to the core of the test. The entire test is templated and uses a fake recursion template parameter to halt the translation-time recursion after a certain depth.
Lambdas: The same as above but actually using int f0(void), etc. helper functions at the start rather than lambdas. Tries to reduce optimizer pressure by using “normal” types which do not add to the generated number of lambda-typed, recursive, templated function calls.
Lambdas (std::function_ref): The same as above, but rather than using a function template to handle each uniquely-typed lambda like a precious baby bird, it instead erases the lambda behind a std::function_ref. This allows the recursive function to retain exactly one signature.
Lambdas (std::function): The same as above, but replaces std::function_ref with std::function. This is an allocating, C++03-style type.
Lambdas (Rosetta Code): The code straight out of the C++11 Rosetta Code Lambda section on the Man-or-Boy Rosetta Code implementation.
Apple Blocks: Uses Apple Blocks to implement the test, along with the __block specifier to refer directly to certain variables on the stack.
GNU Nested Functions (Rosetta Code): The code straight out of the C Rosetta Code section on the Man-or-Boy Rosetta Code implementation.
GNU Nested Functions: GNU Nested Functions similar to the Rosetta Code implementation, but with some slight modifications in a hope to potentially alleviate some stack pressure if possible by using regular helper functions like f0, f1, and f_1.
Custom C++ Class: A custom-written C++ class using a discriminated union to decide whether it’s doing a straight function call or attempting to engage in the Man-or-Boy recursion.
C++03 shared_ptr (Rosetta Code): A C++ class using std::enable_shared_from_this and std::shared_ptr with a virtual function call to invoke the “right” function call during recursion.

Each bar graph has a black error bar at the end, representing the standard error of the measurements performed. At 150 iterations, the error bars (which are most easily understood and read in the linear graphs) are a decent visual approximation of whether or not two solutions are within a statistical threshold of one another.

The two compilers tested are Apple Clang 17 and GCC 15. There are two graph images for each kind of measurement (linear, logarithmic, and linear-but-with-outliers-removed) because one is for Apple Clang and the other is for GCC. This is particularly important because neither compiler implements the other’s closure extension (Clang does Apple Blocks but not Nested Functions, while GCC does Nested Functions in exclusively its C frontend but does not implement Apple Blocks).

MSVC was not tested because MSVC implements none of the extensions being tested, and we do not expect that its performance characteristics would be wildly different than what GCC or Clang are capable of. (In fact, we expect it might be a bit worse in all untested, non-scientific honesty.)

See: https://github.com/soasis/idk/tree/main/benchmarks/closures. ↩
See “Captures Functions: Rehydrated Nested Functions” from “Functions with Data - Closures in C”. ↩ ↩²
See “N3678 - Local functions” and “N3679 - Function literals”, https://www.open-std.org/JTC1/SC22/WG14/www/docs/n3678.pdf and https://www.open-std.org/JTC1/SC22/WG14/www/docs/n3679.pdf ↩
This is not a hard or scientific statistic. We simply catalogued a codebase that used GNU Nested Functions – of the thousands of uses, the overwhelming supermajority accessed variables contextually. A proposal that solves 10% of a codebases existing uses seems worthless. ↩
See “Access the Context of Nested Functions”, https://www.open-std.org/JTC1/SC22/WG14/www/docs/n3654.pdf. ↩

The Cost of a Closure in C

2025-12-10T00:00:00+00:00

I had a vague idea that closures could have a variety of performance implications; I did not believe that so many of the chosen and potential designs for C and C++ extensions ones, however, were so… suboptimal.

But, before we get into how these things perform and what the cost of their designs are, we need to talk about what Closures are.

“Closures”?

Closures in this instance are programming language constructs that include data alongside instructions that are not directly related to their input (arguments) and their results (return values). They can be seen as a “generalization” of the concept of a function or function call, in that a function call is a “subset” of closures (e.g., the set of closures that do not include this extra, spicy data that comes from places outside of arguments and returns). These generalized functions and generalized function objects hold the ability to do things like work with “instance” data that is not passed to it directly (i.e., variables surrounding the closure off the stack) and, usually, some way to carry around more data than is implied by their associated function signature.

Pretty much all recent and modern languages include something for Closures unless they are deliberately developing for a target audience or for a source code design that is too “low level” for such a concept (such as Stack programming languages, Bytecode languages, or ones that fashion themselves as assembly-like or close to it). However, we’re going to be focusing on and looking specifically at Closures in C and C++, since this is going to be about trying to work with and – eventually – standardize something for ISO C that works for everyone.

First, let’s show a typical problem that arises in C code to show why closure solutions have popped up all over the C ecosystem, then talk about it in the context of the various solutions.

The Closure Problem

The closure problem can be neatly described by as “how do I get extra data to use within this qsort call?”. For example, consider setting this variable, in_reverse, as part of a bit of command line shenanigans, to change how a sort happens:

#include 
#include 
#include 

static int in_reverse = 0;

int compare(const void* untyped_left, const void* untyped_right) {
  const int* left = untyped_left;
  const int* right = untyped_right;
  return (in_reverse) ? *right - *left : *left - *right;
}

int main(int argc, char* argv[]) {
  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
  qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare);
	
  return list[0];
}

This uses a static variable to have it persist between both the compare function calls that qsort makes and the main call which (potentially) changes its value to be 1 instead of 0. Unfortunately, this isn’t always the best idea for more complex programs that don’t fit within a single snippet:

it is impossible to have different “copies” of a static variable, meaning all mutations done in all parts of the program that can see in_reverse are responsible for knowing the state before and after (e.g., heavily stateful programming of state that you may not own / cannot see);
working on static data may produce thread contention/race conditions in more complex programs;
using _Thread_local instead of static only solves the race condition problem but does not solve the “shared across several places on the same thread” problem;
referring to specific pieces of data or local pieces of data (like list itself) become impossible;

and so on, and so forth. This is the core of the problem here. It becomes more pronounced when you want to do things with function and data that are a bit more complex, such as Donald Knuth’s “Man-or-Boy” test code.

The solutions to these problems come in 4 major flavors in C and C++ code.

Just reimplement the offending function to take a userdata pointer so you can pass whatever data you want (typical C solution, e.g. going from qsort as the sorting function to BSD’s qsort_r¹ or Annex K’s qsort_s²).
Use GNU Nested Functions to just Refer To What You Want Anyways.
Use Apple Blocks to just Refer To What You Want Anyways.
Use C++ Lambdas and some elbow grease to just Refer To What You Want Anyways.

Each solution has drawbacks and benefits insofar as usability and design, but as a quick overview we’ll show what it’s like using qsort (or qsort_r/qsort_s, where applicable). Apple Blocks, for starters, looks like this:

#include 
#include 
#include 

int main(int argc, char* argv[]) {
	// local, non-static variable
	int in_reverse = 0;

	// value changed in-line
	if (argc > 1) {
		char* r_loc = strchr(argv[1], 'r');
		if (r_loc != NULL) {
			ptrdiff_t r_from_start = (r_loc - argv[1]);
			if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
				in_reverse = 1;
			} 
		}
	}
	
	int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
	
	qsort_b(list, (sizeof(list)/sizeof(*list)), sizeof(*list),
		// Apple Blocks are Block Expressions, meaning they do not have to be stored
		// in a variable first
		^(const void* untyped_left, const void* untyped_right) {
			const int* left = untyped_left;
			const int* right = untyped_right;
			return (in_reverse) ? *right - *left : *left - *right;
		}
	);
	
	return list[0];
}

and GNU Nested Functions look like this:

#include 
#include 
#include 

int main(int argc, char* argv[]) {
	// local, non-static variable
	int in_reverse = 0;

	// modify variable in-line
	if (argc > 1) {
		char* r_loc = strchr(argv[1], 'r');
		if (r_loc != NULL) {
			ptrdiff_t r_from_start = (r_loc - argv[1]);
			if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
				in_reverse = 1;
			} 
		}
	}
	
	int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
	
	// GNU Nested Function definition, can reference `in_reverse` directly
	// is a declaration/definition, and cannot be used directly inside of `qsort`
	int compare(const void* untyped_left, const void* untyped_right) {
		const int* left = untyped_left;
		const int* right = untyped_right;
		return (in_reverse) ? *right - *left : *left - *right;
	}
	// use in the sort function without the need for a `void*` parameter
	qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare);
	
	return list[0];
}

or, finally, C++-style Lambdas:

#define __STDC_WANT_LIB_EXT1__ 1

#include 
#include 
#include 

int main(int argc, char* argv[]) {
	int in_reverse = 0;
	
	if (argc > 1) {
		char* r_loc = strchr(argv[1], 'r');
		if (r_loc != NULL) {
			ptrdiff_t r_from_start = (r_loc - argv[1]);
			if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
				in_reverse = 1;
			} 
		}
	}
	
	// lambdas are expressions, but we can assign their unique variable types with `auto`
	auto compare = [&](const void* untyped_left, const void* untyped_right) {
		const int* left = (const int*)untyped_left;
		const int* right = (const int*)untyped_right;
		return (in_reverse) ? *right - *left : *left - *right;
	};

	int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };	

	// C++ Lambdas don't automatically make a trampoline, so we need to provide
	// one ourselves for the `qsort_s/r` case so we can call the lambda
	auto compare_trampoline = [](const void* left, const void* right, void* user) {
		typeof(compare)* p_compare = user;
		return (*p_compare)(left, right);
	};
	qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare);

	return list[0];
}

To solve this gaggle of problems, pretty much every semi-modern language (that isn’t assembly-adjacent or based on some kind of state/stack programming) provide some idea of being able to associate some set of data with one or more function calls. And, particularly for Closures, this is done in a local way without passing it as an explicit argument. As it turns out, all of those design choices – including the ones in C – have pretty significant consequences on not just usability, but performance.

Not A Big Overview

This article is NOT going to talk in-depth about the design of all of the alternatives or other languages. We’re focused on the actual cost of the extensions and what they mean. A detailed overview of the design tradeoffs, their security implications, and other problems, can be read at the ISO C Proposal for Functions with Closures here; it also gets into things like Security Implications, ABI, current implementation impact, and more of the various designs. The discussion in the paper is pretty long and talks about the dozens of aspects of each solution down to both the design aspect and the implementation quirks. We encourage you to dive into that proposal and read it to figure out if there’s something more specific you care about insofar as some specific design portion. But, this article is going to be concerned about one thing and one thing only:

Purrrrrrrformance :3!

In order to measure this cost, we are going to take Knuth’s Man-or-Boy test and benchmark various styles of implementation in C and C++ using various different extensions / features for the Closure problem. The Man-or-Boy test is an efficient measure of how well your programming language can handle referring to specific entities while engaging in a large degree of recursion and self-reference. It can stress test various portions of how your program creates and passes around data associated with a function call, and if your programming language design is so goofy that it can’t refer to a specific instance of a variable or function argument, it will end up producing the wrong answer and breaking horrifically.

Anatomy of a Benchmark: Raw C

Here is the core of the Man-or-Boy test, as implemented in raw C. This implementation³ and all the others are available online for us all to scrutinize and yell at me for messing up, to make sure I’m not slandering your favorite solution for Closures in this space.

// ...

static int eval(ARG* a) {
	return a->fn(a);
}

static int B(ARG* a) {
	int k    = *a->k -= 1;
	ARG args = { B, &k, a, a->x1, a->x2, a->x3, a->x4 };
	return A(&args);
}

static int A(ARG* a) {
	return *a->k <= 0 ? eval(a->x4) + eval(a->x5) : B(a);
}

// ...

You will notice that there is a big, fat, ugly ARG* parameter hanging around all of these functions. That is because, as stated before, plain ISO C cannot handle passing the data around unless it’s part of a function’s arguments. Because the actual core of the Man-or-Boy experiment is the ability to refer to specific values of k that exist during the recursive run of the program, we need to actually modify the function signature and thereby cheat some of the implicit Man-or-Boy requirements of not passing the value in directly. Here’s what ARG looks like:

typedef struct arg {
	int (*fn)(struct arg*);
	int* k;
	struct arg *x1, *x2, *x3, *x4, *x5;
} ARG;

static int f_1(ARG* _) {
	return -1;
}

static int f0(ARG* _) {
	return 0;
}

static int f1(ARG* _) {
	return 1;
}

static int eval(ARG* a) {
	// ...
}
// ...

And this is how it gets used in the main body of the function in order to compute the right answer and benchmark it:

static void normal_functions_rosetta(benchmark::State& state) {
	const int initial_k  = k_value();
	const int expected_k = expected_k_value();
	int64_t result       = 0;

	for (auto _ : state) {
		int k     = initial_k;
		ARG arg1  = { f1, NULL, NULL, NULL, NULL, NULL, NULL };
		ARG arg2  = { f_1, NULL, NULL, NULL, NULL, NULL, NULL };
		ARG arg3  = { f_1, NULL, NULL, NULL, NULL, NULL, NULL };
		ARG arg4  = { f1, NULL, NULL, NULL, NULL, NULL, NULL };
		ARG arg5  = { f0, NULL, NULL, NULL, NULL, NULL, NULL };
		ARG args  = { B, &k, &arg1, &arg2, &arg3, &arg4, &arg5 };
		int value = A(&args);
		result += value == expected_k ? 1 : 0;
	}

	if (result != state.iterations()) {
		state.SkipWithError("failed: did not produce the right answer!");
	}
}

BENCHMARK(normal_functions_rosetta);

Everything within the for (auto _ : state) { ... } is benchmarked. For those paying attention to the code and find it looking familiar, it’s because that code is the basic structure all Google Benchmark⁴ code finds itself looking like. I’ve wanted to swap to Catch2⁵ for a long time now to change to their benchmarking infrastructure, but I’ve been stuck on Google Benchmark because I’ve made a lot of graph-making tools based on its JSON output and I have not vetted Catch2’s JSON output yet to see if it has all of the necessary bits ‘n’ bobbles I use to de-dedup runs and compute statistics.

Everything outside is setup (the part above the for loop) or teardown/test correction (the part below the for loop). The initialization of the ARG argss cannot be moved outside of the measuring loop because each invocation of A – the core of the Man-or-Boy experiment – modifies the k of the ARG parameter, so all of them have to be inside. Conceivably, arg1 .. 5 could be moved out of the loop, but I am very tired of looking at the eight or nine variations of this code so someone else can move it and tell me if Clang or GCC has lots of compiler optimization sauce and doesn’t understand that those 5 argIs can be hoisted out of the loop.

The value k is 10, and expected_k is -67. The expected, returned k value is dependent on the input k value, which controls how deep the Man-or-Boy test would recurse on itself to produce its answer. Therefore, to prevent GCC and Clang and other MEGA POWERFUL PILLAR COMPILERS from optimizing the entire thing out and just replacing the benchmark loop with ret -67, both k_value() and expected_k_value() come from a Dynamic Link Library (.dylib on MacOS, .so on *nix platforms, .dll on Windows platforms) to make sure that NO amount of optimization (Link Time Optimization/Link Time Code Generation, Inlining Optimization, Cross-Translation Unit Optimization, and Automatic Constant Expression Optimization) from C or C++ compilers could fully preempt all forms of computation.

This allows us to know, for sure, that we’re actually measuring something and not just testing how fast a compiler can load a number into a register and test it against state.iterations(). And, since we know for sure, we can now talk the general methodology.

Methodology

The tests were ran on a dying 13-inch 2020 MacBook Pro M1 that has suffered several toddler spills and two severe falls. It has 16 GB of RAM and is on MacOS 15.7.2 Sequoia at the time the test was taken, using the stock MacOS AppleClang Compiler and the stock brew install gcc compiler in order to produce the numbers seen on December 6th, 2025.

There 2 measures being conducted: Real Time and CPU Time. The time is gathered by running a single iteration of the code within the for loop anywhere from a couple thousand to hundreds of thousands of times to produce confidence in that run of the benchmark. This is then averaged to produce the first point. The process is repeated 50 times, repeating that many iterations to build further confidence in the measurement. All 50 means are used as the points for the values, and the average of all of those 50 means is then used as the height of a bar in a bar graph.

The bars are presented side-by-side as a horizontal bar chart with 11 categories of C or C++ code being measured. The 11 categories are:

no-op: Literally doing nothing. It’s just there to test environmental noise and make sure none of our benchmarks are so off-base that we’re measuring noise rather than computation. Helps keep us grounded in reality.
Lambdas (No Function Helpers): a solution using C++-style lambdas. Rather than using helper functions like f0, f1, and f_1, we compute a raw lambda that stores the value meant to be returned for the Man-or-Boy test (return i;) in the lambda itself and then pass that uniquely-typed lambda to the core of the test. The entire test is templated and uses a fake recursion template parameter to halt the recursion after a certain depth.
Lambdas: The same as above but actually using int f0(void), etc. helper functions at the start rather than lambdas. Reduces inliner pressure by using “normal” types which do not add to the generated number of lambda-typed, recursive, templated function calls.
Lambdas (std::function_ref): The same as above, but rather than using a function template to handle each uniquely-typed lambda like a precious baby bird, it instead erases the lambda behind a std::function_ref. This allows the recursive function to retain exactly one signature.
Lambdas (std::function): The same as above, but replaces std::function_ref with std::function. This is its allocating, C++03-style type.
Lambdas (Rosetta Code): The code straight out of the C++11 Rosetta Code Lambda section on the Man-or-Boy Rosetta Code implementation.
Apple Blocks: Uses Apple Blocks to implement the test, along with the __block specifier to refer directly to certain variables on the stack.
GNU Nested Functions (Rosetta Code): The code straight out of the C Rosetta Code section on the Man-or-Boy Rosetta Code implementation.
GNU Nested Functions: GNU Nested Functions similar to the Rosetta Code implementation, but with some slight modifications in a hope to potentially alleviate some stack pressure if possible by using regular helper functions like f0, f1, and f_1.
Custom C++ Class: A custom-written C++ class using a discriminated union to decide whether it’s doing a straight function call or attempting to engage in the Man-or-Boy recursion.
C++03 shared_ptr (Rosetta Code): A C++ class using std::enable_shared_from_this and std::shared_ptr with a virtual function call to invoke the “right” function call during recursion.

The two compilers tested are Apple Clang 17 and GCC 15. There are two graph images because one is for Apple Clang and the other is for GCC. This is particularly important because neither compiler implements the other’s closure extension (Clang does Apple Blocks but not Nested Functions, while GCC does Nested Functions in exclusively its C frontend but does not implement Apple Blocks⁶).

The Results

Ta-da!

_{For the vision-impaired, a text description is available.}

… Oh. That looks awful.

It turns out that some solutions are so dogwater that it completely screws up our viewing graphs. But, it does let us know that Lambdas using the Rosetta Code style are so unbelievably awful that it is several orders of magnitude more expensive than any other solution presented! One has to wonder what the hell is going on in the code snippet there, but first we need to make the graphs more legible. To do this we’re going to be using the (slightly deceptive) LOGARITHMIC SCALING. This is a bit deadly to do because it tends to mislead people about how much of a change there is, so please pay attention to the potential order of magnitude gains and losses when going from one bar graph to another.

_{For the vision-impaired, a text description is available.}

There we go. Now we can talk about the various solutions and – in particular – why “lambdas” have 4 different entries with such wildly differing performance profiles. First up, let’s talk about the clear performance winners.

Lambdas: On Top!

Not surprising to anyone who has been checked in to C++, lambdas that are used directly and not type-erased are on top. This means there’s a one-to-one mapping between a function call and a given bit of execution. We are cheating by using a constant parameter to stop the uniquely-typed lambdas being passed into the functions from recursing infinitely, which makes the Man-or-Boy function look like this:

template <int recursion = 0>
static int a(int k, const auto& x1, const auto& x2, const auto& x3, const auto& x4, const auto& x5) {
	if constexpr (recursion == 11) {
		::std::cerr << "This should never happen and this code should never have been generated." << std::endl;
		::std::terminate();
		return 0;
	}
	else {
		auto B = [&](this const auto& self) { return a<recursion + 1>(--k, self, x1, x2, x3, x4); };
		return k <= 0 ? x4() + x5() : B();
	}
}

Every B is its own unique type and we are not erasing that unique type when using the expression as an initializer to B. This means that when we call a again with B (the self in this lambda here using Deduced This, a C++23 feature that cannot be part of the C version of lambdas) which means we need to use auto parameters (a shortcut way of writing template parameters) to take it. But, since every parameter is unique, and every B is unique, calling this recursively means that, eventually, C++ compilers will actually just completely crash out/toss out-of-memory errors/say we’ve compile-time recursed too hard, or similar. That’s why the compile-time if constexpr on the extra, templated recursion parameter needs to have some arbitrary limit. Because we know k starts at 10 for this test, we just have some bogus limit of “11”.

This results in a very spammy recursive chain of function calls, where the actual generated names of these template functions are far more complex than a and can run the compiler into the ground / cause quite a bit of instantiations if you let recursion get to a high enough value. But, once you add the limit, the compiler gets perfect information about this recursive call all the way to every leaf, and thus is able to not only optimize the hell out of it, but refuse to generate the other frivolous code it knows won’t be useful.

Lambdas are also Fast, even when Type-Erased

You can observe a slight bump up in performance penalty when a Lambda is erased by a std::function_ref. This is a low-level, non-allocating, non-owning, slim “view” type that is analogous to what a language-based wide function pointer type would be in C. From this, it allows us to guess how good Lambdas in C would be even if you had to hide them behind a non-unique type.

The performance metrics are about equivalent to if you hand-wrote a C++ class with a custom operator() that uses a discriminated union, no matter which compiler gets used to do it. It’s obviously not as fast as having access to a direct function call and being able to slurp-inline optimize, but the performance difference is acceptable when you do not want to engage in a large degree of what is called “monomorphisation” of a generic routine or type. And, indeed, outside of macros, C has no way of doing this innately that isn’t runtime-based.

A very strong contender for a good solution!

Lambdas: On…. Bottom, too?

One must wonder, then, why the std::function Lambdas and the Rosetta Code Lambdas are either bottom-middle-of-the-road or absolutely-teary-eyed-awful.

Starting off, the std::function Lambdas are bad because of exactly that: std::function. std::function is not a “cheap” closure; it is a potentially-allocating, meaty, owning function abstraction. This means that it’s safe to make one and pass it around and store it and call it later; the cost of this is, obviously, that you’re allocating (when the type is big enough) for that internal storage. Part of this is alleviated by using const std::function& parameters, taking things by reference and only generating a new object when necessary. This prevents copying on every function call. Both the Rosetta Lambdas and regular std::function Lambdas code do the by-reference parameters bit, though, so where does the difference come in? It actually has to do with the Captures. Here’s how std::function Lambdas defines the recursive, self-referential lambda and uses it:

using f_t = std::function<int(void)>;

inline static int A(int k, const f_t& x1, const f_t& x2, const f_t& x3, const f_t& x4, const f_t& x5) {
	f_t B = [&] { return A(--k, B, x1, x2, x3, x4); };
	return k <= 0 ? x4() + x5() : B();
}

And, here is how the Rosetta Code Lambdas defines the recursive, self-referential lambda and uses it:

using f_t = std::function<int(void)>;

inline static int A(int k, const f_t& x1, const f_t& x2, const f_t& x3, const f_t& x4, const f_t& x5) {
	f_t B = [=, &k, &B] { return A(--k, B, x1, x2, x3, x4); };
	return k <= 0 ? x4() + x5() : B();
}

The big problem here is in the use of the =. What = by itself in the front of a lambda capture clause means is “copy all the visible variables in and hold onto that copy” (unless the capture for that following variable is “overridden” by a &var, address capture). Meanwhile, the & is the opposite: it means “refer to all the visible variables directly by their address and do not copy them in”. So, while the std::function Lambda is (smartly) referring to stuff directly without copying because we know for the Man-or-Boy test that referring to things directly is not an unsafe operation, the general = causes that for the several dozen recursive iterations through the function, it is copying all five allocating std::function arguments. So the first call creates a B that copies everything in, and then passes that in, and then the next call copies the previous B and the 4 normal functions, and then passes that in to the next B, and then it copies both previous B’s, and this stacks for the depth of the callgraph (some 10 times since k = 10 to start).

You can imagine how much that completely screws with the performance, and it explains why the Rosetta Code Lambdas code behaves so poorly in terms of performance. But, this also raises a question: if referring to everything by-reference saves so much speed, then why does GNU Nested Functions – in all its variants – perform so poorly? After all, Nested Functions capture everything by reference / by address, exactly like a lambda does with [&].

Similarly, if allocating over and over again was so expensive, how come Apple Blocks and C++03 shared_ptr Rosetta Code-style versions of the Man-or-Boy test don’t perform nearly as badly as the Rosetta Code Lambdas? Are we not copying the value of the arguments into a newly created Apple Block and, thusly, tanking the performance metrics? Well, as it turns out, there’s many reasons for these things, so let’s start with GNU Nested Functions.

Nested Functions and The Stack

I’ve written about it dozens of times now, but the prevailing and most common implementation of Nested Functions is with an executable stack. The are a lot of security and other implications for this, but all you need to understand is that the reason GCC did this is because it was an at-the-time slick encoding of both the location of the variables and the routine itself. Allocating a chunk of data off of the current programming stack means that the “environment context”/”this closure” pointer has the same anchoring address as the routine itself. This means you can encode both the location of the data to know what to access and the address of a function’s entry point into a single thing that works with your typical setup-and-call convention that comes with invoking a standard ISO C function pointer.

But think about that, briefly, in terms of optimization.

You are using the function’s stack frame at that precise point in the program as the “base address” for this executable code. That base address also means that all the variables associated with it need to be reachable from that base address: i.e., that things are not stuffed in registers, but that you are referring to the same variables as modified by the enclosing function around your nested function. Principally, this means that your function needs to have all of the following now so that GNU Nested Functions actually work.

A stack that is executable so that the base address used for the trampoline can be run succinctly.
A real function frame that exists somewhere in memory to serve as the base address for the trampoline.
Real objects in memory backing the names of the captured variables to be accessed.

This all seems like regular consequences, until you tack on the second order effects from the point of optimization.

A stack that now has both data and instructions all blended into itself.
A real function frame, which means no omission of a frame pointer and no collapsing / inlining of that function frame.
Real objects that all have their address taken that are tied to the function frame, which must be memory-accessible and which the compiler now has a hard time telling if they can simply be exchanged through registers or if they need to actually sit somewhere in memory.

In other words: GNU Nested Functions have created the perfect little storm for what might be the best optimizer-murderer. The reason it performs so drastically poorly (worse than even allocating lambdas inside of a std::function or C++03-style virtual function calls inside of a bulky, nasty C++ std::shared_ptr) by a whole order of magnitude or more is that everything about Nested Functions and their current implementation is basically Optimizer Death. If the compiler can’t see through everything – and the Man-or-Boy test with a non-constant value of k and expected_k – GNU Nested Functions deteriorate rapidly. It takes every core optimization technique that we’ve researched and maximized on in the last 30 years and puts a shotgun to the side of its head once it can’t pre-compute k and expected_k.

The good news is that GCC has completed a new backing implementation for GNU Nested Functions, which uses a heap-based trampoline. Such a trampoline does not interfere with the stack, would allow for omission of frame pointers while referring directly to the data itself (which may prevent the wrecking of specific kinds of inlining optimizations), and does not need an executable stack (just a piece of memory from ✨somewhere✨ it can mark executable). This may have performance closer to Apple Blocks, but we don’t have a build of the latest GCC to test it with. But, when we do, we can simply add the compilation flag -ftrampoline-impl=heap to the two source files in CMake and then let the benchmarks run again to see how it stacks up!

Finally, there is a minor performance degradation because our benchmarking software is in C++ and this extension exists exclusively in the C frontend of GCC. That means I have to use an extern function call within the benchmark loop to get to the actual code. Within the function call, however, all of this stuff should be optimized down, so the cost of a single function call’s stack frame shouldn’t be so awful, but I expect to try to dig into this better to help make sure the extern of a C function call isn’t making things dramatically worse than they are. Given it’s a different translation unit and it’s not being compiled as a separate static or dynamic library, it should still link together and optimize cleanly, but given how bad it’s performing? Every possible issue is on the table.

What about Apple Blocks?

Apple Blocks are not the fastest, but they are the best of the C extensions while being the worst of the “fast” solutions. They are not faster than just hacking the ARG* into the function signature and using regular normal C function calls, unfortunately, and that’s likely due to their shared, heap-ish nature. The saddest part about Apple Blocks is that it works using a Blocks Runtime that is already as optimized as it can possibly be: Clang and Apple both document that while the Blocks Runtime does manage an Automatic Reference Counted (ARC) Heap of Block pointers, when a Block is first created it will literally have its memory stored on the stack rather than in the heap. In order to move it to the heap, one must call Block_copy to trigger the “normal” heap-based shenanigans. We never call Block_copy, so this is with as-fast-as-possible variable access and management with few allocations.

It’s very slightly disappointing that: normal C functions with an ARG* blob; a custom C++ class using a discriminated union and operator(); any mildly conscientious use of lambdas; and, any other such shenanigans perform better than the very best Apple Blocks has to offer. One has to imagine that all of the ARC management functions made to copy the int^(void) “hat-style” function pointers, even if they end up not doing much for the data stored on the stack, impacted the results here. But, this is also somewhat good news: because Apple Block hat pointers are cheaply-copiable entities (they are just pointers to a Block object), it means that even if we copy all of the arguments into the closure every function call, that copying is about as cheap as it can get. Obviously, as regular “Lambdas” and “Lambdas (No Function Helpers)” demonstrate, being able to just slurp everything up by address/by reference – including visible function arguments – with [&] saves us a teensy, tiny bit of time⁷.

The cheapness of int^(void) hat-pointer function types is likely the biggest saving grace for Apple Blocks in this benchmark. In the one place we need to be careful, we rename the input argument k to arg_k and then make a __block variable to actually refer to a shared int k (and get the right answer):

static int a(int arg_k, fn_t ^ x1, fn_t ^ x2, fn_t ^ x3, fn_t ^ x4, fn_t ^ x5) {
	__block int k    = arg_k;
	__block fn_t ^ b = ^(void) { return a(--k, b, x1, x2, x3, x4); };
	return k <= 0 ? x4() + x5() : b();
}

All of the x1, x2, and x3 – like the bad Lambda case – are copied over and over and over again. One could change the name of all the arguments arg_xI and then have an xI variable inside that is marked __block, but that’s more effort and very unlikely to have any serious impact on the code while possibly degrading performance for the setup of multiple shared variables that all have to also be ARC-reference-counted and be stored inside each and every new b block that is created.

A Brief Aside: Self-Referencing Functions/Closures

It’s also important to note that just writing this:

static int a(int arg_k, fn_t ^ x1, fn_t ^ x2, fn_t ^ x3, fn_t ^ x4, fn_t ^ x5) {
	__block int k    = arg_k;
	fn_t ^ b = ^(void) { return a(--k, b, x1, x2, x3, x4); };
	return k <= 0 ? x4() + x5() : b();
}

(no __block on the b variable) is actually a huge bug. Apple Blocks, like older C++ Lambdas, cannot technically refer to “itself” inside. You have to refer to the “self” by capturing the variable it is assigned to. For those who use C++ and are familiar with the lambdas over there, it’s like making sure you capture the variable you initialize with the lambda by reference while also making sure it has a concrete type. It can only be escaped by using auto and Deducing This, or some other combination of referential-use. That is:

auto x = [&x](int v) { if (v != limit) x(v + 1); return v + 8; } does not compile, as the type auto isn’t figured out yet;
std::function_ref x = [&x](int v) { if (v != limit) x(v + 1); return v + 8; } compiles but due to C++ shenanigans produces a dangling reference to a temporary lambda that dies after the full expression (the initialization);
std::function x = [&x](int v) { if (v != limit) x(v + 1); return v + 8; } compiles and works with no segfaults because std::function allocates, and the reference to itself &x is just fine.
and, finally, auto x = [](this const auto& self, int v) { if (v != limit) self(v + 1); return v + 8; } which compiles and works with no segfaults because the invisible self parameter is just a reference to the current object.

The problem with the most recent Apple Blocks snippet just above is that it’s the equivalent of doing

std::function x = [x](int v) { if (v != limit) x(v + 1); return v + 8; }

Notice that there’s no &x in the lambda initializer’s capture list. It’s copying an (uninitialized) variable by-value into the lambda. This is what Apple Blocks set into a variable that does not have a __block specifier, like in our bad code case with b.

All variations of this on all implementations which allow for self-referencing allow this and compile some form of this. You would imagine some implementations would warn about this, but this is leftover nonsense from allowing a variable to refer to itself in its initialization. The obvious reason this happens in C and C++ is because you can create self-referential structures, but unfortunately neither language provided a safe way to do this generally. C++23’s Deducing This does not work inside of regular functions and non-objects, so good luck applying it to other places and other extensions⁸. The only extension which does not suffer this problem is GNU Nested Functions, because it creates a function declaration / definition rather than a variable with an initializer. Thus, this code from the benchmarks works:

inline static int gnu_nested_functions_a(int k, int xl(void), int x2(void), int x3(void), int x4(void), int x5(void)) {
	int b(void) {
		return gnu_nested_functions_a(--k, b, xl, x2, x3, x4);
	}
	return k <= 0 ? x4() + x5() : b();
}

And it has the semantics one would expect, unlike how Blocks, Lambdas, or others with default by-value copying work.

In the general case, this is what the paper __self_func was going to solve⁹, but… that’s going to need some time for me to convince WG14 that maybe it IS actually a good idea. We can probably just keep writing the buggy code a few dozen more times for the recursion case and keep leaving it error prone, but I’ll try my best to convince them one more time that the above situation is very not-okay.

Thinking It Over

While the Man-or-Boy test isn’t exactly the end-all, be-all performance test, due to flexing both (self)-referential data and utilization of local copies with recursion, it is surprisingly suitable for figuring out if a closure design is decent enough in a mid to high-level programming language. It also gives me some confidence that, at the very least, the baseline for performance of statically-known, compile-time understood, non type-erased, callable Closure objects will have the best implementation quality and performance tradeoffs for a language like ISO C no matter the compiler implementation.

In the future, at some point, I’ll have to write about why that is. It’s a bit upside down from the perspective of readers of this blog to first address performance and then later write about the design, but it’s nice to make sure we’re not designing ourselves into a bad performance corner at the outset of this whole adventure.

Learned Insights

Surprising nobody, the more information the compiler is allowed to accrue (the Lambda design), the better its ability to make the code fast. What might be slightly more surprising is that a slim, compact layer of type erasure – not a bulky set of Virtual Function Calls (C++03 shared_ptr Rosetta Code design) – does not actually cost much at all (Lambdas with std::function_ref). This points out something else that’s part of the ISO C proposal for Closures (but not formally in its wording): Wide Function Pointers.

The ability to make a thin { some_function_type* func; void* context; } type backed by the compiler in C would be extremely powerful. Martin Uecker has a proposal that has received interest and passing approval in the Committee, but it would be nice to move it along in a nice direction. My suggestion is having % as a modifier, so it can be used easily since wide function pointers are an extremely prevalent concept. Being able to write something like the following would be very easy and helpful.

typedef int(compute_fn_t)(int);

int do_computation(int num, compute_fn_t% success_modification);

A wide function pointer type like this would also be traditionally convertible from a number of already existing extensions, too, where GNU Nested Functions, Apple Blocks, C++-style Lambdas, and more could create the appropriate wide function pointer type to be cheaply used. Additionally, it also works for FFI: things like Go closures already use GCC’s __builtin_call_with_static_chain to transport through their Go functions in C. Many other functions from other languages could be cheaply and efficiently bridged with this, without having to come up with harebrained schemes about where to put a void* userdata or some kind of implicit context pointer / implicit environment pointer.

Existing Extensions?

Unfortunately – except for the Borland closure annotation – there’s too many things that are performance-stinky about existing C extensions to this problem. It’s no wonder GCC is trying to add -ftrampoline-impl=heap to the story of GNU Nested Functions; they might be able to tighten up that performance and make it more competitive with Apple Blocks. But, unfortunately, since it is heap-based, there’s a real chance that its maximum performance ceiling is only as good as Apple Blocks, and not as good as a C++-style Lambda.

Both GNU Nested Functions and Apple Blocks – as they are implemented – do not really work well in ISO C. GNU Nested Functions because their base design and most prevalent implementation are performance-awful, but also Apple Blocks because of the copying and indirection runtime of Blocks that manage ARC pointers providing a hard upper limit on how good the performance can actually be in complex cases.

Regular C code, again, performs middle-of-the-road here. It’s not the worst of it, but it’s not the best at all, which means there’s some room beneath how we could go having the C code run. While it’s hard to fully trust the Rosetta Code Man-or-Boy code for C as the best, it is a pretty clear example of how a “normal” C developer would do it and how it’s not actually able to hit maximum performance for this situation.

I wanted to add a version of regular C code that used a dynamic array with statics to transfer data, or a bunch of thread_locals, but I could not bring myself to actually care enough to write a complex association scheme from a specific invocation of the recursive function a and the slot of dynamic data that represented the closure’s data. I’m sure there’s schemes for it and I could think of a few, but at that point it’s such a violent contortion to get a solution going that I figured it simply wasn’t worth the effort. But, as always,

pull requests are welcome. 💚

Banner and Title Photo by Lukas, from Pexels

See https://man.freebsd.org/cgi/man.cgi?query=qsort_r. ↩
See https://en.cppreference.com/w/c/algorithm/qsort. ↩
See: https://github.com/soasis/idk/tree/main/benchmarks/closures. ↩
See https://github.com/google/benchmark. ↩
See https://github.com/catchorg/Catch2/blob/devel/docs/benchmarks.md. And try it out. It’s pretty good, I just haven’t gotten off my butt to make the swap to it yet. ↩
Apple Blocks used to have an implementation in GCC that could be turned on and it used a Blocks Runtime to achieve it. But, I think it was gutted when some NeXT support and Objective-C stuff was wiped out after being unmaintained for some time. There’s been talk of reintroducing it, but obviously someone has to actually sit down and either redo it from scratch (advantageous because Apple has changed the ABI of Blocks) or try to resurrect / fix the old support for this stuff. ↩
Apple Blocks cannot have the “by address” capturing mechanism it has – the __block storage class modifier – applied to function arguments, for some reason. So, all function arguments are de-facto copied into a Block Expression unless someone saves a temporary inside the body of the function before the Block and then uses __block on that to make it a by-reference capture. ↩
It also works on a template basis in order to deduce this – the const auto& is a templated parameter and is usually used to do things like allow a member function to be both const and non-const where possible when generated. ↩
WG14 rejected the paper last meeting, unfortunately, as not motivated enough. Funnily enough, it was immediately after this meeting that I got slammed in the face with this bug. Foresight and “being prepared” is just not something even the most diehard C enthusiasts really embodies, unfortunately, and most industry vendors tend to take a more strongly conservative position over a bigger one. ↩

C2y: Hitting the Ground Running

2025-06-13T00:00:00+00:00

Surprise! Just because we released C23, doesn’t mean we’ve stopped working on C as a whole! There is a TON of things to do, and we have absolutely been busy working on things!

This is a rollup of some of the more exciting things that WG14 has gotten up to in the last 10 months. A huge shoutout to Compiler Developer and Amazing Software Engineer Alex Celeste, who submitted the majority of the papers talked about in this blog and achieved GREAT SUCCESS in setting C on the path for better! We’re not resting on our accomplishments for C23, as there is much to do and still yet more to accomplish! And, speaking of accomplishments, it’s likely appropriate to start with your accomplishments:

`_Countof` and `countof`

N3469

Thanks to all of you participating in our great Managed Democracy, you have convinced WG14 to change the name from lengthof to countof for the operator name based on your feedback. Previously, it had gone into C2y as _Lengthof/lengthof. When I conducted the survey, I was expecting that the consensus would match what the ARM survey showed and what most people I talked to felt: that lengthof was the proper name. Imagine my surprise when the survey came back and countof pulled ahead both in terms of raw votes in favor and was EXTREMELY ahead when using weighted votes as well!

Unfortunately, the countof part is still locked behind a header. That’s just how C works when introducing new keywords of this nature: we have to be conservative, and the maybe in 2 to 3 standard releases we can transition it into being a serious keyword and obsolete the header. So, now, the code looks like:

#include 
#include 

int main () {
	int arr[5];
	char arr2[20];
	const size_t n = countof(arr); // from header
	const size_t n2 = _Countof(arr2); // language keyword
	return n + n2;
}

This doesn’t necessarily stop certain compilers from making countof an implementation-defined keyword anyways, but I imagine that nobody’s implementation will be that brave. But, that concludes that for the foreseeable future: thank you for helping us reach this decision!

`if` Declarations

N3356

This is a feature similar to the one deployed in C++, and one that became oft-requested for C after its utility was proven out pretty quickly in the C++ world and in C compilers that implemented C++ extensions. Fought for by Alex Celeste, this proposal mirrors the C++ version for most of its functionality for declaring a variable that’s scoped to the if statement that can be immediately used for a test. It even comes with shortened, clean syntax that implicitly converts to bool to do the truth test:

extern int fire_off(int val);

int main (int argc, char* argv[]) {
	if (int num_fired = fire_off(argc)) {
		// checks for num_fired is non-zero
	}
	else {

	}
}

This is equivalent to doing…

extern int fire_off(int val);

int main (int argc, char*[]) {
	{
		int num_fired = fire_off(argc);
		if (num_fired) {
			// checks for num_fired is non-zero
		}
		else {

		}
	}
}

Now, occasionally you still need custom logic for the check, even with the declaration. You can do that by adding a semi-colon ; and then putting a typical allowed conditional check afterwards. A common idiom is using 0 for the success result of an API, so you don’t want to check with if (some_val), you want to use if (!some_val), like so:

#include 

enum err_code_t : unsigned { // C23: enum type specifiers
	err_code_ok = 0,
	err_code_invalid = 1,
	// ...
}

extern err_code_t checking_operation();

int main () {
	if (err_code_t e = checking_operation(); !e) {
		// checks for if e IS equal to zero
	}
	else {
		printf("error code: %x", (int)e);
		return 1;
	}
	return 0;
}

Notably, as per the “equivalent” expansion from the very first example, the e is available in all branches of the if/else/else if (but not outside of it). The motivation from this example is clear: getting an error code and checking if it’s non-zero means you might want to do something if it actually does end up being an error, such as printing! This is mostly a usability improvement for people writing C code, and makes a few macro-based idioms easier to use and handle without things breaking irreversibly.

New Escape Sequences (and Deprecating Octals)

N3353

Octals have long been shown as extremely poorly designed in C and C-adjacent languages that picked up the very, VERY weird habit of leading zeros turning numbers into base-8 (octal) numbers. The justification was, as ever, “Unix Permissions!!!”. Unfortunately, that’s a feature for 0.001% of absolute and complete nerds, and when your programming language takes over the world for some 50 years it turns out that optimizing for something that doesn’t even scale across operating systems properly becomes a really bad idea. This should have never been elevated to the status of a real language feature, or at the very least it should have never been “leading zeros change a number’s base” which stands in stark contrast with all of mathematics and science. It doesn’t even make sense, because hexadecimal – an infinitely more useful form of bit explanation, second only to actual base-2 bit literals standardized in C23 – used the x from “hexadecimal”. Was c from “octal” not good enough either? What about the o, the t? Even if o is way too visually similar, there were plenty of choices that do not end with “A raw 0 is actually an octal integer literal, actually” nerd-style trivia.

But, here we are.

Thankfully, just as K&R deprecated (but did not remove) K&R function declarations, we have finally reached a point in C where we’re not going to just sit there and let old mistakes that constantly trip people up continue to slide decade after decade. Alex Celeste is here with another simple & clean proposal to get us a little bit closer to a better world. We have new escape sequences both inside of strings and a new prefix for octal numbers:

int main () {
	const int v0 = 55; // decimal
	const int v1 = 0b00110111; // binary
	const int v2 = 0x37; // hexadecimal
	const int v3 = 0o67; // octal
	const char s0[] = "\x{37}"; // string hexadecimal
	const char s1[] = "\o{67}"; // string octal
#if 0
	// preceding line must be 0 to prevent this from compiling, because it is wrong!
	// We do not have string decimal because Octal Ruins Everything
	const char s2[] = "\55"; // byte value 45, for some fucking reason
	// We do not have string binary because \b is already bell
	const char s3[] = "\b{00110111}"; // ASCII backspace, plus some random crap
#endif
	const int STOP_DOING_THIS = 067; // CEASE!
	const char FOR_THE_LOVE_OF_GOD[] = "\067"; // PLEASE!!!
	return 0;
}

The hope here is that, one day, "\987" in a string literal won’t be an ugly compiler error, but a regular decimal literal. There’s also the eventual hope that leader zeroes, for ALL forms of integer literals, will become irrelevant noise rather than tweaking it to suddenly become a different numeric base. The bell situation is, currently, very unfortunate, but the bell has actual uses (even if only partially as a joke) so the folks here can likely be forgiven for their hubris. Future language designers should get this stuff squared away properly and provide up-front both string and literal notations for hexadecimal, octal, decimal, and binary as their first thought. More sophisticated folks can developer more general, flexible forms, but please try not to be consistent between your strings, characters, and elsewhere: benefit from C making a dumb decision early and improve on the situation in your own language!

For now, in C, we have to sit with 070 being octal for at least 2-4 more standards cycles and then, hopefully, completely change the old behavior into decimal. This is, of course, a serious amount of cope I’m engaging in: chances are even though we finally did the right thing and obsoleted it, it’ll never be fully fixed in the core language. Alas!

Case Ranges

N3370

This is another extension that I am unsure why it wasn’t standardized before I even realized what C was as a proper programming language. It’s been existence since forever and a ton of compilers use it; I also was FREQUENTLY asked about standardizing exactly this in both C and C++. While I can’t help the C++ people (they’d likely put a gun to the back of the head of such a proposal to start with and instead endorse the pattern matching proposal), the C folks were happy to get this one across the finish line the moment it appeared. This one was Yet Another Banger from Alex Celeste, and it just standardizes what is existing practice:

void foo (const char* s);

int main (int n, char* argv[]) {
	switch (n) {
	case 1:
		foo(argv[0]);
		break;
	// case 4 : // error, overlaps 2 ... 5
	//   foo ();
	//   break;
	case 2 ... 5:
		foo(argv[3]);
		break;
	case 6 ... 6: // OK (but questionable)
		foo(argv[5]);
		break;
	case 8 ... 7: // not an error, for some reason
		foo("");
		break;
	case 10 ... 4: // not an error, despite the overlap, lmao
		foo("");
		break;
	}
}

I’m happy that the feature is here, though as the last two cases show: it’s problematic in the way it can be used. Empty ranges have to be specified by swapping the numbers: a range of a single number is just using the same value twice. It’s a bit wonky the way it works in existing implementations like GCC and Clang, and the fact that it’s a fully closed range instead of half-open means that it’s problematic to access the size of an array:

extern int index;

extern void access_arr(int* arr, int idx);

int main () {
	const int N = 30;
	int arr[N] = {};
	switch (index) {
	case 0 ... N:
		access_arr(arr, index); // ahhh damnit!
		break;
	default:
		return 1;
	}
	return 0;
}

This has to be written as, instead:

extern int index;

extern void access_arr(int* arr, int idx);

int main () {
	const int N = 30;
	int arr[N] = {};
	switch (index) {
	case 0 ... N-1: // weird spelling...
		access_arr(arr, index); // but will work.
		break;
	default:
		return 1;
	}
	return 0;
}

This makes me not that happy about Case Ranges in C, but only because I consider this a Design Failure and not an implementation failure. The feature is incomplete if it doesn’t work the Normal Way It Is Supposed To with things like array indices and what not. Every other language, from Kotlin to Rust, addresses this problem directly by having a second syntax: one for fully closed ranges, and another for a half open range. (A half-open range, one where the low number is included but the high number isn’t, is how most things in C work!).

I addressed that in a technical writeup here: Additional Half-Open Case Range Syntax. The hope is that we’ll be able to move forward with something like is in this paper and go ahead and patch this hole.

More Bit Utilities

N3367

This is a hold over from this paper’s previous iterations that didn’t make the cut for C23. So, the full bit functionality is split between C23 and C2y; this paper brings a bunch of typical functions that you may or may not know about, such as:

uintN_t stdc_memreverse8uN(uintN_t value); (byteswap/bswap, effectively, for some bit size N);
void stdc_memreverse8(size_t n, unsigned char ptr[static n]); (generally-sized byteswap for an array);
generic_value_type stdc_rotate_left(generic_value_type value, generic_count_type count);
generic_value_type stdc_rotate_right(generic_value_type value, generic_count_type count);

The last two are macros, but work in the typical way as a rotate left and rotate right. There’s also concrete versions for unsigned char, unsigned short, etc. etc. that use suffixes:

unsigned char stdc_rotate_left_uc(unsigned char value, unsigned int count);
unsigned short stdc_rotate_left_us(unsigned short value, unsigned int count);
unsigned int stdc_rotate_left_ui(unsigned int value, unsigned int count);
unsigned long stdc_rotate_left_ul(unsigned long value, unsigned int count);
unsigned long long stdc_rotate_left_ull(unsigned long long value, unsigned int count);

unsigned char stdc_rotate_right_uc(unsigned char value, unsigned int count);
unsigned short stdc_rotate_right_us(unsigned short value, unsigned int count);
unsigned int stdc_rotate_right_ui(unsigned int value, unsigned int count);
unsigned long stdc_rotate_right_ul(unsigned long value, unsigned int count);
unsigned long long stdc_rotate_right_ull(unsigned long long value, unsigned int count);

These are in the standard now, which means C now catches up to Rust where we can use these functions in the standard and get a proper rotl or rotr without memorizing compiler intrinsics or pray that a compiler bug hasn’t accidentally screwed us out of good code generation. (Not hypothetical: this stuff was VERY poorly optimized, and just writing the paper exposed deficiencies that needed to be fixed in GCC 12 and 13 and Microsoft’s absolute awful quality of implementation on both x64 and ARM32 and ARM64 in this regard (thankfully, now fixed in their recent releases)).

Similarly, there’s also a family of other functions for loading and storing integers in an endian-aware manner, and in both an aligned and unaligned fashion:

uint_leastN_t stdc_load8_leuN(const unsigned char ptr[static ( N / 8)]);
uint_leastN_t stdc_load8_beuN(const unsigned char ptr[static ( N / 8)]);
uint_leastN_t stdc_load8_aligned_leuN(const unsigned char ptr[static ( N / 8)]);
uint_leastN_t stdc_load8_aligned_beuN(const unsigned char ptr[static ( N / 8)]);

int_leastN_t stdc_load8_lesN(const unsigned char ptr[static ( N / 8)]);
int_leastN_t stdc_load8_besN(const unsigned char ptr[static ( N / 8)]);
int_leastN_t stdc_load8_aligned_lesN(const unsigned char ptr[static ( N / 8)]);
int_leastN_t stdc_load8_aligned_besN(const unsigned char ptr[static ( N / 8)]);

void stdc_store8_leuN(uint_leastN_t value, unsigned char ptr[static ( N / 8)]);
void stdc_store8_beuN(uint_leastN_t value, unsigned char ptr[static ( N / 8)]);
void stdc_store8_aligned_leuN(uint_leastN_t value, unsigned char ptr[static ( N / 8)]);
void stdc_store8_aligned_beuN(uint_leastN_t value, unsigned char ptr[static ( N / 8)]);

void stdc_store8_lesN(int_leastN_t value, unsigned char ptr[static ( N / 8)]);
void stdc_store8_besN(int_leastN_t value, unsigned char ptr[static ( N / 8)]);
void stdc_store8_aligned_lesN(int_leastN_t value, unsigned char ptr[static ( N / 8)]);
void stdc_store8_aligned_besN(int_leastN_t value, unsigned char ptr[static ( N / 8)]);

There’s big/little endian variants combined with signed/unsigned variants. If you are concerned about i.e. int_least32_t and int32_t not being the same size when you use stdc_load8_les32, don’t: we added clauses in C23 to say that if int32_t exists, it must be the same type as int_least32_t, so you can use these functions with the exact-width integer types without being worried that things might not fit properly. You can get some significant speedups when processing data in bulk for both storing and loading such integers and get much tighter code if you know the pointer you are loading from is aligned properly for the int64_t or int_least16_t you happen to be using.

Still, a gentle word of caution for those who program fringe embedded devices: everything except the rotate left/right are gated behind #if CHAR_BIT == 8, so it might not exist on embedded platforms if they don’t follow the type of implementation I deploy in ztd.idk that provides cross-platform, 8-bit-steady behavior. I would encourage all embedded implementations, even if they use CHAR_BIT == 16 or CHAR_BIT == 32 to try to use a fully bit-packed, 8-bit-aligned implementation for these things (there’s a reason why I pushed to keep the name of it as store8 and load8, after all).

Labeled Breaks

N3370

Three years ago, I mentioned in a C23 article how we did not have a proposal for labeled loops and that I would have preferred it over the current break break;, continue break; and continue continue; stuff that was in progress from Eskil Steenberg. I’m happy to report that, Yet Again, Alex Celeste crushed it by getting this contentious piece of extremely necessary functionality through into C, and even managed to get C++ to turn their eyes favorably upon this functionality.

For those who live a blissful and peaceful life, there’s been a persistent problem in C-style languages because break, in particular, was a keyword doubly-used for both loops like while and for, as well as switches:

extern int n;

int main () {
	int x = 0;
	for (;; n -= 1) {
		switch (n) {
			default:
				// sure, do whatever
				x += n / 2;
				break;
			case 0:
				// break out of the `for` loop now
				// ...
				// ... ... ...
				// uuuuuhhhhhhhhhhhhhh
				break /*?????*/;
		}
	}
	return x;
}

There’s nothing you can do in this situation, except set up a boolean flag, use an if/else ladder, or write a separate function and then pray you can use return to jump out of the nested for/switch combination. This, of course, doesn’t work or scale great with triply-nested loops/switches or quadruply-nested things (albeit by the time you hit quadruple nesting of anything, some folks will tell you that things have gone too far); trying to jump back to the 1st loop from the 3rd loop is an annoying task, and it gets thorny. It’s a Really Fun Thing that’s been a problem in the language since Forever, and every other language has various solutions for this problem.

HEARTBREAKING: you tried to break out of a for loop inside of a switch statement in dumbass languages like C and C++. Your code fails and everyone laughs at you.

⸻ Björkus Dorkus, May 25th, 2025

There’s a better way to figure this out. And that way is Labeled Loops:

extern int n;

int main () {
	int x = 0;
	das_loopen:
	for (;; n -= 1) {
		switch (n) {
			default:
				// sure, do whatever
				x += n / 2;
				break;
			case 0:
				// yay!!!!
				break das_loopen;
		}
	}
	return x;
}

You can break SOME_LABEL; or continue SOME_LABEL; out of there, and it’ll work as you’d expect it to. Most other languages have this functionality, too, and it should help C developers with complicated, nested structures traverse them easily. It also dispels the heavy Moral, Social, And Technological Weight of a goto on Software Engineers soldiers and stay away from the scathing critiques and wary code reviewers that view it with deep suspicion. Though, if you know what you’re doing? Well…

You can try it in GCC, right now; others are cooking up implementations in their trunks, too. There’s been an (unsuccessful) attempt by N3377 to change the location of the label in the loop after discussion in WG14, so for now it’s going to stay a free-ranging label that just happens to be before the for or while or similar without any intervening statements. That means there is still room for the technological issue if reuse of labels (prevalent in macros in C), but honestly the solution for that should be getting better macro technology or a way to save a token concatenation in a macro so it can be used/reused properly. There’s been some ideas around that, but nothing which has taken off (e.g., potentially having __COUNTER__(IDENTIFIER) as a way to make a custom incrementing counter per “IDENTIFIER” and then allowing to reference it without increment it with something like __READ_COUNTER__(IDENTIFIER)). But whether or not such things take off…

is for a future article. 💚

Banner and Title Photo by Pixabay, from Pexels under CC0

The Defer Technical Specification: It Is Time

2025-03-15T00:00:00+00:00

After the Graz, Austria February 2025 WG14 Meeting, I am now confident in the final status of the defer TS, and it is now time.

… Time to What?

Time for me to write this blog post and prepare everyone for the implementation blitz that needs to happen to make defer a success for the C programming language. If you’re smart and hip like Navi who wrote the GCC patch, the maintainer of slimcc who implemented defer from the early spec and found it both easy and helpful, and several others who are super cool and great, you can skip to the (DRAFT) ISO/DIS 25755 - defer Technical Specification and get started! But, for everyone else…

What is `defer`?

For the big brain 10,000 meter view, defer ⸺ and the forthcoming TS 25755 ⸺ is a general-purpose block/scope-based “undo” mechanism that allows you to ensure that no matter what happens a set of behavior (statements) are run. While there are many, many more usages beyond what will be discussed in this article, defer is generally used to cover these cases:

unlock() of a mutex or other synchronization primitive after a lock();
free() of memory after a malloc();
deref() of a reference-counted parameter after a ref() or (shallow) copy() operation;
rollback on a transaction if something bad happens;

and so, so much more. For C++ people who are going “wait a second, this sounds like destructors!”, just go ahead and skip down below and read about the C++ part while ignoring all the stuff in-between about defer and WG14 and voting and consensus and blah blah blah.

For everyone else, we’re going to go over some pretty simple examples of defer, using a series of printf’s to construct (or fail to construct) a phrase, just to get an idea of how it works. Here’s a basic example showing off some of its core properties:

#include 

int main () {
	const char* s = "this is not going to appear because it's going to be reassigned";
	defer printf(" bark!\"");
	defer printf("%s", s);
	defer {
		defer printf(" woof");
		printf(" says");
	}
	printf("\"dog");
	s = " woof";
	return 0;
}

The output of this program is as follows:

$> ./a.out
"dog says woof woof bark!"

The following principles become evident:

The contents of a defer are run at the end of the block that contains it.
- defer can be nested.
- The rules for nested defer are the same as normal ones: it executes at the end of its containing block (defer introduces its own block.)
Multiple defer statements run in reverse lexicographic order.
defer does not need any braces for simple expression statements, same as for, while, if, etc. constructs.
defer can have braces to stack multiple statements inside of it, same as for, while, if, etc. constructs.
defer uses the value of the variable at the time defer is run at the end of the scope, not at the time when the defer statement is encountered.

This forms the core of the defer feature, and the basis by which we can build, compare, and evaluate this new feature.

“Build?” Wait… Are You Just Making This Up Entirely From Scratch?

Thankfully, no. This is something that has been cooked up for a long time by existing implementations in a variety of ways, such as:

__attribute__((cleanup(func))) void* some_var;, where func takes the address of some_var and gets invoked when some_var’s lifetime ends/the scope is finished (Clang, GCC, and SO many more compilers);
__try/__finally, where the __finally block is invoked on the exit/finish of the __try block (MSVC);
and, various different library hacks, such as this high-quality defer library and this other library-based library hack.

It has a lot of work and understanding behind it, and a ton of existing practice. Variations of it exist in Apple’s MacOS SDK, the C parts of Swift, the Linux Kernel, GTK’s g_autoptr (and qemu’s Lockable), and so much more. It’s also featured in many other languages in exactly the format specified here, including C++ (with RAII), Zig (with defer), and Swift (also as defer, but also a guard feature as well). This, of course, begs the question: if this has so much existing implementations in various styles, and so many years of experience, why is this going into a Technical Specification (or just “TS”) rather than directly into the C standard? Well, honestly, there’s 2 reasons.

The first reason is that vendors claim they can put it into C ⸺ and make it globally available ⸺ faster than if it’s put in the C working draft. Personally, I’m not sure I believe the vendors here; there are many features they have put into C, or even back ported from later versions of C into older versions of C. But, I’m not really at a point in my life that I feel like arguing with the vendors about a boring reskin of feature that’s been in C compilers for just under as long as I’ve been alive, so I’m just going to take their word for it.

The second, more unfortunate, reason is that defer was proposed before I got my hands on it. It was not in a good shape and ready for standardization, and the ideas about what defer should be were somewhat all over the place. Which is fair, because many of the initial papers were exploratory: the problem was that when we had to cut a C23 release, there was a (minor) panic about new features and there was a lot of concentrated effort to try and slim defer down into something ready to go. Going from the wishy-washy status of before that wasn’t grounded in existing practice to something material caused the Committee to reject the idea, and state that if it came back it should come back as a TS.

I could argue that this is not fair, because that vote was based off older version of the paper that was not ready and was subject to C23 pressures. The older papers were discussing various ideas like whether to capture variables by value at the point of the defer statement (catastrophic) or whether defer should be stapled to a higher scope / function scope like Go (also catastrophic), and whether writing a for loop would accumulate a (potentially infinite) amount of extra space and allocations to store variables and other data that would be needed to run at the end of the scope (yikes!). None of those shenanigans apply anymore, but we still have to go to a TS, even though it’s a mirror-image of how existing practice works (in fact, less powerful than existing practice). Somewhat recently, we took new polls about whether it should go in a TS or whether it should go directly into the IS (International Standard; the working draft basically). There was support and consensus for both, but more consensus for a TS.

It’s not really worth fighting about, though, so into a defer TS it goes.

My only worry is that Microsoft is going to do what it usually does and ignore literally everybody else doing things and not do any forward progress with just a defer TS. (As they do with most GNU or Clang or not-Microsoft extensions, some Technical Reports, and some TSs.) So, the only place we’ll get experience is in places that already rely pretty heavily on the existence of the compiler feature. But, I’m more than willing to be pleasantly surprised. It could be driven by users demanding Microsoft make some of their C stuff safer through their User Voice / Feature Request submission portal. But, the message from Microsoft since Time Immemorial was always “just write C++”, so I can imagine we’ll just get the same messaging here, too, and have to wait until defer hits the C Standard before they implement it.

Nevertheless, this TS will be interesting for me. I have several other ideas that should go through a TS process; if I get to watch over the next couple of years that vendors weren’t being honest about how quickly they could implement defer in their compilers ⸺ if only they had a TS to justify it! ⸺ that will strongly color my opinion on whether or not any future improvements should use the TS process at all.

So we’ll see! In the meantime, however, let’s talk about how defer differs from its similarly-named predecessors in other languages.

Scope-based

The central idea behind defer is that, unlike its Go counterpart, defer in C is lexically bound, or “translation-time” only, or “statically scoped”. What that means is that defer runs unconditionally at the end of the block or the scope it is bound to based on its lexical position in the order of the program. This gives it well-defined, deterministic behavior that requires no extra storage, no control flow tracking, no clever optimizations to reduce memory footprint, and no additional compiler infrastructure beyond what would normally be the case for typical variable automatic storage duration (i.e., normal-ass variable) lifetime tracking. Here’s a tiny example using mtx_t:

#include 

extern int do_sync_work(int id, mtx_t* m);

int main () {
	mtx_t m = {};
	if (mtx_init(&m, mtx_plain) != thrd_success) {
		return 1;
	}
	// we have successful initialization: destroy this when we're done
	defer mtx_destroy(&m);

	for (int i = 0; i < 12; ++i) {
		if (mtx_lock(&m) != thrd_success) {
			// return exits both the loop and the main() function,
			// defer block called:
			// - mtx_destroy
			return 1;
		}
		// now that we have succesfully init & locked,
		// make sure unlock is called whenever we leave
		defer mtx_unlock(&m);

		// …
		// do a bunch of stuff!
		// …
		if (do_sync_work(i, &m) == 0) {
			// something went wrong: get out of there!
			// return exits both the loop and the main() function,
			// defer blocks called:
			// - mtx_unlock
			// - mtx_destroy
			return 1;
		}
		
		// re-does the loop, and thus:
		// defer block called:
		// - mtx_unlock
	}

	// defer block called:
	// - mtx_destroy
	return 0;
}

The key takeaway from the comment annotations in the above is that: no matter if you early return from the 6th iteration of the for loop, or you bail early because of an error code sometime after the loop:

if needed, mtx_unlock is always called on m, first;
and, mtx_destroy is called on m, last.

Notably, the mtx_unlock call only happens if execution is still inside of the for loop, and only happens with exits from that specific scope after defer is passed. This is an important distinction from Go, where every defer is actually “lifted” from its current context and attached to run at the end of the function itself that is around it. This tends to make sense as a “last minute check before a function exits about some error conditions”, but it has some devastating consequences for simple code. Take, for example, the following code from above, slightly simplified and modified to make a normal-looking Go program:

package main

import (
	"fmt"
	"sync"
)

var x  = 0

func work(wg *sync.WaitGroup, m *sync.Mutex) {
	defer wg.Done()	
	for i := 0; i < 42; i++ {
		m.Lock()
		defer m.Unlock()
		x = x + 1
	}
}


func main() {
	var w sync.WaitGroup
	var m sync.Mutex
	for i := 0; i < 20; i++ {
		w.Add(1)
		go work(&w, &m)
	}
	w.Wait()
	fmt.Println("final value of x", x)
}

The output of this program, on Godbolt, is:

Killed - processing time exceeded
Program terminated with signal: SIGKILL
Compiler returned: 143

Yeah, that’s right: it never finishes running. This is because this code deadlocks: the defer call is hoisted to the outside of the for loop in func work. This means that it calls m.Lock(), does the increment, loops around, and then attempts to call m.Lock() again. This is a classic deadlock situation, and one that hits most people often enough in Go that they have to add a little caveat. “Use an immediately invoked function to clamp the defer’s reach” is one of those quick caveats:

package main

import (
	"fmt"
	"sync"
)

var x  = 0

func work(wg *sync.WaitGroup, m *sync.Mutex) {
	defer wg.Done()	
	for i := 0; i < 42; i++ {
		func() {
			m.Lock()
			defer m.Unlock()
			x = x + 1
		}()
	}
}


func main() {
	var w sync.WaitGroup
	var m sync.Mutex
	for i := 0; i < 20; i++ {
		w.Add(1)
		go work(&w, &m)
	}
	w.Wait()
	fmt.Println("final value of x", x)
}

This runs without locking up Godbolt’s resource until a SIGKILL. Of course, this is pathological behavior; while it works great for a simple, direct use case (“catch errors and act on them”), it unfortunately results in other problematic behaviors. This is why the version in the defer TS does not cleave strongly to the scope of the function definition (or immediately invoked lambda), but instead directly to the innermost block and its associated scope. This also highlights another important quality of defer that we need when working with a language like C (and also applies to Zig and Swift).

Refer to Variables Directly

Also known as “capture by reference”, defer blocks refer to variables in their scope directly (e.g., as if defer captured pointers to everything that was in scope and then automatically dereferenced those pointers so you could just refer to a previous foo directly as foo). This is something that people sometimes struggle with, but the choice is extremely obvious for a lot of both safety and usability reasons. Looking back at the examples above, there would be severe problems if a defer block would copy the m value, so that the lock/unlock paired calls would actually work on different entities. This would be a different kind of messed up that not even Go attempted, and no language should ever try.

When you have an in-line, scope-based, compile-time feature like defer that does not create an “object” and cannot “travel” to different scopes, capturing directly by reference is fine. Referring to variables directly is perfectly fine. You don’t need to be careful and worry about captures, or be preemptively careful by capturing things through copying in order to be “safe”. defer – unlike RAII objects – can’t go anywhere. You don’t need to be explicit about how it gets access to things in the local scope, because defer can’t leave that scope. This is also a secondary consequence of not following in Go’s footsteps; by not scoping it to the function, there’s no concerns about whether or not the C-style automatic storage duration variables that are in, say, a for loop or an if statement need to be “lifetime extended” to the whole function’s scope.

Direct variable reference and keeping things scope-based does mean that defer does not need to “store” its executions up until the end of the function, nor does it need to record predicates or track branches to know which defer is taken by the end of some arbitrary outer scope. In fact, for any defer block, the model of behavior for the defer TS is pretty much that it takes all the code inside of the defer block and dumps it out onto each and every translation-time (compile-time) exit of that scope. This applies to early return, breaking/continueing out of a loop scope, and also gotoing towards a label.

Oh, even `goto`?

In general, goto is banned from jumping over a defer or jumping into the sequence of statements in a defer. It can jump back before a defer in that scope. The same goes for trying to use switch, break/continue (with or without a label), and other things. Here’s a few examples where things would not compile if you tried it:

#include 

int main () {
	void* p = malloc(1);
	switch (1) {
		defer free(p); // No.
	default:
		defer free(p); // fine
		break;
	}
	return 0;
}

int main () {
	switch (1) {
	default:
		defer {
			break; // No.
		}
	}
	for (;;) {
		defer {
			break; // No.
		}
	}
	for (;;) {
		defer {
			continue; // No.
		}
	}
	return 0;
}

It’s also important to be aware that defer that are not reached in terms of execution do not affect the things that come before them. That is, this is a leak still:

#include 

int main () {
	void* p = malloc(1);
	return 0; // scope is exited here, `defer` is unreachable
	defer free(p); // p is leaked!!
}

Similar to the bans on break, goto, continue, and similar, return also can’t exit a defer block:

int main () {
	defer { return 24; } // No.
	return 5;
}

Though, if you’re an avid user of both __attribute__((cleanup(...))) and __try/__finally, you’ll find that some of these restrictions are actually harsher than what is allowed by the mirrored existing practice, today.

Wait…. Existing Practice Can Do WHAT, Now?

The bans written about in the preceding section are a bit of a departure from existing practice. Both __attribute__((cleanup(...))) and __try/__finally ⸺ the original versions of this present in GCC/Clang/tcc/etc., and MSVC, respectively ⸺ allowed for some (cursed) uses of goto, pre-empting returns, and more in those implementation-specific kinds of defer.

An MSVC example (with Godbolt):

int main () {
	__try {
		return 1;
	}
	__finally {
		return 5;
	}
	// main returns 5 ⸺ can stack this infinitely
}

A GCC example (with Godbolt):

#include 
#include 

int main () {
	__label__ loop_endlessly_and_crash;
	loop_endlessly_and_crash:;
	void horrible_crimes(void* pp) {
		void* p = *(void**)pp;
		printf("before goto...\n");
		goto loop_endlessly_and_crash; // this program never exits successfully or frees memory
		printf("after goto...\n");
		printf("deallocating...\n");
		free(p);
	}
	[[gnu::cleanup(horrible_crimes)]] void* p = malloc(1);
	printf("allocated...\n");
	printf("before label...\n");
	printf("after label...\n");
	return 0;
}

The vast majority of people ⸺ both inside and outside of the Committee ⸺ agreed that allowing this directly in defer for the first go-around was Bad and Evil. I also personally agree that I don’t like it, though I would actually be okay with relaxing the constraint in the future because even if I don’t personally like what I’m seeing from this, I can still write out a tangible, understandable, well-defined behavior for “goto leaves a defer block” or “return is called from within a defer block”. The things I won’t move on, though, are “goto into a defer block” (which exit of the scope is the goto taking execution to??), or jumping over a defer statement in a given scope: there’s no clear, unambiguous, well-defined behavior for that, and it only gets worse with additional control flow.

But, even if you can’t return from the TS’s deferred block, you still have to be aware of when and how the defer actually runs in relation to the actual expression contained in a return statement or similar scope escape.

`defer` Timing

Matching existing practice and also C++ destructors, defer is run before the function actually returns but after the computation of the return’s value. In a language like this, this is not observable in simple programs. But, in complex programs, this absolutely matters. For example, consider the following code:

#include 

extern int important_func_needs_buffer(size_t sz, void* p);
extern int* get_important_buffer(int* p_err, size_t* p_size, int val);
extern void drop_important_buffer(int val, size_t size);

int f (int val) {
	int err = 0;
	size_t size = 0;
	int* p = get_important_buffer(&err, &size, val);
	if (p == nullptr || err != 0) {
		return err;
	}
	defer {
		drop_important_buffer(val, size);
	}
	return important_func_needs_buffer(sizeof(*p) * size, p);
}

int main () {
	if (f(42) == 0) {
		printf("bro definitely cooked. peak.");
		return 0;
	}
	printf("what was bro cooking???");
	return 1;
}

There’s 2 times in which you can run the defer block and its drop_important_buffer(…) call.

before the function returns and before important_func_needs_buffer(…);
or, before the function returns but after important_func_needs_buffer(…).

The problem becomes immediately apparent, here: if the defer runs before the expression in the return statement (before important_func_needs_buffer(…)), then you actually drop the buffer before the function has a chance to use it. That’s a one-way ticket to a use-after-free, or other extremely security-negative shenanigans. So, the only logical and plausible choice is to run the second option, which is that the defer block runs after the return expression is evaluated but before we leave the function itself.

This does frustrate some people, who want to use defer as a last-minute “return value change” like so:

int main (int argc, char* argv[]) {
	int val = 0;
	int* p_val = &val;
	defer {
		if ((argc % 2) == 0) {
			*p_val = 30;
		}
	}
	return val; // returns 0, not 30, even if argc is e.g. 2
}

But I value much more highly compatibility with existing practice (both __try/__finally and __attribute__((cleanup(…))))), compatibility with C++ destructors, and avoiding the absolute security nightmare. If someone wants to evaluate the return expression but still modify the value, they can write a paper or submit feedback to implementations that they want defer { if (whatever) { return ...; } } to be a thing. That way, such a behavior is formalized. And, again, even if I don’t personally want to write code like this or see code like this, there’s still a detectable, tangible, completely well-defined behavior for what happens if a return is evaluated in a defer. This is also not nearly as complex as e.g. Go’s defer, because the defer TS uses a translation-time scoped defer.

It won’t result in “dynamically-determined and executed defer causes spooky action at a distance”. One would still need to be careful about having nested defers that also overwrite the return, or subsequent defers that attempt to change the return value. (One would also have to contend that every defer-nested return would need to have its expression evaluated, and potentially discarded, sans optimization to stop it.) Given needing to answer all of these questions, though, it is still icky and I’m glad we don’t have to go through with return (or goto or break or continue) within defer statements.

… What About Control Flow Outside of Compilation Time?

Run-time style control flow like longjmp, or similar _Noreturn/[[_Noreturn]]/[[noreturn]]-marked functions, are a-okay if they mimic the above allowed uses of goto. If it jumps out of the function entirely, or jumps into a previous scope but beyond the point where a defer would be, the behavior can end up undefined. That means use of functions like exit, quick_exit, or similar explicitly by the user may leak resources by not executing any currently open defer blocks. This is similar to C++, where calling any of the C standard library exit functions (and, specifically, NOT std::terminate()) means destructors will not get run. The only function that this is not fully true on is thrd_exit, as glibc has built-in behavior where thrd_exit will actually provoke unwinding of thread resources by calling destructors on that thread. (You can then use thrd_exit on the main thread, even in a single-threaded program, as a means to trigger unwinding; this is an implementation detail of glibc, though, and most other C standard libraries don’t behave like this.)

The exact wording in the TS and the proposal is that its “unspecified” behavior, but it doesn’t actually proscribe any specific set of behaviors that can happen. So, even if we use the “magic” word of “unspecified” for these run-time jumps, the behavior is effectively as bad as undefined behavior because there really isn’t any document-provided guarantee about what happens when you run off somewhere with e.g. setjmp/longjmp in these situations. I guess the only thing it prevents is some compiler optimization junkie trying to optimize based on whether or not defer with a run-time jump would trigger undefined behavior, though it’s effectively an optimization you can maybe get by only combining defer and one of these run-time jumps. At that point, I’d question what the hell the engineer was doing submitting that kind of “improvement” in the first place to the optimizer, and reject it on the grounds of “Please find something better to do”.

But, you never know I guess?

Maybe there would be real gains, but I’m not holding my breath nor making any space for it. But beyond just ignoring dubious weird optimization corners for defer…

Does…. Defer Actually Solve Any Problems, Though?

Believe it or not: yes. I’m not one to waste my time on things with absolutely no real value; there’s just too little time and standardization takes too much damn effort to focus on worthless things¹. Though, if you were to take it from others, you’d hear about how defer complicates the language for not much/no benefit:

… The proposal authors show a complex solution to make the code free storage and then show how it can be “simplified” using defer. But it is trivial to centralize cleanup in one function, no new features needed. If I was developing this code for real, I’d take the next step and make it single exit. …

⸺ Victor Yodaiken, “Don’t Defer”, December 12, 2023

The code Yodaiken is referring to is code contained in the original proposal (the original proposal is being updated in lock-step with the TS), specifically this section. The code in question was offered to me by its author, and I was told to simply / work with the code. So, after a bit of cleanup and checking and review, this is the first-effort defer version of the original code:

h_err* h_build_plugins(const char* rootdir, h_build_outfiles outfiles, const h_conf* conf)
{
	char* pluginsdir = h_util_path_join(rootdir, H_FILE_PLUGINS);
	if (pluginsdir == NULL)
		return h_err_create(H_ERR_ALLOC, NULL);
	defer free(pluginsdir);
	char* outpluginsdirphp = h_util_path_join(
		rootdir,
		H_FILE_OUTPUT "/" H_FILE_OUT_META "/" H_FILE_OUT_PHP
	);
	if (outpluginsdirphp == NULL)
	{
		return h_err_create(H_ERR_ALLOC, NULL);
	}
	defer free(outpluginsdirphp);
	char* outpluginsdirmisc = h_util_path_join(
		rootdir,
		H_FILE_OUTPUT "/" H_FILE_OUT_META "/" H_FILE_OUT_MISC
	);
	if (outpluginsdirmisc == NULL)
	{
		return h_err_create(H_ERR_ALLOC, NULL);
	}
	defer free(outpluginsdirmisc);
	//Check status of rootdir/plugins, returning if it doesn't exist
	{
		int err = h_util_file_err(pluginsdir);
		if (err == ENOENT)
		{
			return NULL;
		}
		if (err && err != EEXIST)
		{
			return h_err_from_errno(err, pluginsdir);
		}
	}

	//Create dirs if they don't exist
	if (mkdir(outpluginsdirphp, 0777) == -1 && errno != EEXIST) {
		return h_err_from_errno(errno, outpluginsdirphp);
	}
	if (mkdir(outpluginsdirmisc, 0777) == -1 && errno != EEXIST) {
		return h_err_from_errno(errno, outpluginsdirmisc);
	}

	//Loop through plugins, building them
	struct dirent** namelist;
	int n = scandir(pluginsdir, &namelist, NULL, alphasort);
	if (n == -1)
	{
		return h_err_from_errno(errno, namelist);
	}
	defer {
		for (int i = 0; i < n; ++i)
		{
			free(namelist[i]);
		}
		free(namelist);
	}
	for (int i = 0; i < n; ++i)
	{
		struct dirent* ent = namelist[i];
		if (ent->d_name[0] == '.')
		{
			continue;
		}
		char* dirpath = h_util_path_join(pluginsdir, ent->d_name);
		if (dirpath == NULL)
		{
			return h_err_create(H_ERR_ALLOC, NULL);
		}
		defer free(dirpath);
		char* outdirphp = h_util_path_join(outpluginsdirphp, ent->d_name);
		if (outdirphp == NULL)
		{
			return h_err_create(H_ERR_ALLOC, NULL);
		}
		defer free(outdirphp);
		char* outdirmisc = h_util_path_join(outpluginsdirmisc, ent->d_name);
		if (outdirmisc == NULL)
		{
			return h_err_create(H_ERR_ALLOC, NULL);
		}
		defer free(outdirmisc);

		h_err* err;
		err = build_plugin(dirpath, outdirphp, outdirmisc, outfiles, conf);
		if (err)
		{
			return err;
		}
	}
		
	return NULL;
}

This code has some improvements over the original, insofar that it actually protects against a few leaks that were happening in that general purpose code. Instead of this approach, Yodaiken instead changed it to this:

struct plugins {
	char *pluginsdir;
	char *outpluginsdirphp;
	char *outpluginsdirmisc;
	char *dirpath;
	char *outdirphp;
	char *outdirmisc;
	int n;
	struct dirent **namelist;
};

void freeall(struct plugins *x)
{
	if (x->pluginsdir)
		free(x->pluginsdir);
	if (x->outpluginsdirphp)
		free(x->outpluginsdirphp);
	if (x->outpluginsdirmisc)
		free(x->outpluginsdirmisc);
	if (x->dirpath)
		free(x->dirpath);
	if (x->outdirphp)
		free(x->outdirphp);
	if (x->outdirmisc)
		free(x->outdirmisc);
	for (int i = 0; i < x->n; i++) {
		free(x->namelist[i]);
	}
}

h_err *h_build_plugins(const char *rootdir, h_build_outfiles outfiles,
		       const h_conf * conf)
{
	struct plugins x = { 0, };
	x.pluginsdir = h_util_path_join(rootdir, H_FILE_PLUGINS);
	if (pluginsdir == NULL)
		return h_err_create(H_ERR_ALLOC, NULL);
	x.outpluginsdirphp = h_util_path_join(rootdir,
					      H_FILE_OUTPUT "/" H_FILE_OUT_META
					      "/" H_FILE_OUT_PHP);
	if (outpluginsdirphp == NULL) {
		freeall(&x);
		return h_err_create(H_ERR_ALLOC, NULL);
	}
	x.outpluginsdirmisc = h_util_path_join(rootdir,
					       H_FILE_OUTPUT "/" H_FILE_OUT_META
					       "/" H_FILE_OUT_MISC);
	if (x.outpluginsdirmisc == NULL) {
		freeall(&x);
		return h_err_create(H_ERR_ALLOC, NULL);
	}
	//Check status of rootdir/plugins, returning if it doesn’t exist
	{
		int err = h_util_file_err(x.pluginsdir);
		if (err == ENOENT) {
			freeall(&x);
			return NULL;
		}
		if (err && err != EEXIST) {
			freeall(&x);
			return h_err_from_errno(err, x.pluginsdir);
		}
	}

	//Create dirs if they don’t exist
	if (mkdir(x.outpluginsdirphp, 0777) == -1 && errno != EEXIST) {
		freeall(&x);
		return h_err_from_errno(errno, x.outpluginsdirphp);
	}
	if (mkdir(outpluginsdirmisc, 0777) == -1 && errno != EEXIST) {
		freeall(&x);
		return h_err_from_errno(errno, outpluginsdirmisc);
	}
	//Loop through plugins, building them
	x.n = scandir(x.pluginsdir, &x.namelist, NULL, alphasort);
	if (n == -1) {
		freeall(&x);
		return h_err_from_errno(errno, x.namelist);
	}
	for (int i = 0; i < n; ++i) {
		struct dirent *ent = namelist[i];
		if (ent->d_name[0] == '.') {
			continue;
		}
		x.dirpath = h_util_path_join(x.pluginsdir, ent->d_name);
		if (x.dirpath == NULL) {
			freeall(&x);
			return h_err_create(H_ERR_ALLOC, NULL);
		}
		x.outdirphp = h_util_path_join(outpluginsdirphp, ent->d_name);
		if (x.outdirphp == NULL) {
			freeall(&x);
			return h_err_create(H_ERR_ALLOC, NULL);
		}
		x.outdirmisc =
		    h_util_path_join(x.outpluginsdirmisc, ent->d_name);
		if (x.outdirmisc == NULL) {
			freeall(&x);
			return h_err_create(H_ERR_ALLOC, NULL);
		}

		h_err *err;
		err =
		    build_plugin(dirpath, outdirphp, outdirmisc, outfiles,
				 conf);
		if (err) {
			freeall(&x);
			return err;
		}
	}

	freeall(&x);
	return NULL;
}

This works too, and one would argue that Yodaiken has done the same as defer but without the new feature or a TS or any shenanigans. But there’s a critical part of Yodaiken’s argument where his premise falls apart in the example code provided: refactoring. While he states that in “serious” code he would change this to be a single exit, the example code provided is just one that replaces all of the defer or manual frees of the original to instead be freeall. This was not unanticipated by the proposal he linked to, which not only discusses defer in terms of code savings, but also in terms of vulnerability prevention. And it is exactly that which Yodaiken has fallen into, much like his peers and predecessors who work on large software like the Linux Kernel.

However, one should note that Yodaiken’s changes here actually don’t account for everything. Inside of the loop, it’s not just freeall on error: users need to actually free x.dirpath, x.outdirmisc, and x.outdirphp every single loop. freeall doesn’t account for that, so this is actually a downgrade over the defer version (which fixed these problems). It also didn’t pull from the correct namelist (it should be x.namelist), but we can just chock that up to a quick blog post from 2 years ago trying to fix some typos.

CVE-2021-3744, and the Truth About Programmers

The problem, that Yodaiken misses in his example code rewrite and his advice to developers, is the same one that the programmers responsible for CVE-2021-3744. You see, much like Yodaiken’s rewrite of the code, the function in question here had an object. That object’s name was tag. And just like Yodaiken’s rewrite, it had a function call like freeall that was meant to be called at the exit point of the function: ccp_dm_free. The problem, of course, is that along one specific error path, in conjunction with other flow control issues, the V5 CCP’s tag structure was not being properly freed. That’s a leak of (potentially sensitive) information; thankfully, at most it could provoke a Denial of Service, per the original reporter’s claims.

This is the exact pitfall that Yodaiken’s own code is subject to.

It’s not that there isn’t a way, in code as plain as C90, to write a function that frees everything. The problem is that in any sufficiently complex system, even with one that has as many eyeballs as bits of the cryptography code in the Linux Kernel, one might not be able to trace all the through-lines for any specifically used data. The function in question for CVE-2021-3744 had exactly what Yodaiken wanted: a single exit point after doing preliminary returns for precondition/invalid checks, goto to a series of laddered cleanup statements for the very end, highly reviewed code, and being developed in as real a context as it gets (the Linux Kernel). But, it still didn’t work out.

Thankfully, this CVE is only a 5.5 – denial of service, maybe a bit of information leakage – but it’s not the first screwup of this sort. This is only one of hundreds of CVEs that follow the same premise, that have been unearthed over the last 25-summat years² of vulnerability tracking. And, most importantly, Yodaiken’s code can be changed in the face of defer, in a way that both reduces the number of lines written and does all the same things Yodaiken’s code does, but with better future proofing and less potential leaks:

struct plugins {
	char *pluginsdir;
	char *outpluginsdirphp;
	char *outpluginsdirmisc;
	char *dirpath;
	char *outdirphp;
	char *outdirmisc;
	int n;
	struct dirent **namelist;
};

void freeall(struct plugins *x)
{
	free(x->pluginsdir);
	free(x->outpluginsdirphp);
	free(x->outpluginsdirmisc);
	free(x->dirpath);
	free(x->outdirphp);
	free(x->outdirmisc);
	for (int i = 0; i < x->n; i++) {
		free(x->namelist[i]);
	}
}

void freeloop_all(struct plugins *x) {
	free(x->dirpath);
	free(x->outdirphp);
	free(x->outdirmisc);
	x->dirpath = nullptr;
	x->outdirphp = nullptr;
	x->outdirmisc = nullptr;
}

h_err *h_build_plugins(const char *rootdir, h_build_outfiles outfiles,
		       const h_conf * conf)
{
	struct plugins x = { 0, };
	defer freeall(&x);
	x.pluginsdir = h_util_path_join(rootdir, H_FILE_PLUGINS);
	if (pluginsdir == NULL)
		return h_err_create(H_ERR_ALLOC, NULL);
	x.outpluginsdirphp = h_util_path_join(rootdir,
					      H_FILE_OUTPUT "/" H_FILE_OUT_META
					      "/" H_FILE_OUT_PHP);
	if (outpluginsdirphp == NULL) {
		return h_err_create(H_ERR_ALLOC, NULL);
	}
	x.outpluginsdirmisc = h_util_path_join(rootdir,
					       H_FILE_OUTPUT "/" H_FILE_OUT_META
					       "/" H_FILE_OUT_MISC);
	if (x.outpluginsdirmisc == NULL) {
		return h_err_create(H_ERR_ALLOC, NULL);
	}
	//Check status of rootdir/plugins, returning if it doesn’t exist
	{
		int err = h_util_file_err(x.pluginsdir);
		if (err == ENOENT) {
			return NULL;
		}
		if (err && err != EEXIST) {
			return h_err_from_errno(err, x.pluginsdir);
		}
	}

	//Create dirs if they don’t exist
	if (mkdir(x.outpluginsdirphp, 0777) == -1 && errno != EEXIST) {
		return h_err_from_errno(errno, x.outpluginsdirphp);
	}
	if (mkdir(outpluginsdirmisc, 0777) == -1 && errno != EEXIST) {
		return h_err_from_errno(errno, outpluginsdirmisc);
	}
	//Loop through plugins, building them
	x.n = scandir(x.pluginsdir, &x.namelist, NULL, alphasort);
	if (n == -1) {
		return h_err_from_errno(errno, x.namelist);
	}
	for (int i = 0; i < n; ++i) {
		struct dirent *ent = x.namelist[i];
		if (ent->d_name[0] == '.') {
			continue;
		}
		defer freeloop_all(&x);
		x.dirpath = h_util_path_join(x.pluginsdir, ent->d_name);
		if (x.dirpath == NULL) {
			return h_err_create(H_ERR_ALLOC, NULL);
		}
		x.outdirphp = h_util_path_join(outpluginsdirphp, ent->d_name);
		if (x.outdirphp == NULL) {
			return h_err_create(H_ERR_ALLOC, NULL);
		}
		x.outdirmisc =
		    h_util_path_join(x.outpluginsdirmisc, ent->d_name);
		if (x.outdirmisc == NULL) {
			return h_err_create(H_ERR_ALLOC, NULL);
		}

		h_err *err;
		err =
		    build_plugin(dirpath, outdirphp, outdirmisc, outfiles,
				 conf);
		if (err) {
			return err;
		}
	}

	return NULL;
}

As you can see here, we made three ⸺ just three ⸺ change to Yodaiken’s code here: we use defer freeall(&x) at the very start of the function and delete it everywhere else. We fix the loop part (again) correctly with defer freeloop_all(&x);, which was forgotten in the Yodaiken version. And, to make that possible, we have an additional function of freeloop_all and a modified freeall, to accomodate this. (The removal of the if checks is not necessary, but it should be noted free is one of the very, VERY few functions in the C standard library that’s explicitly documented to be a no-op with a null pointer input).

With defer, we no longer need to add a freeall(&x) at every exit point, nor do we need a ladder of gotos cleaning up specific things (in the case where the structure didn’t exist and we tried to use a single exit point). We also don’t accidentally leak loop resources, too.

It’s not that Yodaiken’s principle of change wasn’t an improvement over the existing code (consolidating the frees), it’s just that it simply failed to capture the point of the use of defer: no matter how you exit from this function now (save by using runtime control flow), there is no way to forget to free anything. Nor is there any way to forget to free anything on some specific path. The problems of CVE-2021-3744 ⸺ and the hundreds of CVEs like it ⸺ are not really a plausible issue anymore. It means that the C code you write becomes resistant to problems with later changes or refactors: adding additional checks and exits (as we did compared to the original code in the repository, to cover some cases not covered by the original) means a forgotten freeall(&x) doesn’t result in a leak.

This is the power of `defer` in C

Focusing on things that are actually difficult and worth your time is what your talents and efforts are made for. Menial tasks like “did I forget to free this thing or goto the correct cleanup target” are a waste of your time. Even the Linux Kernel is embracing these ideas, because bugs around forgetting to unlock() something or forgetting to free something are awful wastes of everyone’s life, from people who have to report ‘n’ confirm basic resource failures to getting annoying security advisories over fairly mundane failures. We have more interesting code and greater performance gains to be putting our elbow grease into that do not include fiddling with the same basic crud thousands of times.

This is what the defer TS is supposed to bring for C.

But… What About C++?

For C++ people, MOST (but not all) of defer is covered by destructors (and constructors) and by C++’s object model. The chance of having defer in C++, properly, is less than 0. The authors of C++’s library version of this (scope_guard) have intentionally and deliberately abandoned having this in the C++ standard library, and efforts to revive it (including efforts to revive it to spite defer and tell C to stop using defer) have either gone eerily/swiftly quiet or been abandoned. This does not mean there is no dislike or dissent for defer, just that its C++ compatriots have seemed to ⸺ mostly ⸺ calm down and step back from just trying to put raw RAII into C. Not that I would fully object to actually working out an object model and having real RAII, as stated in a previous article and in the rationale of the proposal itself discussing C++ compatibility of defer, certainly not! It’s just that everyone who’s trying has so far done a rather half-baked job of attempting it, mostly in service of their favorite pet feature rather than as a full, intentional integration of a complete object model that C++ is still working out the extreme edge-case kinks of to this day through Core Working Group issues.

There are also some edge cases where defer is actually better than C++, as mentioned in the rationale of the proposal. For example, exceptions butt up against the very strict noexcept rule for destructors (especially since its not just a rule, but required for standard library objects). This means that using RAII to model defer becomes painful when you intentionally want to use defer ⸺ or scope_guard ⸺ as an exception-detection mechanism and a transactional rollback feature. Destructors overwhelming purpose are, furthermore, to make repeatable resource cleanup easy, but in tying it to the object model must store all of the context that is accessible within the object itself so it can be appropriately accessed. Carrying that context can be antithetical to the goals of the given algorithm or procedure, meaning that a lot more effort goes into effective state management and transfer when just having key defer blocks in certain in-line cases would save on both object size and context move/transfer implementation effort. One can get fairly close by having a defer_t<...> templated type in C++ with all move/copy/etc. functions

Destructors can also fall apart in certain specific cases, like in the input and output file streams of C++. Because the destructor needs to finish to completion, cannot throw (per the Standard Library ironclad blanket rules), and must not block or stall (usually), the specification for the C++ standard streams will swallow up any failures to flush the stream when it goes out of scope and the destructor is run. This usually isn’t a problem, but I’ve had to sit in presentations in real life during my C++ Meetup where the engineers gave talks on standard streams (and many of their boost counterparts) making it impossible for them to have high-reliability file operations. They had to build up their own from scratch instead. (I don’t think Niall Douglass’s (ned13’s) Low-Level File IO had made it into Boost by then.)

Nevertheless, while RAII covers the overwhelming majority of use cases (reusable resource and policy), defer stands by itself as something uniquely helpful for the way that C operates. And, in particular, it can help cover real vulnerabilities that happen in C code due to the simple fact that most people are human beings.

Thusly…

The Time is Now

This is the specification for the defer TS. If you are reading this and you are a compiler vendor, beloved patch writer, or even just a compiler hobbyist, the time to implement this is today. Right now. The whole point of a TS ⸺ and the reason I was forced by previous decisions and discussion out of my control to pick a TS ⸺ is to obtain deployment experience. Early implementers have already found, recovered, and discovered bugs in their code thanks to defer. There is a wealth of places where using defer will drastically improve the quality of code. Removing a significant chunk of human error as well as reducing risk during refactors or rewrites because someone might forget to add a goto CLEANUP; or a necessary freeThat() call are tangible, real benefits we can do to prevent classes of leaks.

Implement defer. Tell me about it. Tell others about it.

The time is now, before C2Y ships. That’s why it’s a TS. Whether you gate it behind -fdefer-ts/-fexperimental-defer-ts, or you simply make it part of the base offering without needing extra flags, now is the time. The Committee is starting to constrict and retract heavily from the improvements in C23, and vendors are starting to get skittish again. They want to see serious groundswells in support; you cannot just sit around quietly, hoping that vendors “get the memo” to make fixes or pick up on your frustrations in mailing lists. Go to them. Register on their bug trackers (and look for existing open bugs). E-mail their lists (but search for threads already addressing things). You must be vocal. You must be loud. You must be direct.

You Must Not Be Ignorable.

With: compiler vendors ⸺ especially the big ones ⸺ getting more and more serious about telling people to Do It In The Standard Or #$&^! Off (with some exceptions); pressure being applied to have greater and greater consensus in the standard itself making that bar higher and higher; and, vendors and individuals getting more and more pissed off about changes to C jeopardizing their implementation efforts and what they view as the integrity of the C language, extensions and changes are more at risk now than ever. Please. Please, please, prettiest of pleases.

Don’t let good changes go down quietly. 💚

Banner and Title Photo by Ethan Sarkar, from Pexels

Footnotes

Author’s note: This is a lie. #embed took 7 years total. ↩
Just spitballing the time, I haven’t actually checked. ↩

Results! - The Big Array Size Survey for C

2025-01-21T00:00:00+00:00

Happy New Year! It is time report the results of the Array Size Operator survey and answer some comments people have been asking for!

The “What” Survey?

As a quick refresher:

#define SIZE_KEYWORD(...) (sizeof(__VA_ARGS__) / sizeof(*(__VA_ARGS__)))

int main () {
	int arfarf[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
	return SIZE_KEYWORD(arfarf); // same as: `return 10;`
}

We were making a built-in operator for this, and that built-in operator was accepted into C2y, the next version of the C standard. The reason we wanted a built-in operator for this was to prevent the typical problems we have with macro, which (at least with the above definition) manifests a few issues:

double-evaluation of e.g. getting the size of the 1-d part of a 2-d array int meow[3][4]; /* ... */ SIZE_KEYWORD(meow[first_idx()]);
macro-trampling of normal user code without warning e.g. #define array_size(....) /* ... /* (hope you weren’t using the word “array_size” anywhere important!);
and, better type safety e.g., SIZE_KEYWORD((int**)0) is a legal call given the above definition, and takes significant additional effort to improve type safety beyond the bogstandard basic definition.

Of course, the easier it is to understand the feature (3 bullets in a bulleted list and one code snippet), the more debate perverts crawl out of the woodwork to start getting their Bikeshedding-jollies in on things like the name. The flames of argumentation raged powerfully for an hour or so in the Committee, and the e-mails back and forth on the Committee Reflector were fairly involved. It spawned several spin-off papers trying to ascertain the history of the spelling of size functionality (see Jakub Łukasiewicz N3402: Words Used for Retrieving Number of Elements in Arrays and Array-like Objects Across Computer Languages), and even before hand had a survey conducted at ARM for it (see Chris Bazley’s N3350: Survey Results for Naming of New nelementsof() Operator).

I had my own opinions about the subject, but rather than wax poetical, I figured I’d follow Chris Bazley’s lead and just…. ask everyone. So, I just went and asked everyone.

How?

If you want to read the methodology for how all this worked, you can read the “Methodology” section of N3440: The Big Array Size Survey. We’re going to dive straight into the results, both the fluffy results and the serious results. There were 1,049 unique responses to the survey. A few had to be culled out. A few were partial responses; followup responses with those people (when possible) did not allow us to complete their responses, so they were recorded down as being neutral. You can access the data and see the Python Script that generated the graphs and the data at this repository. You can replicate the graphs NOT by running the script (that parses the raw data that only we have access to), but by doing the same matplotlib shenanigans after parsing the CSV. We’re not handing out the raw AllCounted data because it includes e-mail address, IP Addresses, and general location information, and we figure that’d be a big breach of privacy if we just handed all that shit over to anyone, so it’s all deleted now after outputting the necessary information instead!

The Respondents

We had quite a large selection of folks from almost every continent (except Antartica). The majority were Professional / Industrial software developers, and a LOT had 5+ years of experience, so we feel this is a pretty good selection of the C populace. Or at least, the population of C people willing to read my blog / check Reddit / check Twitter / check Mastodon / keep their finger on the pulse for a little over 1 month:

We had people from all sorts of cities participating:

The skill level and usage experience distributions were also fairly Professional-oriented, too, with some standout folks using it for 20+ or 30+ years:

Value breakdown:

Just Reading / Just Learning	15	1.43%
Hobbyist / Personal Projects	237	22.64%
Professional / Industrial Software Development	626	59.68%
Academic / Research Software Development	101	9.63%
Software Mentor, Professor / Teacher, or Trainer	59	5.62%
(Used to) Attend Standard Committee Meetings	9	0.86%

Value breakdown:

30+ years	72	6.86%
20 to 30 years	138	13.16%
10 to 20 years	254	24.21%
5 to 10 years	257	24.50%
2 to 5 years	248	23.64%
Recently (0 to 2 years)	78	7.44%

I feel this is a pretty good mix of opinions to have out of a standard 1,049 person survey for a programming language, especially one as old as C! It’s pretty heartening to see folks are reading (and responding) to this website in those kinds of numbers, which is not bad considering I’m not exactly Stack Overflow over here! The overwhelming majority have also used C very, VERY recently:

Value breakdown:

20 to 30 years ago	5	0.48%
10 to 20 years ago	9	0.86%
5 to 10 years ago	20	1.91%
2 to 5 years ago	58	5.53%
Recently (0-2 years ago)	955	91.04%

Still, this is all just set dressing so that we can bring up the part everyone actually cares about.

The Results

Prefacing what will be an endless shitstorm of opinions and interpretations, the results are not exactly an OVERWHELMING mandate in any specific direction.

But.

There is a fairly convincing argument that there’s a few things the C community are beginning to lean towards in these recent years, exemplified in the results and the comments. Of course, this is not a unanimous lean, as the C community is huge and there’s quite a few different needs it needs to fill. But there’s a clear preference for specific options, which we’re going to start getting into below.

On the Delivery Mechanism: Keyword/Macro Style Regardless of Spelling

Here’s the results for the three options of:

lowercase keyword with no header;
_Keyword + stdkeyword.h macro;
and, _Keyword with no header.

There is a clear preference for a lowercase keyword, here, though it is not by the biggest margin. One would imagine that with the way we keep standardizing things since C89 (starting with _Keyword and then adding a header with a macro) that C folks would be overwhelmingly in favor of simply continuing that style. The graph here, however, tells a different story: while there’s a large contingency that clearly hates having _Keyword by itself, it’s not the _Keyword + stdkeyword.h macro that comes out on top! It’s just having a plain lowercase keyword, instead.

One can imagine this is a far less conservative set of professionals and industry members who have begun to realize that the payoff for working with _Bool and is just not worth the struggle. Users already have to opt-in to breaking changes with standard flags. Constantly having code break because you’re not manically and persistently writing things in the ugliest way possible – and then having it breaking in some source file because you didn’t include the right things or some transitive include didn’t work – is annoying.

This doesn’t necessarily represent everyone’s ideas on the subject material, though. Some comments are strongly in-favor of the traditional spelling, for obvious reasons. From Social Media:

huh, new lowercase keyword? Have these people not heard of not breaking existing code?

— mirabilos, January 18, 2025

This perception was immediately countered in a reply to the post:

we do and we prefer to have nice things that we can actually use.

And spend the time fixing old code

— Thomas Depierre, January 18, 2025

Both perspectives can also be found in the comments of the survey itself:

I think C23 is a great turning point to implement disruptive changes, so if we want a keyword (which I’m sure we want) now is the moment to introduce it. Who knows when there will be another chance of breaking away with the past like we have right now.

My 2 cents: this decision affects people twenty years from now and forward. Think about them. Make it easy for newcomers to learn C, i.e. avoid/limit arcane incantations.

I suspect if there was a header then I wouldn’t use it, but I guess it wouldn’t hurt; _Countof seems slightly easier than ‘#include ... countof(...)'. One benefit I can see to '_Countof' (etc.) over 'countof' is that it makes clear 'this is new in C2y' (so C99/etc compatible code beware), but I can also see why it standing out might not be good (since it fits in less, and C already_has_lots_of_underscores). As for the name, _Lengthof is OK but sounds a bit similar to sizeof, and I can see _Lengthof("")==1 being odd. _Nelemsof looks weird but makes a lot of sense.

Have some guts for land’s sake and just add the dang keyword!

_Keyword sucks. Officially provided functions should all be lowercase.

Header macro seems the only sane way.

In general, I’m strongly against any alteration of the global, unprefixed namespace at this point; there are enough rules as it is. Chances are whatever it is will be #if‘d in, b/c compilers won’t support this for decades, so chances are the extra macro and header would be pointless machinery.

If an _Underscore keyword with a macro in the header is selected, I would imagine that it could transition to a lowercase-no-underscore keyword after a transition period (compare bool, alignof, etc.)

This, of course, is in opposition to other comments made:

While I hate the transition period between underscores macros and lowercase keywords, I recognize it is necessary for such a basic and core concept that will have been implemented independently many times over the last 40+ years. Opting in with a header feels too obtuse however.

why not just a macro in a header? strong dislike for a keyword. especially since the operator already conflicts with names I am aware of.

And, as normal, _Generic-style underscore keywords only are the least popular idea ever:

using _Under naming and not including a macro in a header would be frustrating

Interestingly, there was an idea to have an explicit in-source way of opting into the new spelling. Because there’s no such controls in the C language at this time, it manifested in the usual request for improvements to C being cordoned off into a new header entirely:

I would like macro headers (like stdbool.h, stdalign.h, assert.h (I think?)) if we could get all of the ones relevant to a given version of C under one single umbrella header, like stdc23.h or similar

(The “I think?” here is correct - until C23 static_assert was actually spelled _Static_assert instead.) I think the desire to be able to opt into a specific standards version are usually something left to command line flags, but I will say that such command line flags – as they generally come from outside the source and from a build system (or a… ““build system””) – are annoying to library developers. Getting clean builds across multiple compilers is often an exercise in futility, especially if you abandon the open source world and start doing proprietary work (MSVC, ${Embedded and Accelerator Devs here}, …). A header seems like the best “what we do with current technology” bit right now, but others have ideas to make dialects more recognizable through source code like N3407.

My personal opinion is that the opposition to the traditional method may honestly be a pragmatic long-term choice. Introducing a _Keyword, waiting 12 to 30 years, and then just making it a lowercase version anyways as the roar of “it’s very stupid that I have to write things the ass-backwards way unless I include a header” grows louder is a song and dance a lot of people have not been happy to do over time. This flies in the face of “old code should port to new versions of the standard fairly simply”, however, so of course the usual conservative concerns are likely to prevail overall in Committee discussion when this survey is brought up.

The point that C23 – and perhaps C2y – may be disruptive enough to justify just adding the keywords directly is a tempting idea, though. And I’m certainly not one to really enjoy the underscore-keyword + header two-step we’ve developed in C. But, if we were doing raw democracy, the lowercase keyword folks would prevail here.

On the Spelling: Which Word To Use Regardless of Delivery

There was a clear preference among the results out of the following choices:

extentof/_Extentof
nelementsof/_Nelementsof;
nelemsof/_Nelemsof;
countof/_Countof;
lengthof/_Lengthof;
and, lenofof/_Lenof.

This one is actually more interesting after perusing the comments and seeing what people wrote on social media and in forums in reponse to this. There is actually a small degree of backlash against _Lengthof/lengthof due to its associations with strings, and the problem of length and strlen implying a count of N-1 (or up to the null terminator) when the operator doesn’t behave like that at all. In fact:

The off-by-ones are real with string literals. When we banned span construction from string literals in Chromium we found code expecting to make a span without the nul but it was including it of course. We have two explicit ways to make a span from a string literals that make the user choose to include nul or not (the default, which matches what you see in the code).

— dana, November 5th, 2024

This sentiment was repeated in the comments of the survey:

I think countof is the best option because it’s less likely than lengthof/lemon to get confused with string length, much easier to remember how to spell than nelements/nelems/etc. (especially for non-English speakers), and extremely clear in its meaning.

I’d like different terminologies for different things. Let “length” be for “string length”, “size” be for “in-memory size (in bytes)”, so “count” is for “element count”.

Some people had less technical reasons for hating any given option, though. Some of it boiled down to raw preference, or just simply being reminded of things they disliked:

Count reminds me of PHP, which is why I hate. The most appealing option is having beginners learn that the size of something is often in bytes while the length of something is in blocks of arbitrary size. Something simple that’s not hard to remember or to write.

And others clung to the strict mathematics / old-person’s like of “extent”:

Neither count, size or length do well with multidimensional arrays. One might justifiably expect countof((int[4][4]){}) or lengthof((int[4][4]){}) to be 16 instead of 4. So while I like countof more, I think extentof is the most unambiguous naming.

But, ultimately, the stacked bar chart shows that not only is countof and _Countof the most liked, it’s also the least disliked. It’s better on just about every metric insofar as the counted votes are concerned, really. This isn’t the say that it would have always been on top, given different spellings. There were a lot of protesting comments, wanting either more options or completely different options entirely:

nelems() would be better than nelemsof(), to be consistent with nitems().

Please consider “arraysizeof” or “asizeof” or “arraysize”

Why not refer to prior arts? What are these options??

_Array_size

arraycount()

Just use nitems. What existing definition” is there to clobber that isn’t already exactly what you’re trying to achieve? Why do we need to invent yet another name? All the suggestions are trying to contort themselves around not being nitems. “of” suffix is not important to chase.

I’d rather that you standardized existing practice unchanged; the BSD macros are fine. But if you must standardize an operator, at least let me pretend it doesn’t exist. I won’t use it, because there’s only portability-related downside over the macro based version.

I don’t see how ARRAY_SIZE would be awkward, it’s what I have in my own code

My macro is C_ARRAY_SIZE(a)

arrsizeof - 42 files on github

I feel like nof or noof should’ve been an option

There’s a lot of ask for arraycount/arraysize that showed up, but the reason those were culled from the running early (just like nitems) is simply because the blast radius was known to be enormous; any spelling of that was going to blow up a million people. This was even worse for comments that suggested we take the of off of lenof or lengthof or countof to just be count, len, or length; the number of identifiers people would need to goosestep around would be enormous. nelementsof was the original plan from the paper before the ARM Survey conducted by Bazley swayed Committee opinion. I, personally, expected lengthof/_Lengthof to win in this wider survey I conducted; I expected ARM’s engineering consensus to be the dominant consensus throughout the industry.

But, that seems not to be the case!

On the Exact Spelling: A Cross-Section of Delivery and Spelling

There’s not too much to say about this: it’s got a lot less responses since it was an optional question (~650 filled out, versus the 1040+ for the other mandatory questions). But, even with a reduced pool, the same trends and ideas from combining the other two polls manifest fairly reliably for the exact spelling options:

Namely, countof as a keyword with no macro or header has the least dislike and the most likes. Various options steadily fall off from there. In the specific options, lengthof as a keyword with no macro or header comes close, and then from there it’s lengthof/countof as macros in a header, and then various worse options as one continues to look for different combinations. It more or less reinforces the previously points. There’s more comments (some funny/irrelevant ones too), but I think this should provide a solid basis for the necessary data.

I expect people to simply keep bikeshedding. Even with all of this data people will still argue for and against things, but at least I can say I did get the data for all of this! 💚

Banner and Title Photo by Mikael Blomkvist, from Pexels: https://www.pexels.com/photo/person-using-white-ipad-6476590/

The Big Array Size Survey for C

2024-11-06T00:00:00+00:00

New in C2y is an operator that does something people have been asking us for, for decades: something that computes the size in elements (NOT bytes) of an array-like thing. This is a great addition and came from the efforts of Alejandro Colomar in N3369, and was voted into C2y during the recently-finished Minneapolis, MN, USA 2024 standardization meeting. But, there’s been some questions about whether we chose the right name or not, and rather than spend an endless amount of Committee time bikeshedding and arguing about this, I wanted to put this question to you, the user, with a survey! (Link to the survey at the bottom of the article.)

The Operator

Before we get to the survey (link at the bottom), the point of this article is to explain the available choices so you, the user, can make a more informed decision. The core of this survey is to provide a built-in, language name to the behavior of the following macro named SIZE_KEYWORD:

#define SIZE_KEYWORD(...) (sizeof(__VA_ARGS__) / sizeof(*(__VA_ARGS__)))

int main () {
	int arfarf[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
	return SIZE_KEYWORD(arfarf); // same as: `return 10;`
}

This is called nitems() in BSD-style C, ARRAY_SIZE() by others in C with macros, _countof() in MSVC-style C, std::size() (a library feature) and std::extent_v<...> in C++, len() in Python, ztdc_size() in my personal C library, extent in Fortran and other language terminology, and carries many other names both in different languages but also in C itself.

The survey here is not for the naming of a library-based macro (though certain ways of accessing this functionality could be through a macro): there is consensus in the C Standard Committee to make this a normal in-language operator so we can build type safety directly into the language operator rather than come up with increasingly hideous uses of _Generic to achieve the same goal. This keeps compile-times low and also has the language accept responsibility for things that it, honestly, should’ve been responsible for since 1985.

This is the basic level of knowledge you need to access the survey and answer. Further below is an explanation of each important choice in the survey related to the technical features. We encourage you to read this whole blog article before accessing the survey to understand the rationale. The link is at the bottom of this article.

The Choices

The survey has a few preliminary questions about experience level and current/past usage of C; this does not necessarily change how impactful your choice selection will be! It just might reveal certain trends or ideas amongst certain subsets of individuals. It is also not meant to be extremely specific or even all that deeply accurate. Even if you’re not comfortable with C, but you are forced to use it at your Day Job because Nobody Else Will Do This Damn Work, well. You may not like it, but that’s still “Professional / Industrial” C development!

The core part of the survey, however, revolve around 2 choices:

the usage pattern required to get to said operator/keyword;
and, the spelling of the operator/keyword itself.

There’s several spellings, and three usage patterns. We’ll elucidate the usage patterns first, and then discuss the spellings. Given this paper and feature were already accepted to C2y, but that C2y has only JUST started and is still in active development, the goal of this survey is to determine if the community has any sort of preference for the spelling of this operator. Ideally, it would have been nice if people saw the papers in the WG14 document log and made their opinions known ahead-of-time, but this time I am doing my best to reach out to every VIA this article and the survey that is linked at the bottom of the article.

Usage Pattern

Using SIZE_KEYWORD like in the first code sample, this section will explain the three usage patterns and their pros/cons. The program is always meant to return 42.

const double barkbark[] = { 0.0, 0.5, 7.0, 14.7, 23.3, 42.0 };

static_assert(SIZE_KEYWORD(barkbark) == 6, "must have a size of 6");

int main () {
	return (int)barkbark[SIZE_KEYWORD(barkbark) - 1];
}

Underscore and capital letter `_Keyword`; Macro in a New Header

This technique is a common, age-old way of providing a feature in C. It avoids clobbering the global user namespace with a new keyword that could be affected by user-defined or standards-defined macros (from e.g. POSIX or that already exist in your headers). A keyword still exists, but it’s spelled with an underscore and a capital letter to prevent any failures. The user-friendly, lowercase name is only added through a new macro in a new header, so as to prevent breaking old code. Some notable features that USED to be like this:

_Static_assert/static_assert with
_Alignof/alignof with
_Thread_local/thread_local with
_Bool/bool with

As an example, it would look like this:

#include 

const double barkbark[] = { 0.0, 0.5, 7.0, 14.7, 23.3, 42.0 };

_Static_assert(keyword_macro(barkbark) == 6, "must have a size of 6");

int main () {
	return (int)barkbark[_Keyword(barkbark) - 1];
}

Underscore and capital letter `_Keyword`; No Macro in Header

This is a newer way of providing functionality where no effort is made to provide a nice spelling. It’s not used very often, except in cases where people expect that the spelling won’t be used often or the lowercase name might conflict with an important concept that others deem too important to take for a given spelling. This does not happen often in C, and as such there’s really only one prominent example that exists in the standard outside of extensions:

_Generic; no macro ever provided in a header

As an example, it would look like this:

// no header
const double barkbark[] = { 0.0, 0.5, 7.0, 14.7, 23.3, 42.0 };

static_assert(_Keyword(barkbark) == 6, "must have a size of 6");

int main () {
	return (int)barkbark[_Keyword(barkbark) - 1];
}

Lowercase `keyword`; No Macro in Header

This is the more bolder way of providing functionality in the C programming language. Oftentimes, this does not happen in C without a sister language like C++ bulldozing code away from using specific lowercase identifiers. It can also happen if a popular extension dominates the industry and makes it attractive to keep a certain spelling. Technically, everyone acknowledges that the lowercase spelling is what we want in most cases, but we settle for the other two solutions because adding keywords of popular words tends to break somebody’s code. That leads to a lot of grumbling and pissed off developers who view code being “broken” in this way as an annoying busywork task added onto their workloads. For C23, specifically, a bunch of things were changed from the _Keyword + macro approach to using the lowercase name since C++ has already effectively turned them into reserved names:

true, false, and bool
thread_local
static_assert
alignof
typeof (already an existing extension in many places)

As an example, it would look like this:

// no header
const double barkbark[] = { 0.0, 0.5, 7.0, 14.7, 23.3, 42.0 };

static_assert(keyword(barkbark) == 6, "must have a size of 6");

int main () {
	return (int)barkbark[keyword(barkbark) - 1];
}

Keyword Spellings

By far the biggest war over this is not with the usage pattern of the feature, but the actual spelling of the keyword. This prompted a survey from engineer Chris Bazley at ARM, who published his results in N3350 Feedback for C2y - Survey results for naming of new nelementsof() operator. The survey here is not going to query the same set of names, but only the names that seemed to have the most discussion and support in the various e-mails, Committee Meeting discussion, and other drive-by social media / Hallway talking people have done.

Most notably, these options are presented as containing both the lowercase keyword name and the uppercase capital letter _Keyword name. Specific combinations of spelling and usage pattern can be given later during an optional question in the survey, along with any remarks you’d like to leave at the end in a text box that can handle a fair bit of text. There are only 6 names, modeled after the most likely spellings similar to the sizeof operator. If you have another name you think is REALLY important, please add it at the end of the comments section. Some typical names not included with the reasoning:

size/SIZE is too close to sizeof and this is not a library function; it would also bulldoze over pretty much every codebase in existence and jeopardize other languages built on top of / around C.
nitems/NITEMS is a BSD-style way of spelling this and we do not want to clobber that existing definition.
ARRAY_SIZE/stdc_size and similar renditions are not provided because this is an operator exposed through a keyword and not a macro, but even then array_size/_Array_size were deemed too awkward to spell.
dimsof/dimensionsof was, similarly, not all that popular and dimensions as a word did not convey the meaning very appropriately to begin with.
Other brave but unfortunately unmentioned spellings that did not make the cut.

The options in the survey are as below:

`lenof` / `_Lenof`

A very short spelling that utilizes the word “length”, but shortened in the typical C fashion. Very short and easy to type, and it also fits in with most individual’s idea of how this works. It is generally favored amongst C practitioners, and is immediately familiar to Pythonistas. A small point of contention: doing _Lenof(L"barkbark") produces the answer “9”, not “8” (the null terminator is counted, just as in sizeof("barkbark")). This has led some to believe this would result in “confusion” when doing string processing. It’s unclear whether this worry is well-founded in any data and not just a nomenclature issue.

As “len” and lenof are popular in C code, this one would likely need a underscore-capital letter keyword and a macro to manage its introduction, but it is short.

const double barkbark[] = { 0.0, 0.5, 7.0, 14.7, 23.3, 42.0 };

static_assert(_Lenof(barkbark) == 6, "must have an length of 6");

int main () {
	return (int)barkbark[lenof(barkbark) - 1];
}

`lengthof` / `_Lengthof`

This spelling won in Chris Bazley’s ARM survey of the 40 highly-qualified C/C++ engineers and is popular in many places. Being spelled out fully seems to be of benefit and heartens many users who are sort of sick of a wide variety of C’s crunchy, forcefully shortened spellings like creat (or len, for that matter, though len is much more understood and accepted). It is the form that was voted into C2y as _Lengthof, though it’s noted that the author of the paper that put _Lengthof into C is strongly against its existence and thinks this choice will encourage off-by-one errors (similarly to lenof discussed above). Still, it seems like both the least hated and most popular among the C Committee and the adherents who had responded to Alejandro Colomar’s GCC patch for this operator. Whether it will continue to be popular with the wider community has yet to be seen.

As “length” and lengthof are popular in C code, this one would likely need a underscore-capital letter keyword and a macro to introduce it carefully into existing C code.

const double barkbark[] = { 0.0, 0.5, 7.0, 14.7, 23.3, 42.0 };

static_assert(_Lengthof(barkbark) == 6, "must have an length of 6");

int main () {
	return (int)barkbark[lengthof(barkbark) - 1];
}

`countof` / `_Countof`

This spelling is a favorite of many people who want a word shorter than length but still fully spelled out that matches its counterpart size/sizeof. It has strong existing usage in codebases around the world, including a definition of this macro in Microsoft’s C library. It’s favored by a few on the C Committee, and I also received an e-mail about COUNT being provided by the C library as a macro. It was, unfortunately, not polled in the ARM survey. It also conflicts with C++’s idea of count as an algorithm rather than an operation (C++ just uses size for counting the number of elements). It is dictionary-definition accurate to what this feature is attempting to do, and does not come with off-by-one concerns associated with strings and “length”, typically.

As “count” and countof are popular in C code, this too would need some management in its usage pattern to make it available everywhere without getting breakage in some existing code.

const double barkbark[] = { 0.0, 0.5, 7.0, 14.7, 23.3, 42.0 };

static_assert(_Countof(barkbark) == 6, "must have an length of 6");

int main () {
	return (int)barkbark[countof(barkbark) - 1];
}

`nelemsof` / `_Nelemsof`

This spelling is an alternative spelling to nitems() from BSD (to avoid taking nitems from BSD). nelemsof is also seem as the short, cromulent spelling of another suggestion in this list, nelementsof. It is a short spelling but lacks spaces between n and elems, but emphasizes this is the number of elements being counted and not anything else. The n is seen as a universal letter for the count of things, and most people who encounter it understand it readily enough. It lacks problems about off-by-one counts by not being associated with strings in any manner, though n being a common substitution for “length” might bring this up in a few people’s minds.

As “nelems” and nelems are popular in C code, this too would need some management in its usage pattern to make it available everywhere without getting breakage in some existing code.

const double barkbark[] = { 0.0, 0.5, 7.0, 14.7, 23.3, 42.0 };

static_assert(_Nelemsof(barkbark) == 6, "must have an length of 6");

int main () {
	return (int)barkbark[nelemsof(barkbark) - 1];
}

`nelementsof` / `_Nelementsof`

This is the long spelling of the nelemsof option just prior. It is the preferred name of the author of N3369, Alejandro Colomar, before WG14 worked to get consensus to change the name to _Lengthof for C2y. It’s a longer name that very clearly states what it is doing, and all of the rationale for nelems applies.

This is one of the only options that has a name so long and unusual that it shows up absolutely nowhere that matters. It can be standardized without fear as nelements with no macro version whatsoever, straight up becoming a keyword in the Core C language without any macro/header song-and-dance.

const double barkbark[] = { 0.0, 0.5, 7.0, 14.7, 23.3, 42.0 };

static_assert(nelementsof(barkbark) == 6, "must have an length of 6");

int main () {
	return (int)barkbark[nelementsof(barkbark) - 1];
}

`extentof` / `_Extentof`

During the discussion of the paper in the Minneapolis 2024 meeting, there was a surprising amount of in-person vouching for the name extentof. They also envisioned it coming with a form that allowed to pass in which dimension of a multidimensional array you wanted to get the extent of, similar to C++’s std::extent_v and std::rank_v, as seen here and here. Choosing this name comes with the implicit understanding that additional work would be done to furnish a rankof/_Rankof (or similar spelling) operator for C as well in some fashion to allow for better programmability over multidimensional arrays. This option tends to appeal to Fortran and Mathematically-minded individuals in general conversation, and has a certain appeal among older folks for some reason I have not been able to appropriately pin down in my observations and discussions; whether or not this will hold broadly in the C community is anyone’s guess.

As “extent” is a popular word and extentof similarly, this one would likely need a macro version with an underscore capital-letter keyword, but the usage pattern can be introduced gradually and gracefully.

const double barkbark[] = { 0.0, 0.5, 7.0, 14.7, 23.3, 42.0 };

static_assert(_Extentof(barkbark) == 6, "must have an extent of 6");

int main () {
	return (int)barkbark[extentof(barkbark) - 1];
}

The Survey

Here’s the survey: https://www.allcounted.com/s?did=qld5u66hixbtj&lang=en_US.

There is an optional question at the end of the survey, before the open-ended comments, that allows for you to also rank and choose very specific combinations of spelling and feature usage mechanism. This allows for greater precision beyond just answering the two core questions, if you want to explain it.

Employ your democratic right to have a voice and inform the future of C, today!

Good Luck! 💚

Banner and Title Photo by Luka, from Pexels

5 Years Later: The First Win

2024-10-08T00:00:00+00:00

N3366 - Restartable Functions for Efficient Character Conversions has made it into the C2Y Standard (A.K.A., “the next C standard after C23”). And one of my longest struggles — the sole reason I actually came down to the C Standards Committee in the first place —has come to a close.

Yes.

When I originally set out on this journey, it was over 6 years ago in the C++ Unicode Study Group, SG16. I had written a text renderer in C#, and then in C++. As I attempted to make that text renderer cross-platform in the years leading up to finally joining Study Group 16, and kept running into the disgustingly awful APIs for doing text conversions in C and C++. Why was getting e.g. Windows Command Line Arguments into UTF-8 so difficult in standard C and C++? Why was using the C standard functions on a default-rolled Ubuntu LTS at the time handing me data that was stripping off accent marks? It was terrible. It was annoying. It didn’t make sense.

It needed to stop.

Originally, I went to C++. But the more I talked and worked in the C++ Committee, the more I learned that they weren’t exactly as powerful or as separate from C as they kept claiming. This was especially when it came to the C standard library, where important questions about wchar_t, the execution encoding, and the wide execution encoding were constantly punted to the C standard library rather than changed or mandated in C++ to be better. Every time I wanted to pitch the idea of just mandating a UTF-8 execution encoding by default, or a UTF-8 literal encoding by default, I just kept getting the same qualms: “C owns the execution encoding” and “C owns the wide encoding” and “C hasn’t really separated wchar_t from its historical mistakes”. And on and on and on. So, well.

I went down there.

Of course, there were even more problems. Originally, I had proposed interfaces that looked fairly identical to the existing set of functions already inside of and . This was, unfortunately, a big problem: the existing design, as enumerated in presentation after presentation and blog post after blog post, was truly abysmal. These 1980s/1990s functions are wholly incapable of handling the encodings that were present even at 1980, and due to certain requirements on types such as wchar_t we ended up creating problematic functions with unbreakable Application Binary Interfaces (ABIs).

During a conversation on very-very-old Twitter now, I was expressing my frustration about these functions and how they’re fundamentally broken. But that if I wanted to see success, there was probably no other way to get the job done. After all, what is the most conservative and new-stuff-hostile language if not C, the language that’s barely responded to everything from world-shattering security concerns to unearthed poor design decisions for some 40 years at that point? And yet, Henri Sivonen pointed out that going that route was still just as bad: why would I standardize something I know is busted beyond all hope?

Contending with that was difficult. Why should I be made to toil due to C’s goofed up 1989 deficiencies? But, at the same time, how could I be responsible for continuing that failure into the future in-perpetuity? Neither of these questions was more daunting than the fact that what was supposed to be a “quick detour” into C would instantly crumble away if I accepted this burden. Doing things the right way meant I was signing up for not just a quick, clean, 1-year-max brisk journey, but a deep dungeon dive that could take an unknown and untold amount of time. I had to take a completely different approach from iconv and WideCharToMultiByte and uconvConvert and mbrtowc; I would need to turn a bunch of things upside down and inside out and come up with something entirely new that could handle everything I was talking about. I had to take the repulsive force of the oldest C APIs, and grasp the attractive forces of all of the existing transcoding APIs,

and unite them into something entirely different and powerful…

_{Imaginary Technique: Cuneicode}

Henri was right.

It took a lot of me to make this happen. But, I made it happen. Obviously, it will take some time for me to make the patches to implement this for glibc, then musl-libc. I don’t quite remember if the Bionic team handling Android’s standard library takes submissions, and who knows if Apple’s C APIs are something I can contribute usefully to. Microsoft’s C standard library, unlike its C++ one, is also still proprietary and hidden. Microsoft still does a weird thing where, on some occasions, it completely ignores its own Code Page setting and just decides to use UTF-8 only, but only for very specific functions and not all of them.

I GENUINELY hope Microsoft doesn’t make the mistake in these new functions to not provide proper conversions to UTF-8, UTF-16, and UTF-32 through their locale-based execution encoding. These APIs are supposed to give them all the room to do proper translation of locale-based execution encoding data to the UTFs, so that customers can rely on the standard to properly port older and current application data out into Unicode. They can use the dedicated UTF-8-to-UTF-16 and vice versa functions if needed. The specification also makes it so they don’t have to accumulate data in the mbstate_t except for radical stateful encodings, meaning there’s no ABI concerns for their existing stuff so long as they’re careful!

But Microsoft isn’t exactly required to listen to me, personally, and the implementation-defined nature of execution encoding gives them broad latitude to do whatever the hell they want. This includes ignoring their own OEM/Active CodePage settings and just forcing the execution encoding for specific functions to be “UTF-8 only”, while keeping it not-UTF-8 for other functions where it does obey the OEM/Active CodePage.

All in All, Though?

The job is done. The next target is for P1629 to be updated and to start attending SG16 and C++ again (Hi, Tom!). There’s an open question if I should just abandon WG14 now that the work is done, and it is kind of tempting, but for now… I’m just going to try to get some sleep in, happy in the thought that it finally happened.

We did it, chat.

A double-thanks to TomTom and Peter Bindels, as well as the Netherlands National Body, NEN. They allowed me to attend C meetings as a Netherlands expert for 5 years now, ensuring this result could happen. A huge thanks to all the Sponsors and Patrons too. We haven’t written much in either of those places so it might feel barren and empty but I promise you every pence going into those is quite literally keeping me and the people helping going.

And, most importantly, an extremely super duper megathanks h-vetinari, who spent quite literally more than a year directly commenting on every update to the C papers directly in my repository and keeping me motivated and in the game. It cannot be understated how much those messages and that review aided me in moving forward.

God Bless You. 💚

Banner and Title Photo by Coco Championship, from Pexels
Imaginary Technique: Purple Image by ZerosPanda (NSFW Artist, Careful Clicking Through!)

Improving _Generic in C2y

2024-08-01T00:00:00+00:00

The first two meetings of C after C23 was finalized are over, and we have started working on C2y. We decided that this cycle we’re not going to do that “Bugfix” followed by “Release” stuff, because that proved to be a REALLY bad idea that killed a ton of momentum and active contributors during the C11 to C17 timeframe. So, this time, we’re hitting both bugfixes AND features so we can make sure we don’t lose valuable contributions and fixes by stalling for 5 to 6 years again. So, with that… on to fixes!

Generic Selection, a Primer

_Generic — the keyword that’s used for a feature that is Generic Selection — is a deeply hated C feature that everyone likes to dunk on for both being too much and also not being good enough at the same time. It was introduced during C11, and the way it works is simple: you pass in an expression, and it figures out the type of that expression and allows you to match on that type. With each match, you can insert an expression that will be executed thereby giving you the ability to effectively have “type-based behavior”. It looks like this:

int f () {
	return 45;
}

int main () {
	const int a = 1;
	return _Generic(a,
		int: a + 2,
		default: f() + 4
	);
}

As demonstrated by the snippet above, _Generic(...) is considered an expression itself. So it can be used anywhere an expression can be used, which is useful for macros (which was its primary reason for being). The feature was cooked up in C11 and was based off of a GCC built-in (__builtin_choose_expr) and an EDG special feature (__generic) available at the time, after a few papers came in that said type-generic macros were absolutely unimplementable. While C has a colloquial rule that the C standard library can “require magic not possible by normal users”, it was exceedingly frustrating to implement type-generic macros — specifically, — without any language support at all. Thus, _Generic was created and a language hole was patched out.

There are, however, 2 distinct problems with _Generic as it exists at the moment.

Problem 0: “l-value conversion”

One of the things the expression put into a _Generic expression undergoes is something called “l-value conversion” for determining the type. “l-value conversion” is a fancy “phrase of power” (POP) in the standard that means a bunch of things, but the two things we’re primarily concerned about are:

arrays turn into pointers;
and, qualifiers are stripped off.

This makes some degree of sense. After all, if we took the example above:

int f () {
	return 45;
}

int main () {
	const int a = 1;
	return _Generic(a,
		int: a + 2,
		default: f() + 4
	);
}

and said that this example returns 49 (i.e., that it takes the default: branch here because the int: branch doesn’t match), a lot of people would be mad. This helps _Generic resolve to types without needing to write something very, very convoluted and painful like so:

int f () {
	return 45;
}

int main () {
	const int a = 1;
	return _Generic(a,
		int: a + 2,
		const int: a + 2,
		volatile int: a + 2,
		const volatile int: a + 2,
		default: f() + 4
	);
}

In this way, the POP “l-value conversion” is very useful. But, it becomes harder: if you want to actually check if something is const or if it has a specific type, you have to make a pointer out of it and make the expression a pointer. Consider this TYPE_MATCHES_EXPR bit, Version Draft 0:

#define TYPE_MATCHES_EXPR(DESIRED_TYPE, ...) \
	_Generic((__VA_ARGS__),\
		DESIRED_TYPE: 1,\
		default: 0 \
	)

If you attempt to use it, it will actually just straight up fail due to l-value conversion:

static const int a;
static_assert(TYPE_MATCHES_EXPR(const int, a), "AAAAUGH!"); // fails with "AAAAUGH!"

We can use a trick of hiding the qualifiers we want behind a pointer to prevent “top-level” qualifiers from being stripped off the expression:

#define TYPE_MATCHES_EXPR(DESIRED_TYPE, ...) \
	_Generic(&(__VA_ARGS__),\
		DESIRED_TYPE*: 1,\
		default: 0\
	)

And this will work in the first line below, but FAIL for the second line!

static const int a;
static_assert(TYPE_MATCHES_EXPR(const int, a), "AAAAUGH!"); // works, nice!
static_assert(TYPE_MATCHES_EXPR(int, 54), "AAAAUGH!"); // fails with "AAAAUGH!"

In order to combat this problem, you can use typeof (standardized in C23) to add a little spice by creating a null pointer expression:

#define TYPE_MATCHES_EXPR(DESIRED_TYPE, ...) \
	_Generic((typeof((__VA_ARGS__))*)0,\
		DESIRED_TYPE*: 1,\
		default: 0\
	)

Now it’ll work:

static const int a;
static_assert(TYPE_MATCHES_EXPR(const int, a), "AAAAUGH!"); // works, nice!
static_assert(TYPE_MATCHES_EXPR(int, 54), "AAAAUGH!"); // works, yay!

But, in all reality, this sort of “make a null pointer expression!!” nonsense is esoteric, weird, and kind of ridiculous to learn. We never had typeof when _Generic was standardized so the next problem just happened as a natural consequence of “standardize exactly what you need to solve the problem”.

Problem 1: Expressions Only?!

The whole reason we need to form a pointer to the DESIRED_TYPE we want is to (a) avoid the consequences of l-value conversion and (b) have something that is guaranteed (more or less) to not cause any problems when we evaluate it. Asides from terrible issues with Variably-Modified Types/Variable-Length Arrays and all of the Deeply Problematic issues that come from being able to use side-effectful functions/expressions as part of types in C (even if _Generic guarantees it won’t evaluate the selection expression), this means forming a null pointer to something is the LEAST problematic way we can handle any given incoming expression with typeof.

More generally, however, this was expected to just solve the problem of “make type-generic macros in C to implement ”. There was no other benefit, even if a whole arena of cool uses grew out of _Generic and its capabilities (including very very basic type inspection / queries at compile-time). The input to type-generic macros was always an expression, and so _Generic only needed to take an expression to get started. There was also no standardized typeof, so there was no way to take the INPUT parameter or __VA_ARG__ parameter of a macro and get a type out of it in standard C anyways. So, it only seemed natural that _Generic took only an expression. Naturally, as brains got thinking about things,

someone figured out that maybe we can do a lot better than that!

Moving the Needle

Implementers had, at the time, been complaining about not having a way to match on types directly without doing the silly pointer tricks above because they wanted to implement tests. And some of them complained that the standard wasn’t giving them the functionality to solve the problem, and that it was annoying to reinvent such tricks from first principles. This, of course, is at the same time that implementers were also saying we shouldn’t just bring papers directly to the standard, accusing paper authors of “inventing new stuff and not standardizing existing practice”. This, of course, did not seem to apply to their own issues and problems, for which they were happy to blame ISO C for not figuring out a beautiful set of features that could readily solve the problems they were facing.

But, one implementer then got a brilliant idea. What if they flexed their implementer muscles? What if they improved _Generic and reported on that experience without waiting for C standard to do it first? What if implementers fulfilled their end of the so-called “bargain” where they actually implemented extensions? And then, as C’s previous charters kept trying to promise (and then fail to deliver on over and over again over decades), what if those implementers then turned around to the C standard to standardize their successful existing practice so that we could all be Charter-Legal about all of this? After all, it would be way, WAY better than being perpetually frozen with fear that if they implemented a (crappy) extension they’d be stuck with it forever, right? It seems like a novel idea in this day and age where everything related to C seems conservative and stiff and boring. But?

Aaron Ballman decided to flex those implementer muscles, bucking the cognitive dissonance of complaining that ISO C wasn’t doing anything, not writing a paper, and not follow up on his own implementation. He kicked off the discussion. He pushed through with the feature. And you wouldn’t believe it, but:

it worked out great.

N3260 - Generic Selection Expression with Type Operand

It’s as simple as the paper title: N3260 puts a type where the expression usually goes. Aaron got it into Clang in a few months, since it was such a simple paper and had relatively small wording changes. Using a type name rather than an expression in there, _Generic received the additional power to get direct matching with no l-value conversion. This meant that qualifier stripping — and more – did not happen. So we can now write TYPE_MATCHES_EXPR like so:

#define TYPE_MATCHES_EXPR(DESIRED_TYPE, ...) \
	_Generic(typeof((__VA_ARGS__)),\
		DESIRED_TYPE: 1,\
		default: 0\
	)

static const int a;
static_assert(TYPE_MATCHES_EXPR(const int, a), "AAAAUGH!"); // works, nice!
static_assert(TYPE_MATCHES_EXPR(int, 54), "AAAAUGH!"); // works, nice!

This code looks normal. Reads normal. Has no pointer shenanigans, no null pointer constant casting; none of that crap is included. You match on a type, you check for exactly that type, and life is good.

Clang shipped this quietly after some discussion and enabled it just about everywhere. GCC soon did the same in its trunk, because it was just a good idea. Using the flag -pedantic will have it be annoying about the fact that it’s a “C2y extension” if you aren’t using the latest standard flag, but this is C. You should be using the latest standard flag, the standard has barely changed in any appreciable way in years; the risk is minimal. And now, the feature is in C2y officially, because Aaron Ballman was willing to kick the traditional implementer Catch-22 in the face and be brave.

Thank you, Aaron!

The other compilers are probably not going to catch up for a bit, but now _Generic is much easier to handle on the two major implementations. It’s more or less a net win! Though, it… DOES provide for a bit of confusion when used in certain scenarios, however. For example, using the same code from the beginning of the article, this:

int f () {
	return 45;
}

int main () {
	const int a = 1;
	return _Generic(typeof(a),
		int: a + 2,
		default: f() + 4
	);
}

does not match on int anymore, IF you use the type-based match. In fact, it will match on default: now and consequently will call f() and add 4 to it to return 49. That’s gonna fuck some people’s brains up, and it will also expose some people to the interesting quirks and flaws about whether certain expressions — casts, member accesses, accesses into qualified arrays, etc. — result in specific types. We’ve already uncovered one fun issue in the C standard about whether this:

struct x { const int i; };

x f();

int main () {
	return _Generic(typeof(f().i),
		int: 1,
		const int: 2,
		default: 0
	);
}

will make the program return 1 or 2 (the correct answer is 2, but GCC and Clang disagree because of course they do). More work will need to be done to make this less silly, and I have some papers I’m writing to make this situation better by tweaking _Generic. _Generic, in general, still needs a few overhauls so it works better with the compatibility rules and also doesn’t introduce very silly undefined behavior with respect to Variable-Length Arrays and Fixed-Size Array generic types. But that’s a topic

for another time. 💚

Constant Integer Type Declarations Initialized With Constant Expressions Should Be Constants

2024-06-16T00:00:00+00:00

Constant integer-typed (including enumeration-typed) object declarations in C that are immediately initialized with an integer constant expression should just be constant expressions. That’s it. That’s the whole article; it’s going to be one big propaganda piece for an upcoming change I would like to make to the C standard for C2y/C3a!

Doing The “Obvious”, Obviously

As per usual, everyone loves complaining about the status quo and then not doing anything about it. Complaining is a fine form of feedback, but the problem with a constant stream of crticism/feedback is that nominally it has to be directed — eventually — into some kind of material change for the better. Otherwise, it’s just a good way to waste time and burn yourself out! As one would correctly imagine, this “duh, this is obvious” feature is not in the C standard. But, it seemed like making this change would take too much time, effort, and would be too onerous to wrangle. However, this is no longer the case anymore!

Thanks to changes made in C23 by Eris Celeste and Jens Gustedt (woo, thanks you two!), we can now write a very simple and easy specification for this that makes it terrifyingly simple to accomplish. We also know this will not be an (extra) implementation burden to conforming C23 compilers for the next revision of the standard thanks to constexpr being allowed in C23 for object declarations (but not functions!). As we now have such constexpr machinery for objects, there is no need to go the C++ route of trying to accomplish this in the before-constexpr times. This makes both the wording and the semantics easy to write about and reason about.

How It Works

The simple way to achieve this is to take every non-extern, const-qualified (with no other storage class specifiers except static in some cases) integer-typed (including enum-typed) declaration and upgrade it implicitly to be a constexpr declaration. It only works if you’re initializing it with an integer constant expression (a specific kind of Phrase of Power in C standardese), as well as a few other constraints. There are a few reasons for it to be limited to non-extern declarations, and a few reasons for it to be limited to integer and integer-like types rather than the full gamut of floating/struct/union/etc. types. Let’s take a peak into some of the constraints and reasonings, and why it ended up this way.

Non-`extern` only!

An extern object declaration could refer to read-only memory that is only read-only from the perspective of the C program. For example, it could refer to a location in memory written to by the OS, or handled by lower level routines that pull their values from a register or other hardware. (Typically, these are also marked volatile, but the point still stands.) We cannot have things that are visible outside of the translation unit and (potentially) affected by other translation units / powers outside of C marked as true constants; it would present a sincere conflict as interest. But, because of extern, we have a clear storage class specifier that allows us to know when things follow this rule or when things do not. This makes it trivially simple to know when something is entirely internal to the translation unit and the C program and does not “escape” the C abstract machine!

This makes it easy to identify which integer typed declarations would meet our goals, here. Though, it does bring up the important question of “why not the other stuff, too?”. After all, if we can do this for integers, why not structures with compound literals? Why not with string literals? Why not with full array initializers and array object declarations inside of a function?! All of these things can be VERY useful to make standards-mandated available to the optimizer.

Integer-Typed Declarations? Why Not “Literally Everything™”?

Doing this for integer types is more of a practicality than a full-on necessity. The reason it is practical is because 99% of all compilers already compute integer constant expressions for the purposes of the preprocessor and the purposes of the most basic internal compiler improvements. Any serious commercial compiler (and most toy compilers) can compute 1 + 1 at compile-time, and not offload that expression off to a run-time calculation.

However, we know that most C compilers do not go as far as GCC or Clang which will do its damnedest to compute not only every integer constant expression, but compound literal and structure initialization expression and string/table access at compile-time. If we extend this paper to types beyond integers, then we quickly exit the general blessing we obtain from “We Are Standardizing Widely-Deployed Existing Practice”. At that point, we would not be standardizing widespread existing practice, but instead the behavior of a select few powerful compilers whose built-in constant folders and optimizers are powerhouses among the industry and the flagships of their name.

C++ does process almost everything it can at compile-time when possible, under the “manifestly constant evaluated” rules and all of its derivatives. This has resulted in serious work on the forward progress of constant expression parsers, including a whole new constant expression interpreter in Clang¹. However, C is not really that much of a brave language; historically, standard and implementation-provided C has been at least a decade (or a few decades) behind what could be considered basic functionality, requiring an independent hackup of what are bogstandard basic features from users and vendors alike. Given my role as “primary agitator for the destruction of C” (or improvement of C; depends on who’s being asked at the time), it seems fitting to take yet another decades-old idea and try to get it through the ol’ Standards Committee Gauntlet.

With that being the case, the changes to C23’s constant expression rules were already seen as potentially harmful for smaller implementations. (Personally, I think we went exactly as far as we needed to in order to make the situation less objectively awful.) So, trying to make ALL initializers be parsed for potential constant expressions would likely be a bridge too far and ultimately tank the paper and halt any progress. Plus, it turns out we tried to do the opposite of what I’m proposing here! And,

it actually got dunked on by C implementers?!

We Failed To Do It The Opposite Way

A while back, I wrote about the paper N2713 and how it downgraded implementation-defined integer constant expressions to be treated like normal numbers “for the purposes of treatment by the language and its various frontends”. This was a conservative fix because, as the very short paper stated, there was implementation divergence and smaller compilers were not keeping up with the larger ones. Floating point-to-integer conversions being treated as constants, more complex expressions, even something like __builtin_popcount(…) function calls with constants being treated as a constant expression by GCC and Clang were embarrassing the smaller commercial offerings and their constant expression parsers.

It turns out that implementation divergence mattered a lot. A competing paper got published during the “fix all the bugs before C23” timeframe, and it pointed all of this out in paper N3138 “Rebuttal to N2713”. The abstract of N3138 makes it pretty clear: “[N2713] diverges from existing practice and breaks code.” While we swear up and down that existing implementations are less important in our Charter (lol), the Committee DOES promise that existing code in C (and sometimes, C-derivative) languages will be protected and prioritized as highly as is possible. This ultimately destroyed N2713, and resulted in it being considered implementation-defined again whether or not non-standards-blessed constant expressions could be considered constants.

Effectively, the world rejected the idea that being downgraded and needing to ignore warnings about potential VLAs (that would get upgraded to constant arrays at optimization time) was appropriate. Therefore, if C programmers rejected going in the direction that these had to be treated for compiler frontend purposes as not-constants, we should instead go in the opposite direction, and start treating these things as constant expressions. So, rather than downgrading the experience (insofar as making certain expressions be not constants and not letting implementations upgrade them in their front-ends, but only their optimizers), let’s try upgrading it!

Formalizing the Upgrade

In order to do this, I have written a paper currently colloquially named NXXX1 until I order a proper paper number. The motivation is similar to what’s in this blog post, and it contains a table that can explain the changes better than I possibly ever could in text. So, let’s take a look:

int file_d0 = 1;
_Thread_local int file_d1 = 1;
extern int file_d2;
static int file_d3 = 1;
_Thread_local static int file_d4 = 1;
const int file_d5 = 1;
constexpr int file_d6 = 1;
static const int file_d7 = 1;

int file_d2 = 1;

int main (int argc, char* argv[]) {
	int block_d0 = 1;
	extern int block_d1;
	static int block_d2 = 1;
	_Thread_local static int block_d3 = 1;
	const int block_d4 = 1;
	const int block_d5 = file_d6;
	const int block_d6 = block_d4;
	static const int block_d7 = 1;
	static const int block_d8 = file_d5;
	static const int block_d9 = file_d6;
	constexpr int block_d10 = 1;
	static constexpr int block_d11 = 1;
	int block_d12 = argc;
	const int block_d13 = argc;
	const int block_d14 = block_d0;
	const volatile int block_d15 = 1;

	return 0;
}

int block_d1 = 1;

Declaration	`constexpr` Before ?	`constexpr` After ?	Comment
file_d0	❌	❌	no change; `extern` implicitly, non-`const`
file_d1	❌	❌	no change; `_Thread_local`, `extern` implicitly, non-`const`
file_d2	❌	❌	no change; `extern` explicitly, non-`const`
file_d3	❌	❌	no change; non-`const`
file_d4	❌	❌	no change; `_Thread_local`, non-`const`
file_d5	❌	❌	no change; `extern` implicitly
file_d6	✅	✅	no change; `constexpr` explicitly
file_d7	❌	✅	`static` and `const`, initialized by constant expression
block_d0	❌	❌	no change; non-`const`
block_d1	❌	❌	no change; `extern` explicitly, non-`const`
block_d2	❌	❌	no change; non-`const`, `static`
block_d3	❌	❌	no change; `_Thread_local`, `static`, non-`const`
block_d4	❌	✅	`const`; initialized with literal
block_d5	❌	✅	`const`; initialized with other `constexpr` variable
block_d6	❌	✅	`const`, initialized by other constant expression
block_d7	❌	✅	`static` and `const`, initialized with literal
block_d8	❌	❌	no change; non-constant expression initializer
block_d9	❌	✅	`static` and `const`, initialized by constant expression
block_d10	✅	✅	no change; `constexpr` explicitly
block_d11	✅	✅	no change; `constexpr` explicitly
block_d12	❌	❌	no change; non-`const`, non-constant expression initializer
block_d13	❌	❌	no change; non-constant expression initializer
block_d14	❌	❌	no change; non-constant expression initializer
block_d15	❌	❌	no change; `volatile`

For the actual “words in the standard” changes, we’re effectively just making a small change to “§6.7 Declarations, §6.7.1 General” in the latest C standard. It’s an entirely new paragraph that just spins up a bulleted list, saying:

^(NEW)13✨ If one of a declaration’s init declarator matches the second form (a declarator followed by an equal sign = and an initializer) meets the following criteria:

— it is the first visible declaration of the identifier;

— it contains no other storage-class specifiers except static, auto, or register;

— it does not declare the identifier with external linkage;

— its type is an integer type or an enumeration type that is const-qualified but not otherwise qualified, and is non-atomic;

— and, its initializer is an integer constant expression (6.6);

then it behaves as if a constexpr storage-class specifier is implicitly added for that declarator specifically. The declared identifier is then a named constant and is valid in all contexts where a named constant of the corresponding type is valid to form a constant expression of that specific kind (6.6).

Thanks to the improvements to §6.6 from Celeste and Gustedt, and their work on constexpr, the change here is very small, simple, and minimal. This covers all the widely-available existing practice we care about, without providing undue burden for many serious C implementations of C23 and beyond. It also would make a wide variety of integer constant expressions from the “Rebuttal” paper N3138 into valid constant expressions, according to the current rules of the latest C standard. This would be an improvement as it would mean the constant expressions written by users could be relied on across platforms that use a -std=c2y flag or claim to conform to the latest (working draft) C standard.

All in All, Though?

I’m just hoping I can get something as simple as this into C. It’s been long overdue given the number of ways folks complain about how C++ has this but C doesn’t, and it would deeply unify existing practice across implementations. It also helps to remove an annoying style of diagnostic warnings from -Wpedantic/-Wall-style warning lists, too!

The next meeting for C is around October, 2024. I’ll be trying to bring the paper there, to get it formalized, along with the dozens of other papers and features I am working on. Even if my hair will go fully grey by the time this is available on all platforms, I will keep working at it. We deserve the C that people keep talking about, on all implementations.

If not in my lifetime, in yours. 💚

You can read a writeup about it on RedHat’s blog (Part 1, Part 2), or directly from the LLVM documentation. ↩

Why Not Just Do Simple C++ RAII in C?

2024-05-21T00:00:00+00:00

Ever since I finished publishing the “defer” paper and successfully defended it on its first go-around (it now has tentative approval to go to a Technical Specification, I just need to obtain the necessary written boilerplate to do so), an old criticism repeats itself frequently. Both internally to the C and C++ Standards Committee, as well as to many outside, the statement is exactly as the title implies: to implement a general-purpose undo mechanism for C, why not just make Objects with Well-Scoped, Deterministic Lifetimes and build it out of that like C++? This idiom, known as Resource Acquisition Is Initialization (RAII), is C++’s biggest bread and butter and its main claim to fame over just about every other language that grew up near it and after it (including all of the garbage collected languages such as Managed C++, D, Go, etc.). I have received no less than 5 external-to-WG14 (the formal abbreviation for the C Standards Committee) requests/asks about this, and innumerable posts internal to the C Standard mailing lists.

So, let’s just get this off the table right now so I can keep referring to this post every time somebody asks:

You ✨Cannot✨ Have “Simple RAII” in C

That’s the entire premise of this article. There’s a few reasons this is not possible – some mentioned in the defer paper version N3199, and others that I just sort of took for granted that people would understand but do not – and so, to clear up confusion, they will be written down here. There are two MAJOR reasons one cannot take the object-oriented semantics and syntax of RAII from C++ as-is, without jeopardizing sincere qualities about C:

RAII is syntactically difficult to achieve in C due to the semantics imbued on those syntax constructs by C++;
and, RAII is semantically impossible in C due to C’s utterly underwhelming type/object model.

To start with, let’s go over the syntax of C++, and how it achieves RAII. We will also discuss a version of RAII that uses not-C++ syntax, which would work…. at least until the second bulleted reason above dropkicks that in the face. So, let’s begin:

RAII: C++ Syntax

As a quick primer for those who are not familiar, C++ achieves its general purpose do-and-undo mechanism through the use of constructors and destructors. Constructors are function calls that are always invoked on the creation of an object, and destructors are always invoked when an object leaves scope. One can handle doing the construction and destruction manually, but we don’t have to talk about such complicated cases yet. The syntax looks as follows:

#include 

struct ObjectType {
	int a;
	double b;
	void* c;

	/* CONSTRUCTOR: */
	ObjectType() : a(1), b(2.2), c(malloc(30)) {

	}

	/* DESTRUCTOR: */
	~ObjectType() {
		free(c);
	}
};

In the above code snippet, we have a structure named ObjectType. It has a single constructor, that takes no arguments, and initializes all 3 of its members to some default values. It also has a destructor, which is meant to “undo” anything in the class that the programmer likes. In this case, I an using it to purposefully free the data that I originally mallocd into the member c during construction. Thus, when I use the class in this manner:

#include 

int main () {
	ObjectType thing = {};
	printf("%d %f %p", thing.a, thing.b, thing.c);
	return 0;
}

despite not seeing any other code in that snippet, that code will:

create automatic storage duration memory to put thing in (A.K.A. stack space for a stack variable);
call the constructor on that automatic storage duration memory location (A.K.A. the thing that sets those values and does malloc)
perform the printf call
prepares the return statement with the value of 0
call the destructor on that automatic storage duration memory location (A.K.A. the thing that calls free to release the memory)
release the automatic storage duration memory (A.K.A. cleans up the stack)
return from the function with the value 0 being transported in whatever manner the implementation has defined

This is a fairly simple set of steps, but it’s a powerful concept in C++ because no matter what happens (modulo some of the more completely bananas situations), once an object is “properly constructed” (all the data members are initialized from the TypeName (...) : … { list and reach the opening brace) in some region of memory, the compiler will always deterministically call the destructor at a fixed location. There is no wibbly-wobbly semantics like .NET IL finalizers or Lua __gc methods: the object is created, the objected is destroyed, always. (Again, we are ignoring more manual cases at the moment such as using new/delete, its array friends, or placement new & other sorts of shenanigans.) As Scott Meyers described it, this is a “powerful, general-purpose undo mechanism” and its one of the most influential concepts in deterministic, non-garbage-collected systems programming. Every other language worth being so much as spit on either employs deep garbage collection (Go, D, Java, Lua, C#, etc.) or automatic reference counting (Objective-C, Objective-C++, Swift, etc.), uses RAII (Rust with Drop, C++, etc.), or does absolutely nothing while saying to Go Fuck Yourself™ and kicking the developer in the shins for good measure (C, etc.).

The first problem with this, however, is a technical hangup. When C++ created their constructors, they created them with a concept called function overloading in mind. This very quickly gets into the weeds of Application Binary Interfaces and other thorny issues, which are thankfully already thoroughly written about in this expansive blog post, but for the sake of brevity revisiting these concepts is helpful to understand the issue.

Problem 0: Function Overloading

Function overloading is a technique where software engineers, in source code and syntactically, name what are at their core two different functions the same name. That single name is used as a way to referring to two different, distinct function calls by employing extra information, such as the number of arguments, the types of the arguments, and other clues when that single name gets used:

// FUNCTION 0
int func (int a);
// FUNCTION 1
double func (double b);

int main () {
    int x = func(2); // calls FUNCTION 0, f(int)
    double y = func(3.3); // calls FUNCTION 1, f(double)
    return (int)(x + y);   
}

However, when the source code has to stop being source code and instead needs to be serialized as an actual, runnable, on-the-0s-and-1s-machine binary, linkers and loaders do not have things like compile-time “type” information and what not at-the-ready. It is too expensive to carry that information around, all the time, in perpetuity so that when someone runs a program it can differentiate between “call f that does stuff with an integer” versus “call f that does stuff with a 64-bit IEEE 754 floating point number”. So, it undergoes a transformation that transforms f(int) or f(double) into something that looks like this at the assembly level:

main:
        push    rbx
        mov     edi, 2
        call    _Z4funci # call FUNCTION 0
        movsd   xmm0, QWORD PTR .LC0[rip]
        mov     ebx, eax
        call    _Z4funcd # call FUNCTION 1
        movapd  xmm1, xmm0
        pxor    xmm0, xmm0
        cvtsi2sd        xmm0, ebx
        pop     rbx
        addsd   xmm0, xmm1
        cvttsd2si       eax, xmm0
        ret
.LC0:
        .long   1717986918
        .long   1074423398

The code looks messy because we’re working with doubles and so it generates all sorts of stuff for passing arguments and also casting it down to a 32-bit int for the return expression, but the 2 important lines are call _Z4funci and call _Z4funcd. Believe it or not, these weird identifiers in the assembly correspond to the func(int) and func(double) identifiers in the code. This technique is called “name mangling”, and it powers a huge amount of C++’s featureset. Name mangling is how, so long a argument types or number of arguments change, things like the Application Binary Interface (ABI) of function calls can be preserved. The compiler is taking the name of the function func and the arguments int/double and mangling it into the final identifier present in the binary, so that it can call the right function without having a full type system present at the machine instruction level. This has the obvious benefit that the same conceptual name can be used multiple different ways in code with different data types, mapping strongly to the “this is the algorithm, and it can work with multiple data types” idea. Thus, the compiler worries about the actual dispatch details and resolves at compile-time, which means there no runtime cost to do matching on argument count or argument types. Having it resolved at compile-time and mapped out through mangling allows it to just directly call the right code during execution. The reason this becomes important is because this is how constructors must be implemented.

Problem 1: Member Functions

Consider the same ObjectType from before, but with multiple constructors:

#include 

struct ObjectType {
	int a;
	double b;
	void* c;

	/* CONSTRUCTOR 0: */
	ObjectType() : a(1), b(2.2), c(malloc(30)) {

	}

	/* CONSTRUCTOR 1: */
	ObjectType(double value) : a((int)(value / 2.0)), b(value), c(malloc(30)) {

	}

	/* DESTRUCTOR: */
	~ObjectType() {
		free(c);
	}
};

#include 

int main () {
	ObjectType x = {};
	ObjectType y = {50.0};
	printf("x: %d %f %p\n", x.a, x.b, x.c);
	printf("y: %d %f %p\n", y.a, y.b, y.c);
	return 0;
}

We can see the following assembly:

.LC1:
	.string "x: %d %f %p\n"
.LC2:
	.string "y: %d %f %p\n"
main:
	push    r12
	push    rbp
	push    rbx
	sub     rsp, 64
	mov     rdi, rsp
	lea     rbp, [rsp+32]
	mov     rbx, rsp
	call    _ZN10ObjectTypeC1Ev
	movsd   xmm0, QWORD PTR .LC0[rip]
	mov     rdi, rbp
	call    _ZN10ObjectTypeC1Ed
	mov     rdx, QWORD PTR [rsp+16]
	movsd   xmm0, QWORD PTR [rsp+8]
	mov     edi, OFFSET FLAT:.LC1
	mov     eax, 1
	mov     esi, DWORD PTR [rsp]
	call    printf
	mov     rdx, QWORD PTR [rsp+48]
	movsd   xmm0, QWORD PTR [rsp+40]
	mov     edi, OFFSET FLAT:.LC2
	mov     eax, 1
	mov     esi, DWORD PTR [rsp+32]
	call    printf
	mov     rdi, rbp
	call    _ZN10ObjectTypeD1Ev
	mov     rdi, rsp
	call    _ZN10ObjectTypeD1Ev
	add     rsp, 64
	xor     eax, eax
	pop     rbx
	pop     rbp
	pop     r12
	ret
	mov     r12, rax
	jmp     .L3
	mov     r12, rax
	jmp     .L2
main.cold:
.L2:
	mov     rdi, rbp
	call    _ZN10ObjectTypeD1Ev
.L3:
	mov     rdi, rbx
	call    _ZN10ObjectTypeD1Ev
	mov     rdi, r12
	call    _Unwind_Resume
.LC0:
	.long   0
	.long   1078525952 

Again, we notice in particular the use of these special, mangled identifiers for the call instructions: call _ZN10ObjectTypeC1Ev, call _ZN10ObjectTypeC1Ed, and call _ZN10ObjectTypeD1Ev. It has the name of the type (…10ObjectType…) in it this time, but more or less just mangles it out. This is where the heart of our problems lie. If C wants to steal C++’s syntax for RAII, and C wants to be able to share (header file) source code that enjoys simple RAII objects, every single C implementation needs to implement a Name Mangler compatible with C++ for the platforms they target. And how hard could that possibly be?

Hm.

Here are some name manglings for the one argument ObjectType constructor:

_ZN10ObjectTypeC1Ed (GCC/Clang on Linux; x86-64, ARM, ARM64, and i686)
??0ObjectType@@QEAA@N@Z (MSVC; x86-64, ARM64)
??0ObjectType@@QAE@N@Z (MSVC; i686)

That’s three different name manglings for only a handful of platforms! And while some name manglers are partially documented or at least provided as a library so that it can be built upon, the name manglers for others are not only utterly undocumented but completely inscrutable. So much so that on some platforms (like MSVC on any architecture), certain name manglings are not guaranteed to be 1:1 and can infact “demangle” into multiple different plausible entities. If an implementation gets the name mangling wrong, well, that’s just a damn shame for the end user who has to deal with it! Of course, nobody’s claiming that name mangling is an unsolvable problem; it is readily solved in codebases such as Clang and GCC. But, it is worth noting that, as C’s specification stands now, there is no requirement to mangle any functions.

This is both a blessing, and a curse. The former because functions that users write are pretty much 1:1 when they are written under a C compiler. If a functioned is named glorbinflorbin in C, the name that shows up in the binary is glorbinflorbin with maybe some extra underscores added in places somewhere on certain implementations. But, the latter comes in to play for precisely this reason: if there is no name mangling performed that considers things such as related enclosing member object, argument types, and similar, then it is impossible to have even mildly useful features that can do things like avoid name clashes a function prototype is generated with the wrong types. It is, in fact, the primary reason that C ends up in interesting problems when using typedefs inside of its function declarations. Even if the typedefs change, the function names do not because there is no concept of “member functions” or “function overloading” or anything like that. It’s why the intmax_t problem is such an annoying one.

What Does This Have To Do With RAII?

Well, the devil is in these sorts of details. In order to introduce nominal support for something like constructors, name mangling (or something that allows the user to control how names come out on the other side) need to be made manifest in C. If name mangling is chosen as the implementation choice and a syntax identical to C++ is chosen, the implementation becomes responsible for furnishing a name mangler. And, because people are (usually) not trying to be evil, there should be ABI compatibility between the C implementation’s name mangler and C++’s name mangler so that code written with constructors in one language interoperate just fine with the other, without requiiring extern "C" to be placed on every constructor. (Right now, extern "C" is not legal to place on any member function in any C++ implementation.)

The reason this is desirable is obvious: header code could be shared between the languages, which makes sense in a world where “just steal C++’s constructors and destructors” is the overall design decision for C. But this is very much a nonstarter implementation reasons. Most implementers get annoyed when we require them to implement things that might take significant effort. While Clang and GCC likely won’t give an over damn so long as its not C++-modules levels of difficult (and MSVC ignores us until it ships in a real standard), there’s hundreds of C compilers and implementers of WILDLY varying quality. Unlike the 4-5 C++ compilers that exist today, C compilers and their implementers are still cranking things out, sometimes as significant pillars of their (local) software economy. Now, while I personally loathe to use things like lines of code as a functional metric for code, it can help us estimate complexity in a very crude, contextless way. Checking in on Clang’s Itanium Mangler, it clocks in somewhere on the order of about 7,000 lines of code. Which really doesn’t sound so bad,

until chibicc’s entire codebase measures somewhere around 7,300 lines of code.

“Double the amount of code in my entire codebase excluding tests for this feature” very much does not pass the smell test of implementability for C. This is also not including, you know, all the rest of the code required for actually implementing the “parse constructors and destructors” bit. Though, thankfully, that part is a lot less work than the name mangler. and I can guarantee that since there’s quite literally hundreds of C implementations, many of them will… “have fun”. If two or three different ways to mangle ObjectType::ObjectType(double) is bad, wait until a couple dozen implementers who have concerns outside of “C++ Compatibility” – some even with an active bone to pick with C++ – are handed a gaggle of features that relies on a core mechanic that is entirely unspecified. I am certainly not the smartest person out there,

but I know a goddamn interoperability bloodbath when I see one.

But… What If Name Mangling Was not a Problem?

This is the other argument I have received a handful of times on both the C mailing list, and in my inbox. It’s not a bad argument; after all, the entire above argument hinges on the idea of stealing the syntax from C++ entirely and copying their semantics bit-for-bit. By simply refusing to do it the way C++ does it, does it make the above argument go away? Thusly appears the following suggestion, which boils down to something like the following snippet. However, before we continue, note that this syntax comes partially from an e-mail sent to me. PLEASE, second-to-last person who sent me an e-mail about this and notices the syntax looks similar to what was in the e-mail: I am not trying to make fun of you or the syntax you have shown me, I am just trying to explain as best as I can. With that said:

#include 

struct nya {
	void* data_that_must_be_freed;
};

_Constructor void nya_init(struct nya *nya, int n) {
	nya->data_that_must_be_freed = malloc(n);
}

_Destructor void nya_clear(struct nya *nya) {
	free(nya->data_that_must_be_freed);
}

int main () {
	struct nya n = {30};
	return 0;
}

The following uses the _Constructor and _Destructor tags on function declarations/definitions to associate either the returned type struct nya and the destructed type struct nya * (a pointer to an already-existing struct nya to destroy). The sequence of events, here, is pretty simple too:

n’s memory is allocated (off of the stack), its memory is taken from the appropriate location on the stack and passed to;
nya_init, which then calls malloc to initialize its data member;
the return 0 is processed, storing the 0 value to do the actual return later, while;
nya_clear is called on the memory for n, and the data member is appropriately freed;
finally, main returns 0.

It has the same deterministic destruction properties as RAII here. But, notably, it is attached to a free-floating function.

This does the smart thing and gets around the name mangling issue! The person e-mailing me here has sidestepped the whole issue about sharing syntax with C++ and its function overloading issue, which is brilliant! If you can associate a regular, normal function call with these actions, it is literally no longer necessary to provide a name mangling scheme. It does not need to exist, so nobody will implement one: it’s just calling a normal function. (Kudos to Rust for figuring part of this out themselves as well, though they still need name mangling thanks to Traits and Generics.) It avoids all of the very weird fixes other people tried to propose on the C standards internal mailing list by saying things like “only allow one constructor” or “make C++ have extern "C" on constructors work and then have C and C++ mangle them differently” or “just implement name manglers for all C compilers that implement C2y/C3a, it’s fine”. Implementability can certainly be achieved with this.

Other forms of this come from a derivation of the two existing Operators proposals (Marcus Johnson’s n3201 and Jacob Navia’s n3051), most particularly n3201. The recommendation for n3201 by the author was to just use a different “association” that did not actually affect the syntax of the function itself, so the above code that produces the same affect but under n3201’s guidance (but slightly modified from the way it was presented in n3201 because that syntax has Problems™) might look like:

#include 

struct nya {
	void* data_that_must_be_freed;
};

void nya_init(struct nya *nya, int n) {
	nya->data_that_must_be_freed = malloc(n);
}

void nya_clear(struct nya *nya) {
	free(nya->data_that_must_be_freed);
}

_Operator = nya_init;
_Operator ~ nya_clear;

int main () {
	struct nya n = {30};
	return 0;
}

Completely ignoring syntax choices here and the consequences therein, these _Operator statements would associate a function call with an action. = in this case seems to apply to construction, and ~ seems to apply to destruction. As usual, because the association is made using a statement and type information at compile-time, the compiler can know to simply call nya_init and nya_clear without needing to set up a complex, implementation-internal name mangling scheme to figure out which constructor/member/whatever function it needs to call to initialize the object correctly. It also doesn’t rob C++ of its syntax but try to impose different semantics. Nor does it just tell C implementations the functional equivalent of “git gud” with respect to implementing the name mangler(s) required to play nice with other systems. There is, unfortunately, one really annoying problem with having this form of constructors and destructors, and it’s the same problem that C++ had when it first started out trying to tackle the same problem back in the 80s and 90s:

none of these proposals come with an Object Model, and C does not have a real Object Model aside from its Effective Types model!

RAII: C++ Semantics

While the syntax problem can be designed around with any number of interesting permutations or fixes, whether it’s _Operator or _Constructor or whatever, the actual brass-and-tack semantics that C++ endows on memory obtained from these objects is very strict and direct. When someone allocates some memory and casts it to a type and begins to access it, both [c.malloc] and [intro.object]/11-13 cover them by giving them implicitly created objects, so long as those types satisfy the requirements of being trivial and implicitly-creatable types. On top of that, for constructors and destructors, there is an ENORMOUSLY robust system that comes with it beyond these implicitly created objects. This post was going to be extremely long, but thanks to an excellent writeup by Basit Ayuntande, everything anyone needs to know about the C++ object model is already all written up. To fully understand all the details, shortcuts, tricks, and more, please read Basit’s article; becoming a better C++ developer (if that’s desirable) is an inevitably from digesting it.

This, of course, leaves us to talk about just C and RAII and how those semantics play out.

C: Effective Types

In C, we do not have a robust object model. The closest are effective type rules, and they work VIA lvalue accesses rather than applying immediately on cast. The full wording is found in §6.5.1 “General” of N3220, which states:

The effective type of an object for an access to its stored value is the declared type of the object, if any. If a value is stored into an object having no declared type through an lvalue having a type that is not a non-atomic character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.

This is a bunch of text to say something really simple: if a region of memory (like a pointer obtained from malloc) is present, and it is cast to a specific type for the purposes of reading or writing, that region is marked with a given type and the type plus region informs what is the effective type of the memory. The first write or access is what solidifies it as such. The effective type follows a memory region through memmove or memcpy done with appropriate objects of the appropriate size. Fairly straightforward, standard stuff. The next paragraph after this then creates a list of scenarios wherein about any accesses or writes performed through casts or pointers aliasing that region afterwards:

a type compatible with the effective type of the object,

a qualified version of a type compatible with the effective type of the object,

the signed or unsigned type compatible with the underlying type of the effective type of the object,

the signed or unsigned type compatible with a qualified version of the underlying type of the effective type of the object,

an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

a character type.

This is, effectively, C’s aliasing rules. Once a type is set into that region of memory, once casting happens from one type to another (e.g. casting it first to uint32_t* to write to it, and then try to read it as a float* next), that action must be on that list to be standard-sanctioned. If it isn’t, then undefined behavior is invoked and programs are free to behave in very strange ways, at the whim of implementations or hardware or whatever. While I am not holding the person who sent me the simple one-off e-mail accountable to this, in the wider C ecosystem and in discussion even on the C mailing list, there seemed to be a distinct lack of appreciation for how thought-through the C++ system is and why it is this way in the first place. This also becomes glaringly clear after reading n3201 and going through 95% of the discussions around “RAII in C” that just tries to boil it down to simple syntactical solutions with basic code motion. The bigger picture is NOT being considered. There is not even a tiny amount of respect for where C or C++ comes from. It is not just about effective types and shadowy rules about how do they handle dynamic memory: even simpler things just completely fall apart in these counterproposals. Take, for example, a very simple question.

“How do you handle copies?”

Taking the _Operator example from above again, let’s add a single line of spice to this:

#include 

struct nya {
	void* data_that_must_be_freed;
};

void nya_init(struct nya *nya, int n) {
	nya->data_that_must_be_freed = malloc(n);
}

void nya_clear(struct nya *nya) {
	free(nya->data_that_must_be_freed);
}

_Operator = nya_init;
_Operator ~ nya_clear;

int main () {
	struct nya n = {30};
	struct nya n2 = n; // OH SHIT--
	return 0;
}

In a proposal like n3201, what happens here? The actual answer is “the proposal literally does not answer this question”. Assuming (briefly, if I can be allowed such for a moment) the “basic” or “default” for how it works right now, the answer is probably “just memcpy like before”, which is wrong. n3201 is not the first “just do a quick RAII in C” proposal sent to me over e-mail to make this mistake. Simply performing a memberwise copy of struct nya from n to n2 leads to an obvious double-free when n2 goes out of scope, frees the memory pointed to by data_that_must_be_freed, and then n will attempt attempt to free that data as well. This is an infinitely classic blunder, and in critical enough code becomes a security blunder. The suggestions that stem from pointing this out range from unserious to just disappointing, including things like “just ban copying the structure”. Nobody needs a degree in Programming Language Design to communicate that “just ban simple automatic storage duration structure copying” is a terrible usability and horrific ergonomics decision to make, but that’s where we are. And it’s so confusingly baffling that it is impossible to be mad that the suggestion is brought up.

Or, take in n3201’s case (which updates the previous paper, n3182). When responding to the (ever-present) criticism that operators – including for initialization/assignment – that someone could do something weird inside of the operator, n3201 adds a constraint which reads:

Functions must contain the matching operator in their function bodies. i.e. _Operator declarations that associate the compares-equal operator with a function, must contain the compares-equal operator in the body of the function named in the _Operator declaration. (iostream-esque shenanigans with overloading the bitwise shift operators to read/write characters and strings isn’t allowed).

The fact that the proposal has something for initialization (but not cleanup), does not mention anything about the fact that the code snippet in the proposal itself apparently (?) leaks memory, that this constraint is very much deeply unsettling to impose on any type (there’s plenty of vec4 or other mathematics code where I’m using intrinsics that look nothing like the operators they’re being implemented for) does not seem to bother the author in the slightest. Instead, there’s just a palpable hatred of C++ there, apparently so strong that it overrides any practical engagement with the problem space. The proposal – and much of the counter-backlash I had to sift through on the mailing lists and elsewhere as people proposed stripped down RAII solutions for C under the guise of being “simple” – is too busy taking potshots at C++ to address clear and present dangers to its own functionality.

C as an Anti-C++

And this is where things just keep getting worse, because so much of C’s culture seems to swirl around the idea of either being “budget, simple, understandable C++” or “Anti/Nega-C++”. Instead of engaging on C’s stated merits or goals, like:

what-you-write-is-what’s-inside (a function foo produces a binary symbol named foo);
uncompromised, direct access to the hardware (through close collaboration with implementation-defined asm, intrinsics, and unparalleled control of the compiler (severe work in progress, honestly));
simple enough that it can always be used to glue two or more languages together (for any single given platform/compiler combination);
and, being a smaller language focused on its use cases (K&R literally sold C on being good at strings – we can see how that’s been going in the last 30 years).

We instead get “why doesn’t this PRIMITIVE, UNHOLY C just become C++” proposals, and similar just-as-ill-considered “here is my simpler (AND BETTER THAN THAT CRAPPY C++ STUFF) feature” proposals. Sometimes, like the person who e-mailed me with the struct nya example, there’s a genuine curiosity for exploring a different design space that serves as an actually better basis. But at even our highest echelons, the constant spectre of C++ that continually drives an underlying and utterly unhelpful antagonism that prevents actual technical evaluation. It results in things like _Operator throwing itself in the way of RAII, to try and half-heartedly solve the RAII problem without actually engaging with the sincere, instructive merit of the C++ object model. It also prevents actually evaluating the things that make RAII weak, including problems with the strong association with objects that actually manifest in its own standard library.

The negative tradeoffs for defer are numerous, especially since it absolutely loses many of the abilities that come from being a regular object with a well-defined lifetime. This means it is not as powerful as constructors and destructors, including that it is prone to Repeat-Yourself-Syndrome since the defer entity itself is not reusable. It cannot be attached to individual members of a structure, nor can it be passed through function calls or stored on the heap. It cannot be transferred with move constructors or duplicated with copy constructors in a natural way, or in any way as a matter of fact! It can only exist at function scope, not at file scope, and only exists procedurally.

The beneficial tradeoffs are it avoids the Static Initialization Order Fiasco that comes with having objects with constructors at file scope or marked static at function scope. It also does not combine lambdas with object-based destructors to torch 15+ years of life asking the C++ Standards Committee to standardize std::scope_guard only to ultimately be denied success at retirement age (sorry, Peter Sommerlad) because of the C++ Standard Library’s ironclad exceptions-and-destructors rule. And, to be clear, it was the right decision for them to do that! Poking a hole in the “all destructors from the standard library are noexcept” mandate adds needless library complexity gymnastics for a feature that the language should be taking care of! The proper realization after that would be that a language feature is required to sidestep the concerns that come with the Object Model. Of course, I do not expect the C++ Standard Committee’s Evolution Working Group to take that situation seriously as a body; likely, they will leave Library Evolution Working Group out to dry on the matter.

Coming to these sorts of conclusions only arises through behaving as an engineer that is looking to improve at their craft and strengthen their tools, rather than getting into a hammer-measuring pissing contest with the engineers down the hall.

But. Alas!

It still leaves a sour taste, though. It sort of lingers at the back of anyone’s mouth when they sit down to think about it, because it is kind of distasteful.

Genuinely, I understand that C can be behind. Very behind, in fact: taking 30 years to standardize typeof, not performing macro-gymnastics to get to typeof_unqual in the same 30 years, and not making any meaningful moves to work on things like e.g. “Statement Expressions” (something even the Tiny C Compiler implements) easily illustrates just how gut wrenchingly difficult it is to move the needle just a centimeter in this increasingly Godless industry. But when people propose a feature that has had 40+ years of work and refinement and care put into it, but at no point do they sit down and think about “what happens if I copy this object using the usual syntax” or “do we need some syntax for moving objects from one place to another” or “maybe I should not provoke a double free in the world’s most harmless looking code”, the thoughts start coming in. Is this being taken seriously? Is it just forgetfulness? Is it just so automatic nobody thinks about it? Is the pedagogy what is behind here, and is there a teaching crisis for this language?

So Many Questions

And yet, I will see not one damn answer, that’s for sure. Genuinely, I yearn for it because getting things half-baked things like they are in n3201 or similar is kind of rough to deal with. On one hand there’s the overwhelming urge to just grab the proposal and rip it up and get a white board and just go “here, HERE. WHERE IS YOUR OBJECT MODEL. WHAT HAPPENS TO THE EFFECTIVE TYPE RULES. DID YOU THINK ABOUT COPYING AND MOVING THINGS. WHAT HAPPENS IF SOMEBODY USES THESE IN AN COMPOUND ASSIGNMENT EXPRESSION. WHAT HAPPENS IF THEY ARE ASSIGNED FROM A TEMPORARY. HOW DO YOU PASS THAT IN TO THE USER. WHAT ARE THE THINGS THEY CAN CONTROL. HOW DO WE HANDLE THIS FROM HEAP MEMORY OR A STACK ARRAY UNSIGNED CHARACTERS.”

But that kind of tone, that sort of engagement is antagonistic, probably in the extreme.

It’s also not how I would like to engage with anyone. Like, the person who sent me an e-mail with the cute struct nya and the very simple and nice _Constructor syntax might not even have gotten that deep in the C standard and likely barely knows the effective type rules; I sure as hell barely understand them and I’m in charge of goddamn editing them when a few of the big upcoming papers finally make their way through the C Committee.

If I respond to an e-mail like that – with all the capital letters and everything – it would be completely out of line and also would be very unfair, because it is not their fault. I haven’t done that to anyone so far, but the fact that the thought exists in my head is Not Fun™. It’s not anyone’s fault, it’s just an internal struggle with thinking the whole industry is a lot farther along on these problems and continuously feeling like I am very much too stupid to be here. Like, I’m a goddamn moron, a genuine idiot, I cannot be ahead of the game, am I being pranked? Am I being tested, to see if I really belong here? Is someone going to swing in out of the blue and go “AHA, YOU MISSED THE OBVIOUS!”? Something is absolutely not adding up.

The utterly pervasive and constant feeling that a lot of people – way too many people – are really trying to invent these things from first principles and pretend like they were the first people to ever conceive of these ideas… it feels pretty miserable, all things considered. Going through life evaluating effectively no prior art in other languages, domains, C codebases as they exist today, just… anything. It’s a constant nagging pull when working on things like standard C that for the life of me I cannot seem to shake no matter how carefully I break it down. Hell, part of writing this post is so I can stick a link to it in my defer paper and in the defer Technical Specification when it happens so I don’t have to sit down and walk through why I chose a procedural-style, object-less idiom for C rather than trying to load the RAII shotgun and point it at our beloved 746-and-counting page C standard.

Changing a programming language’s whole object model is hard. Adding “things that must be run in order to bring an object into existence, and thing that must be run in order to kill an object, modulo Effective Type rules, with No Other Exceptions” is a big deal. Where in the proposals do they discuss new/delete, and why they are used as wrappers around malloc to ensure construction and destruction are coupled with memory creation to prevent very common bugs? Where is the consideration for placement new or being able to call destructors manually on an object or a region of memory? RAII enables simple idioms but it is not a simple endeavor! Weakening portions of RAII makes it so much less useful and so much less powerful, which is really weird! Is not the thing people keep telling me about C is that its the language of ultimate power and ultimate control? Why does that repeatedly not show up in these discussions?!

It feels so bizarre to have to actually sit down and explain some of these things sometimes because a lot of these things have become second nature to me, but it is just a part of the curse.

“It was Just Some E-mails, Man, Calm Down!”

To be very clear, the person who sent the e-mail – whose syntax I stole using struct nya * for this post for the _Constructor/_Destructor idea – is not someone I actually expect to send me a 5 page e-mail thesis on enhancements to the C object model. That person CLEARLY was just trying to give me a quick simple idea they thought up of that made it easy on them / solved the problem at hand, and I certainly don’t fault them for thinking of it! Their initiative actually demonstrates that rather than just doing the copy-paste roboticism of people who would blindly steal syntax from C++ and then strip off the bits they don’t like and go “See? Simple!” they’re actually thinking about and engaging with the technical merits of the problem. I certainly wish n3201 and other solutions had a fraction of that spark and curiosity and eagerness to explore the space and actually push the needle for C forward, rather than just being driven by trying to define C as “anti-C++”.

My intention is to keep moving forward with proposals like defer, among many others over the next few years, to start actually improving C for C’s sake. Sometimes this will mean cloning an idea right out of C++ and putting it in C; other times, weighing the pros and cons and addressing the inherent deficiencies in approaches to produce something better will be much more desirable. Knee jerk reactions like those in n3201 rarely serve to help either language and are producing demonstrably worse outcomes; which also concern me because I had an idea for handling operators in C for a long time now and seeing the current proposals do a poor job of handling the landscape is not going to bolster anyone’s confidence in how to do it…!

But, the person who inquired VIA e-mail deserves an enthusiastic “NICE”, a thumbs up, and maybe a cookie and a warm glass of milk for actually thinking about the problem domain. … In fact.

Cookies and milk sounds real good right now… 💚

The Pasture

The Cost of a Closure in C, The Rest

Skipping the Introductions

Experimental Setup

Plain C - New Categories

Results

Insights

Becoming the Most Normal Function

Lambdas Are Still Peak

The Next Tier Up: Very Small Amounts of Type Erasure

Slightly Worse: thread_local

What Is Going On With GNU Nested Functions???

Final Takeways

P.S.

Methodology

The Cost of a Closure in C

“Closures”?

The Closure Problem

Not A Big Overview

Purrrrrrrformance :3!

Anatomy of a Benchmark: Raw C

Methodology

The Results

Lambdas: On Top!

Lambdas are also Fast, even when Type-Erased

Lambdas: On…. Bottom, too?

Nested Functions and The Stack

What about Apple Blocks?

A Brief Aside: Self-Referencing Functions/Closures

Thinking It Over

Learned Insights

Existing Extensions?

C2y: Hitting the Ground Running

_Countof and countof

if Declarations

New Escape Sequences (and Deprecating Octals)

Case Ranges

More Bit Utilities

Labeled Breaks

The Defer Technical Specification: It Is Time

… Time to What?

What is defer?

“Build?” Wait… Are You Just Making This Up Entirely From Scratch?

Scope-based

Refer to Variables Directly

Oh, even goto?

Wait…. Existing Practice Can Do WHAT, Now?

defer Timing

… What About Control Flow Outside of Compilation Time?

Does…. Defer Actually Solve Any Problems, Though?

CVE-2021-3744, and the Truth About Programmers

This is the power of defer in C

But… What About C++?

The Time is Now

You Must Not Be Ignorable.

Footnotes

Results! - The Big Array Size Survey for C

The “What” Survey?

How?

The Respondents

The Results

On the Delivery Mechanism: Keyword/Macro Style Regardless of Spelling

On the Spelling: Which Word To Use Regardless of Delivery

On the Exact Spelling: A Cross-Section of Delivery and Spelling

The Big Array Size Survey for C

The Operator

The Choices

Usage Pattern

Underscore and capital letter _Keyword; Macro in a New Header

Underscore and capital letter _Keyword; No Macro in Header

Lowercase keyword; No Macro in Header

Keyword Spellings

lenof / _Lenof

lengthof / _Lengthof

countof / _Countof

nelemsof / _Nelemsof

nelementsof / _Nelementsof

extentof / _Extentof

The Survey

Employ your democratic right to have a voice and inform the future of C, today!

Slightly Worse: `thread_local`

`_Countof` and `countof`

`if` Declarations

What is `defer`?

Oh, even `goto`?

`defer` Timing

This is the power of `defer` in C

Underscore and capital letter `_Keyword`; Macro in a New Header

Underscore and capital letter `_Keyword`; No Macro in Header

Lowercase `keyword`; No Macro in Header

`lenof` / `_Lenof`

`lengthof` / `_Lengthof`

`countof` / `_Countof`

`nelemsof` / `_Nelemsof`

`nelementsof` / `_Nelementsof`

`extentof` / `_Extentof`

Non-`extern` only!