1963
Integrated Data Store (IDS)
Developed by Charles Bachman at General Electric, IDS introduced the network data model, one of the earliest database systems.
An interesting discussion started on X about the Cloudflare outage, with some
people linking Rust to it, eg:
The Cloudflare outage was caused by an unwrap()
This is like saying that a gun killed someone… It wasn’t the gun; it was the person who pulled the trigger.
The Cloudflare developers certainly know about the risks of unwrap. The bug
was probably caused by a failed assumption: the config file will fit in
memory.
And even if the developer didn’t know that unwrap can panic:
We can’t blame a tool that is well-documented just because someone misused it.
From: https://doc.rust-lang.org/std/option/enum.Option.html#method.unwrap
Returns the contained [`Some`] value, consuming the `self` value.
Because this function may panic, its use is generally discouraged.
Panics are meant for unrecoverable errors, and
[may abort the entire program][panic-abort].
Instead, prefer to use pattern matching and handle the [`None`]
case explicitly, or call [`unwrap_or`], [`unwrap_or_else`], or
[`unwrap_or_default`]. In functions returning `Option`, you can use
[the `?` (try) operator][try-option].
# Panics
Panics if the self value equals [`None`].
The argument that languages and standard libraries should avoid having constructs that enable bad patterns is actually valid, though.
So lets investigate:
It is better to read this post by Andrew Gallant (ripgrep author) first: Using unwrap() in Rust is Okay
It’s a great and comprehensive post.
note: read everything. Initially it seems that he only use unwrap for
tests and documentation, but that is not the case.
Some important points:
If the value is always what the caller expects, then it follows that unwrap() and expect() will never result in a panic. If a panic does occur, then this generally corresponds to a violation of the expectations of the programmer. In other words, a runtime invariant was broken and it led to a bug.
This is starkly different from “don’t use unwrap() for error handling.” The key difference here is we expect errors to occur at some frequency, but we never expect a bug to occur. And when a bug does occur, we seek to remove the bug (or declare it as a problem that won’t be fixed).
Of the different ways to handle errors in Rust, this one is regarded as best practice:
One can handle errors as normal values, typically with Result<T, E>. If an error bubbles all the way up to the main function, one might print the error to stderr and then abort.
One of the most important parts of this approach is the ability to attach additional context to error values as they are returned to the caller. The anyhow crate makes this effortless.
The use cases for unwrap would be:
The runtime invariant is what can cause more controversy.
One could ask: if it is guaranteed to have a value/return ok, why is it an
Option or Result?
It’s a great question and it is well explained in So why not make all invariants compile-time invariants?
In essense, in many cases that will lead to much more complex code and lots of duplication.
I’ve personally never found this particular conceptualization to be helpful. The problem, as I see it, is the ambiguity in determining whether a particular error is “recoverable” or not. What does it mean, exactly?
I don’t see ambiguity there, and it’s a very important distinction that has to
be made.
Basically: can the program continue working in a valid state?
Until you find a way to recover, it is unrecoverable.
It will depend on the program and its use case.
crates/core/haystack.rs
impl Haystack {
pub(crate) fn path(&self) -> &Path {
if self.strip_dot_prefix && self.dent.path().starts_with("./") {
self.dent.path().strip_prefix("./").unwrap()
} else {
self.dent.path()
}
}
crates/core/main.rs
if let Some(ref mut stats) = stats {
*stats += search_result.stats().unwrap();
}
if matched && args.quit_after_match() {
break;
}
tokio/src/io/poll_evented.rs
/// Deregisters the inner io from the registration and returns a Result containing the inner io.
#[cfg(any(feature = "net", feature = "process"))]
pub(crate) fn into_inner(mut self) -> io::Result<E> {
let mut inner = self.io.take().unwrap(); // As io shouldn't ever be None, just unwrap here.
self.registration.deregister(&mut inner)?;
Ok(inner)
}
serde/src/private/de.rs
fn deserialize_seq<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where
V: Visitor<'de>,
{
let mut pair_visitor = PairVisitor(Some(self.0), Some(self.1), PhantomData);
let pair = tri!(visitor.visit_seq(&mut pair_visitor));
if pair_visitor.1.is_none() {
Ok(pair)
} else {
let remaining = pair_visitor.size_hint().unwrap();
// First argument is the number of elements in the data, second
// argument is the number of elements expected by the Deserialize.
Err(de::Error::invalid_length(2, &ExpectedInSeq(2 - remaining)))
}
}
serde_derive/src/internals/ctxt.rs
/// Add one of Syn's parse errors./h
pub fn syn_error(&self, err: syn::Error) {
self.errors.borrow_mut().as_mut().unwrap().push(err);
}
https://github.com/clap-rs/clap/blob/8d87e70789ed7832aa93473779d248090712d66b/clap_builder/src/util/flat_map.rs#L172
pub(crate) fn or_insert_with<F: FnOnce() -> V>(self, default: F) -> &'a mut V {
match self {
Entry::Occupied(entry) => &mut entry.v.values[entry.index],
Entry::Vacant(entry) => {
entry.v.keys.push(entry.key);
entry.v.values.push(default());
entry.v.values.last_mut().unwrap()
}
}
}
https://github.com/clap-rs/clap/blob/8d87e70789ed7832aa93473779d248090712d66b/clap_builder/src/util/color.rs#L83
impl std::str::FromStr for ColorChoice {
type Err = String;
fn from_str(s: &str) -> Result<Self, Self::Err> {
for variant in Self::value_variants() {
if variant.to_possible_value().unwrap().matches(s, false) {
return Ok(*variant);
}
}
Err(format!("invalid variant: {s}"))
}
}
https://github.com/seanmonstar/reqwest/blob/a2aa5a34e48724be0c1089b0f5afe49b82ece30e/src/async_impl/h3_client/pool.rs#L63
/// Forget the lock and return corresponding Key
fn forget(mut self) -> Key {
// Unwrap is safe because the Option can be None only after dropping the
// lock
self.0.take().unwrap().key
}
From the valid use cases above, I think unwrap has its place.
Gemini 3 response:
This is one of the most debated philosophical questions in the Rust community.
The short answer is no. While unwrap is the source of many production crashes, removing it entirely would make the language significantly more frustrating to use and, paradoxically, could lead to messier code.
Here is an analysis of why unwrap exists, why it is dangerous, and why we still need it.
unwrap (The “Why it’s bad” view)If you are asking this, you have likely encountered a production crash caused by a lazy unwrap().
Option and Result). unwrap is an explicit instruction to the compiler to ignore that safety and crash if things go wrong.called Result::unwrap() on an Err value, you have no idea why it failed, unlike expect("Failed to load config").unwrap (Why we need it)If Rust removed unwrap today, developers would immediately re-implement it. Here are the three scenarios where unwrap is actually the correct tool:
When learning the language or sketching out an algorithm, you don’t want to spend 50% of your time writing error handling boilerplate. You want to see if the logic works. unwrap allows you to defer error handling until the logic is proven.
In a test environment, a panic is exactly what you want. If a setup step returns an error, the test should fail immediately.
#[test]
fn test_calculation() {
// If this fails, the test fails. Perfect usage.
let result = complex_calculation().unwrap();
assert_eq!(result, 42);
}
Sometimes, you possess logic that the compiler cannot verify. If you are mathematically certain a value cannot be None or Err, handling the error is dead code that clutters the screen.
Example:
// We know "10" is a valid integer.
// Handling the error here would be verbose and pointless.
let number: u32 = "10".parse().unwrap();
In this case, unwrap acts as an assertion. You are telling the compiler: “Trust me. If I am wrong, I deserve to crash.”
If you feel the urge to use unwrap, Rust provides alternatives that are almost always better in production code.
expect()This is identical to unwrap, but allows you to provide a custom error message.
file.open().unwrap() (Crashes with generic error)file.open().expect("CRITICAL: Configuration file missing") (Crashes with context)? Operator (Propagation)Instead of crashing, pass the problem to the caller.
file.open()?
This keeps the current function safe and lets the parent function decide how to handle the failure.unwrap_or / unwrap_or_elseProvide a fallback value instead of crashing.
// If the env var is missing, don't crash; just use port 8080.
let port = std::env::var("PORT").unwrap_or("8080".to_string());
| Tool | Behavior | Best Use Case |
|---|---|---|
unwrap() |
Panic on failure | Quick scripts, prototypes, unit tests. |
expect() |
Panic with message | When you are 100% sure it won’t fail (Invariants). |
? |
Return error | Production code, libraries, standard logic. |
unwrap_or |
Return fallback | When a default value is acceptable. |
It wouldn’t be better if Rust didn’t have unwrap, because panicking is sometimes the correct response. However, it is better if you use a linter (like Clippy) to forbid unwrap in your production codebase, forcing you to use expect or proper error propagation.
fn fetch_features(..., features: &mut Features)
-> Result<(), (ErrorFlags, i32)> {
...
let (feature_values, _) = features
.append_with_names(&self.config.feature_names)
.unwrap();
}
Possible cases:
append_with_names really can’t fail: then it should not return an
Option/Result. Callers should not have to think about it. If there is a bug,
then it should panic inside append_with_names.append_with_names can fail: callers should not use unwrap, and the
bad case should be handled properly.append_with_names can fail, but not at that point: then the assumption
failed.The bug happenned because a bad configuration file was sent to this code.
The root cause was creating and allowing this bad configuration file to spread.
Now the question is: should the Rust code handle that?
Maybe, that’s up to Cloudflare to decide.
As it’s part of a critical system that affects many other companies, it
probably should be handled.
One option might be to accept only the features that fit in the
pre-allocated memory, and log an error or send an alarm if the threshold is hit.
Cloudflare outage on November 18, 2025
Richard Feldman: The Cloudflare outage was caused by an unwrap()
Using unwrap() in Rust is Okay
ThePrimeTime - Another Internet outage???
Blessed.rs An unofficial guide to the Rust ecosystem
Two points, if I may:
In short: if a programmer uses a panic, that just means the program does not have a way to handle the error. Whether that is the right choice, and whether that is an actual bug or not if it happens, and whether abort or unwind should be used, etc., it all depends on the requirements.
# Appendix B
**Prompting Gemini 3: Show me some valid uses of unwrap in important Rust crates**
It is validating to see that even the "gods" of the Rust ecosystem use `unwrap()`. However, they use it very differently than a beginner does.
In high-quality crates, `unwrap()` is rarely used to say "I hope this works." It is used to say **"It is logically impossible for this to fail, and if it does, the world is broken."**
Here are four specific, valid patterns found in major crates.
-----
### 1\. The "Poisoned" Mutex (Standard Library)
This is arguably the most common `unwrap` in production Rust code.
**The Pattern:** Locking a Mutex.
**Where you see it:** Almost every multi-threaded crate (`std`, `tokio`, `actix`).
```rust
use std::sync::{Arc, Mutex};
let data = Arc::new(Mutex::new(0));
// 👇 This is considered idiomatic, valid Rust.
let mut lock = data.lock().unwrap();
Why it’s valid:
Mutex::lock only returns an Err if the previous thread holding the lock panicked (this is called “poisoning”). If a thread crashed while modifying the data, the data is likely corrupted.
The Pattern: compiling a hardcoded string into a complex object during startup.
Where you see it: regex, once_cell, lazy_static.
use regex::Regex;
// We are hardcoding the pattern. We aren't accepting user input here.
let re = Regex::new(r"^\d{4}-\d{2}-\d{2}$").unwrap();
Why it’s valid: The input string is a string literal written by the programmer. It will not change at runtime.
RegexError that can never happen once the code is correct.”The Pattern: Using the std::fmt::Write trait on a String.
Where you see it: serde_json, log, and formatters.
The Write trait is designed for I/O (files, network), so it returns a Result because disks can be full or networks can disconnect. However, you can also implement Write for a standard String (RAM).
use std::fmt::Write;
let mut buffer = String::new();
// We are writing to RAM, not a disk.
// The only way this fails is OOM (Out of Memory), which aborts anyway.
write!(&mut buffer, "Hello {}", "World").unwrap();
Why it’s valid:
Writing to a String in memory does not produce I/O errors. The Result is a formality of the Trait signature.
The Pattern: You have checked a condition explicitly, so the subsequent operation is guaranteed.
Where you see it: crossbeam, rand, Vec implementations.
Imagine you have a logic flow where you check a queue size.
// Simplified logic often found in channel implementations
if let Some(item) = queue.pop() {
// We just proved 'item' exists.
process(item);
} else {
// ...
}
// But sometimes, the check is separated:
if queue.is_empty() {
return;
}
// We checked is_empty() above.
// We know pop() returns Some().
let item = queue.pop().unwrap();
Why it’s valid: The compiler’s borrow checker is smart, but it isn’t omniscient. Sometimes you (the human) know the state of the system better than the compiler.
None case is unreachable code.”When a beginner uses unwrap:
“I’m too lazy to handle the error right now.”
When a library author uses unwrap:
]]>“I have proven that this error is impossible, or if it happens, the program is in a fatal state.”
I didn’t know about intrusive data structures, then I saw it mentioned in this X thread:
CedarDB on why they picked C++ over Rust
Let’s learn about it.
container_of implementationsA simple linked list in C:
typedef struct Node {
int data;
struct Node *next;
} Node;
An intrusive linked list:
typedef struct IntrusiveLink {
struct IntrusiveLink *next;
} IntrusiveLink;
Where is the data?! =)
Here it is:
// Our data structure that we want to put into the list
// It *must* contain an IntrusiveLink member.
typedef struct MyIntData {
int value; // The actual data we care about
IntrusiveLink node; // The embedded list hook
} MyIntData;
How it works?
The real magic is in this macro:
// ptr: pointer to the member (e.g., &myIntDataInstance->node)
// type: the type of the containing struct (e.g., MyIntData)
// member: the name of the member (e.g., node)
#define container_of(ptr, type, member) ({ \
const typeof( ((type *)0)->member ) *__mptr = (ptr); \
(type *)( (char *)__mptr - offsetof(type,member) );})
With this macro, you can go to the struct containing the intrusive node (container_of) by passing the node pointer and the name of the field, eg:
MyIntData* data_item = container_of(current_link, MyIntData, node);
printf("%d -> ", data_item->value);
The main difference between an intrusive collection and a normal one is that while normal collections allocate memory behind your back to keep track of a set of values, intrusive collections never allocate memory themselves and instead keep track of a set of objects. Such collections are called intrusive because they requires explicit support in objects to allow them to be inserted into a collection.
https://docs.rs/intrusive-collections/latest/intrusive_collections/
Amanieu:
The main use case for intrusive collections is to allow an object to be a member of multiple collections simultaneously. Say you are writing a game and you have a set of objects. You will have one main list of all objects, and a separate list of objects that are “owned” by a player. If a player dies, all objects owned by him need to be removed. This can be implemented by making objects belong to two lists at once.
https://www.reddit.com/r/rust/comments/46yubp/comment/d090zxg/
Usually a linked list structure contains a pointer to the item. The implementation of linked list in Linux kernel does not. So the main question is - where does the list store the data? The actual implementation of linked list in the kernel is - Intrusive list. An intrusive linked list does not contain data in its nodes - A node just contains pointers to the next and previous node and list nodes part of the data that are added to the list. This makes the data structure generic, so it does not care about entry data type anymore.
https://github.com/0xAX/linux-insides/blob/master/DataStructures/linux-datastructures-1.md
This link is great and it has a clear explanation and examples. Take a look! =)
Intrusive data structures represent a specialized class of linked data structures where the organizational metadata, such as next and prev pointers for a list or child/parent pointers for a tree, is embedded directly within the user’s data object. This contrasts sharply with traditional data structures, which typically reside in a separate, externally managed node object. The design paradigm of intrusive structures necessitates that the data object itself “cooperates” with the data structure by containing the necessary “hooks” or linking fields.
A fundamental characteristic of this approach is that intrusive containers do not store copies of user-provided values; instead, they directly manage and link the original objects, with the linking information intrinsically part of those objects. This means that the nodes of the data structure contain only the metadata (i.e., pointers to other nodes) but not the data itself. The programmer embeds these nodes into their data, thereby avoiding additional pointer indirections. This paradigm suggests a profound shift in design philosophy: instead of a container owning and managing external nodes that point to data, the data itself becomes “aware” of its participation in a data structure. This “self-awareness” (or “intrusion”) is the root cause of both the primary benefits, such as memory efficiency and cache locality, and the primary drawbacks, including the requirement for object modification and more complex lifetime management. This approach points to a tighter coupling between the data and its organizational structure, which is a hallmark of low-level systems programming and high-performance computing.
More here: https://g.co/gemini/share/2ed8ecea357e
Prompt: In an intrusive data structure, who should be responsible for freeing the actual object?
what if I just want to free the list, but still want to use the tree? Are there use cases for that?
Gemini:
Excellent questions. They get to the very heart of why and how intrusive data structures are used.
The short answer is: the code that allocated the object is responsible for freeing it.
Intrusive containers (like the list or tree) do not own the objects they contain. They are simply mechanisms for organizing pointers that live inside the objects. Think of the containers as different “views” or “indexes” into a set of data that you manage separately.
This is a fundamental shift from non-intrusive (e.g., std::list in C++) or standard library containers, where the container allocates its own nodes and is therefore responsible for freeing them.
Here are the common ownership patterns:
free every single object. This is a simple and common approach when the objects’ lifetime is tied to one primary organization.MyIntData. When you add an object to a container, you increment the count. When you remove it, you decrement the count. The object is only freed when its count reaches zero. This is the most robust method but adds complexity.Yes, absolutely. This is one of the most powerful use cases for intrusive data structures.
However, you need to be precise about what “free the list” means. It does not mean iterating through the list and calling free() on each object, because that would deallocate the objects that the tree also points to, leaving you with a tree full of dangling pointers.
Instead, “freeing the list” means dismantling the list structure, leaving the objects intact. You would simply traverse the list and reset the next pointers, then set the list’s head pointer to NULL. The MyIntData objects themselves would remain allocated and valid, and the tree structure would be completely unaffected.
A function to do this would look like this:
// Dismantles the list structure WITHOUT freeing the underlying objects.
void intrusive_dismantle_list(IntrusiveLink** headRef) {
IntrusiveLink* current_link = *headRef;
while (current_link != NULL) {
IntrusiveLink* next_link = current_link->next;
current_link->next = NULL; // Unlink the node
current_link = next_link;
}
*headRef = NULL; // The list is now empty
}
This decoupling of the container’s lifetime from the data’s lifetime is extremely useful.
Temporary Re-ordering: Imagine your tree is the primary, canonical storage for your objects, sorted by value. For a specific task, you need to process these objects in the reverse order of their creation. You can temporarily build an intrusive list for this task, process it, and then simply dismantle the list when you’re done, leaving your tree untouched.
Filtering and Subsets: You have thousands of objects in your tree. You want to run an operation on a small subset that meets a certain criterion (e.g., value > 100). You can iterate through your tree and add only the matching objects to a temporary intrusive list. You then operate on that much smaller list, which is very efficient. Once finished, you dismantle the list.
State Management (e.g., Task Schedulers): An object might represent a task. It could live in a main tree of all known tasks. When a task is ready to run, it’s added to an intrusive run_queue list. When it’s waiting for I/O, it’s moved to a wait_queue list. The object itself is never created or destroyed; it’s just linked into different lists that represent its current state. This is a core pattern in OS kernel development.
Intrusive containers have some important advantages:
https://www.boost.org/doc/libs/1_60_0/doc/html/intrusive/intrusive_vs_nontrusive.html
Performance comparison between Intrusive and Non-intrusive containers: https://www.boost.org/doc/libs/1_60_0/doc/html/intrusive/performance.html
https://news.ycombinator.com/item?id=43680655
There are several advantages to intrusive node-based data structures…:
The core motivations for their use are multifaceted and directly tied to performance optimization. They aim to avoid costly pointer indirections, which can incur run-time costs on every data read. Furthermore, they minimize dynamic memory allocations, which can be computationally expensive and unpredictable. A significant advantage is their ability to facilitate the inclusion of a single data object in multiple distinct data structures simultaneously. For example, an element might be part of several search trees and a priority queue, allowing efficient retrieval in different orders.
By Gemini:
#include <stdio.h>
#include <stdlib.h> // For malloc, free
#include <stddef.h> // For offsetof
// Define the "list hook" that will be embedded in our data structures
// For a singly linked list, we only need a 'next' pointer.
typedef struct IntrusiveLink {
struct IntrusiveLink *next;
} IntrusiveLink;
// Our data structure that we want to put into the list
// It *must* contain an IntrusiveLink member.
typedef struct MyIntData {
int value; // The actual data we care about
IntrusiveLink node; // The embedded list hook
} MyIntData;
MyIntData* newIntData(int value) {
MyIntData* data = (MyIntData*)malloc(sizeof(MyIntData));
data->value = value;
data->node.next = NULL; // Initialize its embedded link
return data;
}
// A macro to get the containing structure from a pointer to its member
// This is crucial for intrusive lists.
// ptr: pointer to the member (e.g., &myIntDataInstance->node)
// type: the type of the containing struct (e.g., MyIntData)
// member: the name of the member (e.g., node)
#define container_of(ptr, type, member) ({ \
const typeof( ((type *)0)->member ) *__mptr = (ptr); \
(type *)( (char *)__mptr - offsetof(type,member) );})
// Function to insert a new data structure at the beginning of the intrusive list
// headRef: A pointer to the head of the list (which is an IntrusiveLink*)
// new_data_ptr: A pointer to the MyIntData instance to add
void intrusive_insert_at_beginning(IntrusiveLink** headRef, MyIntData* new_data_ptr) {
// The new node's 'next' points to the current head
new_data_ptr->node.next = *headRef;
// The head is updated to point to the new node's embedded link
*headRef = &(new_data_ptr->node);
}
// Function to print the intrusive linked list
void intrusive_print_list(IntrusiveLink* head) {
IntrusiveLink* current_link = head;
printf("Intrusive Linked List: ");
while (current_link != NULL) {
// Use container_of to get the MyIntData structure from its embedded link
MyIntData* data_item = container_of(current_link, MyIntData, node);
printf("%d -> ", data_item->value);
current_link = current_link->next;
}
printf("NULL\n");
}
// Function to free the memory allocated for the intrusive linked list
// Note: This frees the MyIntData objects themselves, not just "nodes".
void intrusive_free_list(IntrusiveLink** headRef) {
IntrusiveLink* current_link = *headRef;
IntrusiveLink* next_link;
while (current_link != NULL) {
next_link = current_link->next;
// Get the containing data structure
MyIntData* data_item = container_of(current_link, MyIntData, node);
free(data_item); // Free the entire MyIntData object
current_link = next_link;
}
*headRef = NULL;
printf("Intrusive List freed.\n");
}
int main() {
// The head of our intrusive list. It points to an IntrusiveLink
// embedded within a MyIntData structure.
IntrusiveLink* head = NULL;
MyIntData* data1 = newIntData(10);
MyIntData* data2 = newIntData(20);
MyIntData* data3 = newIntData(30);
MyIntData* data4 = newIntData(40);
// Insert elements into the intrusive list
// We pass the address of the embedded 'node' member
intrusive_insert_at_beginning(&head, data1); // List: 10 -> NULL
intrusive_insert_at_beginning(&head, data2); // List: 20 -> 10 -> NULL
intrusive_insert_at_beginning(&head, data3); // List: 30 -> 20 -> 10 -> NULL
intrusive_insert_at_beginning(&head, data4); // List: 40 -> 30 -> 20 -> 10 -> NULL
// Print the list
intrusive_print_list(head);
// Add another element
MyIntData* data5 = (MyIntData*)malloc(sizeof(MyIntData));
data5->value = 5;
data5->node.next = NULL;
intrusive_insert_at_beginning(&head, data5);
intrusive_print_list(head);
// Free the allocated memory
intrusive_free_list(&head);
// Try to print after freeing (should show NULL or be empty)
intrusive_print_list(head);
return 0;
}
It shows how different structures can use the same intrusive link definition.
Here we have Person and IntegerWrapper:
typedef struct Person {
IntrusiveNode node;
char *name;
int age;
} Person;
typedef struct IntegerWrapper {
IntrusiveNode node;
int value;
} IntegerWrapper;
By Gemini:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stddef.h> // <-- Add this header for offsetof
// Define a generic intrusive node structure.
// Any structure that wants to be part of an intrusive list will embed this.
typedef struct IntrusiveNode {
struct IntrusiveNode *next;
} IntrusiveNode;
// The LinkedList now just holds a pointer to the head of the intrusive nodes.
typedef struct IntrusiveLinkedList {
IntrusiveNode *head;
} IntrusiveLinkedList;
void initIntrusiveList(IntrusiveLinkedList *list) {
list->head = NULL;
}
// Function to insert a node at the end of the intrusive list.
// The 'node_to_insert' is already part of the data structure.
void intrusiveInsertEnd(IntrusiveLinkedList *list, IntrusiveNode *node_to_insert) {
node_to_insert->next = NULL; // Ensure the new node points to NULL initially
if (list->head == NULL) {
list->head = node_to_insert;
return;
}
IntrusiveNode *temp = list->head;
while (temp->next != NULL) {
temp = temp->next;
}
temp->next = node_to_insert;
}
// Function to print the intrusive list.
// Requires a print function that can cast the IntrusiveNode* back to the original struct.
void printIntrusiveList(IntrusiveLinkedList *list, void (*printFunc)(IntrusiveNode *)) {
IntrusiveNode *temp = list->head;
while (temp != NULL) {
printFunc(temp);
printf(", ");
temp = temp->next;
}
printf("\n");
}
// The free function for an intrusive list typically does nothing, as the memory
// for the nodes is managed by the data structures themselves.
// If the data structures themselves allocate internal memory (like char* for name),
// that should be freed by the caller or a specific free function for the data type.
void freeIntrusiveList(IntrusiveLinkedList *list) {
// In an intrusive list, we don't free the nodes here,
// as they are part of larger allocated structures.
// The responsibility of freeing the memory lies with the owner of the data structures.
list->head = NULL; // Just clear the head of the list
}
// Example data structure: Person, now containing the intrusive node.
typedef struct Person {
IntrusiveNode node; // Embedded intrusive node
char *name;
int age;
} Person;
// Helper to get the containing Person struct from the IntrusiveNode
// This is a common pattern for intrusive lists, often implemented with offsetof
// For simplicity, here we'll assume a direct cast is safe for this example,
// but for more complex scenarios, offsetof and container_of macro are preferred.
#define container_of(ptr, type, member) ({ \
const typeof( ((type *)0)->member ) *__mptr = (ptr); \
(type *)( (char *)__mptr - offsetof(type,member) );})
void printPerson(IntrusiveNode *node) {
// Cast the IntrusiveNode back to the containing Person struct
Person *p = container_of(node, Person, node);
printf("%s is %d years old", p->name, p->age);
}
// Example data structure: IntegerWrapper, for demonstrating integers
typedef struct IntegerWrapper {
IntrusiveNode node;
int value;
} IntegerWrapper;
void printInt(IntrusiveNode *node) {
IntegerWrapper *iw = container_of(node, IntegerWrapper, node);
printf("%d ", iw->value);
}
void testIntList() {
IntrusiveLinkedList list;
initIntrusiveList(&list);
// Allocate IntegerWrapper structs on the heap or stack
IntegerWrapper val1 = {.value = 10};
IntegerWrapper val2 = {.value = 20};
IntegerWrapper val3 = {.value = 30};
IntegerWrapper val4 = {.value = 40};
intrusiveInsertEnd(&list, &val1.node);
intrusiveInsertEnd(&list, &val2.node);
intrusiveInsertEnd(&list, &val3.node);
intrusiveInsertEnd(&list, &val4.node);
printIntrusiveList(&list, printInt);
// For stack-allocated data, no explicit free is needed here.
// If allocated on heap, they would need to be freed by the caller.
freeIntrusiveList(&list); // This just clears the list head, doesn't free nodes
}
void testPersonList() {
IntrusiveLinkedList list;
initIntrusiveList(&list);
// Allocate Person structs on the heap to demonstrate memory management
Person *p1 = (Person *)malloc(sizeof(Person));
p1->name = strdup("Marco"); // strdup allocates memory for the string
p1->age = 22;
Person *p2 = (Person *)malloc(sizeof(Person));
p2->name = strdup("Mary");
p2->age = 20;
intrusiveInsertEnd(&list, &p1->node);
intrusiveInsertEnd(&list, &p2->node);
printIntrusiveList(&list, printPerson);
// When dealing with heap-allocated intrusive nodes, you must manually free them
// along with any dynamically allocated members (like 'name' here).
// This often involves iterating the list and freeing each container.
IntrusiveNode *current = list.head;
IntrusiveNode *next;
while (current != NULL) {
next = current->next;
Person *p = container_of(current, Person, node);
free(p->name); // Free the dynamically allocated name
free(p); // Free the Person struct itself
current = next;
}
freeIntrusiveList(&list); // Clears the list head
}
int main() {
testIntList();
printf("\n");
testPersonList();
return 0;
}
#include <stdio.h>
#include <stdlib.h> // For malloc, free
#include <stddef.h> // For offsetof
// A macro to get the containing structure from a pointer to its member
// This is crucial for intrusive data structures.
// ptr: pointer to the member (e.g., &myIntDataInstance->node)
// type: the type of the containing struct (e.g., MyIntData)
// member: the name of the member (e.g., node)
#define container_of(ptr, type, member) ({ \
const typeof( ((type *)0)->member ) *__mptr = (ptr); \
(type *)( (char *)__mptr - offsetof(type,member) );})
//====================================================================
// Intrusive Linked List Structures and Functions
//====================================================================
// Define the "list hook" that will be embedded in our data structures
typedef struct IntrusiveLink {
struct IntrusiveLink *next;
} IntrusiveLink;
//====================================================================
// Intrusive Binary Tree Structures and Functions
//====================================================================
// Define the "tree hook" for an intrusive binary tree
typedef struct IntrusiveTreeNode {
struct IntrusiveTreeNode *left;
struct IntrusiveTreeNode *right;
} IntrusiveTreeNode;
//====================================================================
// Our Main Data Structure with Multiple Intrusive Hooks
//====================================================================
// Our data structure now contains hooks for both a list and a tree.
typedef struct MyIntData {
int value;
IntrusiveLink node; // The embedded list hook
IntrusiveTreeNode tree_node; // The embedded binary tree hook
} MyIntData;
// Factory function to create and initialize a new data object
MyIntData* newIntData(int value) {
MyIntData* data = (MyIntData*)malloc(sizeof(MyIntData));
if (!data) {
perror("Failed to allocate memory for MyIntData");
exit(EXIT_FAILURE);
}
data->value = value;
// Initialize both intrusive hooks
data->node.next = NULL;
data->tree_node.left = NULL;
data->tree_node.right = NULL;
return data;
}
//====================================================================
// Functions for Managing the Intrusive List
//====================================================================
// Function to insert a new data structure at the beginning of the intrusive list
void intrusive_insert_at_beginning(IntrusiveLink** headRef, MyIntData* new_data_ptr) {
new_data_ptr->node.next = *headRef;
*headRef = &(new_data_ptr->node);
}
// Function to print the intrusive linked list
void intrusive_print_list(IntrusiveLink* head) {
IntrusiveLink* current_link = head;
printf("Intrusive Linked List (insertion order): ");
while (current_link != NULL) {
MyIntData* data_item = container_of(current_link, MyIntData, node);
printf("%d -> ", data_item->value);
current_link = current_link->next;
}
printf("NULL\n");
}
// Function to free the memory allocated for the intrusive linked list
void intrusive_free_list(IntrusiveLink** headRef) {
IntrusiveLink* current_link = *headRef;
IntrusiveLink* next_link;
while (current_link != NULL) {
next_link = current_link->next;
MyIntData* data_item = container_of(current_link, MyIntData, node);
free(data_item); // Free the entire MyIntData object
current_link = next_link;
}
*headRef = NULL;
printf("Intrusive List freed. All underlying MyIntData objects have been deallocated.\n");
}
// Dismantles the list structure WITHOUT freeing the underlying objects.
void intrusive_dismantle_list(IntrusiveLink** headRef) {
IntrusiveLink* current_link = *headRef;
while (current_link != NULL) {
IntrusiveLink* next_link = current_link->next;
current_link->next = NULL; // Unlink the node
current_link = next_link;
}
*headRef = NULL;
}
//====================================================================
// Functions for Managing the Intrusive Binary Tree
//====================================================================
// Function to insert a new data structure into the intrusive binary search tree
void intrusive_tree_insert(IntrusiveTreeNode** rootRef, MyIntData* new_data_ptr) {
if (*rootRef == NULL) {
*rootRef = &(new_data_ptr->tree_node);
return;
}
IntrusiveTreeNode* current = *rootRef;
while (1) {
MyIntData* current_data = container_of(current, MyIntData, tree_node);
if (new_data_ptr->value < current_data->value) {
if (current->left == NULL) {
current->left = &(new_data_ptr->tree_node);
break;
}
current = current->left;
} else {
if (current->right == NULL) {
current->right = &(new_data_ptr->tree_node);
break;
}
current = current->right;
}
}
}
// Recursive function to print the tree using in-order traversal
void intrusive_print_tree_in_order_recursive(IntrusiveTreeNode* root) {
if (root == NULL) {
return;
}
intrusive_print_tree_in_order_recursive(root->left);
MyIntData* data_item = container_of(root, MyIntData, tree_node);
printf("%d ", data_item->value);
intrusive_print_tree_in_order_recursive(root->right);
}
// Wrapper function to print the entire tree
void intrusive_print_tree(IntrusiveTreeNode* root) {
printf("Intrusive Binary Tree (sorted order): ");
intrusive_print_tree_in_order_recursive(root);
printf("\n");
}
//====================================================================
// Main Demo
//====================================================================
int main() {
// Heads for our two intrusive data structures
IntrusiveLink* list_head = NULL;
IntrusiveTreeNode* tree_root = NULL;
printf("Creating and inserting data into both list and tree...\n");
// Create several data objects
MyIntData* data1 = newIntData(30);
MyIntData* data2 = newIntData(10);
MyIntData* data3 = newIntData(40);
MyIntData* data4 = newIntData(20);
MyIntData* data5 = newIntData(50);
// Insert each object into BOTH data structures
intrusive_insert_at_beginning(&list_head, data1);
intrusive_tree_insert(&tree_root, data1);
intrusive_insert_at_beginning(&list_head, data2);
intrusive_tree_insert(&tree_root, data2);
intrusive_insert_at_beginning(&list_head, data3);
intrusive_tree_insert(&tree_root, data3);
intrusive_insert_at_beginning(&list_head, data4);
intrusive_tree_insert(&tree_root, data4);
intrusive_insert_at_beginning(&list_head, data5);
intrusive_tree_insert(&tree_root, data5);
printf("\n--- Data Structures Populated ---\n");
// Print both structures to show the same data is organized differently
intrusive_print_list(list_head);
intrusive_print_tree(tree_root);
printf("----------------------------------\n\n");
// Free the memory. We only need to do this once.
// We iterate through the list to free each MyIntData object.
// After this, the pointers in the tree (tree_root) are dangling.
intrusive_free_list(&list_head);
// The tree's pointers are now invalid. We must set its root to NULL
// to avoid using dangling pointers.
tree_root = NULL;
printf("Tree root set to NULL as its memory has been freed via the list.\n\n");
printf("--- Data Structures After Freeing ---\n");
intrusive_print_list(list_head);
intrusive_print_tree(tree_root);
printf("-----------------------------------\n");
return 0;
}
Prompt: Create a C code example for the task scheduler use case:
State Management (e.g., Task Schedulers): An object might represent a task. It could live in a main tree of all known tasks. When a task is ready to run, it’s added to an intrusive run_queue list. When it’s waiting for I/O, it’s moved to a wait_queue list. The object itself is never created or destroyed; it’s just linked into different lists that represent its current state. This is a core pattern in OS kernel development.
Gemini:
Here is a C code example that implements a simple task scheduler using intrusive data structures.
A single Task object can exist in the main all_tasks_tree for its entire lifetime while being moved between the run_queue and wait_queue lists without any additional memory allocations. The lists and the tree simply manage pointers embedded within the Task object itself.
#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
#include <string.h>
// Macro to get the containing structure from a pointer to its member
#define container_of(ptr, type, member) \
((type *)((char *)(ptr) - offsetof(type, member)))
//====================================================================
// Intrusive Data Structure "Hooks"
//====================================================================
// A link for a doubly-linked list, useful for efficient queue removal
typedef struct IntrusiveLink {
struct IntrusiveLink *next;
struct IntrusiveLink *prev;
} IntrusiveLink;
// A node for a binary search tree
typedef struct IntrusiveTreeNode {
struct IntrusiveTreeNode *left;
struct IntrusiveTreeNode *right;
} IntrusiveTreeNode;
//====================================================================
// The Task Object
//====================================================================
typedef enum {
TASK_STATE_NEW,
TASK_STATE_READY,
TASK_STATE_WAITING,
TASK_STATE_DONE
} TaskState;
// The core object containing data and intrusive hooks
typedef struct Task {
int task_id;
const char* name;
TaskState state;
IntrusiveLink queue_link; // Hook for run/wait queues
IntrusiveTreeNode tree_node; // Hook for the master task tree
} Task;
//====================================================================
// "Scheduler" Global State
//====================================================================
// Master tree of all tasks that have ever been created, indexed by ID
IntrusiveTreeNode* all_tasks_tree = NULL;
// Queues representing the current state of tasks
IntrusiveLink run_queue = { &run_queue, &run_queue }; // Sentinels for an empty list
IntrusiveLink wait_queue = { &wait_queue, &wait_queue };
//====================================================================
// Intrusive Queue (List) Management
//====================================================================
// Removes a link from whichever queue it is currently in
void queue_unlink(IntrusiveLink* link) {
link->prev->next = link->next;
link->next->prev = link->prev;
link->next = link->prev = NULL; // Isolate the link
}
// Adds a link to the back of a queue
void queue_push_back(IntrusiveLink* head, IntrusiveLink* new_link) {
new_link->prev = head->prev;
new_link->next = head;
head->prev->next = new_link;
head->prev = new_link;
}
// Removes and returns the link at the front of a queue
IntrusiveLink* queue_pop_front(IntrusiveLink* head) {
if (head->next == head) return NULL; // Queue is empty
IntrusiveLink* link = head->next;
queue_unlink(link);
return link;
}
//====================================================================
// Core Task Management & Scheduler Logic
//====================================================================
// Factory to create a new task
Task* create_task(int id, const char* name) {
Task* task = (Task*)malloc(sizeof(Task));
task->task_id = id;
task->name = name;
task->state = TASK_STATE_NEW;
task->queue_link.next = task->queue_link.prev = NULL;
task->tree_node.left = task->tree_node.right = NULL;
// Add to the master tree (simple BST insertion)
IntrusiveTreeNode** current = &all_tasks_tree;
while (*current) {
Task* existing_task = container_of(*current, Task, tree_node);
if (task->task_id < existing_task->task_id) {
current = &(*current)->left;
} else {
current = &(*current)->right;
}
}
*current = &task->tree_node;
printf("CREATED: Task %d ('%s')\n", id, name);
return task;
}
// Moves a task to the ready queue to be run
void schedule_task(Task* task) {
if (task->queue_link.next) queue_unlink(&task->queue_link); // Unlink from old queue if any
task->state = TASK_STATE_READY;
queue_push_back(&run_queue, &task->queue_link);
printf("SCHEDULED: Task %d ('%s') moved to RUN queue.\n", task->task_id, task->name);
}
// Simulates running a task. Returns 1 if task is done, 0 otherwise.
int execute_task(Task* task) {
printf("EXECUTING: Task %d ('%s')...\n", task->task_id, task->name);
// Simple logic: tasks with ID > 100 need I/O, others finish
if (task->task_id > 100) {
task->state = TASK_STATE_WAITING;
queue_push_back(&wait_queue, &task->queue_link);
printf("WAITING: Task %d ('%s') moved to WAIT queue for I/O.\n", task->task_id, task->name);
return 0;
} else {
task->state = TASK_STATE_DONE;
printf("DONE: Task %d ('%s') has finished.\n", task->task_id, task->name);
return 1;
}
}
// Prints the current state of the scheduler queues
void print_scheduler_state() {
printf("\n--- Scheduler State ---\n");
printf("Run Queue: ");
for (IntrusiveLink* l = run_queue.next; l != &run_queue; l = l->next) {
printf("%d -> ", container_of(l, Task, queue_link)->task_id);
}
printf("END\nWait Queue: ");
for (IntrusiveLink* l = wait_queue.next; l != &wait_queue; l = l->next) {
printf("%d -> ", container_of(l, Task, queue_link)->task_id);
}
printf("END\n-----------------------\n\n");
}
// Free all tasks using a post-order traversal of the master tree
void free_all_tasks(IntrusiveTreeNode* node) {
if (!node) return;
free_all_tasks(node->left);
free_all_tasks(node->right);
free(container_of(node, Task, tree_node));
}
//====================================================================
// Main Demo
//====================================================================
int main() {
// 1. Create all tasks. They start in the NEW state.
Task* task1 = create_task(10, "Calculate report");
Task* task2 = create_task(205, "Fetch network data");
Task* task3 = create_task(30, "Update UI");
Task* task4 = create_task(155, "Write to disk");
// 2. Schedule initial tasks. They move to the READY state and enter the run queue.
schedule_task(task1);
schedule_task(task2);
schedule_task(task3);
print_scheduler_state();
// 3. Run a scheduler cycle: execute one task from the run queue.
IntrusiveLink* link = queue_pop_front(&run_queue);
execute_task(container_of(link, Task, queue_link)); // Task 10 will finish.
print_scheduler_state();
// 4. Run another cycle. This task will need I/O and move to the wait queue.
link = queue_pop_front(&run_queue);
execute_task(container_of(link, Task, queue_link)); // Task 205 will wait.
print_scheduler_state();
// 5. Schedule the last task.
schedule_task(task4);
print_scheduler_state();
// 6. Simulate I/O completion for task 205. It's found in the wait queue
// and moved back to the run queue.
printf("...I/O for task 205 has completed...\n");
schedule_task(task2); // Re-scheduling moves it from wait_queue to run_queue.
print_scheduler_state();
// 7. Run all remaining tasks until the run queue is empty.
printf("...Running all remaining tasks...\n");
while ((link = queue_pop_front(&run_queue))) {
execute_task(container_of(link, Task, queue_link));
}
print_scheduler_state();
// 8. Clean up all memory by freeing the master tree.
printf("All tasks processed. Freeing memory.\n");
free_all_tasks(all_tasks_tree);
return 0;
}
Gemini:
Here is a complete C implementation of an LRU (Least Recently Used) cache using intrusive data structures.
This example demonstrates a key strength of this pattern: a single allocated object (CacheEntry) is simultaneously part of two different data structures:
The CacheEntry object itself contains the “hooks” (hash_link and lru_link) that allow it to be managed by these external structures without any additional memory allocations per operation.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stddef.h>
// A macro to get the containing structure from a pointer to its member.
#define container_of(ptr, type, member) \
((type *)((char *)(ptr) - offsetof(type, member)))
//====================================================================
// Intrusive Data Structure "Hooks"
//====================================================================
// Hook for a singly-linked list (for hash table collision chains).
typedef struct HashLink {
struct HashLink *next;
} HashLink;
// Hook for a doubly-linked list (for LRU ordering).
typedef struct LruLink {
struct LruLink *next;
struct LruLink *prev;
} LruLink;
//====================================================================
// The Cache Entry Object
//====================================================================
// This is the core object we store. It contains the actual data
// and the hooks for both the hash table and the LRU list.
typedef struct CacheEntry {
int key;
char value[256];
HashLink hash_link; // Hook for hash collision chaining.
LruLink lru_link; // Hook for LRU ordering.
} CacheEntry;
//====================================================================
// The LRU Cache Structure
//====================================================================
typedef struct LRUCache {
size_t capacity;
size_t size;
HashLink **hash_table; // Array of pointers to collision chains.
size_t hash_table_size;
LruLink lru_list_head; // Sentinel node for the LRU list.
} LRUCache;
//====================================================================
// Doubly-Linked List (LRU) Management
//====================================================================
// Initializes a sentinel node for a doubly-linked list.
static void lru_list_init(LruLink *head) {
head->next = head;
head->prev = head;
}
// Removes a link from its current position in the list.
static void lru_list_unlink(LruLink *link) {
link->prev->next = link->next;
link->next->prev = link->prev;
}
// Adds a link to the front of the list (most recently used).
static void lru_list_push_front(LruLink *head, LruLink *new_link) {
new_link->next = head->next;
new_link->prev = head;
head->next->prev = new_link;
head->next = new_link;
}
//====================================================================
// Cache Internals
//====================================================================
// A simple hash function for integer keys.
static size_t hash_key(int key, size_t table_size) {
return (size_t)key % table_size;
}
// Finds a cache entry by key. Does NOT change LRU order.
static CacheEntry* find_entry(LRUCache* cache, int key) {
size_t index = hash_key(key, cache->hash_table_size);
HashLink* current = cache->hash_table[index];
while (current) {
CacheEntry* entry = container_of(current, CacheEntry, hash_link);
if (entry->key == key) {
return entry;
}
current = current->next;
}
return NULL;
}
// Removes an entry from the hash table.
static void remove_from_hash_table(LRUCache* cache, CacheEntry* entry) {
size_t index = hash_key(entry->key, cache->hash_table_size);
HashLink** indirect = &cache->hash_table[index];
while (*indirect && *indirect != &entry->hash_link) {
indirect = &(*indirect)->next;
}
if (*indirect) {
*indirect = entry->hash_link.next;
}
}
// Evicts the least recently used item from the cache.
static void evict_one(LRUCache* cache) {
// The LRU item is at the back of the list (before the sentinel).
LruLink* lru_link_to_evict = cache->lru_list_head.prev;
if (lru_link_to_evict == &cache->lru_list_head) {
// Should not happen if size > 0, but good practice.
return;
}
CacheEntry* entry_to_evict = container_of(lru_link_to_evict, CacheEntry, lru_link);
printf("EVICTING: Key %d ('%s')\n", entry_to_evict->key, entry_to_evict->value);
// Unlink from both data structures.
lru_list_unlink(&entry_to_evict->lru_link);
remove_from_hash_table(cache, entry_to_evict);
// Free the actual object and decrement cache size.
free(entry_to_evict);
cache->size--;
}
//====================================================================
// Public LRU Cache API
//====================================================================
// Creates and initializes an LRU cache.
LRUCache* lru_create(size_t capacity) {
LRUCache* cache = (LRUCache*)malloc(sizeof(LRUCache));
if (!cache) return NULL;
cache->capacity = capacity;
cache->size = 0;
// For simplicity, hash table size is twice the capacity.
cache->hash_table_size = capacity * 2;
cache->hash_table = (HashLink**)calloc(cache->hash_table_size, sizeof(HashLink*));
if (!cache->hash_table) {
free(cache);
return NULL;
}
lru_list_init(&cache->lru_list_head);
return cache;
}
// Puts a key-value pair into the cache.
void lru_put(LRUCache* cache, int key, const char* value) {
printf("PUT: Key %d, Value '%s'\n", key, value);
CacheEntry* entry = find_entry(cache, key);
if (entry) {
// --- Key exists: Update value and move to front ---
strncpy(entry->value, value, sizeof(entry->value) - 1);
// Unlink from current position and move to front of LRU list.
lru_list_unlink(&entry->lru_link);
lru_list_push_front(&cache->lru_list_head, &entry->lru_link);
} else {
// --- Key does not exist: Create new entry ---
if (cache->size >= cache->capacity) {
evict_one(cache);
}
// Create the new entry.
entry = (CacheEntry*)malloc(sizeof(CacheEntry));
entry->key = key;
strncpy(entry->value, value, sizeof(entry->value) - 1);
// Add to hash table.
size_t index = hash_key(key, cache->hash_table_size);
entry->hash_link.next = cache->hash_table[index];
cache->hash_table[index] = &entry->hash_link;
// Add to front of LRU list.
lru_list_push_front(&cache->lru_list_head, &entry->lru_link);
cache->size++;
}
}
// Gets a value by key from the cache.
const char* lru_get(LRUCache* cache, int key) {
CacheEntry* entry = find_entry(cache, key);
if (entry) {
printf("GET: Key %d -> Found '%s'. Moving to front.\n", key, entry->value);
// Move to front of LRU list to mark as recently used.
lru_list_unlink(&entry->lru_link);
lru_list_push_front(&cache->lru_list_head, &entry->lru_link);
return entry->value;
}
printf("GET: Key %d -> Not found.\n", key);
return NULL;
}
// Frees all resources used by the cache.
void lru_destroy(LRUCache* cache) {
LruLink* current = cache->lru_list_head.next;
while (current != &cache->lru_list_head) {
LruLink* next = current->next;
CacheEntry* entry = container_of(current, CacheEntry, lru_link);
free(entry);
current = next;
}
free(cache->hash_table);
free(cache);
}
// Utility to print the cache contents in LRU order.
void lru_print(LRUCache* cache) {
printf("\n--- Cache State (MRU to LRU) ---\n");
printf("Size: %zu, Capacity: %zu\n", cache->size, cache->capacity);
printf("Order: ");
LruLink* current = cache->lru_list_head.next;
while (current != &cache->lru_list_head) {
CacheEntry* entry = container_of(current, CacheEntry, lru_link);
printf("%d -> ", entry->key);
current = current->next;
}
printf("END\n--------------------------------\n\n");
}
//====================================================================
// Main Demo
//====================================================================
int main() {
LRUCache* cache = lru_create(3);
lru_put(cache, 10, "apple");
lru_put(cache, 20, "banana");
lru_put(cache, 30, "cherry");
lru_print(cache); // Order: 30 -> 20 -> 10
lru_get(cache, 10); // Access 10, making it the most recent.
lru_print(cache); // Order: 10 -> 30 -> 20
// This next put will cause an eviction. '20' is the LRU item.
lru_put(cache, 40, "date");
lru_print(cache); // Order: 40 -> 10 -> 30
lru_get(cache, 20); // '20' was evicted, so this is a miss.
lru_print(cache);
lru_put(cache, 30, "cantaloupe"); // Update existing key '30'.
lru_print(cache); // Order: 30 -> 40 -> 10
lru_get(cache, 30);
lru_destroy(cache);
printf("Cache destroyed.\n");
return 0;
}
Prompt: Create an example in C of a good use case for intrusive data structures where it involves two threads. Should we worry about thread safety?
Gemini:
This is an excellent question because multi-threaded producer-consumer scenarios are one of the best use cases for intrusive data structures.
Yes, absolutely and unequivocally.
Any time you have data that can be accessed and modified by more than one thread, you must use synchronization mechanisms to protect it. In our case, the shared data is the queue itself—specifically, the pointers (next, prev) that form the linked list.
Without protection, you could face race conditions like:
To solve this, we will use:
Here is a classic example. We’ll have:
The key benefit of using an intrusive list here is performance. The producer mallocs the WorkItem once. The queue itself performs zero allocations. It just shuffles pointers. This is much faster and creates less memory fragmentation than a non-intrusive queue that would need to allocate its own internal nodes for every item.
Here is the complete C code example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stddef.h>
#include <pthread.h>
#include <unistd.h> // for sleep()
// A macro to get the containing structure from a pointer to its member.
#define container_of(ptr, type, member) \
((type *)((char *)(ptr) - offsetof(type, member)))
//====================================================================
// Intrusive Data Structure Hook
//====================================================================
// Hook for a doubly-linked list.
typedef struct IntrusiveLink {
struct IntrusiveLink *next;
struct IntrusiveLink *prev;
} IntrusiveLink;
//====================================================================
// The Work Item Object
//====================================================================
// This is the object that is passed between threads.
typedef struct WorkItem {
int id;
char message[128];
IntrusiveLink link; // The hook for the shared queue.
} WorkItem;
//====================================================================
// The Thread-Safe Shared Queue
//====================================================================
typedef struct SharedQueue {
IntrusiveLink head; // Sentinel node for the queue.
pthread_mutex_t mutex; // Protects access to the queue.
pthread_cond_t cond; // For signaling between threads.
int is_done; // Flag to signal shutdown.
} SharedQueue;
// Initialize the shared queue.
void queue_init(SharedQueue* queue) {
queue->head.next = &queue->head;
queue->head.prev = &queue->head;
pthread_mutex_init(&queue->mutex, NULL);
pthread_cond_init(&queue->cond, NULL);
queue->is_done = 0;
}
// Add a work item to the back of the queue.
void queue_push(SharedQueue* queue, WorkItem* item) {
pthread_mutex_lock(&queue->mutex);
// Link the new item at the back of the list (before the head).
IntrusiveLink* new_link = &item->link;
IntrusiveLink* head = &queue->head;
new_link->prev = head->prev;
new_link->next = head;
head->prev->next = new_link;
head->prev = new_link;
// Signal the consumer that a new item is available.
pthread_cond_signal(&queue->cond);
pthread_mutex_unlock(&queue->mutex);
}
// Remove and return a work item from the front of the queue.
// Blocks until an item is available.
WorkItem* queue_pop(SharedQueue* queue) {
pthread_mutex_lock(&queue->mutex);
// Wait until the queue is not empty or shutdown is signaled.
while (queue->head.next == &queue->head && !queue->is_done) {
pthread_cond_wait(&queue->cond, &queue->mutex);
}
if (queue->head.next != &queue->head) {
// Unlink the item from the front of the list.
IntrusiveLink* link_to_pop = queue->head.next;
link_to_pop->prev->next = link_to_pop->next;
link_to_pop->next->prev = link_to_pop->prev;
pthread_mutex_unlock(&queue->mutex);
return container_of(link_to_pop, WorkItem, link);
}
// Queue is empty and shutdown was signaled.
pthread_mutex_unlock(&queue->mutex);
return NULL;
}
// Signal threads to shut down and destroy queue resources.
void queue_destroy(SharedQueue* queue) {
pthread_mutex_lock(&queue->mutex);
queue->is_done = 1;
// Broadcast to wake up all waiting threads.
pthread_cond_broadcast(&queue->cond);
pthread_mutex_unlock(&queue->mutex);
pthread_mutex_destroy(&queue->mutex);
pthread_cond_destroy(&queue->cond);
}
//====================================================================
// Thread Functions
//====================================================================
void* producer_thread(void* arg) {
SharedQueue* queue = (SharedQueue*)arg;
for (int i = 0; i < 10; ++i) {
// Create a new work item. This is the only allocation.
WorkItem* item = (WorkItem*)malloc(sizeof(WorkItem));
item->id = i;
snprintf(item->message, sizeof(item->message), "Message #%d", i);
printf("PRODUCER: Creating item %d\n", i);
queue_push(queue, item);
// Simulate work
usleep((rand() % 250) * 1000);
}
printf("PRODUCER: Finished.\n");
return NULL;
}
void* consumer_thread(void* arg) {
SharedQueue* queue = (SharedQueue*)arg;
while (1) {
WorkItem* item = queue_pop(queue);
if (item == NULL) {
// This means the queue is empty and we're shutting down.
break;
}
printf("CONSUMER: Processing item %d ('%s')\n", item->id, item->message);
// The consumer is responsible for freeing the object.
free(item);
}
printf("CONSUMER: Shutting down.\n");
return NULL;
}
//====================================================================
// Main Demo
//====================================================================
int main() {
pthread_t producer, consumer;
SharedQueue queue;
queue_init(&queue);
printf("Starting producer and consumer threads...\n");
pthread_create(&producer, NULL, producer_thread, &queue);
pthread_create(&consumer, NULL, consumer_thread, &queue);
// Wait for the producer to finish its work.
pthread_join(producer, NULL);
printf("Producer finished, signaling consumer to shut down...\n");
// Signal shutdown and clean up.
queue_destroy(&queue);
// Wait for the consumer to finish processing remaining items and exit.
pthread_join(consumer, NULL);
printf("All threads finished.\n");
return 0;
}
#![allow(unused_variables)] // Allow unused variables for demonstration purposes
#![allow(dead_code)] // Allow dead code for demonstration purposes
use std::{mem, ptr}; // Import mem for offset_of, ptr for raw pointer operations
// Helper to get the containing struct from the IntrusiveNode.
// This is the Rust equivalent of the C `container_of` macro.
// It relies on `offset_of!` from `core::mem`.
// IMPORTANT: This macro MUST be defined BEFORE any code that uses it.
macro_rules! container_of {
($ptr:expr, $Type:ty, $member:ident) => {
{
// Ensure the pointer is mutable. `*mut u8` is the generic byte pointer.
let member_ptr = $ptr as *mut u8;
// Calculate the offset of the member within the struct.
// This requires the actual type name ($Type) and member name ($member).
let offset = mem::offset_of!($Type, $member);
// Subtract the offset from the member's address to get the struct's base address.
(member_ptr.sub(offset)) as *mut $Type
}
};
}
// Define a generic intrusive node structure.
// Any structure that wants to be part of an intrusive list will embed this.
// #[repr(C)] ensures C-compatible memory layout.
#[repr(C)]
pub struct IntrusiveNode {
pub next: *mut IntrusiveNode, // Raw pointer to the next node
}
impl IntrusiveNode {
// A constructor for a new (unlinked) node.
pub const fn new() -> Self {
IntrusiveNode {
next: ptr::null_mut(), // Initialize next as a null pointer
}
}
}
// The IntrusiveLinkedList now just holds a raw pointer to the head.
pub struct IntrusiveLinkedList {
head: *mut IntrusiveNode, // Raw pointer to the head of the intrusive nodes
}
impl IntrusiveLinkedList {
pub fn new() -> Self {
IntrusiveLinkedList {
head: ptr::null_mut(),
}
}
// Function to insert a node at the end of the intrusive list.
// 'node_to_insert' is a raw pointer to the embedded IntrusiveNode.
// This function is `unsafe` because it deals with raw pointers and
// doesn't guarantee memory safety on its own.
pub unsafe fn intrusive_insert_end(&mut self, node_to_insert: *mut IntrusiveNode) {
// Ensure the new node points to NULL initially
unsafe {
(*node_to_insert).next = ptr::null_mut();
}
if self.head.is_null() {
self.head = node_to_insert;
return;
}
let mut temp = self.head;
unsafe {
while !(*temp).next.is_null() {
temp = (*temp).next;
}
(*temp).next = node_to_insert;
}
}
// Function to print the intrusive list.
// Requires a print function that can cast the IntrusiveNode* back to the original struct.
pub unsafe fn print_intrusive_list(&self, print_func: unsafe fn(*mut IntrusiveNode)) {
let mut temp = self.head;
while !temp.is_null() {
unsafe {
print_func(temp);
}
print!(", ");
unsafe {
temp = (*temp).next;
}
}
println!();
}
// This function will iterate through the list and free the containing structures.
// This is the user's responsibility in an intrusive list.
// This is highly unsafe and requires knowledge of the actual type.
pub unsafe fn free_list_person(&mut self) {
println!("free_list_person");
// let mut current = self.head;
// while !current.is_null() {
// let next_node = (*current).next; // Save the next pointer before freeing current
//
// // Cast the IntrusiveNode back to the containing Person struct using the macro
// println!("casting");
// let p_ptr = container_of!(current, Person, node);
//
// // Reconstruct Box from raw pointer and let it drop, which frees 'name' and 'Person'
// println!("reconstructing box");
// let _ = Box::from_raw(p_ptr);
// current = next_node;
// }
println!("clearing list");
self.head = ptr::null_mut(); // Clear the list head after freeing all elements
}
}
// Example data structure: Person, now containing the intrusive node.
// #[repr(C)] is essential for predictable memory layout.
#[repr(C)]
pub struct Person {
pub node: IntrusiveNode, // Embedded intrusive node
pub name: String,
pub age: i32,
}
impl Drop for Person {
fn drop(&mut self) {
// `String` handles its own memory, so no explicit free for `name` here.
// This `Drop` impl will be called when a `Person` value goes out of scope
// or is explicitly dropped (e.g., via `Box::from_raw`).
println!("Dropping Person: {}", self.name); // For debugging drops
}
}
// Print function for Person. Takes a raw pointer to IntrusiveNode.
unsafe fn print_person(node_ptr: *mut IntrusiveNode) {
unsafe {
let person_ptr = container_of!(node_ptr, Person, node);
let person_ref = &*person_ptr; // Dereference the raw pointer to get a reference
print!("{} is {} years old", person_ref.name, person_ref.age);
}
}
// Example data structure: IntegerWrapper, for demonstrating integers
#[repr(C)]
pub struct IntegerWrapper {
pub node: IntrusiveNode,
pub value: i32,
}
// Print function for IntegerWrapper. Takes a raw pointer to IntrusiveNode.
unsafe fn print_int(node_ptr: *mut IntrusiveNode) {
unsafe {
let int_wrapper_ptr = container_of!(node_ptr, IntegerWrapper, node);
let int_wrapper_ref = &*int_wrapper_ptr;
print!("{}", int_wrapper_ref.value);
}
}
fn test_int_list() {
let mut list = IntrusiveLinkedList::new();
let mut val1 = IntegerWrapper {
node: IntrusiveNode::new(),
value: 10,
};
let mut val2 = IntegerWrapper {
node: IntrusiveNode::new(),
value: 20,
};
let mut val3 = IntegerWrapper {
node: IntrusiveNode::new(),
value: 30,
};
let mut val4 = IntegerWrapper {
node: IntrusiveNode::new(),
value: 40,
};
unsafe {
list.intrusive_insert_end(&mut val1.node);
list.intrusive_insert_end(&mut val2.node);
list.intrusive_insert_end(&mut val3.node);
list.intrusive_insert_end(&mut val4.node);
list.print_intrusive_list(print_int);
}
list.head = ptr::null_mut();
}
fn test_person_list() {
let mut list = IntrusiveLinkedList::new();
let mut p1 = Box::new(Person {
node: IntrusiveNode::new(),
name: String::from("Marco"),
age: 22,
});
let mut p2 = Box::new(Person {
node: IntrusiveNode::new(),
name: String::from("Mary"),
age: 20,
});
unsafe {
let p1_node_ptr = &mut p1.node as *mut IntrusiveNode;
let p2_node_ptr = &mut p2.node as *mut IntrusiveNode;
list.intrusive_insert_end(p1_node_ptr);
list.intrusive_insert_end(p2_node_ptr);
list.print_intrusive_list(print_person);
list.free_list_person();
println!("bye usnafe");
}
println!("bye test_person_list");
}
fn main() {
test_int_list();
println!();
test_person_list();
}
https://www.openmymind.net/Zigs-New-LinkedList-API/
pub const SinglyLinkedList = struct {
first: ?*Node = null,
pub const Node = struct {
next: ?*Node = null,
};
};
intrusive-collection requires the nodes to outlive the linked-list (static hierarchical lifetimes) or use box/arena (heap allocation).
By intrusive, I mean not assuming where memory must live
https://github.com/mitchellh/libxev/blob/main/src/ThreadPool.zig Pool.schedule() takes Task ptr that can live anywhere until its callback is invoked. Store it on stack for fork/join structured concurrency pattern. Store in heap & free itself in callback for spawn(detached)/join pattern. Store in array for pool pattern.
https://github.com/mitchellh/libxev/blob/main/src/ThreadPool.zig#L377
const task: *Task = @fieldParentPtr("node", result.node);
(task.callback)(task);
container_of implementationsPrompt: What is the difference between these two macros?
#define container_of(ptr, type, member) \
((type *)((char *)(ptr) - offsetof(type, member)))
#define container_of(ptr, type, member) ({ \
const typeof( ((type *)0)->member ) *__mptr = (ptr); \
(type *)( (char *)__mptr - offsetof(type,member) );})
Gemini:
You’ve identified two common forms of the container_of macro, and the second one is the more robust and widely used version, especially in the Linux kernel. Let’s break down the differences:
#define container_of(ptr, type, member) \
((type *)((char *)(ptr) - offsetof(type, member)))
How it works: This version directly performs the pointer arithmetic.
(char *)(ptr): Casts the member pointer ptr to a char*. This is crucial because offsetof returns a byte offset, and char* pointer arithmetic works in terms of bytes. If you didn’t cast to char*, the subtraction would be in terms of sizeof(*ptr), which would likely be incorrect.offsetof(type, member): This standard macro (from <stddef.h>) calculates the offset in bytes of the member within the type structure.((char *)(ptr) - offsetof(type, member)): Subtracts the offset from the member’s address to get the address of the beginning of the containing structure.(type *): Casts the resulting byte address back to a pointer to the type of the container structure.Pros:
Cons:
ptr that is not actually a pointer to the member of the specified type, the compiler will not warn you. It will happily perform the calculation, resulting in undefined behavior or crashes at runtime. For example, if ptr points to an int and member is supposed to be a struct list_head, the compiler won’t catch the mismatch.#define container_of(ptr, type, member) ({ \
const typeof( ((type *)0)->member ) *__mptr = (ptr); \
(type *)( (char *)__mptr - offsetof(type,member) );})
How it works: This version introduces two key enhancements:
GNU Statement Expression ({ ... }): The entire macro is wrapped in ({ ... }). This is a GNU C extension (supported by GCC and Clang) that allows you to have a compound statement (with local variable declarations, etc.) act as an expression. The value of the last expression in the block is the value of the overall statement expression. This is essential for declaring a local variable like __mptr within the macro without polluting the surrounding scope.
Compile-Time Type Checking with typeof:
((type *)0)->member: This is a clever trick. It creates a “null pointer” to the type structure and then attempts to access its member. Crucially, this expression is never actually evaluated at runtime (it doesn’t dereference a null pointer). Its sole purpose is to provide a context for typeof.
typeof( ((type *)0)->member ): The typeof operator (another GCC/Clang extension) gets the type of the expression ((type *)0)->member. So, if member is of type struct list_head, typeof(...) will yield struct list_head.
const typeof( ((type *)0)->member ) *__mptr = (ptr);: This is the magic line. It declares a temporary local pointer __mptr whose type is const pointer to the actual type of the member. Then, it initializes __mptr with ptr.
ptr has a different type than member, the compiler will issue a warning (or even an error, depending on compiler flags) about incompatible pointer types during this assignment. This catches many potential bugs at compile time that the simpler version would miss.const qualifier is added to __mptr to prevent accidental modification of the pointer within the macro.(type *)( (char *)__mptr - offsetof(type,member) );: The rest is similar to the simpler version, performing the pointer arithmetic using the now type-checked __mptr.
Pros:
ptr might not actually point to the correct type of member. This makes the code much safer and easier to debug.({ ... }) ensures that any temporary variables (like __mptr) used within the macro don’t interfere with variables in the code that uses the macro.Cons:
typeof and statement expressions are not part of standard C (C99, C11, C17). While widely supported by GCC and Clang, they are technically non-portable to other compilers. For most Linux kernel development and embedded systems where GCC/Clang are dominant, this is not a practical concern.typeof trick and statement expression.| Feature | Simple Version | Linux Kernel Version |
|---|---|---|
| Type Safety | None (runtime errors possible) | Yes (compile-time warnings/errors) |
| Portability | Standard C | GNU C extension (GCC/Clang specific) |
| Readability | Slightly simpler | Slightly more complex |
| Error Detection | Poor | Excellent |
| Usage | Fine for quick, internal use, or very simple cases where type is guaranteed | Preferred for robust, large-scale, or kernel-style development |
In most real-world scenarios, especially when dealing with low-level C programming or libraries that need to be highly robust (like the Linux kernel), the second version (with typeof and statement expressions) is overwhelmingly preferred because of its crucial compile-time type-checking capabilities.
https://github.com/Rust-for-Linux/linux/pull/158
impl_list_item!
container_of!
macro_rules! container_ofmacro_rules! impl_has_list_linkshttps://github.com/0xAX/linux-insides/blob/master/DataStructures/linux-datastructures-1.md
Pacific++ 2017: Matt Bentley “Can we make a faster linked list?”
Gemini: Intrusive Data Structures Analysis
https://www.gamedeveloper.com/programming/in-depth-intrusive-lists
Adopt intrusive data structures for better performance
Zig’s @fieldParentPtr for dumbos like me
Hacker News - Zig’s new LinkedList API (it’s time to learn fieldParentPtr)
Resource efficient Thread Pools with Zig
https://docs.rs/intrusive-collections/latest/intrusive_collections/
Safe Intrusive Collections with Pinning
What are intrusive linked lists and does Rust really struggle with them?
https://www.kernel.org/doc/rustdoc/next/kernel/list/struct.List.html
https://rust.docs.kernel.org/src/kernel/list.rs.html#255
https://github.com/bbatha/movecell_graph/blob/master/src/lib.rs
]]>“Go’s memory model, for example, has no concept of isolating memory between co-routines or threads, nor of controlling mutability. Everything can always race on the same global mutable heap.”
Graydon Hoare https://www.infoq.com/news/2012/08/Interview-Rust/
Create a Go program where this behavior can lead to bugs, then show a Rust equivalent where it is safe, because Rust prevents that kind of bug.
Of course. This is a fantastic question that gets to the very heart of the philosophical differences between Go and Rust regarding concurrency and memory safety.
I will create two programs to illustrate this. The first is a Go program that demonstrates a classic race condition, a bug that can be subtle and hard to track down. The second is the equivalent program in Rust, which will not even compile without the proper safety mechanisms, effectively preventing the bug from ever existing.
The Go Example: Unchecked Shared Memory
Here is a Go program that starts several concurrent “goroutines.” Each goroutine attempts to increment a shared counter 1,000 times. Because Go’s memory model allows any goroutine to access the shared counter variable without restriction, they will interfere with each other, leading to an incorrect final count.
package main
import (
"fmt"
"sync"
)
func main() {
// We expect the final count to be 10 * 1000 = 10,000.
const numGoroutines = 10
const incrementsPerGoroutine = 1000
// A WaitGroup is used to wait for all the goroutines to finish.
// This is for control flow, not for memory safety.
var wg sync.WaitGroup
wg.Add(numGoroutines)
// This is our shared counter. It lives on the heap and is accessible
// by all goroutines launched from main.
var counter int // Defaults to 0
// Launch 10 concurrent goroutines.
for i := 0; i < numGoroutines; i++ {
go func() {
// Decrement the WaitGroup counter when the goroutine finishes.
defer wg.Done()
// Each goroutine will try to increment the counter 1000 times.
for j := 0; j < incrementsPerGoroutine; j++ {
// --- THIS IS THE DANGER ZONE ---
// The `counter++` operation is not atomic. It involves three steps:
// 1. Read the current value of `counter`.
// 2. Add 1 to that value.
// 3. Write the new value back to `counter`.
// A different goroutine can read the value *after* this one has
// read it but *before* it has written the new value back.
counter++
}
}()
}
// Wait for all goroutines to complete their work.
wg.Wait()
// The final result will almost certainly NOT be 10,000.
// Run this program multiple times, and you'll likely get a different result each time.
fmt.Printf("Expected final counter: %d\n", numGoroutines*incrementsPerGoroutine)
fmt.Printf("Actual final counter: %d\n", counter)
}
When you run the Go program, the output will show an “Actual final counter” that is less than the expected 10,000. This is because multiple goroutines read the same value of counter before any of them can write their incremented value back, causing some increments to be lost. This is a classic race condition.
The Rust Equivalent: Compile-Time Guarantees
Now, let’s try to write the same logic in Rust. A naive, direct translation of the Go code would be blocked by the compiler. Rust’s ownership and borrowing rules prevent you from sharing mutable data between threads without a synchronization primitive.
The compiler forces you to be explicit about shared ownership (Arc) and mutual exclusion (Mutex). This makes the code slightly more verbose but guarantees at compile time that a race condition of this nature is impossible.
use std::sync::{Arc, Mutex};
use std::thread;
fn main() {
// We expect the final count to be 10 * 1000 = 10,000.
const NUM_THREADS: usize = 10;
const INCREMENTS_PER_THREAD: usize = 1000;
// To share data between threads, Rust requires special "wrapper" types.
// `Arc` stands for "Atomically Referenced Counter". It allows multiple
// owners of the same data by keeping track of how many references exist.
// `Mutex` stands for "Mutual Exclusion". It ensures that only one thread
// can access the data inside it at any given time.
//
// By combining them, we create a thread-safe, shared, mutable integer.
let counter = Arc::new(Mutex::new(0));
// We'll store the handles to our spawned threads in this vector.
let mut handles = vec![];
for _ in 0..NUM_THREADS {
// We clone the `Arc` to give ownership of a reference to the new thread.
// This just increments the reference count; it doesn't copy the data.
let counter_clone = Arc::clone(&counter);
let handle = thread::spawn(move || {
for _ in 0..INCREMENTS_PER_THREAD {
// --- THIS IS THE SAFE ZONE ---
// To access the data, we must first `lock()` the mutex.
// This call will block until the mutex is available.
// The `.unwrap()` handles a potential error if a thread panics
// while holding the lock.
//
// `num` is a "MutexGuard", a smart pointer to the data (our `i32`).
let mut num = counter_clone.lock().unwrap();
// Now that we have the lock, we can safely modify the data.
// No other thread can acquire the lock until `num` goes out of scope.
*num += 1;
// The lock is automatically released when `num` goes out of scope
// at the end of this statement. This is a key feature of Rust's
// RAII (Resource Acquisition Is Initialization) pattern.
}
});
handles.push(handle);
}
// Wait for all threads to complete their work.
for handle in handles {
handle.join().unwrap();
}
// The result will ALWAYS be 10,000 because the Mutex ensured that
// the increments were orderly and atomic from the perspective of the counter.
// We lock it one last time to read the final value.
println!("Expected final counter: {}", NUM_THREADS * INCREMENTS_PER_THREAD);
println!("Actual final counter: {}", *counter.lock().unwrap());
}
I’ve just provided the two code examples.
The Go program is simple and concise, but it contains a hidden bug that might only appear under heavy load. The burden is on the developer to know when to use synchronization primitives like sync.Mutex.
The Rust program requires you to think about memory sharing and locking upfront. The compiler acts as a strict partner, refusing to compile code that could lead to data races. This prevents the bug entirely, shifting the safety check from runtime (or a developer’s code review) to compile-time.
To address the query, we need to demonstrate how Go’s memory model, which lacks memory isolation between goroutines and control over mutability, can lead to concurrency bugs like data races. Then, we’ll provide a Rust equivalent where its safety features prevent such issues. Let’s dive into the examples.
In Go, goroutines share the same memory space, and variables are mutable by default. Without explicit synchronization, multiple goroutines accessing and modifying shared data can result in race conditions—unpredictable behavior due to unsynchronized memory access. Here’s a Go program that illustrates this problem:
package main
import (
"sync"
)
var counter int
func increment(wg *sync.WaitGroup) {
for i := 0; i < 1000; i++ {
counter++ // Unsynchronized access to shared variable
}
wg.Done()
}
func main() {
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go increment(&wg) // Launch 10 goroutines
}
wg.Wait() // Wait for all goroutines to complete
println(counter) // Expected: 10000, Actual: Varies
}
counter is an integer accessible by all goroutines.counter 1000 times. Ideally, counter should reach 10 × 1000 = 10,000.counter simultaneously. For example, two goroutines might read counter as 42, increment it to 43 locally, and write back 43, losing an update. This leads to a final value less than 10,000, varying per run.go run -race main.go will flag a data race, confirming the bug, but Go compiles and executes it regardless, leaving the error to manifest at runtime.This showcases Go’s memory model: goroutines operate on a shared, mutable heap without inherent protection, making such bugs possible if synchronization is neglected.
Rust’s ownership model and borrow checker prevent data races at compile time by enforcing strict rules: only one mutable reference or multiple immutable references to data can exist at once, and shared data across threads requires synchronization. Let’s first attempt a direct equivalent to the Go program and see why it fails, then provide a safe version.
use std::thread;
fn main() {
let mut counter = 0;
let handles: Vec<_> = (0..10).map(|_| {
thread::spawn(|| {
for _ in 0..1000 {
counter += 1; // Attempt to mutate shared variable
}
})
}).collect();
for handle in handles {
handle.join().unwrap();
}
println!("{}", counter);
}
This code doesn’t compile. Rust’s compiler produces an error like:
error[E0373]: closure may outlive the current function, but it borrows `counter`, which is owned by the current function
--> src/main.rs:6:23
|
6 | thread::spawn(|| {
| ^^ may outlive borrowed value `counter`
7 | for _ in 0..1000 {
8 | counter += 1;
| ------- `counter` is borrowed here
|
note: function requires argument type to outlive `'static`
counter by reference and is moved to a new thread, which may outlive main. Rust requires thread-spawned closures to have a 'static lifetime, but counter is a local variable. More critically, even if we adjusted lifetimes, Rust prohibits multiple threads from mutating counter without synchronization, preventing the race condition outright.To achieve the same functionality safely, we use Rust’s synchronization primitives, such as Mutex with Arc (Atomic Reference Counting) for thread-safe sharing:
use std::sync::{Arc, Mutex};
use std::thread;
fn main() {
// Wrap counter in Arc<Mutex<i32>> for safe sharing and mutation
let counter = Arc::new(Mutex::new(0));
// Spawn 10 threads
let handles: Vec<_> = (0..10).map(|_| {
let counter_clone = Arc::clone(&counter); // Clone Arc for each thread
thread::spawn(move || {
for _ in 0..1000 {
let mut num = counter_clone.lock().unwrap(); // Lock for exclusive access
*num += 1; // Safely increment
}
})
}).collect();
// Wait for all threads to finish
for handle in handles {
handle.join().unwrap();
}
// Print the final value
println!("{}", *counter.lock().unwrap()); // Always prints 10000
}
For efficiency, we can use AtomicUsize instead of Mutex, which provides lock-free atomic operations:
use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::Arc;
use std::thread;
fn main() {
let counter = Arc::new(AtomicUsize::new(0));
let handles: Vec<_> = (0..10).map(|_| {
let counter_clone = Arc::clone(&counter);
thread::spawn(move || {
for _ in 0..1000 {
counter_clone.fetch_add(1, Ordering::Relaxed); // Atomic increment
}
})
}).collect();
for handle in handles {
handle.join().unwrap();
}
println!("{}", counter.load(Ordering::Relaxed)); // Always prints 10000
}
lock() provides exclusive access.fetch_add, suitable for simple operations like this, with memory ordering (e.g., Relaxed) specifying consistency rules.counter reaches exactly 10,000 every time.sync.Mutex) manually. Forgetting this leads to subtle, runtime bugs detectable only with tools like the race detector.Mutex or AtomicUsize. This eliminates data races by design, aligning with the query’s requirement to show a safe Rust equivalent.Thus, Go’s shared, mutable memory model permits concurrency bugs, while Rust’s type system and ownership enforce safety, making the latter immune to this class of errors when using its safe abstractions.
Great question. You’re right: Go’s memory model offers concurrency through goroutines and channels, but it does not enforce isolation or immutability at the type level. This makes it easy to accidentally write code with data races if shared memory is accessed without synchronization.
Rust, on the other hand, uses ownership, borrowing, and type system checks at compile time to prevent data races by design.
We will:
package main
import (
"fmt"
"time"
)
var shared = 0
func write() {
for i := 0; i < 1000; i++ {
shared++
}
}
func main() {
go write()
go write()
time.Sleep(100 * time.Millisecond)
fmt.Println("Final value:", shared)
}
This code starts two goroutines that both increment a global variable shared without synchronization. This will almost certainly lead to a data race.
You can verify this by running:
go run -race main.go
You’ll see something like:
WARNING: DATA RACE
use std::thread;
fn main() {
let mut shared = 0;
let handle1 = thread::spawn(|| {
for _ in 0..1000 {
// Cannot access shared here
}
});
let handle2 = thread::spawn(|| {
for _ in 0..1000 {
// Cannot access shared here
}
});
// This won’t compile: `shared` cannot be shared across threads without synchronization.
// handle1.join().unwrap();
// handle2.join().unwrap();
println!("Final value: {}", shared);
}
Rust will say something like:
error[E0373]: closure may outlive the current function, but it borrows `shared`, which is owned by the current function
You must use thread-safe types, like Mutex or Atomic*.
use std::sync::{Arc, Mutex};
use std::thread;
fn main() {
let shared = Arc::new(Mutex::new(0));
let mut handles = vec![];
for _ in 0..2 {
let shared_clone = Arc::clone(&shared);
let handle = thread::spawn(move || {
for _ in 0..1000 {
let mut num = shared_clone.lock().unwrap();
*num += 1;
}
});
handles.push(handle);
}
for handle in handles {
handle.join().unwrap();
}
println!("Final value: {}", *shared.lock().unwrap());
}
Arc is an atomic reference counter for shared ownership across threads.Mutex ensures only one thread can mutate at a time.| Language | Code Compiles? | Data Race Possible? | Safety Mechanism |
|---|---|---|---|
| Go | ✅ | ✅ Yes | Manual sync |
| Rust | ❌ (unsafe version) / ✅ (safe version) | ❌ Never (if it compiles) | Enforced via compiler |
Would you like to see the same kind of example using channels instead of shared memory?
prompt: what does “-race” do?
The -race flag in Go enables the race detector, a powerful tool that detects data races in your program at runtime.
When you compile or run a Go program with -race, the Go toolchain:
Detects concurrent accesses where:
go run -race main.go
go build -race -o myapp main.go
./myapp
For a program with a data race, you’ll see output like:
==================
WARNING: DATA RACE
Read at 0x00c0000b6008 by goroutine 6:
main.write()
/path/to/main.go:10 +0x3c
Previous write at 0x00c0000b6008 by goroutine 5:
main.write()
/path/to/main.go:10 +0x3c
==================
It tells you:
| Flag | Purpose |
|---|---|
-race |
Enables runtime detection of data races in Go programs |
Would you like help detecting or fixing a specific race in your code?
The -race flag in Go enables the race detector, a powerful tool that detects data races in your program at runtime.
When you compile or run a Go program with -race, the Go toolchain:
Detects concurrent accesses where:
go run -race main.go
go build -race -o myapp main.go
./myapp
For a program with a data race, you’ll see output like:
==================
WARNING: DATA RACE
Read at 0x00c0000b6008 by goroutine 6:
main.write()
/path/to/main.go:10 +0x3c
Previous write at 0x00c0000b6008 by goroutine 5:
main.write()
/path/to/main.go:10 +0x3c
==================
It tells you:
| Flag | Purpose |
|---|---|
-race |
Enables runtime detection of data races in Go programs |
Would you like help detecting or fixing a specific race in your code?
]]>I. Introduction: The Significance of Resource Efficiency in Java Development and the Role of JEPs
In the realm of modern software development, the efficient utilization of computational resources, particularly CPU and memory, stands as a cornerstone of application performance, scalability, and cost-effectiveness. Java applications, widely deployed across diverse environments ranging from enterprise servers to embedded systems and cloud-native architectures, are particularly sensitive to resource consumption. Inefficient resource usage can manifest in various detrimental ways, including elevated operational expenditures due to increased infrastructure demands, diminished application responsiveness leading to poor user experiences, and limitations in the ability of applications to scale effectively under high load. Addressing these challenges necessitates a continuous focus on optimizing how Java applications consume and manage CPU cycles and memory.
The Java Development Kit (JDK) Enhancement Proposal (JEP) process serves as the primary and structured pathway for introducing significant improvements and innovations to the Java platform and its associated ecosystem.1 These proposals, which are meticulously documented and reviewed by the Java community, cover a broad spectrum of potential enhancements, encompassing new language features designed to improve developer productivity, Application Programming Interface (API) extensions aimed at providing richer functionality, and crucial modifications to the underlying Java Virtual Machine (JVM) that are often geared towards bolstering performance and resource management.3 Within this landscape of ongoing development, numerous JEPs have been specifically conceived and implemented with the explicit goal of enhancing the CPU and memory efficiency of Java applications. This report undertakes a focused analysis of these key JEPs, drawing upon provided research materials to elucidate the proposed mechanisms, objectives, and potential impacts of these enhancements on the Java platform.
II. Methodology: Approach to Identifying and Analyzing Performance-Enhancing JEPs
To effectively address the user's query regarding JDK Enhancement Proposals that aim to improve the CPU and memory usage of Java applications, a systematic approach was employed, mirroring the steps outlined in the initial request. The investigation commenced with the acquisition of a comprehensive list of JEPs by examining the provided research snippets.1 Notably, snippet 4 presented a particularly detailed and categorized list of JEPs, offering a valuable foundation for this analysis.
Following the initial list acquisition, a keyword-based filtering process was conducted. This involved scrutinizing the titles and brief descriptions of the JEPs for the presence of terms directly related to the user's interest, such as "performance," "CPU," "memory," "optimization," "resource usage," and "efficiency," alongside other semantically similar terms. This filtering stage served to narrow down the extensive list to a more manageable set of potentially relevant proposals that explicitly indicated an intention to improve resource utilization.
For each JEP identified through the filtering process, the subsequent step involved retrieving more detailed descriptions and objectives. This was accomplished by systematically searching the provided research snippets for mentions of these specific JEP numbers. Snippets from 10 onwards proved particularly useful in this stage, often containing more in-depth information, especially for JEPs associated with recent JDK releases.
With the detailed descriptions in hand, a careful analysis was performed to ascertain whether each JEP explicitly mentioned improvements to CPU and/or memory usage. The focus on explicit mentions was a deliberate choice to ensure the report's accuracy and direct relevance to the user's query. This stage involved a thorough reading of the objectives, descriptions, and motivations outlined in the JEP documentation and associated articles.
Finally, the JEPs that explicitly targeted CPU improvements, memory improvements, or both were compiled into categorized lists. For each JEP in these lists, a summary of the key changes proposed and the underlying mechanisms intended to achieve these improvements was developed, drawing directly from the analyzed research materials.
III. JDK Enhancement Proposals Targeting CPU Usage
The analysis of the provided research material revealed several JDK Enhancement Proposals that primarily focus on improving the CPU usage of Java applications. These JEPs span various aspects of the JVM and core libraries, reflecting a multifaceted approach to enhancing performance.
JEP 509: JFR CPU-Time Profiling (Experimental) 10 introduces an experimental feature to the JDK Flight Recorder (JFR) specifically designed to capture CPU-time profiling information on Linux operating systems. The primary goal of this JEP is to empower developers with the ability to precisely measure the CPU cycles consumed by specific elements within their Java programs.14 This capability is crucial for identifying performance bottlenecks and pinpointing areas where CPU optimization efforts would be most effective. As highlighted in 14, without such profiling data, developers might inadvertently focus on optimizing program elements that have a negligible impact on overall performance, leading to wasted effort. By providing detailed insights into CPU consumption at a granular level, JEP 509 enables developers to make data-driven decisions regarding performance tuning and CPU efficiency.
JEP 475: Late Barrier Expansion for G1 15 aims to simplify the implementation of the G1 garbage collector's barriers, which are responsible for recording information about application memory accesses. This simplification is achieved by shifting the expansion of these barriers from an early stage to a later point in the C2 JIT (Just-In-Time) compilation pipeline.18 The primary goals of this JEP, as outlined in 18, include reducing the execution time of the C2 compiler when the G1 garbage collector is in use and enhancing the comprehensibility of G1 barriers for HotSpot developers. Furthermore, preliminary experiments have indicated that a naive implementation of late barrier expansion can already achieve code quality that is comparable to code optimized by C2 using the earlier expansion strategy.18 This enhancement directly contributes to CPU efficiency by reducing the overhead associated with the C2 compilation process itself, thereby decreasing the time and CPU resources required for application startup and warm-up phases.16 The JEP also strives to ensure that the C2 compiler preserves critical invariants regarding the relative ordering of memory accesses, safepoints, and barriers.18
JEP 423: Region Pinning for G1 5 addresses the challenge of reducing latency in Java applications that interact with unmanaged code through the Java Native Interface (JNI). Traditionally, the default garbage collector, G1, would entirely disable garbage collection during JNI critical regions to prevent the movement of critical objects. This approach could lead to significant latency issues, including thread stalling. JEP 423 introduces the concept of region pinning, allowing G1 to pin specific memory regions containing critical objects during garbage collection operations.23 By preventing the evacuation of these pinned regions, the JEP eliminates the need to disable garbage collection during JNI critical regions, thus reducing thread stalling, minimizing additional latency to start garbage collection, and avoiding regressions in GC pause times.22 This enhancement directly improves CPU responsiveness and overall application performance in scenarios involving frequent JNI interactions.
JEP 416: Reimplement Core Reflection with Method Handles 7 proposes a significant internal change to the Java platform by reimplementing the core reflection mechanisms (java.lang.reflect.Method, Constructor, and Field) on top of java.lang.invoke method handles.28 The primary motivation behind this JEP is to reduce the maintenance and development costs associated with both the reflection and method handle APIs.27 The new implementation performs direct invocations of method handles for specific reflective objects.27 Notably, microbenchmarks have demonstrated that when Method, Constructor, and Field instances are held in static final fields, the performance of the new implementation is significantly faster compared to the old implementation.28 While some performance degradation might be observed in microbenchmarks when these instances are held in non-constant fields, real-world benchmarks using established libraries have not shown significant performance regressions.27 This reimplementation not only streamlines the internal workings of reflection but also reduces the cost of upgrading reflection support for new language features and simplifies the HotSpot VM by removing the special treatment of MagicAccessorImpl subclasses.27
JEP 376: ZGC: Concurrent Thread-Stack Processing 30 addresses the challenge of minimizing garbage collection pauses in the Z Garbage Collector (ZGC). The JEP proposes to move the processing of thread stacks from safepoints to a concurrent phase.30 The goals of this enhancement include making stack processing lazy, cooperative, concurrent, and incremental, as well as removing all other per-thread root processing from ZGC safepoints.30 By achieving these goals, the JEP aims to ensure that the time spent inside GC safepoints does not exceed one millisecond, even on systems with large memory configurations.30 The implementation introduces a stack watermark barrier, which facilitates the concurrent processing of thread stacks.30 This change allows the ZGC to process all roots in the JVM concurrently, rather than relying on stop-the-world pauses, thereby significantly reducing the duration of GC pauses and improving overall CPU utilization by allowing application threads to execute more continuously.31
JEP 346: Promptly Return Unused Committed Memory from G1 33 focuses on enhancing the G1 garbage collector's ability to manage memory resources more efficiently. The JEP proposes that G1 should automatically return unused Java heap memory to the operating system when the application is idle.34 This behavior is particularly beneficial in containerized environments where resource consumption directly translates to operational costs.34 During periods of application inactivity, G1 will periodically assess the Java heap usage and, if appropriate, initiate a process to uncommit unused portions of the heap, returning this memory to the underlying operating system.34 This feature can be controlled through various options, including the G1PeriodicGCInvokesConcurrent option, which determines the type of periodic garbage collection to be performed.34 While the primary focus of this JEP is on memory management, the ability to dynamically adjust the JVM's memory footprint based on application activity can indirectly contribute to better CPU utilization by reducing the overhead associated with managing a larger-than-necessary heap.
JEP 307: Parallel Full GC for G1 39 addresses the worst-case latency scenarios that can occur with the G1 garbage collector. While G1 is designed to largely avoid full garbage collections, they can still occur when concurrent collections are unable to reclaim memory quickly enough. The original implementation of the full GC in G1 utilized a single-threaded mark-sweep-compact algorithm. JEP 307 proposes to improve G1's worst-case latencies by making this full GC phase parallel.41 By parallelizing the mark-sweep-compact algorithm and utilizing the same number of threads as the young and mixed collections (controlled by the -XX:ParallelGCThreads option), this enhancement aims to significantly reduce the duration of full GC pauses, leading to improved CPU responsiveness and overall application performance under heavy memory pressure.40
JEP 246: Leverage CPU Instructions for GHASH and RSA 43 targets the performance of specific cryptographic operations that are frequently used in Java applications, particularly in secure communication protocols like TLS. The JEP proposes to improve the performance of GHASH (used in the GCM cipher mode) and RSA cryptographic operations by leveraging recently introduced CPU instructions available on SPARC and Intel x64 architectures.43 By utilizing hardware-level acceleration for these computationally intensive tasks, the JEP aims to significantly improve the speed and efficiency of cryptographic operations within the JVM, leading to reduced CPU utilization and faster execution of secure applications.
JEP 143: Improve Contended Locking 3 focuses on optimizing the performance of contended Java object monitors, which are fundamental for thread synchronization in Java. The JEP explores various areas for improvement, including field reordering, cache line alignment, and the speed of monitor enter, exit, and notification operations.47 The goal is to enhance the overall performance of contended monitors, which can become a bottleneck in highly concurrent applications, leading to better CPU utilization by reducing the overhead associated with thread synchronization.48
JEP Number | Title | JDK Release Status | Key CPU Improvement Mechanisms | |
509 | JFR CPU-Time Profiling (Experimental) | Candidate | Enhances JDK Flight Recorder to capture CPU-time profiling information on Linux, enabling better performance analysis. | |
475 | Late Barrier Expansion for G1 | 24 | Simplifies G1 GC barriers by shifting expansion to later in the JIT pipeline, reducing overhead and improving code efficiency. | |
423 | Region Pinning for G1 | 22 | Allows G1 GC to pin memory regions during JNI critical regions, avoiding the need to disable GC and reducing thread stalling. | |
416 | Reimplement Core Reflection with Method Handles | 18 | Replaces bytecode generation for reflection with method handles, potentially improving performance for reflective operations. | |
376 | ZGC: Concurrent Thread-Stack Processing | 16 | Moves ZGC thread-stack processing to a concurrent phase, significantly reducing stop-the-world pause times. | |
346 | Promptly Return Unused Committed Memory from G1 | 12 | Enhances G1 to automatically return unused heap memory to the OS, indirectly improving CPU usage by managing a more appropriately sized heap. | |
307 | Parallel Full GC for G1 | 10 | Parallelizes the full garbage collection phase in G1, reducing the duration of the longest GC pauses. | |
246 | Leverage CPU Instructions for GHASH and RSA | 9 | Improves the performance of GHASH and RSA cryptographic operations by utilizing specific CPU instructions. | |
143 | Improve Contended Locking | 9 | Optimizes the performance of contended Java object monitors, reducing synchronization overhead in multithreaded applications. |
IV. JDK Enhancement Proposals Targeting Memory Usage
Several JEPs have been specifically designed to improve the memory usage characteristics of Java applications. These proposals address various aspects of memory management within the JVM and the core libraries.
JEP 450: Compact Object Headers (Experimental) 51 introduces an experimental feature aimed at reducing the size of object headers in the HotSpot JVM. On 64-bit architectures, object headers, which contain metadata about objects, can occupy a significant amount of memory (between 96 and 128 bits). This JEP proposes to reduce this size down to 64 bits.53 This reduction is achieved by compacting the mark word and the class pointer into a single 64-bit header.58 By decreasing the size of each object header from 12 bytes to 8 bytes 51, this feature aims to reduce the overall heap size required by Java applications, improve deployment density by allowing more applications to run within the same memory footprint, and potentially increase data locality, leading to better performance due to improved CPU cache utilization.51 This JEP is inspired by Project Lilliput, which has a broader goal of reducing the memory footprint of the JVM.55 As it is an experimental feature in JDK 24, it must be explicitly enabled using specific JVM options.51
JEP 439: Generational ZGC 5 extends the Z Garbage Collector (ZGC) by introducing the concept of separate generations for young and old objects.63 This enhancement allows ZGC to more frequently collect young objects, which have a higher tendency to die young, thus reclaiming memory more efficiently.61 The primary goals of this JEP include lowering the risks of allocation stalls, reducing the required heap memory overhead, and decreasing the CPU overhead associated with garbage collection.62 While preserving the essential low-pause-time characteristics of non-generational ZGC, the generational approach splits the heap into two logical generations that are collected independently.63 This strategy leverages the weak generational hypothesis, which posits that younger objects are more likely to be garbage.61 By focusing collection efforts on the young generation, ZGC aims to improve overall memory efficiency and reduce the frequency of more resource-intensive old generation collections.
JEP 387: Elastic Metaspace 4 was delivered as a feature in JDK 16. While the provided snippets do not offer detailed information about this JEP, the name itself suggests improvements in the dynamic management of Metaspace. Metaspace is the area of native memory used by the HotSpot JVM to store class metadata. An "elastic" Metaspace likely indicates enhancements that allow this memory region to more dynamically resize based on the application's needs, potentially reducing the overall memory footprint by more efficiently managing the allocation and deallocation of memory used for class-related data. This could involve a greater ability for Metaspace to return unused memory to the operating system, further optimizing memory utilization.
JEP 377: ZGC: A Scalable Low-Latency Garbage Collector (Production) 65 marked a significant step in the maturity of the Z Garbage Collector. Introduced as an experimental feature in JDK 11 (JEP 333), JEP 377 promoted ZGC to a production-ready feature in JDK 15.70 ZGC was designed with the primary goals of achieving low GC pause times (not exceeding 10ms), handling heaps ranging from a few hundred megabytes to multiple terabytes, and limiting the reduction in application throughput compared to using G1.73 As a concurrent, single-generation, region-based, NUMA-aware, and compacting collector 73, ZGC's design inherently involves efficient memory management to support these goals, particularly when dealing with very large heaps. Features like concurrent class unloading and the ability to uncommit unused memory (introduced in JEP 351 69) further contribute to ZGC's memory efficiency.
JEP 351: ZGC: Uncommit Unused Memory (Experimental) 74 specifically addresses the memory footprint of applications using the Z Garbage Collector. This experimental JEP enhances ZGC to return unused heap memory to the operating system.74 This is particularly beneficial for applications running in environments where memory resources are a concern, such as containerized deployments or scenarios where applications might experience periods of low activity.74 ZGC achieves this by identifying ZPages (heap regions) that have been unused for a specified period and evicting them from its page cache, subsequently uncommitting the memory associated with these pages.74 This uncommit capability is enabled by default but can be controlled using the -XX:ZUncommitDelay=<seconds> option to configure the timeout for unused memory.74 Importantly, ZGC will never uncommit memory such that the heap size falls below the minimum specified size (-Xms).74
JEP 254: Compact Strings 79 introduced a more space-efficient internal representation for java.lang.String objects and related classes like StringBuilder and StringBuffer.79 Prior to this JEP, strings in Java used a char array with two bytes per character (UTF-16 encoding). However, many String objects, especially in certain applications, contain only Latin-1 characters, which can be represented using a single byte. JEP 254 changes the internal storage to a byte array plus an encoding-flag field.81 The encoding flag indicates whether the characters are stored as ISO-8859-1/Latin-1 (one byte per character) or as UTF-16 (two bytes per character), based on the string's content.79 This optimization can reduce the memory required for String objects containing only single-byte characters by up to 50% 79, leading to significant overall memory savings in applications with a large number of strings. This was primarily an implementation change without any modifications to public APIs.81
JEP 192: String Deduplication in G1 84 focuses on reducing memory consumption by addressing the common issue of duplicate String objects on the Java heap. In many large-scale Java applications, a significant portion of the heap's live data set is occupied by String objects, and a substantial fraction of these are often duplicates (i.e., they have the same content).84 JEP 192 enhances the G1 garbage collector to automatically and continuously deduplicate these redundant String instances.85 The deduplication process involves making duplicate String objects share the same underlying char array.85 This is achieved by a background deduplication thread that processes a queue of candidate String objects identified during garbage collection. A hashtable is used to track unique character arrays, and when a duplicate is found, the String object is updated to point to the existing array, allowing the original array to be garbage collected.85 This feature, which is only implemented for the G1 garbage collector, is expected to result in an average heap reduction of around 10%.85
JEP 149: Reduce Core-Library Memory Usage 88 represents a broad effort to decrease the dynamic memory used by the core-library classes within the JDK without negatively impacting performance.89 This JEP explored various potential improvements to library classes and their native implementations to reduce heap usage.89 Some of the candidate techniques investigated included reducing the size of java.lang.Class objects by moving infrequently used fields (related to reflection and annotations) to separate helper classes 89, and potentially disabling the reflection compiler, which generates bytecode for method calls to improve performance but also contributes to the dynamic footprint.89 Other areas of investigation included tuning the initial sizes of internal tables, caches, and buffers to minimize memory wastage.89 The overall goal was to identify and implement memory reductions that are effective, maintainable, and have minimal performance impact on typical workloads.89
JEP 147: Reduce Class Metadata Footprint 92 specifically aimed at decreasing the memory footprint of class metadata within the HotSpot JVM, with a particular focus on improving performance on small devices.95 This JEP drew inspiration from memory-reduction techniques used in CVM, an embedded JVM.95 Some of the strategies explored included keeping rarely-used fields out of the core class, method, and field data structures, using the smallest possible data types for struct fields, encoding certain fields to fit into smaller types, carefully grouping fields of similar sizes to avoid unnecessary padding, using 16-bit offsets instead of 32-bit pointers for some data, and employing unions for groups of fields where only one is in use at a time.95 The goal of this JEP was to achieve a 25% reduction in the memory footprint of class, method, and field metadata (excluding bytecodes and interned strings) without causing more than a 1% regression in application startup and runtime performance.95
JEP 122: Remove the Permanent Generation 96 represents a significant architectural change in the HotSpot JVM's memory management. Prior to this JEP (delivered in Java 8), class metadata, interned Strings, and class static variables were stored in a dedicated portion of the heap called the permanent generation. JEP 122 proposed and implemented the removal of this permanent generation, moving class metadata to native memory and interned Strings and class statics to the Java heap.101 This change simplified memory management by eliminating the need to tune the size of the permanent generation, a common source of confusion and performance issues. By managing class metadata in native memory and leveraging the existing garbage collection mechanisms for interned Strings and statics, this JEP indirectly impacted memory usage patterns and paved the way for further memory-related optimizations in subsequent JDK releases.
JEP Number | Title | JDK Release | Key Memory Improvement Mechanisms | |
450 | Compact Object Headers (Experimental) | 24 | Reduces the size of object headers from 96-128 bits to 64 bits, decreasing heap size and improving density. | |
439 | Generational ZGC | 21 | Extends ZGC to use generational garbage collection, focusing on young objects to reduce memory overhead. | |
387 | Elastic Metaspace | 16 | Improves dynamic resizing and management of Metaspace for more efficient memory usage of class metadata. | |
377 | ZGC: A Scalable Low-Latency Garbage Collector (Production) | 15 | Design inherently involves efficient memory management for large heaps with low latency. | |
351 | ZGC: Uncommit Unused Memory (Experimental) | 13 | Allows ZGC to return unused heap memory to the operating system, reducing memory footprint. | |
254 | Compact Strings | 9 | Changes internal string representation to use byte arrays for Latin-1 characters, reducing string memory usage. | |
192 | String Deduplication in G1 | 8u20 | Enhances G1 GC to automatically deduplicate duplicate String instances, reducing heap live data. | |
149 | Reduce Core-Library Memory Usage | 8 | Aims to reduce dynamic memory usage across core library classes through various techniques. | |
147 | Reduce Class Metadata Footprint | 8 | Reduces the memory footprint of HotSpot's class metadata, especially beneficial for small devices. | |
122 | Remove the Permanent Generation | 8 | Removes the permanent generation, simplifying memory management and moving metadata to native memory and heap. |
V. JDK Enhancement Proposals Targeting Both CPU and Memory Usage
Several JEPs have been identified that aim to improve both the CPU and memory usage of Java applications, often through mechanisms that have a dual impact on resource efficiency.
JEP 483: Ahead-of-Time Class Loading & Linking 55 focuses on optimizing the application startup process. By making the classes of an application instantly available in a loaded and linked state when the HotSpot JVM starts, this JEP aims to improve startup time, which directly translates to reduced CPU cycles spent during initialization.106 This is achieved by monitoring an application during one run and storing the loaded and linked forms of all classes in a cache for use in subsequent runs.106 Furthermore, by pre-loading and linking classes, the memory footprint at runtime can potentially be reduced, as these classes are readily available and do not need to be loaded and linked on demand for each application run.108 The AOT cache itself does consume memory, but the trade-off results in faster startup and potentially better overall runtime performance due to reduced overhead.
JEP 444: Virtual Threads 5 introduces a new concurrency model to the Java platform. Virtual threads are lightweight threads that dramatically reduce the effort of writing, maintaining, and observing high-throughput concurrent applications.62 By allowing a much larger number of concurrent tasks to be handled with significantly less overhead compared to traditional platform threads, virtual threads can lead to improved CPU utilization, especially in I/O-bound workloads where many threads might otherwise be blocked waiting for operations to complete.112 Moreover, virtual threads typically have a much smaller memory footprint compared to platform threads, as they do not require a dedicated OS thread and have smaller stack sizes.114 This can result in substantial memory savings for applications that need to manage a large number of concurrent operations.
JEP 431: Sequenced Collections While the provided snippets do not explicitly link this JEP to direct CPU or memory performance enhancements, research indicates that this JEP introduces new interfaces to represent collections with a defined encounter order. This can indirectly impact performance in scenarios where the order of elements in a collection is semantically important and needs to be consistently maintained, potentially avoiding the need for custom ordering logic that could consume additional CPU cycles. However, the primary focus of this JEP is on API design and functionality rather than direct performance optimization.
JEP 350: Dynamic CDS Archives 116 extends the Class-Data Sharing (CDS) feature to allow for the dynamic archiving of application classes at the end of Java application execution. This builds upon the existing AppCDS (Application Class-Data Sharing) functionality. The dynamically generated archive, created on top of the default system archive, contains all loaded application classes and library classes not present in the base archive.116 In subsequent runs of the same application, this dynamic CDS archive can be memory-mapped, making the application's classes instantly available in a loaded and linked state.116 This leads to improved startup time, reducing the CPU cycles spent on class loading and linking. Furthermore, by sharing this archive across multiple JVM instances running the same application, the overall memory footprint can be reduced, as the class metadata is shared rather than being loaded separately by each instance.117
JEP 310: Application Class-Data Sharing 42 enhanced the existing Class-Data Sharing (CDS) feature by allowing application classes to be placed in the shared archive.122 When the JVM starts, it loads classes into memory as a preliminary step, which can introduce latency, especially for applications with many classes. AppCDS addresses this by allowing application classes to be pre-processed and stored in a shared archive.119 This archive can then be memory-mapped at JVM startup, making the application's classes instantly available in a loaded and linked state.122 This significantly reduces the application startup time, leading to lower CPU consumption during initialization. Moreover, when multiple JVMs on the same host use the same shared archive, the memory footprint can be reduced, as the class data is shared across processes.119
JEP 197: Segmented Code Cache 47 introduced a significant change to the organization of the JVM's code cache. Instead of a single monolithic code heap, the code cache is divided into distinct segments, each containing compiled code of a particular type (non-method, profiled, and non-profiled code).124 This segmentation aims to improve performance (CPU) by enabling shorter sweep times during code cache management, potentially leading to better instruction cache locality and reduced fragmentation of highly optimized code.124 Additionally, by providing better control over the JVM's memory footprint through the ability to manage the sizes of these different code heap segments, this JEP also contributes to improved memory usage.124
JEP 158: Unified JVM Logging 126 introduced a common logging system for all components of the JVM. While the primary goal of this JEP was to provide a more flexible and configurable logging framework, efficient logging can indirectly contribute to better CPU and memory usage. By allowing for more granular control over the logging output and the ability to easily diagnose performance issues, developers and system administrators can more quickly identify and address areas of inefficiency in their applications, potentially leading to optimizations that reduce both CPU and memory consumption.
JEP Number | Title | JDK Release | Key CPU Improvement Mechanisms | Key Memory Improvement Mechanisms | |
483 | Ahead-of-Time Class Loading & Linking | 24 | Improves startup time by pre-loading and linking classes. | Reduces memory footprint by sharing loaded and linked class data in a cache. | |
444 | Virtual Threads | 21 | Improves CPU utilization for I/O-bound tasks with lightweight threads. | Reduces memory footprint per concurrent task compared to platform threads. | |
350 | Dynamic CDS Archives | 13 | Improves startup time by dynamically archiving and sharing application classes. | Reduces memory footprint by sharing the archive across JVM instances. | |
310 | Application Class-Data Sharing | 10 | Improves startup time by memory-mapping shared application class data. | Reduces memory usage by sharing class data across multiple JVMs. | |
197 | Segmented Code Cache | 9 | Improves performance through better code locality and reduced fragmentation. | Provides better control over JVM memory footprint by managing code heap segments. | |
158 | Unified JVM Logging | 9 | Indirectly improves efficiency by facilitating quicker identification and resolution of performance bottlenecks. | Indirectly improves efficiency by facilitating quicker identification and resolution of memory-related issues. |
VI. Conclusion: Key Trends and Future Directions in Java Performance Optimization through JEPs
The analysis of these JDK Enhancement Proposals reveals several key trends in the ongoing efforts to optimize the performance and resource efficiency of the Java platform. A significant emphasis continues to be placed on refining the garbage collection mechanisms, with advancements seen across different collectors like G1 and ZGC. These improvements target both reducing CPU overhead through concurrent processing and parallelization, as well as minimizing memory footprint through techniques like generational collection and the ability to uncommit unused memory.
Memory footprint reduction remains a critical area of focus, with JEPs like Compact Object Headers and Compact Strings directly aiming to decrease the memory consumed by fundamental data structures. The Class-Data Sharing (CDS) feature, along with its extensions for applications and dynamic archiving, demonstrates a sustained effort to improve both startup time and memory usage by enabling the sharing of class metadata across JVM instances.
The introduction of virtual threads in JEP 444 represents a notable advancement in how Java handles concurrency, offering a more resource-efficient model for managing large numbers of concurrent tasks, particularly in I/O-bound applications. This signifies an adaptation to modern application architectures and the demands of high-throughput systems.
Furthermore, the ongoing optimization of specific runtime aspects, such as reflection and cryptography, highlights a comprehensive strategy for improving performance across various facets of Java application execution. The progression of features from experimental to production-ready status, as seen with ZGC and Shenandoah (mentioned in 67), indicates a mature and iterative approach to platform enhancement.
Looking towards the future, the "Lookahead" section in snippets like 5 hints at continued evolution in areas such as scoped values and structured concurrency, which could further impact resource efficiency. Moreover, ongoing projects like Valhalla 52, with its aim to introduce value types, promise additional opportunities for enhancing both the performance and memory usage of Java applications by allowing for more efficient data representation and manipulation. The continuous stream of JEPs focused on these critical areas underscores the Java platform's commitment to remaining a high-performance and resource-efficient choice for a wide range of development needs.
| Feature | MQTT | Apache Kafka |
|---|---|---|
| Protocol Type | Lightweight messaging protocol | Distributed streaming platform |
| Designed For | IoT devices, constrained networks | High-throughput, large-scale data pipelines |
| Message Model | Pub/Sub | Pub/Sub + log-based storage |
| Persistence | Optional, often not used | Built-in, durable, distributed log |
| QoS Levels | 0 (at most once), 1 (at least once), 2 (exactly once) | At least once (default), exactly once (configurable) |
| Broker Complexity | Lightweight (e.g., Mosquitto) | Heavy (e.g., Kafka needs Zookeeper or KRaft) |
| Latency | Low | Low to medium, optimized for throughput |
| Storage | Minimal to none | Retains messages (configurable retention) |
| Scalability | Limited (single broker common) | Highly scalable and distributed |
| Security | TLS, username/password | TLS, SASL, ACLs, etc. |
| Typical Use Cases | IoT sensors, mobile devices, telemetry | Event sourcing, log aggregation, analytics pipelines |
Integrated Data Store (IDS)
Developed by Charles Bachman at General Electric, IDS introduced the network data model, one of the earliest database systems.
IBM Information Management System (IMS)
Released by IBM, IMS used a hierarchical model, organizing data in tree-like structures.
Edgar F. Codd’s Relational Model
Edgar Codd published a groundbreaking paper introducing the relational database model, laying the foundation for modern databases.
VSAM (Virtual Storage Access Method)
Introduced by IBM for mainframe systems, VSAM was a file access method that enhanced data retrieval and storage efficiency for large datasets.
IBM System R
IBM’s System R was one of the first relational database management systems (RDBMS), introducing SQL (Structured Query Language) to query relational data.
Adabas
Developed by Software AG, Adabas was a high-performance inverted list database optimized for large-scale transaction processing.
Oracle Database
Released by Oracle Corporation, this was the first commercial RDBMS based on Codd’s relational model.
IBM DB2
Built on System R, DB2 became a leading RDBMS for mainframes, widely adopted for robust transaction support.
SQL ANSI Standard
The American National Standards Institute (ANSI) standardized SQL, establishing it as the universal language for relational databases.
MySQL and PostgreSQL
These open-source RDBMS systems emerged: MySQL for its speed and simplicity, and PostgreSQL for its robustness and SQL standards compliance.
Redis
An in-memory data structure store, Redis functions as a database, cache, or message broker, known for its speed and versatility.
Google Spanner
A globally distributed RDBMS, Spanner combined relational consistency with NoSQL scalability, introducing synchronized global clocks for distributed transactions.