tivrfoa blog

Should you use Rust unwrap?

2025-11-22T11:10:00+00:00

@alexeiboukirev8357 Rust is a gift to the developers. Gifts are meant to be unwrapped.

An interesting discussion started on X about the Cloudflare outage, with some people linking Rust to it, eg:
The Cloudflare outage was caused by an unwrap()

This is like saying that a gun killed someone… It wasn’t the gun; it was the person who pulled the trigger.

The Cloudflare developers certainly know about the risks of unwrap. The bug was probably caused by a failed assumption: the config file will fit in memory.

And even if the developer didn’t know that unwrap can panic:
We can’t blame a tool that is well-documented just because someone misused it.

From: https://doc.rust-lang.org/std/option/enum.Option.html#method.unwrap

Returns the contained [`Some`] value, consuming the `self` value.

Because this function may panic, its use is generally discouraged.
Panics are meant for unrecoverable errors, and
[may abort the entire program][panic-abort].

Instead, prefer to use pattern matching and handle the [`None`]
case explicitly, or call [`unwrap_or`], [`unwrap_or_else`], or
[`unwrap_or_default`]. In functions returning `Option`, you can use
[the `?` (try) operator][try-option].

# Panics

Panics if the self value equals [`None`].

The argument that languages and standard libraries should avoid having constructs that enable bad patterns is actually valid, though.

So lets investigate:

Use cases for unwrap

It is better to read this post by Andrew Gallant (ripgrep author) first: Using unwrap() in Rust is Okay

It’s a great and comprehensive post.

note: read everything. Initially it seems that he only use unwrap for tests and documentation, but that is not the case.

Some important points:

If the value is always what the caller expects, then it follows that unwrap() and expect() will never result in a panic. If a panic does occur, then this generally corresponds to a violation of the expectations of the programmer. In other words, a runtime invariant was broken and it led to a bug.

This is starkly different from “don’t use unwrap() for error handling.” The key difference here is we expect errors to occur at some frequency, but we never expect a bug to occur. And when a bug does occur, we seek to remove the bug (or declare it as a problem that won’t be fixed).

Of the different ways to handle errors in Rust, this one is regarded as best practice:

One can handle errors as normal values, typically with Result. If an error bubbles all the way up to the main function, one might print the error to stderr and then abort.

One of the most important parts of this approach is the ability to attach additional context to error values as they are returned to the caller. The anyhow crate makes this effortless.

The use cases for unwrap would be:

tests
documentation
runtime invariants

The runtime invariant is what can cause more controversy.
One could ask: if it is guaranteed to have a value/return ok, why is it an Option or Result?

It’s a great question and it is well explained in So why not make all invariants compile-time invariants?

In essense, in many cases that will lead to much more complex code and lots of duplication.

“recoverable” vs “unrecoverable”

I’ve personally never found this particular conceptualization to be helpful. The problem, as I see it, is the ambiguity in determining whether a particular error is “recoverable” or not. What does it mean, exactly?

Using unwrap() in Rust is Okay

I don’t see ambiguity there, and it’s a very important distinction that has to be made.
Basically: can the program continue working in a valid state?

Until you find a way to recover, it is unrecoverable.

It will depend on the program and its use case.

Use of unwrap in important Rust crates

ripgrep

crates/core/haystack.rs

impl Haystack {
    pub(crate) fn path(&self) -> &Path {
        if self.strip_dot_prefix && self.dent.path().starts_with("./") {
            self.dent.path().strip_prefix("./").unwrap()
        } else {
            self.dent.path()
        }
    }

crates/core/main.rs

        if let Some(ref mut stats) = stats {
            *stats += search_result.stats().unwrap();
        }
        if matched && args.quit_after_match() {
            break;
        }

tokio

tokio/src/io/poll_evented.rs

    /// Deregisters the inner io from the registration and returns a Result containing the inner io.
    #[cfg(any(feature = "net", feature = "process"))]
    pub(crate) fn into_inner(mut self) -> io::Result<E> {
        let mut inner = self.io.take().unwrap(); // As io shouldn't ever be None, just unwrap here.
        self.registration.deregister(&mut inner)?;
        Ok(inner)
    }

serde

serde/src/private/de.rs

        fn deserialize_seq<V>(self, visitor: V) -> Result<V::Value, Self::Error>
        where
            V: Visitor<'de>,
        {
            let mut pair_visitor = PairVisitor(Some(self.0), Some(self.1), PhantomData);
            let pair = tri!(visitor.visit_seq(&mut pair_visitor));
            if pair_visitor.1.is_none() {
                Ok(pair)
            } else {
                let remaining = pair_visitor.size_hint().unwrap();
                // First argument is the number of elements in the data, second
                // argument is the number of elements expected by the Deserialize.
                Err(de::Error::invalid_length(2, &ExpectedInSeq(2 - remaining)))
            }
        }

serde_derive/src/internals/ctxt.rs

    /// Add one of Syn's parse errors./h
    pub fn syn_error(&self, err: syn::Error) {
        self.errors.borrow_mut().as_mut().unwrap().push(err);
    }

clap

https://github.com/clap-rs/clap/blob/8d87e70789ed7832aa93473779d248090712d66b/clap_builder/src/util/flat_map.rs#L172

    pub(crate) fn or_insert_with<F: FnOnce() -> V>(self, default: F) -> &'a mut V {
        match self {
            Entry::Occupied(entry) => &mut entry.v.values[entry.index],
            Entry::Vacant(entry) => {
                entry.v.keys.push(entry.key);
                entry.v.values.push(default());
                entry.v.values.last_mut().unwrap()
            }
        }
    }

https://github.com/clap-rs/clap/blob/8d87e70789ed7832aa93473779d248090712d66b/clap_builder/src/util/color.rs#L83

impl std::str::FromStr for ColorChoice {
    type Err = String;

    fn from_str(s: &str) -> Result<Self, Self::Err> {
        for variant in Self::value_variants() {
            if variant.to_possible_value().unwrap().matches(s, false) {
                return Ok(*variant);
            }
        }
        Err(format!("invalid variant: {s}"))
    }
}

reqwest

https://github.com/seanmonstar/reqwest/blob/a2aa5a34e48724be0c1089b0f5afe49b82ece30e/src/async_impl/h3_client/pool.rs#L63

    /// Forget the lock and return corresponding Key
    fn forget(mut self) -> Key {
        // Unwrap is safe because the Option can be None only after dropping the
        // lock
        self.0.take().unwrap().key
    }

Would it be better if Rust didn’t have unwrap?

From the valid use cases above, I think unwrap has its place.

Gemini 3 response:

This is one of the most debated philosophical questions in the Rust community.

The short answer is no. While unwrap is the source of many production crashes, removing it entirely would make the language significantly more frustrating to use and, paradoxically, could lead to messier code.

Here is an analysis of why unwrap exists, why it is dangerous, and why we still need it.

The Case Against `unwrap` (The “Why it’s bad” view)

If you are asking this, you have likely encountered a production crash caused by a lazy unwrap().

It abandons Rust’s safety guarantees: Rust’s type system is designed to force you to handle edge cases (Option and Result). unwrap is an explicit instruction to the compiler to ignore that safety and crash if things go wrong.
It’s abrupt: It triggers a panic, which unwinds the stack and terminates the thread (or the program). There is no opportunity for recovery.
It provides no context: When code crashes with called Result::unwrap() on an Err value, you have no idea why it failed, unlike expect("Failed to load config").

The Case For `unwrap` (Why we need it)

If Rust removed unwrap today, developers would immediately re-implement it. Here are the three scenarios where unwrap is actually the correct tool:

1. Prototyping and Learning

When learning the language or sketching out an algorithm, you don’t want to spend 50% of your time writing error handling boilerplate. You want to see if the logic works. unwrap allows you to defer error handling until the logic is proven.

2. Unit Tests

In a test environment, a panic is exactly what you want. If a setup step returns an error, the test should fail immediately.

#[test]
fn test_calculation() {
    // If this fails, the test fails. Perfect usage.
    let result = complex_calculation().unwrap();
    assert_eq!(result, 42);
}

3. Mathematical Invariants (The “I know better than the compiler” scenario)

Sometimes, you possess logic that the compiler cannot verify. If you are mathematically certain a value cannot be None or Err, handling the error is dead code that clutters the screen.

Example:

// We know "10" is a valid integer.
// Handling the error here would be verbose and pointless.
let number: u32 = "10".parse().unwrap();

In this case, unwrap acts as an assertion. You are telling the compiler: “Trust me. If I am wrong, I deserve to crash.”

The “Better” Alternatives

If you feel the urge to use unwrap, Rust provides alternatives that are almost always better in production code.

1. `expect()`

This is identical to unwrap, but allows you to provide a custom error message.

Bad: file.open().unwrap() (Crashes with generic error)
Better: file.open().expect("CRITICAL: Configuration file missing") (Crashes with context)

2. The `?` Operator (Propagation)

Instead of crashing, pass the problem to the caller.

Idiomatic: file.open()? This keeps the current function safe and lets the parent function decide how to handle the failure.

3. `unwrap_or` / `unwrap_or_else`

Provide a fallback value instead of crashing.

// If the env var is missing, don't crash; just use port 8080.
let port = std::env::var("PORT").unwrap_or("8080".to_string());

Summary Table: When to use what

Tool	Behavior	Best Use Case
`unwrap()`	Panic on failure	Quick scripts, prototypes, unit tests.
`expect()`	Panic with message	When you are 100% sure it won’t fail (Invariants).
`?`	Return error	Production code, libraries, standard logic.
`unwrap_or`	Return fallback	When a default value is acceptable.

Conclusion

It wouldn’t be better if Rust didn’t have unwrap, because panicking is sometimes the correct response. However, it is better if you use a linter (like Clippy) to forbid unwrap in your production codebase, forcing you to use expect or proper error propagation.

How could the Cloudflare bug have been avoided?

Memory preallocation

fn fetch_features(..., features: &mut Features)
		-> Result<(), (ErrorFlags, i32)> {
	...
	let (feature_values, _) = features
		.append_with_names(&self.config.feature_names)
		.unwrap();
}

Possible cases:

append_with_names really can’t fail: then it should not return an Option/Result. Callers should not have to think about it. If there is a bug, then it should panic inside append_with_names.
append_with_names can fail: callers should not use unwrap, and the bad case should be handled properly.
append_with_names can fail, but not at that point: then the assumption failed.

The bug happenned because a bad configuration file was sent to this code.

The root cause was creating and allowing this bad configuration file to spread.

Now the question is: should the Rust code handle that?

Maybe, that’s up to Cloudflare to decide.
As it’s part of a critical system that affects many other companies, it probably should be handled.
One option might be to accept only the features that fit in the pre-allocated memory, and log an error or send an alarm if the threshold is hit.

References

Cloudflare outage on November 18, 2025

Richard Feldman: The Cloudflare outage was caused by an unwrap()

Using unwrap() in Rust is Okay

ThePrimeTime - Another Internet outage???

Appendix A

Miguel Ojeda

Two points, if I may:

Whether a program is correct or not all depends on the requirements. For instance, a program like fn main() { panic!(); } may be correct if the requirement is that it always panics. For instance, for testing a testing framework.
Even if “normal” programs are expected to “not panic”, panics can still happen in fully correct programs. For instance, a hardware error that makes an assert fail, even if triggering that assert is impossible given Rust semantics.

In short: if a programmer uses a panic, that just means the program does not have a way to handle the error. Whether that is the right choice, and whether that is an actual bug or not if it happens, and whether abort or unwind should be used, etc., it all depends on the requirements.

# Appendix B

**Prompting Gemini 3: Show me some valid uses of unwrap in important Rust crates**

It is validating to see that even the "gods" of the Rust ecosystem use `unwrap()`. However, they use it very differently than a beginner does.

In high-quality crates, `unwrap()` is rarely used to say "I hope this works." It is used to say **"It is logically impossible for this to fail, and if it does, the world is broken."**

Here are four specific, valid patterns found in major crates.

-----

### 1\. The "Poisoned" Mutex (Standard Library)

This is arguably the most common `unwrap` in production Rust code.

**The Pattern:** Locking a Mutex.
**Where you see it:** Almost every multi-threaded crate (`std`, `tokio`, `actix`).

```rust
use std::sync::{Arc, Mutex};

let data = Arc::new(Mutex::new(0));
// 👇 This is considered idiomatic, valid Rust.
let mut lock = data.lock().unwrap(); 

Why it’s valid: Mutex::lock only returns an Err if the previous thread holding the lock panicked (this is called “poisoning”). If a thread crashed while modifying the data, the data is likely corrupted.

The logic: “If the data is corrupt, I cannot safely continue. Crash this thread too so we don’t spread the corruption.”

2. Static Initialization (Regex / Lazy Static)

The Pattern: compiling a hardcoded string into a complex object during startup. Where you see it: regex, once_cell, lazy_static.

use regex::Regex;

// We are hardcoding the pattern. We aren't accepting user input here.
let re = Regex::new(r"^\d{4}-\d{2}-\d{2}$").unwrap();

Why it’s valid: The input string is a string literal written by the programmer. It will not change at runtime.

The logic: “If I made a typo in this regex, the program is fundamentally broken. I want to know immediately (crash on startup) rather than handle a RegexError that can never happen once the code is correct.”

3. Writing to Memory Buffers (Serde / fmt)

The Pattern: Using the std::fmt::Write trait on a String. Where you see it: serde_json, log, and formatters.

The Write trait is designed for I/O (files, network), so it returns a Result because disks can be full or networks can disconnect. However, you can also implement Write for a standard String (RAM).

use std::fmt::Write;

let mut buffer = String::new();
// We are writing to RAM, not a disk.
// The only way this fails is OOM (Out of Memory), which aborts anyway.
write!(&mut buffer, "Hello {}", "World").unwrap();

Why it’s valid: Writing to a String in memory does not produce I/O errors. The Result is a formality of the Trait signature.

The logic: “I am writing to a variable in RAM. There is no disk to fail. Handling this error is dead code.”

4. Proven Invariants (Internal Logic)

The Pattern: You have checked a condition explicitly, so the subsequent operation is guaranteed. Where you see it: crossbeam, rand, Vec implementations.

Imagine you have a logic flow where you check a queue size.

// Simplified logic often found in channel implementations
if let Some(item) = queue.pop() {
    // We just proved 'item' exists.
    process(item);
} else {
    // ...
}

// But sometimes, the check is separated:
if queue.is_empty() {
    return;
}
// We checked is_empty() above.
// We know pop() returns Some().
let item = queue.pop().unwrap(); 

Why it’s valid: The compiler’s borrow checker is smart, but it isn’t omniscient. Sometimes you (the human) know the state of the system better than the compiler.

The logic: “I have structurally guaranteed this value exists. The None case is unreachable code.”

Summary: The Difference is Intent

When a beginner uses unwrap:

“I’m too lazy to handle the error right now.”

When a library author uses unwrap:

“I have proven that this error is impossible, or if it happens, the program is in a fatal state.”

Intrusive Data Structures

2025-07-27T11:10:00+00:00

I didn’t know about intrusive data structures, then I saw it mentioned in this X thread:

CedarDB on why they picked C++ over Rust

Let’s learn about it.

What is an Intrusive Data Structure?

A simple linked list in C:

typedef struct Node {
    int data;
    struct Node *next;
} Node;

An intrusive linked list:

typedef struct IntrusiveLink {
    struct IntrusiveLink *next;
} IntrusiveLink;

Where is the data?! =)

Here it is:

// Our data structure that we want to put into the list
// It *must* contain an IntrusiveLink member.
typedef struct MyIntData {
    int value;           // The actual data we care about
    IntrusiveLink node;  // The embedded list hook
} MyIntData;

How it works?

The real magic is in this macro:

// ptr: pointer to the member (e.g., &myIntDataInstance->node)
// type: the type of the containing struct (e.g., MyIntData)
// member: the name of the member (e.g., node)
#define container_of(ptr, type, member) ({ \
    const typeof( ((type *)0)->member ) *__mptr = (ptr); \
    (type *)( (char *)__mptr - offsetof(type,member) );})

With this macro, you can go to the struct containing the intrusive node (container_of) by passing the node pointer and the name of the field, eg:

MyIntData* data_item = container_of(current_link, MyIntData, node);
printf("%d -> ", data_item->value);

From: Crate intrusive_collections

The main difference between an intrusive collection and a normal one is that while normal collections allocate memory behind your back to keep track of a set of values, intrusive collections never allocate memory themselves and instead keep track of a set of objects. Such collections are called intrusive because they requires explicit support in objects to allow them to be inserted into a collection.

https://docs.rs/intrusive-collections/latest/intrusive_collections/

Amanieu:

The main use case for intrusive collections is to allow an object to be a member of multiple collections simultaneously. Say you are writing a game and you have a set of objects. You will have one main list of all objects, and a separate list of objects that are “owned” by a player. If a player dies, all objects owned by him need to be removed. This can be implemented by making objects belong to two lists at once.

https://www.reddit.com/r/rust/comments/46yubp/comment/d090zxg/

From: Data Structures in the Linux Kernel - Doubly linked list

Usually a linked list structure contains a pointer to the item. The implementation of linked list in Linux kernel does not. So the main question is - where does the list store the data? The actual implementation of linked list in the kernel is - Intrusive list. An intrusive linked list does not contain data in its nodes - A node just contains pointers to the next and previous node and list nodes part of the data that are added to the list. This makes the data structure generic, so it does not care about entry data type anymore.

https://github.com/0xAX/linux-insides/blob/master/DataStructures/linux-datastructures-1.md

This link is great and it has a clear explanation and examples. Take a look! =)

Gemini Definition

Intrusive data structures represent a specialized class of linked data structures where the organizational metadata, such as next and prev pointers for a list or child/parent pointers for a tree, is embedded directly within the user’s data object. This contrasts sharply with traditional data structures, which typically reside in a separate, externally managed node object. The design paradigm of intrusive structures necessitates that the data object itself “cooperates” with the data structure by containing the necessary “hooks” or linking fields.

A fundamental characteristic of this approach is that intrusive containers do not store copies of user-provided values; instead, they directly manage and link the original objects, with the linking information intrinsically part of those objects. This means that the nodes of the data structure contain only the metadata (i.e., pointers to other nodes) but not the data itself. The programmer embeds these nodes into their data, thereby avoiding additional pointer indirections. This paradigm suggests a profound shift in design philosophy: instead of a container owning and managing external nodes that point to data, the data itself becomes “aware” of its participation in a data structure. This “self-awareness” (or “intrusion”) is the root cause of both the primary benefits, such as memory efficiency and cache locality, and the primary drawbacks, including the requirement for object modification and more complex lifetime management. This approach points to a tighter coupling between the data and its organizational structure, which is a hallmark of low-level systems programming and high-performance computing.

More here: https://g.co/gemini/share/2ed8ecea357e

Who is responsible for freeing the object?

Prompt: In an intrusive data structure, who should be responsible for freeing the actual object?

what if I just want to free the list, but still want to use the tree? Are there use cases for that?

Gemini:

Excellent questions. They get to the very heart of why and how intrusive data structures are used.

Who is Responsible for Freeing the Object?

The short answer is: the code that allocated the object is responsible for freeing it.

Intrusive containers (like the list or tree) do not own the objects they contain. They are simply mechanisms for organizing pointers that live inside the objects. Think of the containers as different “views” or “indexes” into a set of data that you manage separately.

This is a fundamental shift from non-intrusive (e.g., std::list in C++) or standard library containers, where the container allocates its own nodes and is therefore responsible for freeing them.

Here are the common ownership patterns:

A “Master” Container: One of your intrusive structures is designated as the “owner.” In the example code, we decided the linked list was the master. We used the list to iterate through and free every single object. This is a simple and common approach when the objects’ lifetime is tied to one primary organization.
External Logic: The application’s main logic owns the objects, perhaps in a simple array or by some other means. The intrusive list and tree are just secondary ways to access them. When the application decides an object is no longer needed, it’s responsible for removing it from all intrusive containers it’s a part of and then freeing it.
Reference Counting: For maximum flexibility, you could add a reference count to MyIntData. When you add an object to a container, you increment the count. When you remove it, you decrement the count. The object is only freed when its count reaches zero. This is the most robust method but adds complexity.

Can You Free the List but Keep the Tree?

Yes, absolutely. This is one of the most powerful use cases for intrusive data structures.

However, you need to be precise about what “free the list” means. It does not mean iterating through the list and calling free() on each object, because that would deallocate the objects that the tree also points to, leaving you with a tree full of dangling pointers.

Instead, “freeing the list” means dismantling the list structure, leaving the objects intact. You would simply traverse the list and reset the next pointers, then set the list’s head pointer to NULL. The MyIntData objects themselves would remain allocated and valid, and the tree structure would be completely unaffected.

A function to do this would look like this:

// Dismantles the list structure WITHOUT freeing the underlying objects.
void intrusive_dismantle_list(IntrusiveLink** headRef) {
    IntrusiveLink* current_link = *headRef;
    while (current_link != NULL) {
        IntrusiveLink* next_link = current_link->next;
        current_link->next = NULL; // Unlink the node
        current_link = next_link;
    }
    *headRef = NULL; // The list is now empty
}

Use Cases for This Behavior

This decoupling of the container’s lifetime from the data’s lifetime is extremely useful.

Temporary Re-ordering: Imagine your tree is the primary, canonical storage for your objects, sorted by value. For a specific task, you need to process these objects in the reverse order of their creation. You can temporarily build an intrusive list for this task, process it, and then simply dismantle the list when you’re done, leaving your tree untouched.
Filtering and Subsets: You have thousands of objects in your tree. You want to run an operation on a small subset that meets a certain criterion (e.g., value > 100). You can iterate through your tree and add only the matching objects to a temporary intrusive list. You then operate on that much smaller list, which is very efficient. Once finished, you dismantle the list.
State Management (e.g., Task Schedulers): An object might represent a task. It could live in a main tree of all known tasks. When a task is ready to run, it’s added to an intrusive run_queue list. When it’s waiting for I/O, it’s moved to a wait_queue list. The object itself is never created or destroyed; it’s just linked into different lists that represent its current state. This is a core pattern in OS kernel development.

Why use them?

From: Boost - Intrusive and non-intrusive containers

Intrusive containers have some important advantages:

Operating with intrusive containers doesn’t invoke any memory management at all. The time and size overhead associated with dynamic memory can be minimized.
Iterating an Intrusive container needs less memory accesses than the semantically equivalent container of pointers: iteration is faster.
Intrusive containers offer better exception guarantees than non-intrusive containers. In some situations intrusive containers offer a no-throw guarantee that can’t be achieved with non-intrusive containers.
The computation of an iterator to an element from a pointer or reference to that element is a constant time operation (computing the position of T* in a std::list has linear complexity).
Intrusive containers offer predictability when inserting and erasing objects since no memory management is done with intrusive containers. Memory management usually is not a predictable operation so complexity guarantees from non-intrusive containers are looser than the guarantees offered by intrusive containers.

https://www.boost.org/doc/libs/1_60_0/doc/html/intrusive/intrusive_vs_nontrusive.html

Performance comparison between Intrusive and Non-intrusive containers: https://www.boost.org/doc/libs/1_60_0/doc/html/intrusive/performance.html

From mgaunard

https://news.ycombinator.com/item?id=43680655

There are several advantages to intrusive node-based data structures…:

the same object can move between containers with no allocation and no need for a dedicated complex API
the same object can be part of multiple containers at once; particularly useful for intrusive binary trees, for indexing data with different criteria.
the container can be fully polymorphic, no need for all elements to be the same dynamic type.
no need for complex allocators, you can just store the objects as you see fit.

Gemini:

The core motivations for their use are multifaceted and directly tied to performance optimization. They aim to avoid costly pointer indirections, which can incur run-time costs on every data read. Furthermore, they minimize dynamic memory allocations, which can be computationally expensive and unpredictable. A significant advantage is their ability to facilitate the inclusion of a single data object in multiple distinct data structures simultaneously. For example, an element might be part of several search trees and a priority queue, allowing efficient retrieval in different orders.

Examples

C

By Gemini:

#include 
#include  // For malloc, free
#include  // For offsetof

// Define the "list hook" that will be embedded in our data structures
// For a singly linked list, we only need a 'next' pointer.
typedef struct IntrusiveLink {
    struct IntrusiveLink *next;
} IntrusiveLink;

// Our data structure that we want to put into the list
// It *must* contain an IntrusiveLink member.
typedef struct MyIntData {
    int value;           // The actual data we care about
    IntrusiveLink node;  // The embedded list hook
} MyIntData;

MyIntData* newIntData(int value) {
    MyIntData* data = (MyIntData*)malloc(sizeof(MyIntData));
    data->value = value;
    data->node.next = NULL; // Initialize its embedded link
    return data;
}

// A macro to get the containing structure from a pointer to its member
// This is crucial for intrusive lists.
// ptr: pointer to the member (e.g., &myIntDataInstance->node)
// type: the type of the containing struct (e.g., MyIntData)
// member: the name of the member (e.g., node)
#define container_of(ptr, type, member) ({ \
    const typeof( ((type *)0)->member ) *__mptr = (ptr); \
    (type *)( (char *)__mptr - offsetof(type,member) );})

// Function to insert a new data structure at the beginning of the intrusive list
// headRef: A pointer to the head of the list (which is an IntrusiveLink*)
// new_data_ptr: A pointer to the MyIntData instance to add
void intrusive_insert_at_beginning(IntrusiveLink** headRef, MyIntData* new_data_ptr) {
    // The new node's 'next' points to the current head
    new_data_ptr->node.next = *headRef;
    // The head is updated to point to the new node's embedded link
    *headRef = &(new_data_ptr->node);
}

// Function to print the intrusive linked list
void intrusive_print_list(IntrusiveLink* head) {
    IntrusiveLink* current_link = head;
    printf("Intrusive Linked List: ");
    while (current_link != NULL) {
        // Use container_of to get the MyIntData structure from its embedded link
        MyIntData* data_item = container_of(current_link, MyIntData, node);
        printf("%d -> ", data_item->value);
        current_link = current_link->next;
    }
    printf("NULL\n");
}

// Function to free the memory allocated for the intrusive linked list
// Note: This frees the MyIntData objects themselves, not just "nodes".
void intrusive_free_list(IntrusiveLink** headRef) {
    IntrusiveLink* current_link = *headRef;
    IntrusiveLink* next_link;
    while (current_link != NULL) {
        next_link = current_link->next;
        // Get the containing data structure
        MyIntData* data_item = container_of(current_link, MyIntData, node);
        free(data_item); // Free the entire MyIntData object
        current_link = next_link;
    }
    *headRef = NULL;
    printf("Intrusive List freed.\n");
}

int main() {
    // The head of our intrusive list. It points to an IntrusiveLink
    // embedded within a MyIntData structure.
    IntrusiveLink* head = NULL;

    MyIntData* data1 = newIntData(10);
    MyIntData* data2 = newIntData(20);
    MyIntData* data3 = newIntData(30);
    MyIntData* data4 = newIntData(40);

    // Insert elements into the intrusive list
    // We pass the address of the embedded 'node' member
    intrusive_insert_at_beginning(&head, data1); // List: 10 -> NULL
    intrusive_insert_at_beginning(&head, data2); // List: 20 -> 10 -> NULL
    intrusive_insert_at_beginning(&head, data3); // List: 30 -> 20 -> 10 -> NULL
    intrusive_insert_at_beginning(&head, data4); // List: 40 -> 30 -> 20 -> 10 -> NULL

    // Print the list
    intrusive_print_list(head);

    // Add another element
    MyIntData* data5 = (MyIntData*)malloc(sizeof(MyIntData));
    data5->value = 5;
    data5->node.next = NULL;
    intrusive_insert_at_beginning(&head, data5);
    intrusive_print_list(head);

    // Free the allocated memory
    intrusive_free_list(&head);

    // Try to print after freeing (should show NULL or be empty)
    intrusive_print_list(head);

    return 0;
}

C Generic List

It shows how different structures can use the same intrusive link definition.

Here we have Person and IntegerWrapper:

typedef struct Person {
    IntrusiveNode node;
    char *name;
    int age;
} Person;

typedef struct IntegerWrapper {
    IntrusiveNode node;
    int value;
} IntegerWrapper;

By Gemini:

#include 
#include 
#include 
#include  // <-- Add this header for offsetof


// Define a generic intrusive node structure.
// Any structure that wants to be part of an intrusive list will embed this.
typedef struct IntrusiveNode {
    struct IntrusiveNode *next;
} IntrusiveNode;

// The LinkedList now just holds a pointer to the head of the intrusive nodes.
typedef struct IntrusiveLinkedList {
    IntrusiveNode *head;
} IntrusiveLinkedList;

void initIntrusiveList(IntrusiveLinkedList *list) {
    list->head = NULL;
}

// Function to insert a node at the end of the intrusive list.
// The 'node_to_insert' is already part of the data structure.
void intrusiveInsertEnd(IntrusiveLinkedList *list, IntrusiveNode *node_to_insert) {
    node_to_insert->next = NULL; // Ensure the new node points to NULL initially

    if (list->head == NULL) {
        list->head = node_to_insert;
        return;
    }

    IntrusiveNode *temp = list->head;
    while (temp->next != NULL) {
        temp = temp->next;
    }
    temp->next = node_to_insert;
}

// Function to print the intrusive list.
// Requires a print function that can cast the IntrusiveNode* back to the original struct.
void printIntrusiveList(IntrusiveLinkedList *list, void (*printFunc)(IntrusiveNode *)) {
    IntrusiveNode *temp = list->head;
    while (temp != NULL) {
        printFunc(temp);
        printf(", ");
        temp = temp->next;
    }
    printf("\n");
}

// The free function for an intrusive list typically does nothing, as the memory
// for the nodes is managed by the data structures themselves.
// If the data structures themselves allocate internal memory (like char* for name),
// that should be freed by the caller or a specific free function for the data type.
void freeIntrusiveList(IntrusiveLinkedList *list) {
    // In an intrusive list, we don't free the nodes here,
    // as they are part of larger allocated structures.
    // The responsibility of freeing the memory lies with the owner of the data structures.
    list->head = NULL; // Just clear the head of the list
}

// Example data structure: Person, now containing the intrusive node.
typedef struct Person {
    IntrusiveNode node; // Embedded intrusive node
    char *name;
    int age;
} Person;

// Helper to get the containing Person struct from the IntrusiveNode
// This is a common pattern for intrusive lists, often implemented with offsetof
// For simplicity, here we'll assume a direct cast is safe for this example,
// but for more complex scenarios, offsetof and container_of macro are preferred.
#define container_of(ptr, type, member) ({ \
    const typeof( ((type *)0)->member ) *__mptr = (ptr); \
    (type *)( (char *)__mptr - offsetof(type,member) );})


void printPerson(IntrusiveNode *node) {
    // Cast the IntrusiveNode back to the containing Person struct
    Person *p = container_of(node, Person, node);
    printf("%s is %d years old", p->name, p->age);
}

// Example data structure: IntegerWrapper, for demonstrating integers
typedef struct IntegerWrapper {
    IntrusiveNode node;
    int value;
} IntegerWrapper;

void printInt(IntrusiveNode *node) {
    IntegerWrapper *iw = container_of(node, IntegerWrapper, node);
    printf("%d ", iw->value);
}


void testIntList() {
    IntrusiveLinkedList list;
    initIntrusiveList(&list);

    // Allocate IntegerWrapper structs on the heap or stack
    IntegerWrapper val1 = {.value = 10};
    IntegerWrapper val2 = {.value = 20};
    IntegerWrapper val3 = {.value = 30};
    IntegerWrapper val4 = {.value = 40};

    intrusiveInsertEnd(&list, &val1.node);
    intrusiveInsertEnd(&list, &val2.node);
    intrusiveInsertEnd(&list, &val3.node);
    intrusiveInsertEnd(&list, &val4.node);

    printIntrusiveList(&list, printInt);

    // For stack-allocated data, no explicit free is needed here.
    // If allocated on heap, they would need to be freed by the caller.
    freeIntrusiveList(&list); // This just clears the list head, doesn't free nodes
}

void testPersonList() {
    IntrusiveLinkedList list;
    initIntrusiveList(&list);

    // Allocate Person structs on the heap to demonstrate memory management
    Person *p1 = (Person *)malloc(sizeof(Person));
    p1->name = strdup("Marco"); // strdup allocates memory for the string
    p1->age = 22;

    Person *p2 = (Person *)malloc(sizeof(Person));
    p2->name = strdup("Mary");
    p2->age = 20;

    intrusiveInsertEnd(&list, &p1->node);
    intrusiveInsertEnd(&list, &p2->node);

    printIntrusiveList(&list, printPerson);

    // When dealing with heap-allocated intrusive nodes, you must manually free them
    // along with any dynamically allocated members (like 'name' here).
    // This often involves iterating the list and freeing each container.
    IntrusiveNode *current = list.head;
    IntrusiveNode *next;
    while (current != NULL) {
        next = current->next;
        Person *p = container_of(current, Person, node);
        free(p->name); // Free the dynamically allocated name
        free(p);        // Free the Person struct itself
        current = next;
    }

    freeIntrusiveList(&list); // Clears the list head
}

int main() {
    testIntList();
    printf("\n");
    testPersonList();

    return 0;
}

C Intrusive List and Tree

#include 
#include  // For malloc, free
#include  // For offsetof

// A macro to get the containing structure from a pointer to its member
// This is crucial for intrusive data structures.
// ptr: pointer to the member (e.g., &myIntDataInstance->node)
// type: the type of the containing struct (e.g., MyIntData)
// member: the name of the member (e.g., node)
#define container_of(ptr, type, member) ({ \
    const typeof( ((type *)0)->member ) *__mptr = (ptr); \
    (type *)( (char *)__mptr - offsetof(type,member) );})

//====================================================================
// Intrusive Linked List Structures and Functions
//====================================================================

// Define the "list hook" that will be embedded in our data structures
typedef struct IntrusiveLink {
    struct IntrusiveLink *next;
} IntrusiveLink;


//====================================================================
// Intrusive Binary Tree Structures and Functions
//====================================================================

// Define the "tree hook" for an intrusive binary tree
typedef struct IntrusiveTreeNode {
    struct IntrusiveTreeNode *left;
    struct IntrusiveTreeNode *right;
} IntrusiveTreeNode;


//====================================================================
// Our Main Data Structure with Multiple Intrusive Hooks
//====================================================================

// Our data structure now contains hooks for both a list and a tree.
typedef struct MyIntData {
    int value;
    IntrusiveLink node;          // The embedded list hook
    IntrusiveTreeNode tree_node; // The embedded binary tree hook
} MyIntData;

// Factory function to create and initialize a new data object
MyIntData* newIntData(int value) {
    MyIntData* data = (MyIntData*)malloc(sizeof(MyIntData));
    if (!data) {
        perror("Failed to allocate memory for MyIntData");
        exit(EXIT_FAILURE);
    }
    data->value = value;
    // Initialize both intrusive hooks
    data->node.next = NULL;
    data->tree_node.left = NULL;
    data->tree_node.right = NULL;
    return data;
}

//====================================================================
// Functions for Managing the Intrusive List
//====================================================================

// Function to insert a new data structure at the beginning of the intrusive list
void intrusive_insert_at_beginning(IntrusiveLink** headRef, MyIntData* new_data_ptr) {
    new_data_ptr->node.next = *headRef;
    *headRef = &(new_data_ptr->node);
}

// Function to print the intrusive linked list
void intrusive_print_list(IntrusiveLink* head) {
    IntrusiveLink* current_link = head;
    printf("Intrusive Linked List (insertion order): ");
    while (current_link != NULL) {
        MyIntData* data_item = container_of(current_link, MyIntData, node);
        printf("%d -> ", data_item->value);
        current_link = current_link->next;
    }
    printf("NULL\n");
}

// Function to free the memory allocated for the intrusive linked list
void intrusive_free_list(IntrusiveLink** headRef) {
    IntrusiveLink* current_link = *headRef;
    IntrusiveLink* next_link;
    while (current_link != NULL) {
        next_link = current_link->next;
        MyIntData* data_item = container_of(current_link, MyIntData, node);
        free(data_item); // Free the entire MyIntData object
        current_link = next_link;
    }
    *headRef = NULL;
    printf("Intrusive List freed. All underlying MyIntData objects have been deallocated.\n");
}

// Dismantles the list structure WITHOUT freeing the underlying objects.
void intrusive_dismantle_list(IntrusiveLink** headRef) {
    IntrusiveLink* current_link = *headRef;
    while (current_link != NULL) {
        IntrusiveLink* next_link = current_link->next;
        current_link->next = NULL; // Unlink the node
        current_link = next_link;
    }
    *headRef = NULL;
}


//====================================================================
// Functions for Managing the Intrusive Binary Tree
//====================================================================

// Function to insert a new data structure into the intrusive binary search tree
void intrusive_tree_insert(IntrusiveTreeNode** rootRef, MyIntData* new_data_ptr) {
    if (*rootRef == NULL) {
        *rootRef = &(new_data_ptr->tree_node);
        return;
    }
    
    IntrusiveTreeNode* current = *rootRef;
    while (1) {
        MyIntData* current_data = container_of(current, MyIntData, tree_node);
        if (new_data_ptr->value < current_data->value) {
            if (current->left == NULL) {
                current->left = &(new_data_ptr->tree_node);
                break;
            }
            current = current->left;
        } else {
            if (current->right == NULL) {
                current->right = &(new_data_ptr->tree_node);
                break;
            }
            current = current->right;
        }
    }
}

// Recursive function to print the tree using in-order traversal
void intrusive_print_tree_in_order_recursive(IntrusiveTreeNode* root) {
    if (root == NULL) {
        return;
    }
    intrusive_print_tree_in_order_recursive(root->left);
    MyIntData* data_item = container_of(root, MyIntData, tree_node);
    printf("%d ", data_item->value);
    intrusive_print_tree_in_order_recursive(root->right);
}

// Wrapper function to print the entire tree
void intrusive_print_tree(IntrusiveTreeNode* root) {
    printf("Intrusive Binary Tree (sorted order):  ");
    intrusive_print_tree_in_order_recursive(root);
    printf("\n");
}


//====================================================================
// Main Demo
//====================================================================

int main() {
    // Heads for our two intrusive data structures
    IntrusiveLink* list_head = NULL;
    IntrusiveTreeNode* tree_root = NULL;

    printf("Creating and inserting data into both list and tree...\n");

    // Create several data objects
    MyIntData* data1 = newIntData(30);
    MyIntData* data2 = newIntData(10);
    MyIntData* data3 = newIntData(40);
    MyIntData* data4 = newIntData(20);
    MyIntData* data5 = newIntData(50);

    // Insert each object into BOTH data structures
    intrusive_insert_at_beginning(&list_head, data1);
    intrusive_tree_insert(&tree_root, data1);

    intrusive_insert_at_beginning(&list_head, data2);
    intrusive_tree_insert(&tree_root, data2);

    intrusive_insert_at_beginning(&list_head, data3);
    intrusive_tree_insert(&tree_root, data3);

    intrusive_insert_at_beginning(&list_head, data4);
    intrusive_tree_insert(&tree_root, data4);

    intrusive_insert_at_beginning(&list_head, data5);
    intrusive_tree_insert(&tree_root, data5);

    printf("\n--- Data Structures Populated ---\n");
    // Print both structures to show the same data is organized differently
    intrusive_print_list(list_head);
    intrusive_print_tree(tree_root);
    printf("----------------------------------\n\n");

    // Free the memory. We only need to do this once.
    // We iterate through the list to free each MyIntData object.
    // After this, the pointers in the tree (tree_root) are dangling.
    intrusive_free_list(&list_head);

    // The tree's pointers are now invalid. We must set its root to NULL
    // to avoid using dangling pointers.
    tree_root = NULL;
    printf("Tree root set to NULL as its memory has been freed via the list.\n\n");
    
    printf("--- Data Structures After Freeing ---\n");
    intrusive_print_list(list_head);
    intrusive_print_tree(tree_root);
    printf("-----------------------------------\n");

    return 0;
}

C Task Scheduler

Prompt: Create a C code example for the task scheduler use case:

State Management (e.g., Task Schedulers): An object might represent a task. It could live in a main tree of all known tasks. When a task is ready to run, it’s added to an intrusive run_queue list. When it’s waiting for I/O, it’s moved to a wait_queue list. The object itself is never created or destroyed; it’s just linked into different lists that represent its current state. This is a core pattern in OS kernel development.

Gemini:

Here is a C code example that implements a simple task scheduler using intrusive data structures.

A single Task object can exist in the main all_tasks_tree for its entire lifetime while being moved between the run_queue and wait_queue lists without any additional memory allocations. The lists and the tree simply manage pointers embedded within the Task object itself.

#include 
#include 
#include 
#include 

// Macro to get the containing structure from a pointer to its member
#define container_of(ptr, type, member) \
    ((type *)((char *)(ptr) - offsetof(type, member)))

//====================================================================
// Intrusive Data Structure "Hooks"
//====================================================================

// A link for a doubly-linked list, useful for efficient queue removal
typedef struct IntrusiveLink {
    struct IntrusiveLink *next;
    struct IntrusiveLink *prev;
} IntrusiveLink;

// A node for a binary search tree
typedef struct IntrusiveTreeNode {
    struct IntrusiveTreeNode *left;
    struct IntrusiveTreeNode *right;
} IntrusiveTreeNode;

//====================================================================
// The Task Object
//====================================================================

typedef enum {
    TASK_STATE_NEW,
    TASK_STATE_READY,
    TASK_STATE_WAITING,
    TASK_STATE_DONE
} TaskState;

// The core object containing data and intrusive hooks
typedef struct Task {
    int task_id;
    const char* name;
    TaskState state;

    IntrusiveLink queue_link;     // Hook for run/wait queues
    IntrusiveTreeNode tree_node;  // Hook for the master task tree
} Task;

//====================================================================
// "Scheduler" Global State
//====================================================================

// Master tree of all tasks that have ever been created, indexed by ID
IntrusiveTreeNode* all_tasks_tree = NULL;

// Queues representing the current state of tasks
IntrusiveLink run_queue  = { &run_queue, &run_queue }; // Sentinels for an empty list
IntrusiveLink wait_queue = { &wait_queue, &wait_queue };

//====================================================================
// Intrusive Queue (List) Management
//====================================================================

// Removes a link from whichever queue it is currently in
void queue_unlink(IntrusiveLink* link) {
    link->prev->next = link->next;
    link->next->prev = link->prev;
    link->next = link->prev = NULL; // Isolate the link
}

// Adds a link to the back of a queue
void queue_push_back(IntrusiveLink* head, IntrusiveLink* new_link) {
    new_link->prev = head->prev;
    new_link->next = head;
    head->prev->next = new_link;
    head->prev = new_link;
}

// Removes and returns the link at the front of a queue
IntrusiveLink* queue_pop_front(IntrusiveLink* head) {
    if (head->next == head) return NULL; // Queue is empty
    IntrusiveLink* link = head->next;
    queue_unlink(link);
    return link;
}

//====================================================================
// Core Task Management & Scheduler Logic
//====================================================================

// Factory to create a new task
Task* create_task(int id, const char* name) {
    Task* task = (Task*)malloc(sizeof(Task));
    task->task_id = id;
    task->name = name;
    task->state = TASK_STATE_NEW;
    task->queue_link.next = task->queue_link.prev = NULL;
    task->tree_node.left = task->tree_node.right = NULL;

    // Add to the master tree (simple BST insertion)
    IntrusiveTreeNode** current = &all_tasks_tree;
    while (*current) {
        Task* existing_task = container_of(*current, Task, tree_node);
        if (task->task_id < existing_task->task_id) {
            current = &(*current)->left;
        } else {
            current = &(*current)->right;
        }
    }
    *current = &task->tree_node;
    printf("CREATED:  Task %d ('%s')\n", id, name);
    return task;
}

// Moves a task to the ready queue to be run
void schedule_task(Task* task) {
    if (task->queue_link.next) queue_unlink(&task->queue_link); // Unlink from old queue if any
    task->state = TASK_STATE_READY;
    queue_push_back(&run_queue, &task->queue_link);
    printf("SCHEDULED: Task %d ('%s') moved to RUN queue.\n", task->task_id, task->name);
}

// Simulates running a task. Returns 1 if task is done, 0 otherwise.
int execute_task(Task* task) {
    printf("EXECUTING: Task %d ('%s')...\n", task->task_id, task->name);
    
    // Simple logic: tasks with ID > 100 need I/O, others finish
    if (task->task_id > 100) {
        task->state = TASK_STATE_WAITING;
        queue_push_back(&wait_queue, &task->queue_link);
        printf("WAITING:  Task %d ('%s') moved to WAIT queue for I/O.\n", task->task_id, task->name);
        return 0;
    } else {
        task->state = TASK_STATE_DONE;
        printf("DONE:     Task %d ('%s') has finished.\n", task->task_id, task->name);
        return 1;
    }
}

// Prints the current state of the scheduler queues
void print_scheduler_state() {
    printf("\n--- Scheduler State ---\n");
    printf("Run Queue:  ");
    for (IntrusiveLink* l = run_queue.next; l != &run_queue; l = l->next) {
        printf("%d -> ", container_of(l, Task, queue_link)->task_id);
    }
    printf("END\nWait Queue: ");
    for (IntrusiveLink* l = wait_queue.next; l != &wait_queue; l = l->next) {
        printf("%d -> ", container_of(l, Task, queue_link)->task_id);
    }
    printf("END\n-----------------------\n\n");
}

// Free all tasks using a post-order traversal of the master tree
void free_all_tasks(IntrusiveTreeNode* node) {
    if (!node) return;
    free_all_tasks(node->left);
    free_all_tasks(node->right);
    free(container_of(node, Task, tree_node));
}

//====================================================================
// Main Demo
//====================================================================

int main() {
    // 1. Create all tasks. They start in the NEW state.
    Task* task1 = create_task(10, "Calculate report");
    Task* task2 = create_task(205, "Fetch network data");
    Task* task3 = create_task(30, "Update UI");
    Task* task4 = create_task(155, "Write to disk");

    // 2. Schedule initial tasks. They move to the READY state and enter the run queue.
    schedule_task(task1);
    schedule_task(task2);
    schedule_task(task3);
    
    print_scheduler_state();

    // 3. Run a scheduler cycle: execute one task from the run queue.
    IntrusiveLink* link = queue_pop_front(&run_queue);
    execute_task(container_of(link, Task, queue_link)); // Task 10 will finish.

    print_scheduler_state();

    // 4. Run another cycle. This task will need I/O and move to the wait queue.
    link = queue_pop_front(&run_queue);
    execute_task(container_of(link, Task, queue_link)); // Task 205 will wait.
    
    print_scheduler_state();

    // 5. Schedule the last task.
    schedule_task(task4);

    print_scheduler_state();

    // 6. Simulate I/O completion for task 205. It's found in the wait queue
    //    and moved back to the run queue.
    printf("...I/O for task 205 has completed...\n");
    schedule_task(task2); // Re-scheduling moves it from wait_queue to run_queue.
    
    print_scheduler_state();
    
    // 7. Run all remaining tasks until the run queue is empty.
    printf("...Running all remaining tasks...\n");
    while ((link = queue_pop_front(&run_queue))) {
        execute_task(container_of(link, Task, queue_link));
    }

    print_scheduler_state();

    // 8. Clean up all memory by freeing the master tree.
    printf("All tasks processed. Freeing memory.\n");
    free_all_tasks(all_tasks_tree);

    return 0;
}

C LRU Cache

Gemini:

Here is a complete C implementation of an LRU (Least Recently Used) cache using intrusive data structures.

This example demonstrates a key strength of this pattern: a single allocated object (CacheEntry) is simultaneously part of two different data structures:

A Hash Table (implemented as an array of singly-linked lists) for O(1) average time complexity lookups.
A Doubly-Linked List to maintain the usage order (from most- to least-recently-used), allowing for O(1) updates and evictions.

The CacheEntry object itself contains the “hooks” (hash_link and lru_link) that allow it to be managed by these external structures without any additional memory allocations per operation.

#include 
#include 
#include 
#include 

// A macro to get the containing structure from a pointer to its member.
#define container_of(ptr, type, member) \
    ((type *)((char *)(ptr) - offsetof(type, member)))

//====================================================================
// Intrusive Data Structure "Hooks"
//====================================================================

// Hook for a singly-linked list (for hash table collision chains).
typedef struct HashLink {
    struct HashLink *next;
} HashLink;

// Hook for a doubly-linked list (for LRU ordering).
typedef struct LruLink {
    struct LruLink *next;
    struct LruLink *prev;
} LruLink;

//====================================================================
// The Cache Entry Object
//====================================================================

// This is the core object we store. It contains the actual data
// and the hooks for both the hash table and the LRU list.
typedef struct CacheEntry {
    int key;
    char value[256];

    HashLink hash_link; // Hook for hash collision chaining.
    LruLink lru_link;   // Hook for LRU ordering.
} CacheEntry;

//====================================================================
// The LRU Cache Structure
//====================================================================

typedef struct LRUCache {
    size_t capacity;
    size_t size;
    HashLink **hash_table; // Array of pointers to collision chains.
    size_t hash_table_size;
    LruLink lru_list_head; // Sentinel node for the LRU list.
} LRUCache;

//====================================================================
// Doubly-Linked List (LRU) Management
//====================================================================

// Initializes a sentinel node for a doubly-linked list.
static void lru_list_init(LruLink *head) {
    head->next = head;
    head->prev = head;
}

// Removes a link from its current position in the list.
static void lru_list_unlink(LruLink *link) {
    link->prev->next = link->next;
    link->next->prev = link->prev;
}

// Adds a link to the front of the list (most recently used).
static void lru_list_push_front(LruLink *head, LruLink *new_link) {
    new_link->next = head->next;
    new_link->prev = head;
    head->next->prev = new_link;
    head->next = new_link;
}

//====================================================================
// Cache Internals
//====================================================================

// A simple hash function for integer keys.
static size_t hash_key(int key, size_t table_size) {
    return (size_t)key % table_size;
}

// Finds a cache entry by key. Does NOT change LRU order.
static CacheEntry* find_entry(LRUCache* cache, int key) {
    size_t index = hash_key(key, cache->hash_table_size);
    HashLink* current = cache->hash_table[index];

    while (current) {
        CacheEntry* entry = container_of(current, CacheEntry, hash_link);
        if (entry->key == key) {
            return entry;
        }
        current = current->next;
    }
    return NULL;
}

// Removes an entry from the hash table.
static void remove_from_hash_table(LRUCache* cache, CacheEntry* entry) {
    size_t index = hash_key(entry->key, cache->hash_table_size);
    HashLink** indirect = &cache->hash_table[index];

    while (*indirect && *indirect != &entry->hash_link) {
        indirect = &(*indirect)->next;
    }

    if (*indirect) {
        *indirect = entry->hash_link.next;
    }
}

// Evicts the least recently used item from the cache.
static void evict_one(LRUCache* cache) {
    // The LRU item is at the back of the list (before the sentinel).
    LruLink* lru_link_to_evict = cache->lru_list_head.prev;
    if (lru_link_to_evict == &cache->lru_list_head) {
        // Should not happen if size > 0, but good practice.
        return;
    }

    CacheEntry* entry_to_evict = container_of(lru_link_to_evict, CacheEntry, lru_link);
    printf("EVICTING: Key %d ('%s')\n", entry_to_evict->key, entry_to_evict->value);

    // Unlink from both data structures.
    lru_list_unlink(&entry_to_evict->lru_link);
    remove_from_hash_table(cache, entry_to_evict);

    // Free the actual object and decrement cache size.
    free(entry_to_evict);
    cache->size--;
}


//====================================================================
// Public LRU Cache API
//====================================================================

// Creates and initializes an LRU cache.
LRUCache* lru_create(size_t capacity) {
    LRUCache* cache = (LRUCache*)malloc(sizeof(LRUCache));
    if (!cache) return NULL;

    cache->capacity = capacity;
    cache->size = 0;
    // For simplicity, hash table size is twice the capacity.
    cache->hash_table_size = capacity * 2; 
    cache->hash_table = (HashLink**)calloc(cache->hash_table_size, sizeof(HashLink*));
    if (!cache->hash_table) {
        free(cache);
        return NULL;
    }
    
    lru_list_init(&cache->lru_list_head);
    return cache;
}

// Puts a key-value pair into the cache.
void lru_put(LRUCache* cache, int key, const char* value) {
    printf("PUT:      Key %d, Value '%s'\n", key, value);

    CacheEntry* entry = find_entry(cache, key);

    if (entry) {
        // --- Key exists: Update value and move to front ---
        strncpy(entry->value, value, sizeof(entry->value) - 1);
        // Unlink from current position and move to front of LRU list.
        lru_list_unlink(&entry->lru_link);
        lru_list_push_front(&cache->lru_list_head, &entry->lru_link);
    } else {
        // --- Key does not exist: Create new entry ---
        if (cache->size >= cache->capacity) {
            evict_one(cache);
        }

        // Create the new entry.
        entry = (CacheEntry*)malloc(sizeof(CacheEntry));
        entry->key = key;
        strncpy(entry->value, value, sizeof(entry->value) - 1);

        // Add to hash table.
        size_t index = hash_key(key, cache->hash_table_size);
        entry->hash_link.next = cache->hash_table[index];
        cache->hash_table[index] = &entry->hash_link;

        // Add to front of LRU list.
        lru_list_push_front(&cache->lru_list_head, &entry->lru_link);
        cache->size++;
    }
}

// Gets a value by key from the cache.
const char* lru_get(LRUCache* cache, int key) {
    CacheEntry* entry = find_entry(cache, key);

    if (entry) {
        printf("GET:      Key %d -> Found '%s'. Moving to front.\n", key, entry->value);
        // Move to front of LRU list to mark as recently used.
        lru_list_unlink(&entry->lru_link);
        lru_list_push_front(&cache->lru_list_head, &entry->lru_link);
        return entry->value;
    }

    printf("GET:      Key %d -> Not found.\n", key);
    return NULL;
}

// Frees all resources used by the cache.
void lru_destroy(LRUCache* cache) {
    LruLink* current = cache->lru_list_head.next;
    while (current != &cache->lru_list_head) {
        LruLink* next = current->next;
        CacheEntry* entry = container_of(current, CacheEntry, lru_link);
        free(entry);
        current = next;
    }
    free(cache->hash_table);
    free(cache);
}

// Utility to print the cache contents in LRU order.
void lru_print(LRUCache* cache) {
    printf("\n--- Cache State (MRU to LRU) ---\n");
    printf("Size: %zu, Capacity: %zu\n", cache->size, cache->capacity);
    printf("Order: ");
    LruLink* current = cache->lru_list_head.next;
    while (current != &cache->lru_list_head) {
        CacheEntry* entry = container_of(current, CacheEntry, lru_link);
        printf("%d -> ", entry->key);
        current = current->next;
    }
    printf("END\n--------------------------------\n\n");
}

//====================================================================
// Main Demo
//====================================================================

int main() {
    LRUCache* cache = lru_create(3);

    lru_put(cache, 10, "apple");
    lru_put(cache, 20, "banana");
    lru_put(cache, 30, "cherry");
    lru_print(cache); // Order: 30 -> 20 -> 10

    lru_get(cache, 10); // Access 10, making it the most recent.
    lru_print(cache); // Order: 10 -> 30 -> 20

    // This next put will cause an eviction. '20' is the LRU item.
    lru_put(cache, 40, "date");
    lru_print(cache); // Order: 40 -> 10 -> 30

    lru_get(cache, 20); // '20' was evicted, so this is a miss.
    lru_print(cache);

    lru_put(cache, 30, "cantaloupe"); // Update existing key '30'.
    lru_print(cache); // Order: 30 -> 40 -> 10
    lru_get(cache, 30);

    lru_destroy(cache);
    printf("Cache destroyed.\n");

    return 0;
}

C Multi-threaded Producer-Consumer Queue

Prompt: Create an example in C of a good use case for intrusive data structures where it involves two threads. Should we worry about thread safety?

Gemini:

This is an excellent question because multi-threaded producer-consumer scenarios are one of the best use cases for intrusive data structures.

Should we worry about thread safety?

Yes, absolutely and unequivocally.

Any time you have data that can be accessed and modified by more than one thread, you must use synchronization mechanisms to protect it. In our case, the shared data is the queue itself—specifically, the pointers (next, prev) that form the linked list.

Without protection, you could face race conditions like:

A “lost update”: The producer adds an item at the exact same time the consumer removes one. The pointer manipulations could interfere, corrupting the list and either losing an item or causing a crash.
Double free or memory leak: The list gets corrupted in such a way that an item is either processed twice (leading to a double free) or is unlinked incorrectly and never processed at all (a memory leak).
Inconsistent state: A thread reads the list while it’s in a partially updated, invalid state.

To solve this, we will use:

A Mutex (pthread_mutex_t): To ensure that only one thread can be modifying the queue’s pointers at any given time (mutual exclusion).
A Condition Variable (pthread_cond_t): To allow the consumer thread to sleep efficiently when the queue is empty, and for the producer to wake it up when a new item is available. This avoids wasteful “busy-waiting”.

Use Case: Multi-threaded Producer-Consumer Queue

Here is a classic example. We’ll have:

A Producer Thread: Creates “work items” and adds them to a shared queue.
A Consumer Thread: Takes items from the queue and “processes” them.

The key benefit of using an intrusive list here is performance. The producer mallocs the WorkItem once. The queue itself performs zero allocations. It just shuffles pointers. This is much faster and creates less memory fragmentation than a non-intrusive queue that would need to allocate its own internal nodes for every item.

Here is the complete C code example:

#include 
#include 
#include 
#include 
#include 
#include  // for sleep()

// A macro to get the containing structure from a pointer to its member.
#define container_of(ptr, type, member) \
    ((type *)((char *)(ptr) - offsetof(type, member)))

//====================================================================
// Intrusive Data Structure Hook
//====================================================================

// Hook for a doubly-linked list.
typedef struct IntrusiveLink {
    struct IntrusiveLink *next;
    struct IntrusiveLink *prev;
} IntrusiveLink;

//====================================================================
// The Work Item Object
//====================================================================

// This is the object that is passed between threads.
typedef struct WorkItem {
    int id;
    char message[128];
    IntrusiveLink link; // The hook for the shared queue.
} WorkItem;

//====================================================================
// The Thread-Safe Shared Queue
//====================================================================

typedef struct SharedQueue {
    IntrusiveLink head;         // Sentinel node for the queue.
    pthread_mutex_t mutex;      // Protects access to the queue.
    pthread_cond_t cond;        // For signaling between threads.
    int is_done;                // Flag to signal shutdown.
} SharedQueue;

// Initialize the shared queue.
void queue_init(SharedQueue* queue) {
    queue->head.next = &queue->head;
    queue->head.prev = &queue->head;
    pthread_mutex_init(&queue->mutex, NULL);
    pthread_cond_init(&queue->cond, NULL);
    queue->is_done = 0;
}

// Add a work item to the back of the queue.
void queue_push(SharedQueue* queue, WorkItem* item) {
    pthread_mutex_lock(&queue->mutex);

    // Link the new item at the back of the list (before the head).
    IntrusiveLink* new_link = &item->link;
    IntrusiveLink* head = &queue->head;
    new_link->prev = head->prev;
    new_link->next = head;
    head->prev->next = new_link;
    head->prev = new_link;

    // Signal the consumer that a new item is available.
    pthread_cond_signal(&queue->cond);
    pthread_mutex_unlock(&queue->mutex);
}

// Remove and return a work item from the front of the queue.
// Blocks until an item is available.
WorkItem* queue_pop(SharedQueue* queue) {
    pthread_mutex_lock(&queue->mutex);

    // Wait until the queue is not empty or shutdown is signaled.
    while (queue->head.next == &queue->head && !queue->is_done) {
        pthread_cond_wait(&queue->cond, &queue->mutex);
    }

    if (queue->head.next != &queue->head) {
        // Unlink the item from the front of the list.
        IntrusiveLink* link_to_pop = queue->head.next;
        link_to_pop->prev->next = link_to_pop->next;
        link_to_pop->next->prev = link_to_pop->prev;
        
        pthread_mutex_unlock(&queue->mutex);
        return container_of(link_to_pop, WorkItem, link);
    }

    // Queue is empty and shutdown was signaled.
    pthread_mutex_unlock(&queue->mutex);
    return NULL;
}

// Signal threads to shut down and destroy queue resources.
void queue_destroy(SharedQueue* queue) {
    pthread_mutex_lock(&queue->mutex);
    queue->is_done = 1;
    // Broadcast to wake up all waiting threads.
    pthread_cond_broadcast(&queue->cond);
    pthread_mutex_unlock(&queue->mutex);

    pthread_mutex_destroy(&queue->mutex);
    pthread_cond_destroy(&queue->cond);
}


//====================================================================
// Thread Functions
//====================================================================

void* producer_thread(void* arg) {
    SharedQueue* queue = (SharedQueue*)arg;
    for (int i = 0; i < 10; ++i) {
        // Create a new work item. This is the only allocation.
        WorkItem* item = (WorkItem*)malloc(sizeof(WorkItem));
        item->id = i;
        snprintf(item->message, sizeof(item->message), "Message #%d", i);
        
        printf("PRODUCER: Creating item %d\n", i);
        queue_push(queue, item);
        
        // Simulate work
        usleep((rand() % 250) * 1000);
    }
    printf("PRODUCER: Finished.\n");
    return NULL;
}

void* consumer_thread(void* arg) {
    SharedQueue* queue = (SharedQueue*)arg;
    while (1) {
        WorkItem* item = queue_pop(queue);
        if (item == NULL) {
            // This means the queue is empty and we're shutting down.
            break;
        }
        
        printf("CONSUMER: Processing item %d ('%s')\n", item->id, item->message);
        // The consumer is responsible for freeing the object.
        free(item);
    }
    printf("CONSUMER: Shutting down.\n");
    return NULL;
}

//====================================================================
// Main Demo
//====================================================================

int main() {
    pthread_t producer, consumer;
    SharedQueue queue;

    queue_init(&queue);

    printf("Starting producer and consumer threads...\n");
    pthread_create(&producer, NULL, producer_thread, &queue);
    pthread_create(&consumer, NULL, consumer_thread, &queue);

    // Wait for the producer to finish its work.
    pthread_join(producer, NULL);
    
    printf("Producer finished, signaling consumer to shut down...\n");
    // Signal shutdown and clean up.
    queue_destroy(&queue);

    // Wait for the consumer to finish processing remaining items and exit.
    pthread_join(consumer, NULL);

    printf("All threads finished.\n");

    return 0;
}

Rust

#![allow(unused_variables)] // Allow unused variables for demonstration purposes
#![allow(dead_code)] // Allow dead code for demonstration purposes

use std::{mem, ptr}; // Import mem for offset_of, ptr for raw pointer operations

// Helper to get the containing struct from the IntrusiveNode.
// This is the Rust equivalent of the C `container_of` macro.
// It relies on `offset_of!` from `core::mem`.
// IMPORTANT: This macro MUST be defined BEFORE any code that uses it.
macro_rules! container_of {
    ($ptr:expr, $Type:ty, $member:ident) => {
        {
            // Ensure the pointer is mutable. `*mut u8` is the generic byte pointer.
            let member_ptr = $ptr as *mut u8;
            // Calculate the offset of the member within the struct.
            // This requires the actual type name ($Type) and member name ($member).
            let offset = mem::offset_of!($Type, $member);
            // Subtract the offset from the member's address to get the struct's base address.
            (member_ptr.sub(offset)) as *mut $Type
        }
    };
}

// Define a generic intrusive node structure.
// Any structure that wants to be part of an intrusive list will embed this.
// #[repr(C)] ensures C-compatible memory layout.
#[repr(C)]
pub struct IntrusiveNode {
    pub next: *mut IntrusiveNode, // Raw pointer to the next node
}

impl IntrusiveNode {
    // A constructor for a new (unlinked) node.
    pub const fn new() -> Self {
        IntrusiveNode {
            next: ptr::null_mut(), // Initialize next as a null pointer
        }
    }
}

// The IntrusiveLinkedList now just holds a raw pointer to the head.
pub struct IntrusiveLinkedList {
    head: *mut IntrusiveNode, // Raw pointer to the head of the intrusive nodes
}

impl IntrusiveLinkedList {
    pub fn new() -> Self {
        IntrusiveLinkedList {
            head: ptr::null_mut(),
        }
    }

    // Function to insert a node at the end of the intrusive list.
    // 'node_to_insert' is a raw pointer to the embedded IntrusiveNode.
    // This function is `unsafe` because it deals with raw pointers and
    // doesn't guarantee memory safety on its own.
    pub unsafe fn intrusive_insert_end(&mut self, node_to_insert: *mut IntrusiveNode) {
        // Ensure the new node points to NULL initially
        unsafe {
            (*node_to_insert).next = ptr::null_mut();
        }

        if self.head.is_null() {
            self.head = node_to_insert;
            return;
        }

        let mut temp = self.head;
        unsafe {
            while !(*temp).next.is_null() {
                temp = (*temp).next;
            }
            (*temp).next = node_to_insert;
        }
    }

    // Function to print the intrusive list.
    // Requires a print function that can cast the IntrusiveNode* back to the original struct.
    pub unsafe fn print_intrusive_list(&self, print_func: unsafe fn(*mut IntrusiveNode)) {
        let mut temp = self.head;
        while !temp.is_null() {
            unsafe {
                print_func(temp);
            }
            print!(", ");
            unsafe {
                temp = (*temp).next;
            }
        }
        println!();
    }

    // This function will iterate through the list and free the containing structures.
    // This is the user's responsibility in an intrusive list.
    // This is highly unsafe and requires knowledge of the actual type.
    pub unsafe fn free_list_person(&mut self) {
        println!("free_list_person");
        // let mut current = self.head;
        // while !current.is_null() {
        //     let next_node = (*current).next; // Save the next pointer before freeing current
        //
        //     // Cast the IntrusiveNode back to the containing Person struct using the macro
        //     println!("casting");
        //     let p_ptr = container_of!(current, Person, node);
        //
        //     // Reconstruct Box from raw pointer and let it drop, which frees 'name' and 'Person'
        //     println!("reconstructing box");
        //     let _ = Box::from_raw(p_ptr);

        //     current = next_node;
        // }
        println!("clearing list");
        self.head = ptr::null_mut(); // Clear the list head after freeing all elements
    }
}

// Example data structure: Person, now containing the intrusive node.
// #[repr(C)] is essential for predictable memory layout.
#[repr(C)]
pub struct Person {
    pub node: IntrusiveNode, // Embedded intrusive node
    pub name: String,
    pub age: i32,
}

impl Drop for Person {
    fn drop(&mut self) {
        // `String` handles its own memory, so no explicit free for `name` here.
        // This `Drop` impl will be called when a `Person` value goes out of scope
        // or is explicitly dropped (e.g., via `Box::from_raw`).
        println!("Dropping Person: {}", self.name); // For debugging drops
    }
}

// Print function for Person. Takes a raw pointer to IntrusiveNode.
unsafe fn print_person(node_ptr: *mut IntrusiveNode) {
    unsafe {
        let person_ptr = container_of!(node_ptr, Person, node);
        let person_ref = &*person_ptr; // Dereference the raw pointer to get a reference
        print!("{} is {} years old", person_ref.name, person_ref.age);
    }
}

// Example data structure: IntegerWrapper, for demonstrating integers
#[repr(C)]
pub struct IntegerWrapper {
    pub node: IntrusiveNode,
    pub value: i32,
}

// Print function for IntegerWrapper. Takes a raw pointer to IntrusiveNode.
unsafe fn print_int(node_ptr: *mut IntrusiveNode) {
    unsafe {
        let int_wrapper_ptr = container_of!(node_ptr, IntegerWrapper, node);
        let int_wrapper_ref = &*int_wrapper_ptr;
        print!("{}", int_wrapper_ref.value);
    }
}

fn test_int_list() {
    let mut list = IntrusiveLinkedList::new();

    let mut val1 = IntegerWrapper {
        node: IntrusiveNode::new(),
        value: 10,
    };
    let mut val2 = IntegerWrapper {
        node: IntrusiveNode::new(),
        value: 20,
    };
    let mut val3 = IntegerWrapper {
        node: IntrusiveNode::new(),
        value: 30,
    };
    let mut val4 = IntegerWrapper {
        node: IntrusiveNode::new(),
        value: 40,
    };

    unsafe {
        list.intrusive_insert_end(&mut val1.node);
        list.intrusive_insert_end(&mut val2.node);
        list.intrusive_insert_end(&mut val3.node);
        list.intrusive_insert_end(&mut val4.node);

        list.print_intrusive_list(print_int);
    }

    list.head = ptr::null_mut();
}

fn test_person_list() {
    let mut list = IntrusiveLinkedList::new();

    let mut p1 = Box::new(Person {
        node: IntrusiveNode::new(),
        name: String::from("Marco"),
        age: 22,
    });
    let mut p2 = Box::new(Person {
        node: IntrusiveNode::new(),
        name: String::from("Mary"),
        age: 20,
    });

    unsafe {
        let p1_node_ptr = &mut p1.node as *mut IntrusiveNode;
        let p2_node_ptr = &mut p2.node as *mut IntrusiveNode;

        list.intrusive_insert_end(p1_node_ptr);
        list.intrusive_insert_end(p2_node_ptr);

        list.print_intrusive_list(print_person);

        list.free_list_person();
        println!("bye usnafe");
    }
    println!("bye test_person_list");
}

fn main() {
    test_int_list();
    println!();
    test_person_list();
}

Zig

https://www.openmymind.net/Zigs-New-LinkedList-API/

pub const SinglyLinkedList = struct {
  first: ?*Node = null,

  pub const Node = struct {
    next: ?*Node = null,
  };
};

Protty @kingprotty Jul 21, 2025

intrusive-collection requires the nodes to outlive the linked-list (static hierarchical lifetimes) or use box/arena (heap allocation).

By intrusive, I mean not assuming where memory must live

https://x.com/kingprotty/status/1947288844094603443

https://github.com/mitchellh/libxev/blob/main/src/ThreadPool.zig Pool.schedule() takes Task ptr that can live anywhere until its callback is invoked. Store it on stack for fork/join structured concurrency pattern. Store in heap & free itself in callback for spawn(detached)/join pattern. Store in array for pool pattern.

https://x.com/kingprotty/status/1947363773217419707

https://github.com/mitchellh/libxev/blob/main/src/ThreadPool.zig#L377

const task: *Task = @fieldParentPtr("node", result.node);
(task.callback)(task);

Comparing `container_of` implementations

Prompt: What is the difference between these two macros?

#define container_of(ptr, type, member) \
((type *)((char *)(ptr) - offsetof(type, member)))

#define container_of(ptr, type, member) ({ \
const typeof( ((type *)0)->member ) *__mptr = (ptr); \
(type *)( (char *)__mptr - offsetof(type,member) );}) 

Gemini:

You’ve identified two common forms of the container_of macro, and the second one is the more robust and widely used version, especially in the Linux kernel. Let’s break down the differences:

Version 1: Simple (Less Robust)

#define container_of(ptr, type, member) \
    ((type *)((char *)(ptr) - offsetof(type, member)))

How it works: This version directly performs the pointer arithmetic.

(char *)(ptr): Casts the member pointer ptr to a char*. This is crucial because offsetof returns a byte offset, and char* pointer arithmetic works in terms of bytes. If you didn’t cast to char*, the subtraction would be in terms of sizeof(*ptr), which would likely be incorrect.
offsetof(type, member): This standard macro (from ) calculates the offset in bytes of the member within the type structure.
((char *)(ptr) - offsetof(type, member)): Subtracts the offset from the member’s address to get the address of the beginning of the containing structure.
(type *): Casts the resulting byte address back to a pointer to the type of the container structure.

Pros:

Simple and straightforward.
Achieves the core functionality.

Cons:

No Compile-Time Type Checking: This is the most significant drawback. If you accidentally pass a ptr that is not actually a pointer to the member of the specified type, the compiler will not warn you. It will happily perform the calculation, resulting in undefined behavior or crashes at runtime. For example, if ptr points to an int and member is supposed to be a struct list_head, the compiler won’t catch the mismatch.

Version 2: Linux Kernel Style (Robust with Type Checking)

#define container_of(ptr, type, member) ({ \
    const typeof( ((type *)0)->member ) *__mptr = (ptr); \
    (type *)( (char *)__mptr - offsetof(type,member) );})

How it works: This version introduces two key enhancements:

GNU Statement Expression ({ ... }): The entire macro is wrapped in ({ ... }). This is a GNU C extension (supported by GCC and Clang) that allows you to have a compound statement (with local variable declarations, etc.) act as an expression. The value of the last expression in the block is the value of the overall statement expression. This is essential for declaring a local variable like __mptr within the macro without polluting the surrounding scope.
Compile-Time Type Checking with typeof:
- ((type *)0)->member: This is a clever trick. It creates a “null pointer” to the type structure and then attempts to access its member. Crucially, this expression is never actually evaluated at runtime (it doesn’t dereference a null pointer). Its sole purpose is to provide a context for typeof.
- typeof( ((type *)0)->member ): The typeof operator (another GCC/Clang extension) gets the type of the expression ((type *)0)->member. So, if member is of type struct list_head, typeof(...) will yield struct list_head.
- const typeof( ((type *)0)->member ) *__mptr = (ptr);: This is the magic line. It declares a temporary local pointer __mptr whose type is const pointer to the actual type of the member. Then, it initializes __mptr with ptr.
  - If ptr has a different type than member, the compiler will issue a warning (or even an error, depending on compiler flags) about incompatible pointer types during this assignment. This catches many potential bugs at compile time that the simpler version would miss.
  - The const qualifier is added to __mptr to prevent accidental modification of the pointer within the macro.
- (type *)( (char *)__mptr - offsetof(type,member) );: The rest is similar to the simpler version, performing the pointer arithmetic using the now type-checked __mptr.

Pros:

Compile-Time Type Safety (Major Advantage): Catches many common programming errors where ptr might not actually point to the correct type of member. This makes the code much safer and easier to debug.
Encapsulation: The use of ({ ... }) ensures that any temporary variables (like __mptr) used within the macro don’t interfere with variables in the code that uses the macro.

Cons:

Requires GNU C Extensions: typeof and statement expressions are not part of standard C (C99, C11, C17). While widely supported by GCC and Clang, they are technically non-portable to other compilers. For most Linux kernel development and embedded systems where GCC/Clang are dominant, this is not a practical concern.
Slightly more complex to read and understand initially due to the typeof trick and statement expression.

Summary:

Feature	Simple Version	Linux Kernel Version
Type Safety	None (runtime errors possible)	Yes (compile-time warnings/errors)
Portability	Standard C	GNU C extension (GCC/Clang specific)
Readability	Slightly simpler	Slightly more complex
Error Detection	Poor	Excellent
Usage	Fine for quick, internal use, or very simple cases where type is guaranteed	Preferred for robust, large-scale, or kernel-style development

In most real-world scenarios, especially when dealing with low-level C programming or libraries that need to be highly robust (like the Linux kernel), the second version (with typeof and statement expressions) is overwhelmingly preferred because of its crucial compile-time type-checking capabilities.

Rust for Linux

Add container_of and offset_of macros

https://github.com/Rust-for-Linux/linux/pull/158

impl_list_item!
    container_of!

`macro_rules! container_of`

https://github.com/Rust-for-Linux/linux/blob/28753212e0f9c61afd859acf1d678f5de7faa4b8/rust/kernel/lib.rs#L238C1-L238C26

`macro_rules! impl_has_list_links`

offset_of

https://github.com/Rust-for-Linux/linux/blob/28753212e0f9c61afd859acf1d678f5de7faa4b8/rust/kernel/list/impl_list_item_mod.rs#L47

References

https://github.com/0xAX/linux-insides/blob/master/DataStructures/linux-datastructures-1.md

Pacific++ 2017: Matt Bentley “Can we make a faster linked list?”

Gemini: Intrusive Data Structures Analysis

https://www.gamedeveloper.com/programming/in-depth-intrusive-lists

Adopt intrusive data structures for better performance

Zig

Zig’s @fieldParentPtr for dumbos like me

Hacker News - Zig’s new LinkedList API (it’s time to learn fieldParentPtr)

Resource efficient Thread Pools with Zig

Rust

https://docs.rs/intrusive-collections/latest/intrusive_collections/

Safe Intrusive Collections with Pinning

What are intrusive linked lists and does Rust really struggle with them?

https://www.kernel.org/doc/rustdoc/next/kernel/list/struct.List.html

https://rust.docs.kernel.org/src/kernel/list.rs.html#255

https://github.com/bbatha/movecell_graph/blob/master/src/lib.rs

Go vs Rust Concurrency

2025-07-06T11:10:00+00:00

Prompt

“Go’s memory model, for example, has no concept of isolating memory between co-routines or threads, nor of controlling mutability. Everything can always race on the same global mutable heap.”

Graydon Hoare https://www.infoq.com/news/2012/08/Interview-Rust/

Create a Go program where this behavior can lead to bugs, then show a Rust equivalent where it is safe, because Rust prevents that kind of bug.

Gemini

Of course. This is a fantastic question that gets to the very heart of the philosophical differences between Go and Rust regarding concurrency and memory safety.

I will create two programs to illustrate this. The first is a Go program that demonstrates a classic race condition, a bug that can be subtle and hard to track down. The second is the equivalent program in Rust, which will not even compile without the proper safety mechanisms, effectively preventing the bug from ever existing.

The Go Example: Unchecked Shared Memory

Here is a Go program that starts several concurrent “goroutines.” Each goroutine attempts to increment a shared counter 1,000 times. Because Go’s memory model allows any goroutine to access the shared counter variable without restriction, they will interfere with each other, leading to an incorrect final count.

Go Playground

package main

import (
	"fmt"
	"sync"
)

func main() {
	// We expect the final count to be 10 * 1000 = 10,000.
	const numGoroutines = 10
	const incrementsPerGoroutine = 1000

	// A WaitGroup is used to wait for all the goroutines to finish.
	// This is for control flow, not for memory safety.
	var wg sync.WaitGroup
	wg.Add(numGoroutines)

	// This is our shared counter. It lives on the heap and is accessible
	// by all goroutines launched from main.
	var counter int // Defaults to 0

	// Launch 10 concurrent goroutines.
	for i := 0; i < numGoroutines; i++ {
		go func() {
			// Decrement the WaitGroup counter when the goroutine finishes.
			defer wg.Done()

			// Each goroutine will try to increment the counter 1000 times.
			for j := 0; j < incrementsPerGoroutine; j++ {
				// --- THIS IS THE DANGER ZONE ---
				// The `counter++` operation is not atomic. It involves three steps:
				// 1. Read the current value of `counter`.
				// 2. Add 1 to that value.
				// 3. Write the new value back to `counter`.
				// A different goroutine can read the value *after* this one has
				// read it but *before* it has written the new value back.
				counter++
			}
		}()
	}

	// Wait for all goroutines to complete their work.
	wg.Wait()

	// The final result will almost certainly NOT be 10,000.
	// Run this program multiple times, and you'll likely get a different result each time.
	fmt.Printf("Expected final counter: %d\n", numGoroutines*incrementsPerGoroutine)
	fmt.Printf("Actual final counter:   %d\n", counter)
}

When you run the Go program, the output will show an “Actual final counter” that is less than the expected 10,000. This is because multiple goroutines read the same value of counter before any of them can write their incremented value back, causing some increments to be lost. This is a classic race condition.

The Rust Equivalent: Compile-Time Guarantees

Now, let’s try to write the same logic in Rust. A naive, direct translation of the Go code would be blocked by the compiler. Rust’s ownership and borrowing rules prevent you from sharing mutable data between threads without a synchronization primitive.

The compiler forces you to be explicit about shared ownership (Arc) and mutual exclusion (Mutex). This makes the code slightly more verbose but guarantees at compile time that a race condition of this nature is impossible.

Rust playground

use std::sync::{Arc, Mutex};
use std::thread;

fn main() {
    // We expect the final count to be 10 * 1000 = 10,000.
    const NUM_THREADS: usize = 10;
    const INCREMENTS_PER_THREAD: usize = 1000;

    // To share data between threads, Rust requires special "wrapper" types.
    // `Arc` stands for "Atomically Referenced Counter". It allows multiple
    // owners of the same data by keeping track of how many references exist.
    // `Mutex` stands for "Mutual Exclusion". It ensures that only one thread
    // can access the data inside it at any given time.
    //
    // By combining them, we create a thread-safe, shared, mutable integer.
    let counter = Arc::new(Mutex::new(0));

    // We'll store the handles to our spawned threads in this vector.
    let mut handles = vec![];

    for _ in 0..NUM_THREADS {
        // We clone the `Arc` to give ownership of a reference to the new thread.
        // This just increments the reference count; it doesn't copy the data.
        let counter_clone = Arc::clone(&counter);

        let handle = thread::spawn(move || {
            for _ in 0..INCREMENTS_PER_THREAD {
                // --- THIS IS THE SAFE ZONE ---
                // To access the data, we must first `lock()` the mutex.
                // This call will block until the mutex is available.
                // The `.unwrap()` handles a potential error if a thread panics
                // while holding the lock.
                //
                // `num` is a "MutexGuard", a smart pointer to the data (our `i32`).
                let mut num = counter_clone.lock().unwrap();

                // Now that we have the lock, we can safely modify the data.
                // No other thread can acquire the lock until `num` goes out of scope.
                *num += 1;

                // The lock is automatically released when `num` goes out of scope
                // at the end of this statement. This is a key feature of Rust's
                // RAII (Resource Acquisition Is Initialization) pattern.
            }
        });
        handles.push(handle);
    }

    // Wait for all threads to complete their work.
    for handle in handles {
        handle.join().unwrap();
    }

    // The result will ALWAYS be 10,000 because the Mutex ensured that
    // the increments were orderly and atomic from the perspective of the counter.
    // We lock it one last time to read the final value.
    println!("Expected final counter: {}", NUM_THREADS * INCREMENTS_PER_THREAD);
    println!("Actual final counter:   {}", *counter.lock().unwrap());
}

I’ve just provided the two code examples.

The Go program is simple and concise, but it contains a hidden bug that might only appear under heavy load. The burden is on the developer to know when to use synchronization primitives like sync.Mutex.

The Rust program requires you to think about memory sharing and locking upfront. The compiler acts as a strict partner, refusing to compile code that could lead to data races. This prevents the bug entirely, shifting the safety check from runtime (or a developer’s code review) to compile-time.

Grok

To address the query, we need to demonstrate how Go’s memory model, which lacks memory isolation between goroutines and control over mutability, can lead to concurrency bugs like data races. Then, we’ll provide a Rust equivalent where its safety features prevent such issues. Let’s dive into the examples.

Go Program: Demonstrating a Data Race Bug

In Go, goroutines share the same memory space, and variables are mutable by default. Without explicit synchronization, multiple goroutines accessing and modifying shared data can result in race conditions—unpredictable behavior due to unsynchronized memory access. Here’s a Go program that illustrates this problem:

package main

import (
    "sync"
)

var counter int

func increment(wg *sync.WaitGroup) {
    for i := 0; i < 1000; i++ {
        counter++ // Unsynchronized access to shared variable
    }
    wg.Done()
}

func main() {
    var wg sync.WaitGroup
    for i := 0; i < 10; i++ {
        wg.Add(1)
        go increment(&wg) // Launch 10 goroutines
    }
    wg.Wait()        // Wait for all goroutines to complete
    println(counter) // Expected: 10000, Actual: Varies
}

Explanation

Shared Mutable State: The global variable counter is an integer accessible by all goroutines.
Goroutines: We launch 10 goroutines, each incrementing counter 1000 times. Ideally, counter should reach 10 × 1000 = 10,000.
Race Condition: Since there’s no synchronization (e.g., mutex), goroutines may read and write counter simultaneously. For example, two goroutines might read counter as 42, increment it to 43 locally, and write back 43, losing an update. This leads to a final value less than 10,000, varying per run.
Detection: Running this with go run -race main.go will flag a data race, confirming the bug, but Go compiles and executes it regardless, leaving the error to manifest at runtime.

This showcases Go’s memory model: goroutines operate on a shared, mutable heap without inherent protection, making such bugs possible if synchronization is neglected.

Rust Equivalent: Safety Through Compiler Enforcement

Rust’s ownership model and borrow checker prevent data races at compile time by enforcing strict rules: only one mutable reference or multiple immutable references to data can exist at once, and shared data across threads requires synchronization. Let’s first attempt a direct equivalent to the Go program and see why it fails, then provide a safe version.

Attempted Unsafe Rust Program (Won’t Compile)

use std::thread;

fn main() {
    let mut counter = 0;
    let handles: Vec<_> = (0..10).map(|_| {
        thread::spawn(|| {
            for _ in 0..1000 {
                counter += 1; // Attempt to mutate shared variable
            }
        })
    }).collect();

    for handle in handles {
        handle.join().unwrap();
    }
    println!("{}", counter);
}

Compilation Error

This code doesn’t compile. Rust’s compiler produces an error like:

error[E0373]: closure may outlive the current function, but it borrows `counter`, which is owned by the current function
  --> src/main.rs:6:23
   |
6  |         thread::spawn(|| {
   |                       ^^ may outlive borrowed value `counter`
7  |             for _ in 0..1000 {
8  |                 counter += 1;
   |                 ------- `counter` is borrowed here
   |
note: function requires argument type to outlive `'static`

Why It Fails: The closure captures counter by reference and is moved to a new thread, which may outlive main. Rust requires thread-spawned closures to have a 'static lifetime, but counter is a local variable. More critically, even if we adjusted lifetimes, Rust prohibits multiple threads from mutating counter without synchronization, preventing the race condition outright.

Safe Rust Program with Synchronization

To achieve the same functionality safely, we use Rust’s synchronization primitives, such as Mutex with Arc (Atomic Reference Counting) for thread-safe sharing:

use std::sync::{Arc, Mutex};
use std::thread;

fn main() {
    // Wrap counter in Arc> for safe sharing and mutation
    let counter = Arc::new(Mutex::new(0));
    
    // Spawn 10 threads
    let handles: Vec<_> = (0..10).map(|_| {
        let counter_clone = Arc::clone(&counter); // Clone Arc for each thread
        thread::spawn(move || {
            for _ in 0..1000 {
                let mut num = counter_clone.lock().unwrap(); // Lock for exclusive access
                *num += 1; // Safely increment
            }
        })
    }).collect();

    // Wait for all threads to finish
    for handle in handles {
        handle.join().unwrap();
    }

    // Print the final value
    println!("{}", *counter.lock().unwrap()); // Always prints 10000
}

Alternative with Atomic Operations

For efficiency, we can use AtomicUsize instead of Mutex, which provides lock-free atomic operations:

use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::Arc;
use std::thread;

fn main() {
    let counter = Arc::new(AtomicUsize::new(0));
    let handles: Vec<_> = (0..10).map(|_| {
        let counter_clone = Arc::clone(&counter);
        thread::spawn(move || {
            for _ in 0..1000 {
                counter_clone.fetch_add(1, Ordering::Relaxed); // Atomic increment
            }
        })
    }).collect();

    for handle in handles {
        handle.join().unwrap();
    }

    println!("{}", counter.load(Ordering::Relaxed)); // Always prints 10000
}

Explanation

Arc: Enables multiple threads to share ownership of the counter.
Mutex: Ensures only one thread modifies the counter at a time, preventing races. lock() provides exclusive access.
AtomicUsize: Offers lock-free increments via fetch_add, suitable for simple operations like this, with memory ordering (e.g., Relaxed) specifying consistency rules.
Safety Guarantee: Rust forces us to use these constructs. The unsafe version doesn’t compile, so we must write safe code, ensuring counter reaches exactly 10,000 every time.

Comparison and Conclusion

Go’s Vulnerability: The Go program compiles despite the race condition, relying on the programmer to add synchronization (e.g., sync.Mutex) manually. Forgetting this leads to subtle, runtime bugs detectable only with tools like the race detector.
Rust’s Prevention: Rust prevents the bug at compile time. The naive attempt fails due to ownership and borrowing rules, forcing the use of Mutex or AtomicUsize. This eliminates data races by design, aligning with the query’s requirement to show a safe Rust equivalent.

Thus, Go’s shared, mutable memory model permits concurrency bugs, while Rust’s type system and ownership enforce safety, making the latter immune to this class of errors when using its safe abstractions.

ChatGPT

Great question. You’re right: Go’s memory model offers concurrency through goroutines and channels, but it does not enforce isolation or immutability at the type level. This makes it easy to accidentally write code with data races if shared memory is accessed without synchronization.

Rust, on the other hand, uses ownership, borrowing, and type system checks at compile time to prevent data races by design.

✅ Goal

We will:

Write a Go program with a race condition.
Rewrite it in Rust where the same code won’t even compile until it’s made race-free.

⚠️ Go Version (With Race Condition)

package main

import (
    "fmt"
    "time"
)

var shared = 0

func write() {
    for i := 0; i < 1000; i++ {
        shared++
    }
}

func main() {
    go write()
    go write()
    time.Sleep(100 * time.Millisecond)
    fmt.Println("Final value:", shared)
}

🔍 What’s Wrong?

This code starts two goroutines that both increment a global variable shared without synchronization. This will almost certainly lead to a data race.

You can verify this by running:

go run -race main.go

You’ll see something like:

WARNING: DATA RACE

✅ Rust Version (Compile-time Safe)

✅ 1. Rust will prevent the race at compile time:

use std::thread;

fn main() {
    let mut shared = 0;

    let handle1 = thread::spawn(|| {
        for _ in 0..1000 {
            // Cannot access shared here
        }
    });

    let handle2 = thread::spawn(|| {
        for _ in 0..1000 {
            // Cannot access shared here
        }
    });

    // This won’t compile: `shared` cannot be shared across threads without synchronization.
    // handle1.join().unwrap();
    // handle2.join().unwrap();

    println!("Final value: {}", shared);
}

🛑 Error Message

Rust will say something like:

error[E0373]: closure may outlive the current function, but it borrows `shared`, which is owned by the current function

✅ Rust Version (Correct and Safe with Synchronization)

You must use thread-safe types, like Mutex or Atomic*.

use std::sync::{Arc, Mutex};
use std::thread;

fn main() {
    let shared = Arc::new(Mutex::new(0));

    let mut handles = vec![];

    for _ in 0..2 {
        let shared_clone = Arc::clone(&shared);
        let handle = thread::spawn(move || {
            for _ in 0..1000 {
                let mut num = shared_clone.lock().unwrap();
                *num += 1;
            }
        });
        handles.push(handle);
    }

    for handle in handles {
        handle.join().unwrap();
    }

    println!("Final value: {}", *shared.lock().unwrap());
}

✅ What’s Different

Arc is an atomic reference counter for shared ownership across threads.
Mutex ensures only one thread can mutate at a time.
Rust forces you to wrap shared state in safe abstractions.

✅ Summary

Language	Code Compiles?	Data Race Possible?	Safety Mechanism
Go	✅	✅ Yes	Manual sync
Rust	❌ (unsafe version) / ✅ (safe version)	❌ Never (if it compiles)	Enforced via compiler

Would you like to see the same kind of example using channels instead of shared memory?

prompt: what does “-race” do?

The -race flag in Go enables the race detector, a powerful tool that detects data races in your program at runtime.

✅ What it does

When you compile or run a Go program with -race, the Go toolchain:

Instruments your code to track memory accesses.
Monitors all reads and writes to shared memory.
Detects concurrent accesses where:
- At least one access is a write,
- And they happen in different goroutines,
- And there’s no synchronization between them.

✅ How to use it

Run your Go program with race detection:

go run -race main.go

Or build it with race detection:

go build -race -o myapp main.go
./myapp

✅ Example output

For a program with a data race, you’ll see output like:

==================
WARNING: DATA RACE
Read at 0x00c0000b6008 by goroutine 6:
  main.write()
      /path/to/main.go:10 +0x3c

Previous write at 0x00c0000b6008 by goroutine 5:
  main.write()
      /path/to/main.go:10 +0x3c
==================

It tells you:

The memory address involved.
The line of code that caused the access.
Which goroutines are racing.

✅ When to use it

During development or testing.
Before releasing production binaries.
In CI pipelines to catch unsafe concurrency early.

⚠️ Limitations

It slows down execution (2–20× slower).
Uses more memory.
Doesn’t catch every race — just ones that actually happen during that run.

✅ Summary

Flag	Purpose
`-race`	Enables runtime detection of data races in Go programs

Would you like help detecting or fixing a specific race in your code? The -race flag in Go enables the race detector, a powerful tool that detects data races in your program at runtime.

✅ What it does

When you compile or run a Go program with -race, the Go toolchain:

Instruments your code to track memory accesses.
Monitors all reads and writes to shared memory.
Detects concurrent accesses where:
- At least one access is a write,
- And they happen in different goroutines,
- And there’s no synchronization between them.

✅ How to use it

Run your Go program with race detection:

go run -race main.go

Or build it with race detection:

go build -race -o myapp main.go
./myapp

✅ Example output

For a program with a data race, you’ll see output like:

==================
WARNING: DATA RACE
Read at 0x00c0000b6008 by goroutine 6:
  main.write()
      /path/to/main.go:10 +0x3c

Previous write at 0x00c0000b6008 by goroutine 5:
  main.write()
      /path/to/main.go:10 +0x3c
==================

It tells you:

The memory address involved.
The line of code that caused the access.
Which goroutines are racing.

✅ When to use it

During development or testing.
Before releasing production binaries.
In CI pipelines to catch unsafe concurrency early.

⚠️ Limitations

It slows down execution (2–20× slower).
Uses more memory.
Doesn’t catch every race — just ones that actually happen during that run.

✅ Summary

Flag	Purpose
`-race`	Enables runtime detection of data races in Go programs

Would you like help detecting or fixing a specific race in your code?

Enhancing Java Application Performance and Resource Utilization Through JDK Enhancement Proposals - Gemini Deep Research

2025-04-23T11:10:00+00:00

Prompt: Make a research for which JDK Enhancement Proposals, known as JEPs, try to improve the CPU and memory usage of Java applications.

I. Introduction: The Significance of Resource Efficiency in Java Development and the Role of JEPs

In the realm of modern software development, the efficient utilization of computational resources, particularly CPU and memory, stands as a cornerstone of application performance, scalability, and cost-effectiveness. Java applications, widely deployed across diverse environments ranging from enterprise servers to embedded systems and cloud-native architectures, are particularly sensitive to resource consumption. Inefficient resource usage can manifest in various detrimental ways, including elevated operational expenditures due to increased infrastructure demands, diminished application responsiveness leading to poor user experiences, and limitations in the ability of applications to scale effectively under high load. Addressing these challenges necessitates a continuous focus on optimizing how Java applications consume and manage CPU cycles and memory.

The Java Development Kit (JDK) Enhancement Proposal (JEP) process serves as the primary and structured pathway for introducing significant improvements and innovations to the Java platform and its associated ecosystem.1 These proposals, which are meticulously documented and reviewed by the Java community, cover a broad spectrum of potential enhancements, encompassing new language features designed to improve developer productivity, Application Programming Interface (API) extensions aimed at providing richer functionality, and crucial modifications to the underlying Java Virtual Machine (JVM) that are often geared towards bolstering performance and resource management.3 Within this landscape of ongoing development, numerous JEPs have been specifically conceived and implemented with the explicit goal of enhancing the CPU and memory efficiency of Java applications. This report undertakes a focused analysis of these key JEPs, drawing upon provided research materials to elucidate the proposed mechanisms, objectives, and potential impacts of these enhancements on the Java platform.

II. Methodology: Approach to Identifying and Analyzing Performance-Enhancing JEPs

To effectively address the user's query regarding JDK Enhancement Proposals that aim to improve the CPU and memory usage of Java applications, a systematic approach was employed, mirroring the steps outlined in the initial request. The investigation commenced with the acquisition of a comprehensive list of JEPs by examining the provided research snippets.1 Notably, snippet 4 presented a particularly detailed and categorized list of JEPs, offering a valuable foundation for this analysis.

Following the initial list acquisition, a keyword-based filtering process was conducted. This involved scrutinizing the titles and brief descriptions of the JEPs for the presence of terms directly related to the user's interest, such as "performance," "CPU," "memory," "optimization," "resource usage," and "efficiency," alongside other semantically similar terms. This filtering stage served to narrow down the extensive list to a more manageable set of potentially relevant proposals that explicitly indicated an intention to improve resource utilization.

For each JEP identified through the filtering process, the subsequent step involved retrieving more detailed descriptions and objectives. This was accomplished by systematically searching the provided research snippets for mentions of these specific JEP numbers. Snippets from 10 onwards proved particularly useful in this stage, often containing more in-depth information, especially for JEPs associated with recent JDK releases.

With the detailed descriptions in hand, a careful analysis was performed to ascertain whether each JEP explicitly mentioned improvements to CPU and/or memory usage. The focus on explicit mentions was a deliberate choice to ensure the report's accuracy and direct relevance to the user's query. This stage involved a thorough reading of the objectives, descriptions, and motivations outlined in the JEP documentation and associated articles.

Finally, the JEPs that explicitly targeted CPU improvements, memory improvements, or both were compiled into categorized lists. For each JEP in these lists, a summary of the key changes proposed and the underlying mechanisms intended to achieve these improvements was developed, drawing directly from the analyzed research materials.

III. JDK Enhancement Proposals Targeting CPU Usage

The analysis of the provided research material revealed several JDK Enhancement Proposals that primarily focus on improving the CPU usage of Java applications. These JEPs span various aspects of the JVM and core libraries, reflecting a multifaceted approach to enhancing performance.

JEP 509: JFR CPU-Time Profiling (Experimental) 10 introduces an experimental feature to the JDK Flight Recorder (JFR) specifically designed to capture CPU-time profiling information on Linux operating systems. The primary goal of this JEP is to empower developers with the ability to precisely measure the CPU cycles consumed by specific elements within their Java programs.14 This capability is crucial for identifying performance bottlenecks and pinpointing areas where CPU optimization efforts would be most effective. As highlighted in 14, without such profiling data, developers might inadvertently focus on optimizing program elements that have a negligible impact on overall performance, leading to wasted effort. By providing detailed insights into CPU consumption at a granular level, JEP 509 enables developers to make data-driven decisions regarding performance tuning and CPU efficiency.

JEP 475: Late Barrier Expansion for G1 15 aims to simplify the implementation of the G1 garbage collector's barriers, which are responsible for recording information about application memory accesses. This simplification is achieved by shifting the expansion of these barriers from an early stage to a later point in the C2 JIT (Just-In-Time) compilation pipeline.18 The primary goals of this JEP, as outlined in 18, include reducing the execution time of the C2 compiler when the G1 garbage collector is in use and enhancing the comprehensibility of G1 barriers for HotSpot developers. Furthermore, preliminary experiments have indicated that a naive implementation of late barrier expansion can already achieve code quality that is comparable to code optimized by C2 using the earlier expansion strategy.18 This enhancement directly contributes to CPU efficiency by reducing the overhead associated with the C2 compilation process itself, thereby decreasing the time and CPU resources required for application startup and warm-up phases.16 The JEP also strives to ensure that the C2 compiler preserves critical invariants regarding the relative ordering of memory accesses, safepoints, and barriers.18

JEP 423: Region Pinning for G1 5 addresses the challenge of reducing latency in Java applications that interact with unmanaged code through the Java Native Interface (JNI). Traditionally, the default garbage collector, G1, would entirely disable garbage collection during JNI critical regions to prevent the movement of critical objects. This approach could lead to significant latency issues, including thread stalling. JEP 423 introduces the concept of region pinning, allowing G1 to pin specific memory regions containing critical objects during garbage collection operations.23 By preventing the evacuation of these pinned regions, the JEP eliminates the need to disable garbage collection during JNI critical regions, thus reducing thread stalling, minimizing additional latency to start garbage collection, and avoiding regressions in GC pause times.22 This enhancement directly improves CPU responsiveness and overall application performance in scenarios involving frequent JNI interactions.

JEP 416: Reimplement Core Reflection with Method Handles 7 proposes a significant internal change to the Java platform by reimplementing the core reflection mechanisms (java.lang.reflect.Method, Constructor, and Field) on top of java.lang.invoke method handles.28 The primary motivation behind this JEP is to reduce the maintenance and development costs associated with both the reflection and method handle APIs.27 The new implementation performs direct invocations of method handles for specific reflective objects.27 Notably, microbenchmarks have demonstrated that when Method, Constructor, and Field instances are held in static final fields, the performance of the new implementation is significantly faster compared to the old implementation.28 While some performance degradation might be observed in microbenchmarks when these instances are held in non-constant fields, real-world benchmarks using established libraries have not shown significant performance regressions.27 This reimplementation not only streamlines the internal workings of reflection but also reduces the cost of upgrading reflection support for new language features and simplifies the HotSpot VM by removing the special treatment of MagicAccessorImpl subclasses.27

JEP 376: ZGC: Concurrent Thread-Stack Processing 30 addresses the challenge of minimizing garbage collection pauses in the Z Garbage Collector (ZGC). The JEP proposes to move the processing of thread stacks from safepoints to a concurrent phase.30 The goals of this enhancement include making stack processing lazy, cooperative, concurrent, and incremental, as well as removing all other per-thread root processing from ZGC safepoints.30 By achieving these goals, the JEP aims to ensure that the time spent inside GC safepoints does not exceed one millisecond, even on systems with large memory configurations.30 The implementation introduces a stack watermark barrier, which facilitates the concurrent processing of thread stacks.30 This change allows the ZGC to process all roots in the JVM concurrently, rather than relying on stop-the-world pauses, thereby significantly reducing the duration of GC pauses and improving overall CPU utilization by allowing application threads to execute more continuously.31

JEP 346: Promptly Return Unused Committed Memory from G1 33 focuses on enhancing the G1 garbage collector's ability to manage memory resources more efficiently. The JEP proposes that G1 should automatically return unused Java heap memory to the operating system when the application is idle.34 This behavior is particularly beneficial in containerized environments where resource consumption directly translates to operational costs.34 During periods of application inactivity, G1 will periodically assess the Java heap usage and, if appropriate, initiate a process to uncommit unused portions of the heap, returning this memory to the underlying operating system.34 This feature can be controlled through various options, including the G1PeriodicGCInvokesConcurrent option, which determines the type of periodic garbage collection to be performed.34 While the primary focus of this JEP is on memory management, the ability to dynamically adjust the JVM's memory footprint based on application activity can indirectly contribute to better CPU utilization by reducing the overhead associated with managing a larger-than-necessary heap.

JEP 307: Parallel Full GC for G1 39 addresses the worst-case latency scenarios that can occur with the G1 garbage collector. While G1 is designed to largely avoid full garbage collections, they can still occur when concurrent collections are unable to reclaim memory quickly enough. The original implementation of the full GC in G1 utilized a single-threaded mark-sweep-compact algorithm. JEP 307 proposes to improve G1's worst-case latencies by making this full GC phase parallel.41 By parallelizing the mark-sweep-compact algorithm and utilizing the same number of threads as the young and mixed collections (controlled by the -XX:ParallelGCThreads option), this enhancement aims to significantly reduce the duration of full GC pauses, leading to improved CPU responsiveness and overall application performance under heavy memory pressure.40

JEP 246: Leverage CPU Instructions for GHASH and RSA 43 targets the performance of specific cryptographic operations that are frequently used in Java applications, particularly in secure communication protocols like TLS. The JEP proposes to improve the performance of GHASH (used in the GCM cipher mode) and RSA cryptographic operations by leveraging recently introduced CPU instructions available on SPARC and Intel x64 architectures.43 By utilizing hardware-level acceleration for these computationally intensive tasks, the JEP aims to significantly improve the speed and efficiency of cryptographic operations within the JVM, leading to reduced CPU utilization and faster execution of secure applications.

JEP 143: Improve Contended Locking 3 focuses on optimizing the performance of contended Java object monitors, which are fundamental for thread synchronization in Java. The JEP explores various areas for improvement, including field reordering, cache line alignment, and the speed of monitor enter, exit, and notification operations.47 The goal is to enhance the overall performance of contended monitors, which can become a bottleneck in highly concurrent applications, leading to better CPU utilization by reducing the overhead associated with thread synchronization.48

JEP Number	Title	JDK Release Status	Key CPU Improvement Mechanisms
509	JFR CPU-Time Profiling (Experimental)	Candidate	Enhances JDK Flight Recorder to capture CPU-time profiling information on Linux, enabling better performance analysis.
475	Late Barrier Expansion for G1	24	Simplifies G1 GC barriers by shifting expansion to later in the JIT pipeline, reducing overhead and improving code efficiency.
423	Region Pinning for G1	22	Allows G1 GC to pin memory regions during JNI critical regions, avoiding the need to disable GC and reducing thread stalling.
416	Reimplement Core Reflection with Method Handles	18	Replaces bytecode generation for reflection with method handles, potentially improving performance for reflective operations.
376	ZGC: Concurrent Thread-Stack Processing	16	Moves ZGC thread-stack processing to a concurrent phase, significantly reducing stop-the-world pause times.
346	Promptly Return Unused Committed Memory from G1	12	Enhances G1 to automatically return unused heap memory to the OS, indirectly improving CPU usage by managing a more appropriately sized heap.
307	Parallel Full GC for G1	10	Parallelizes the full garbage collection phase in G1, reducing the duration of the longest GC pauses.
246	Leverage CPU Instructions for GHASH and RSA	9	Improves the performance of GHASH and RSA cryptographic operations by utilizing specific CPU instructions.
143	Improve Contended Locking	9	Optimizes the performance of contended Java object monitors, reducing synchronization overhead in multithreaded applications.

IV. JDK Enhancement Proposals Targeting Memory Usage

Several JEPs have been specifically designed to improve the memory usage characteristics of Java applications. These proposals address various aspects of memory management within the JVM and the core libraries.

JEP 450: Compact Object Headers (Experimental) 51 introduces an experimental feature aimed at reducing the size of object headers in the HotSpot JVM. On 64-bit architectures, object headers, which contain metadata about objects, can occupy a significant amount of memory (between 96 and 128 bits). This JEP proposes to reduce this size down to 64 bits.53 This reduction is achieved by compacting the mark word and the class pointer into a single 64-bit header.58 By decreasing the size of each object header from 12 bytes to 8 bytes 51, this feature aims to reduce the overall heap size required by Java applications, improve deployment density by allowing more applications to run within the same memory footprint, and potentially increase data locality, leading to better performance due to improved CPU cache utilization.51 This JEP is inspired by Project Lilliput, which has a broader goal of reducing the memory footprint of the JVM.55 As it is an experimental feature in JDK 24, it must be explicitly enabled using specific JVM options.51

JEP 439: Generational ZGC 5 extends the Z Garbage Collector (ZGC) by introducing the concept of separate generations for young and old objects.63 This enhancement allows ZGC to more frequently collect young objects, which have a higher tendency to die young, thus reclaiming memory more efficiently.61 The primary goals of this JEP include lowering the risks of allocation stalls, reducing the required heap memory overhead, and decreasing the CPU overhead associated with garbage collection.62 While preserving the essential low-pause-time characteristics of non-generational ZGC, the generational approach splits the heap into two logical generations that are collected independently.63 This strategy leverages the weak generational hypothesis, which posits that younger objects are more likely to be garbage.61 By focusing collection efforts on the young generation, ZGC aims to improve overall memory efficiency and reduce the frequency of more resource-intensive old generation collections.

JEP 387: Elastic Metaspace 4 was delivered as a feature in JDK 16. While the provided snippets do not offer detailed information about this JEP, the name itself suggests improvements in the dynamic management of Metaspace. Metaspace is the area of native memory used by the HotSpot JVM to store class metadata. An "elastic" Metaspace likely indicates enhancements that allow this memory region to more dynamically resize based on the application's needs, potentially reducing the overall memory footprint by more efficiently managing the allocation and deallocation of memory used for class-related data. This could involve a greater ability for Metaspace to return unused memory to the operating system, further optimizing memory utilization.

JEP 377: ZGC: A Scalable Low-Latency Garbage Collector (Production) 65 marked a significant step in the maturity of the Z Garbage Collector. Introduced as an experimental feature in JDK 11 (JEP 333), JEP 377 promoted ZGC to a production-ready feature in JDK 15.70 ZGC was designed with the primary goals of achieving low GC pause times (not exceeding 10ms), handling heaps ranging from a few hundred megabytes to multiple terabytes, and limiting the reduction in application throughput compared to using G1.73 As a concurrent, single-generation, region-based, NUMA-aware, and compacting collector 73, ZGC's design inherently involves efficient memory management to support these goals, particularly when dealing with very large heaps. Features like concurrent class unloading and the ability to uncommit unused memory (introduced in JEP 351 69) further contribute to ZGC's memory efficiency.

JEP 351: ZGC: Uncommit Unused Memory (Experimental) 74 specifically addresses the memory footprint of applications using the Z Garbage Collector. This experimental JEP enhances ZGC to return unused heap memory to the operating system.74 This is particularly beneficial for applications running in environments where memory resources are a concern, such as containerized deployments or scenarios where applications might experience periods of low activity.74 ZGC achieves this by identifying ZPages (heap regions) that have been unused for a specified period and evicting them from its page cache, subsequently uncommitting the memory associated with these pages.74 This uncommit capability is enabled by default but can be controlled using the -XX:ZUncommitDelay= option to configure the timeout for unused memory.74 Importantly, ZGC will never uncommit memory such that the heap size falls below the minimum specified size (-Xms).74

JEP 254: Compact Strings 79 introduced a more space-efficient internal representation for java.lang.String objects and related classes like StringBuilder and StringBuffer.79 Prior to this JEP, strings in Java used a char array with two bytes per character (UTF-16 encoding). However, many String objects, especially in certain applications, contain only Latin-1 characters, which can be represented using a single byte. JEP 254 changes the internal storage to a byte array plus an encoding-flag field.81 The encoding flag indicates whether the characters are stored as ISO-8859-1/Latin-1 (one byte per character) or as UTF-16 (two bytes per character), based on the string's content.79 This optimization can reduce the memory required for String objects containing only single-byte characters by up to 50% 79, leading to significant overall memory savings in applications with a large number of strings. This was primarily an implementation change without any modifications to public APIs.81

JEP 192: String Deduplication in G1 84 focuses on reducing memory consumption by addressing the common issue of duplicate String objects on the Java heap. In many large-scale Java applications, a significant portion of the heap's live data set is occupied by String objects, and a substantial fraction of these are often duplicates (i.e., they have the same content).84 JEP 192 enhances the G1 garbage collector to automatically and continuously deduplicate these redundant String instances.85 The deduplication process involves making duplicate String objects share the same underlying char array.85 This is achieved by a background deduplication thread that processes a queue of candidate String objects identified during garbage collection. A hashtable is used to track unique character arrays, and when a duplicate is found, the String object is updated to point to the existing array, allowing the original array to be garbage collected.85 This feature, which is only implemented for the G1 garbage collector, is expected to result in an average heap reduction of around 10%.85

JEP 149: Reduce Core-Library Memory Usage 88 represents a broad effort to decrease the dynamic memory used by the core-library classes within the JDK without negatively impacting performance.89 This JEP explored various potential improvements to library classes and their native implementations to reduce heap usage.89 Some of the candidate techniques investigated included reducing the size of java.lang.Class objects by moving infrequently used fields (related to reflection and annotations) to separate helper classes 89, and potentially disabling the reflection compiler, which generates bytecode for method calls to improve performance but also contributes to the dynamic footprint.89 Other areas of investigation included tuning the initial sizes of internal tables, caches, and buffers to minimize memory wastage.89 The overall goal was to identify and implement memory reductions that are effective, maintainable, and have minimal performance impact on typical workloads.89

JEP 147: Reduce Class Metadata Footprint 92 specifically aimed at decreasing the memory footprint of class metadata within the HotSpot JVM, with a particular focus on improving performance on small devices.95 This JEP drew inspiration from memory-reduction techniques used in CVM, an embedded JVM.95 Some of the strategies explored included keeping rarely-used fields out of the core class, method, and field data structures, using the smallest possible data types for struct fields, encoding certain fields to fit into smaller types, carefully grouping fields of similar sizes to avoid unnecessary padding, using 16-bit offsets instead of 32-bit pointers for some data, and employing unions for groups of fields where only one is in use at a time.95 The goal of this JEP was to achieve a 25% reduction in the memory footprint of class, method, and field metadata (excluding bytecodes and interned strings) without causing more than a 1% regression in application startup and runtime performance.95

JEP 122: Remove the Permanent Generation 96 represents a significant architectural change in the HotSpot JVM's memory management. Prior to this JEP (delivered in Java 8), class metadata, interned Strings, and class static variables were stored in a dedicated portion of the heap called the permanent generation. JEP 122 proposed and implemented the removal of this permanent generation, moving class metadata to native memory and interned Strings and class statics to the Java heap.101 This change simplified memory management by eliminating the need to tune the size of the permanent generation, a common source of confusion and performance issues. By managing class metadata in native memory and leveraging the existing garbage collection mechanisms for interned Strings and statics, this JEP indirectly impacted memory usage patterns and paved the way for further memory-related optimizations in subsequent JDK releases.

JEP Number	Title	JDK Release	Key Memory Improvement Mechanisms
450	Compact Object Headers (Experimental)	24	Reduces the size of object headers from 96-128 bits to 64 bits, decreasing heap size and improving density.
439	Generational ZGC	21	Extends ZGC to use generational garbage collection, focusing on young objects to reduce memory overhead.
387	Elastic Metaspace	16	Improves dynamic resizing and management of Metaspace for more efficient memory usage of class metadata.
377	ZGC: A Scalable Low-Latency Garbage Collector (Production)	15	Design inherently involves efficient memory management for large heaps with low latency.
351	ZGC: Uncommit Unused Memory (Experimental)	13	Allows ZGC to return unused heap memory to the operating system, reducing memory footprint.
254	Compact Strings	9	Changes internal string representation to use byte arrays for Latin-1 characters, reducing string memory usage.
192	String Deduplication in G1	8u20	Enhances G1 GC to automatically deduplicate duplicate String instances, reducing heap live data.
149	Reduce Core-Library Memory Usage	8	Aims to reduce dynamic memory usage across core library classes through various techniques.
147	Reduce Class Metadata Footprint	8	Reduces the memory footprint of HotSpot's class metadata, especially beneficial for small devices.
122	Remove the Permanent Generation	8	Removes the permanent generation, simplifying memory management and moving metadata to native memory and heap.

V. JDK Enhancement Proposals Targeting Both CPU and Memory Usage

Several JEPs have been identified that aim to improve both the CPU and memory usage of Java applications, often through mechanisms that have a dual impact on resource efficiency.

JEP 483: Ahead-of-Time Class Loading & Linking 55 focuses on optimizing the application startup process. By making the classes of an application instantly available in a loaded and linked state when the HotSpot JVM starts, this JEP aims to improve startup time, which directly translates to reduced CPU cycles spent during initialization.106 This is achieved by monitoring an application during one run and storing the loaded and linked forms of all classes in a cache for use in subsequent runs.106 Furthermore, by pre-loading and linking classes, the memory footprint at runtime can potentially be reduced, as these classes are readily available and do not need to be loaded and linked on demand for each application run.108 The AOT cache itself does consume memory, but the trade-off results in faster startup and potentially better overall runtime performance due to reduced overhead.

JEP 444: Virtual Threads 5 introduces a new concurrency model to the Java platform. Virtual threads are lightweight threads that dramatically reduce the effort of writing, maintaining, and observing high-throughput concurrent applications.62 By allowing a much larger number of concurrent tasks to be handled with significantly less overhead compared to traditional platform threads, virtual threads can lead to improved CPU utilization, especially in I/O-bound workloads where many threads might otherwise be blocked waiting for operations to complete.112 Moreover, virtual threads typically have a much smaller memory footprint compared to platform threads, as they do not require a dedicated OS thread and have smaller stack sizes.114 This can result in substantial memory savings for applications that need to manage a large number of concurrent operations.

JEP 431: Sequenced Collections While the provided snippets do not explicitly link this JEP to direct CPU or memory performance enhancements, research indicates that this JEP introduces new interfaces to represent collections with a defined encounter order. This can indirectly impact performance in scenarios where the order of elements in a collection is semantically important and needs to be consistently maintained, potentially avoiding the need for custom ordering logic that could consume additional CPU cycles. However, the primary focus of this JEP is on API design and functionality rather than direct performance optimization.

JEP 350: Dynamic CDS Archives 116 extends the Class-Data Sharing (CDS) feature to allow for the dynamic archiving of application classes at the end of Java application execution. This builds upon the existing AppCDS (Application Class-Data Sharing) functionality. The dynamically generated archive, created on top of the default system archive, contains all loaded application classes and library classes not present in the base archive.116 In subsequent runs of the same application, this dynamic CDS archive can be memory-mapped, making the application's classes instantly available in a loaded and linked state.116 This leads to improved startup time, reducing the CPU cycles spent on class loading and linking. Furthermore, by sharing this archive across multiple JVM instances running the same application, the overall memory footprint can be reduced, as the class metadata is shared rather than being loaded separately by each instance.117

JEP 310: Application Class-Data Sharing 42 enhanced the existing Class-Data Sharing (CDS) feature by allowing application classes to be placed in the shared archive.122 When the JVM starts, it loads classes into memory as a preliminary step, which can introduce latency, especially for applications with many classes. AppCDS addresses this by allowing application classes to be pre-processed and stored in a shared archive.119 This archive can then be memory-mapped at JVM startup, making the application's classes instantly available in a loaded and linked state.122 This significantly reduces the application startup time, leading to lower CPU consumption during initialization. Moreover, when multiple JVMs on the same host use the same shared archive, the memory footprint can be reduced, as the class data is shared across processes.119

JEP 197: Segmented Code Cache 47 introduced a significant change to the organization of the JVM's code cache. Instead of a single monolithic code heap, the code cache is divided into distinct segments, each containing compiled code of a particular type (non-method, profiled, and non-profiled code).124 This segmentation aims to improve performance (CPU) by enabling shorter sweep times during code cache management, potentially leading to better instruction cache locality and reduced fragmentation of highly optimized code.124 Additionally, by providing better control over the JVM's memory footprint through the ability to manage the sizes of these different code heap segments, this JEP also contributes to improved memory usage.124

JEP 158: Unified JVM Logging 126 introduced a common logging system for all components of the JVM. While the primary goal of this JEP was to provide a more flexible and configurable logging framework, efficient logging can indirectly contribute to better CPU and memory usage. By allowing for more granular control over the logging output and the ability to easily diagnose performance issues, developers and system administrators can more quickly identify and address areas of inefficiency in their applications, potentially leading to optimizations that reduce both CPU and memory consumption.

JEP Number	Title	JDK Release	Key CPU Improvement Mechanisms	Key Memory Improvement Mechanisms
483	Ahead-of-Time Class Loading & Linking	24	Improves startup time by pre-loading and linking classes.	Reduces memory footprint by sharing loaded and linked class data in a cache.
444	Virtual Threads	21	Improves CPU utilization for I/O-bound tasks with lightweight threads.	Reduces memory footprint per concurrent task compared to platform threads.
350	Dynamic CDS Archives	13	Improves startup time by dynamically archiving and sharing application classes.	Reduces memory footprint by sharing the archive across JVM instances.
310	Application Class-Data Sharing	10	Improves startup time by memory-mapping shared application class data.	Reduces memory usage by sharing class data across multiple JVMs.
197	Segmented Code Cache	9	Improves performance through better code locality and reduced fragmentation.	Provides better control over JVM memory footprint by managing code heap segments.
158	Unified JVM Logging	9	Indirectly improves efficiency by facilitating quicker identification and resolution of performance bottlenecks.	Indirectly improves efficiency by facilitating quicker identification and resolution of memory-related issues.

VI. Conclusion: Key Trends and Future Directions in Java Performance Optimization through JEPs

The analysis of these JDK Enhancement Proposals reveals several key trends in the ongoing efforts to optimize the performance and resource efficiency of the Java platform. A significant emphasis continues to be placed on refining the garbage collection mechanisms, with advancements seen across different collectors like G1 and ZGC. These improvements target both reducing CPU overhead through concurrent processing and parallelization, as well as minimizing memory footprint through techniques like generational collection and the ability to uncommit unused memory.

Memory footprint reduction remains a critical area of focus, with JEPs like Compact Object Headers and Compact Strings directly aiming to decrease the memory consumed by fundamental data structures. The Class-Data Sharing (CDS) feature, along with its extensions for applications and dynamic archiving, demonstrates a sustained effort to improve both startup time and memory usage by enabling the sharing of class metadata across JVM instances.

The introduction of virtual threads in JEP 444 represents a notable advancement in how Java handles concurrency, offering a more resource-efficient model for managing large numbers of concurrent tasks, particularly in I/O-bound applications. This signifies an adaptation to modern application architectures and the demands of high-throughput systems.

Furthermore, the ongoing optimization of specific runtime aspects, such as reflection and cryptography, highlights a comprehensive strategy for improving performance across various facets of Java application execution. The progression of features from experimental to production-ready status, as seen with ZGC and Shenandoah (mentioned in 67), indicates a mature and iterative approach to platform enhancement.

Looking towards the future, the "Lookahead" section in snippets like 5 hints at continued evolution in areas such as scoped values and structured concurrency, which could further impact resource efficiency. Moreover, ongoing projects like Valhalla 52, with its aim to introduce value types, promise additional opportunities for enhancing both the performance and memory usage of Java applications by allowing for more efficient data representation and manipulation. The continuous stream of JEPs focused on these critical areas underscores the Java platform's commitment to remaining a high-performance and resource-efficient choice for a wide range of development needs.

Works cited

JEP 1: JDK Enhancement-Proposal & Roadmap Process, accessed April 23, 2025, https://openjdk.org/jeps/1
JDK Enhancement Proposal - Wikipedia, accessed April 23, 2025, https://en.wikipedia.org/wiki/JDK_Enhancement_Proposal
OpenJDK Project JEPs (JDK Enhancement Proposals) - VM Options Explorer, accessed April 23, 2025, https://chriswhocodes.com/jepmap.html
JEP 0: JEP Index - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/0
A list of major Java and JVM features since JDK 17 to 22 - Digma AI, accessed April 23, 2025, https://digma.ai/a-list-of-major-java-and-jvm-features-since-jdk-17-to-22/
JDK Releases - Java, accessed April 23, 2025, https://www.java.com/releases/
The not-so-hidden gems in Java 18: The JDK Enhancement Proposals - Oracle Blogs, accessed April 23, 2025, https://blogs.oracle.com/javamagazine/post/java-18-gems-jdks
A categorized list of all Java and JVM features since JDK 8 to 21 - Advanced Web Machinery, accessed April 23, 2025, https://advancedweb.hu/a-categorized-list-of-all-java-and-jvm-features-since-jdk-8-to-21/
How to Read a JDK Enhancement Proposal - Inside Java Newscast #74 // nipafx, accessed April 23, 2025, https://nipafx.dev/inside-java-newscast-74/
OpenJDK News Roundup: Compact Source, Module Import Declarations, Key Derivation, Scoped Values - InfoQ, accessed April 23, 2025, https://www.infoq.com/news/2025/04/jdk-news-roundup-apr14-2025/
OpenJDK News: Source Compacting, Module Declarations, and More - MojoAuth, accessed April 23, 2025, https://mojoauth.com/blog/openjdk-news-source-compacting-module-declarations-and-more/
Eight JEPs Advance to Candidate Status as Java Community Prepares for JDK 25, accessed April 23, 2025, https://adtmag.com/articles/2025/04/23/eight-jeps-advance-to-candidate-status.aspx
JEP 509: JFR CPU-Time Profiling (Experimental) - daily.dev, accessed April 23, 2025, https://app.daily.dev/posts/jep-509-jfr-cpu-time-profiling-experimental--cbiu49npu
JEP 509: JFR CPU-Time Profiling (Experimental) - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/509
Oracle Releases Java 24, accessed April 23, 2025, https://www.oracle.com/news/announcement/oracle-releases-java-24-2025-03-18/
Performance Improvements in JDK 24 - Inside.java, accessed April 23, 2025, https://inside.java/2025/03/19/performance-improvements-in-jdk24/
What's new in Java 24 - PVS-Studio, accessed April 23, 2025, https://pvs-studio.com/en/blog/posts/java/1233/
JEP 475: Late Barrier Expansion for G1 - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/475
JEP proposed to target JDK 24: 475: Late Barrier Expansion for G1 - OpenJDK mailing lists, accessed April 23, 2025, https://mail.openjdk.org/pipermail/jdk-dev/2024-October/009432.html
Java 22 available: features for improved performance - Techzine Europe, accessed April 23, 2025, https://www.techzine.eu/news/devops/117889/java-22-available-features-for-improved-performance/
An overview of JDK 22 features - BellSoft, accessed April 23, 2025, https://bell-sw.com/blog/an-overview-of-jdk-22-features/
What's New With Java 22 | JRebel by Perforce, accessed April 23, 2025, https://www.jrebel.com/blog/whats-new-java-22
JEP 423: Introducing Region Pinning to G1 Garbage Collector in OpenJDK - InfoQ, accessed April 23, 2025, https://www.infoq.com/news/2023/12/region-pinning-to-g1-gc/
[JDK-8318706] Implement JEP 423: Region Pinning for G1 - Java Bug System, accessed April 23, 2025, https://bugs.openjdk.org/browse/JDK-8318706
JEP 423: Region Pinning for G1 - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/423
JDK-8277244 Release Note: JEP 416: Reimplement Core Reflection With Method Handles - Java Bug Database, accessed April 23, 2025, https://bugs.java.com/bugdatabase/view_bug?bug_id=8277244
JDK-8266010 JEP 416: Reimplement Core Reflection with Method Handles - Bug ID, accessed April 23, 2025, https://bugs.java.com/bugdatabase/view_bug?bug_id=8266010
JEP 416: Reimplement Core Reflection with Method Handles - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/416
[JDK-8271820] Implementation of JEP 416: Reimplement Core Reflection with Method Handle - Java Bug System, accessed April 23, 2025, https://bugs.openjdk.org/browse/JDK-8271820
JEP 376: ZGC: Concurrent Thread-Stack Processing - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/376
[JDK-8261332] Release Note: JEP 376: ZGC Concurrent Stack Processing - Java Bug System, accessed April 23, 2025, https://bugs.openjdk.org/issues/?jql=parent%3DJDK-8239600
[JDK-8261332] Release Note: JEP 376: ZGC Concurrent Stack, accessed April 23, 2025, https://bugs.openjdk.org/browse/JDK-8261332
Does a compactor process return memory to the OS? - Apache Accumulo, accessed April 23, 2025, https://accumulo.apache.org/blog/2024/04/09/does-a-compactor-return-memory-to-OS.html
JEP 346: Promptly Return Unused Committed Memory from G1 - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/346
Java 12-GC (JEP 346)Is behavior is particularly disadvantageous in container environments where resources are paid by use? - Stack Overflow, accessed April 23, 2025, https://stackoverflow.com/questions/54490373/java-12-gc-jep-346is-behavior-is-particularly-disadvantageous-in-container-env
Java 12 Features - DigitalOcean, accessed April 23, 2025, https://www.digitalocean.com/community/tutorials/java-12-features
The arrival of Java 12! - Oracle Blogs, accessed April 23, 2025, https://blogs.oracle.com/java/post/the-arrival-of-java-12
[11u] Backport JEP 346 : Promptly Return Unused Committed Memory from G1 - OpenJDK mailing lists, accessed April 23, 2025, https://mail.openjdk.org/pipermail/jdk-updates-dev/2020-April/002953.html
Parallel full GC for G1 (JEP 307) - Packt+ | Advance your knowledge in tech, accessed April 23, 2025, https://www.packtpub.com/en-ic/product/java-11-and-12-new-features-9781789133271/chapter/garbage-collector-optimizations-4/section/parallel-full-gc-for-g1-jep-307-ch04lvl1sec17
[JDK-8189726] Release Note: JEP 307: Parallel Full GC for G1 - Java Bug System, accessed April 23, 2025, https://bugs.openjdk.org/browse/JDK-8189726
JEP 307: Parallel Full GC for G1 - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/307
Introducing Java SE 10 - Oracle Blogs, accessed April 23, 2025, https://blogs.oracle.com/java/post/introducing-java-se-10
JEP 246: Leverage CPU Instructions for GHASH and RSA - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/246
Build JEP on Win 10 64 bit · Issue #230 · ninia/jep - GitHub, accessed April 23, 2025, https://github.com/ninia/jep/issues/230
jep/MANIFEST.in at master · ninia/jep - GitHub, accessed April 23, 2025, https://github.com/ninia/jep/blob/master/MANIFEST.in
Index by Author to Volume 9 - American Economic Association, accessed April 23, 2025, https://www.aeaweb.org/articles?id=10.1257/jep.9.4.246
26 new Java 9 enhancements you will love - Packt, accessed April 23, 2025, https://www.packtpub.com/en-us/learning/how-to-tutorials/26-new-java-9-enhancements-you-will-love
Performance Improvements of Contended Java Monitor from JDK 6 to 9 - vmlens, accessed April 23, 2025, https://vmlens.com/articles/performance-improvements-of-java-monitor/
JEP 143: Improve Contended Locking - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/143
[JDK-8049158] Improve Contended Locking Development - Java Bug System, accessed April 23, 2025, https://bugs.openjdk.org/issues/?jql=parent%3DJDK-8046133
Java Compact Object Headers (JEP 450) - HappyCoders.eu, accessed April 23, 2025, https://www.happycoders.eu/java/java-compact-object-headers-jep-450/
JEP 450: Compact Object Headers. Proposed to Target JDK 24 : r/java - Reddit, accessed April 23, 2025, https://www.reddit.com/r/java/comments/1gd35vb/jep_450_compact_object_headers_proposed_to_target/
[24] JEP 450: Compact Object Headers (Experimental) #3236 - GitHub, accessed April 23, 2025, https://github.com/eclipse-jdt/eclipse.jdt.core/issues/3236
Demystifying Java Object Sizes: Compact Headers, Compressed Oops, and Beyond, accessed April 23, 2025, http://blog.vanillajava.blog/2024/12/demystifying-java-object-sizes-compact.html
Java 24 Delivers New Experimental and Many Final Features - InfoQ, accessed April 23, 2025, https://www.infoq.com/news/2025/03/java24-released/
What's New With Java 24 | JRebel by Perforce, accessed April 23, 2025, https://www.jrebel.com/blog/whats-new-java-24
Liberica JDK 24 builds are generally available - BellSoft, accessed April 23, 2025, https://bell-sw.com/blog/liberica-jdk-24-is-released/
Java 24 Features Unveiled: 24 Highlights You Need to Know - BellSoft, accessed April 23, 2025, https://bell-sw.com/blog/an-overview-of-jdk-24-features/
JEP-450 Review Todo - OpenJDK Wiki, accessed April 23, 2025, https://wiki.openjdk.org/display/lilliput/JEP-450+Review+Todo
JEP 450: Compact Object Headers (Experimental) - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/450
Java 21 Hidden Heroes - JAX London 2025, accessed April 23, 2025, https://jaxlondon.com/blog/java21-hidden-heroes/
Java 21 is here: Virtual threads, string templates, and more - Oracle Blogs, accessed April 23, 2025, https://blogs.oracle.com/javamagazine/post/java-21-now-available
JEP 439: Generational ZGC - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/439
Preparing for JDK 21: A Comprehensive Overview of Key Features and Enhancements, accessed April 23, 2025, https://foojay.io/today/preparing-for-jdk-21-a-comprehensive-overview-of-key-features-and-enhancements/
Looking at Java 21: Generational ZGC | belief driven design, accessed April 23, 2025, https://belief-driven-design.com/looking-at-java-21-generational-zgc-e5c1c/
JEP 474: ZGC: Generational Mode by Default - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/474
The Arrival of Java 15 - Oracle Blogs, accessed April 23, 2025, https://blogs.oracle.com/java/post/the-arrival-of-java-15
[JDK-8209683] JEP 377: ZGC: A Scalable Low-Latency Garbage Collector (Production) - Java Bug System, accessed April 23, 2025, https://bugs.openjdk.org/browse/JDK-8209683?jql=labels%20%3D%20jdk15-ptt-2020-04-02
[JDK-8209683] JEP 377: ZGC: A Scalable Low-Latency Garbage Collector (Production), accessed April 23, 2025, https://bugs.openjdk.java.net/browse/JDK-8209683?actionOrder=desc
Oracle Announces Java 15, accessed April 23, 2025, https://www.oracle.com/news/announcement/oracle-announces-java-15-091520/
A one-sentence summary of each new JEP from JDK 21 - JVM Weekly vol. 149 - Vived, accessed April 23, 2025, https://vived.io/a-one-sentence-summary-of-each-new-jep-from-jdk-21-jvm-weekly-vol-149/
JEP 377: ZGC: A Scalable Low-Latency Garbage Collector (Production) - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/377
JEP 333: ZGC: A Scalable Low-Latency Garbage Collector (Experimental) - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/333
JEP 351: ZGC: Uncommit Unused Memory (Experimental) - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/351
Significant Changes in JDK 13 Release - Java - Oracle Help Center, accessed April 23, 2025, https://docs.oracle.com/en/java/javase/24/migrate/significant-changes-jdk-13.html
JDK-8222481 Release Note: JEP 351: ZGC: Uncommit Unused Memory - Bug ID, accessed April 23, 2025, https://bugs.java.com/bugdatabase/view_bug?bug_id=8222481
Oracle Keeps Driving Developer Productivity with New Java Release, accessed April 23, 2025, https://www.oracle.com/corporate/pressrelease/oow19-new-java-release-091619.html
[JDK-8222481] Release Note: JEP 351: ZGC: Uncommit Unused, accessed April 23, 2025, https://bugs.openjdk.org/browse/JDK-8222481
JDK 9 Release Notes - New Features, accessed April 23, 2025, https://www.oracle.com/java/technologies/javase/9-new-features.html
[JavaSpecialists 306] - Measuring compact strings memory savings, accessed April 23, 2025, https://www.javaspecialists.eu/archive/Issue306-Measuring-compact-strings-memory-savings.html
[JDK-8054307] JEP 254: Compact Strings - Java Bug System, accessed April 23, 2025, https://bugs.openjdk.java.net/browse/JDK-8054307?actionOrder=desc
[JDK-8176344] Release Note: JEP 254: Compact Strings - Java Bug System, accessed April 23, 2025, https://bugs.openjdk.org/issues/?jql=parent%3DJDK-8054307
JEP 254: Compact Strings - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/254
JEP 192: String Deduplication in G1 : r/ProgrammingLanguages - Reddit, accessed April 23, 2025, https://www.reddit.com/r/ProgrammingLanguages/comments/k4g2ib/jep_192_string_deduplication_in_g1/
JEP 192: String Deduplication in G1 - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/192
JDK 17 G1/Parallel GC changes, accessed April 23, 2025, https://tschatzl.github.io/2021/09/16/jdk17-g1-parallel-gc-changes.html
String Deduplication in JEP192 - OpenJDK mailing lists, accessed April 23, 2025, https://mail.openjdk.org/pipermail/hotspot-gc-dev/2014-February/009525.html
[JDK-8005233] (JEP 149) Fully Concurrent ClassLoading - Java Bug System, accessed April 23, 2025, https://bugs.openjdk.org/browse/JDK-8005233?focusedCommentId=13293578&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel
JEP 149: Reduce Core-Library Memory Usage - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/149
Standards & Documents Search | JEDEC, accessed April 23, 2025, https://www.jedec.org/document_search?items_per_page=40&search_api_views_fulltext=jep&order=title&sort=asc
JDK-8005232 (JEP-149) Class Instance size reduction - Java Bug Database, accessed April 23, 2025, https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8005232
Speer Gold Dot LE Duty 9mm Luger Ammunition 147 Grain Jacketed Hollow Point - 53619, accessed April 23, 2025, https://www.targetsportsusa.com/speer-gold-dot-law-enforcement-duty-9mm-luger-ammo-147-grain-jhp-53619-p-3527.aspx
JEDEC JEP 147 - Procedure for Measuring Input Capacitance Using a Vector Network Analyzer (VNA) - Standards | GlobalSpec, accessed April 23, 2025, https://standards.globalspec.com/std/809462/jedec-jep-147
PROCEDURE FOR MEASURING INPUT CAPACITANCE USING A VECTOR NETWORK ANALYZER (VNA): | JEDEC, accessed April 23, 2025, https://www.jedec.org/standards-documents/docs/jep-147
JEP 147: Reduce Class Metadata Footprint - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/147
jedec jep 122 Std. Antpedia, accessed April 23, 2025, https://m.antpedia.com/standard/sp/en/746385.html
JEDEC JEP 122 - Failure Mechanisms and Models for Semiconductor Devices | GlobalSpec, accessed April 23, 2025, https://standards.globalspec.com/std/10047309/jedec-jep-122
JEDEC PUBLICATION - CiteSeerX, accessed April 23, 2025, https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=1adfbb24400749acdf2f6c636be294511111892b
JEP-122 | Failure Mechanisms and Models for Semiconductor Devices - Document Center, accessed April 23, 2025, https://www.document-center.com/standards/show/JEP-122/history/
JEP-122 | Failure Mechanisms and Models for Semiconductor Devices - Document Center, accessed April 23, 2025, https://www.document-center.com/standards/show/JEP-122
JEP 122: Remove the Permanent Generation - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/122
FAILURE MECHANISMS AND MODELS FOR SEMICONDUCTOR DEVICES - JEDEC, accessed April 23, 2025, https://www.jedec.org/standards-documents/docs/jep-122e
Failure Mechanisms and Models for Semiconductor Devices JEP122H - JEDEC, accessed April 23, 2025, https://www.jedec.org/sites/default/files/docs/JEP122H.pdf
1-JEDEC standards for product level qualification.pdf, accessed April 23, 2025, http://audace-reliability.crihan.fr/Ateliers_files/1-JEDEC%20standards%20for%20product%20level%20qualification.pdf
SSB-1: Guidelines for Using Plastic Encapsulated Microcircuits and Semiconductors in Military, Aerospace and Other Rugged Applic - NASA NEPP, accessed April 23, 2025, https://nepp.nasa.gov/docuploads/77082CF9-CD35-4B5F-9CAE5004DE386E64/Guidelines%20for%20using%20PEMs.pdf
Java Applications Can Start 40% Faster in Java 24 - InfoQ, accessed April 23, 2025, https://www.infoq.com/news/2025/03/java-24-leyden-ships/
Java 24 Launches with JEP 483 for Enhanced Application Performance and Future Plans for 2025 - MojoAuth - Go Passwordless, accessed April 23, 2025, https://mojoauth.com/blog/news-2025-03-java-24-leyden-ships/
Let's Take a Look at... JEP 483: Ahead-of-Time Class Loading & Linking! - Gunnar Morling, accessed April 23, 2025, https://www.morling.dev/blog/jep-483-aot-class-loading-linking/
Java 24 Launches with JEP 483 for Enhanced Application Performance and Future Plans for 2025 - DEV Community, accessed April 23, 2025, https://dev.to/ssojet/java-24-launches-with-jep-483-for-enhanced-application-performance-and-future-plans-for-2025-588e
JEP draft: Ahead-of-time Command Line Ergonomics - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/8350022
JEP 483: Ahead-of-Time Class Loading & Linking - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/483
JEP 444: Virtual Threads - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/444
JEP draft: Structured Concurrency (Fifth Preview) - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/8340343
Virtual Threads (JEP 444) - javaalmanac.io, accessed April 23, 2025, https://javaalmanac.io/features/virtual-threads/
Virtual Threads - Oracle Help Center, accessed April 23, 2025, https://docs.oracle.com/en/java/javase/21/core/virtual-threads.html
JEP 350: Dynamic CDS Archives - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/350
How to use CDS with Spring Boot applications - BellSoft, accessed April 23, 2025, https://bell-sw.com/blog/how-to-use-cds-with-spring-boot-applications/
Application / Dynamic Class Data Sharing In HotSpot JVM - Ionut Balosin, accessed April 23, 2025, https://ionutbalosin.com/2022/04/application-dynamic-class-data-sharing-in-hotspot-jvm/
Java 10 - Application Class-Data Sharing - Tutorialspoint, accessed April 23, 2025, https://www.tutorialspoint.com/java/java10_class_data_sharing.htm
Java 10 Features You Must Be Knowing About - Code Pumpkin, accessed April 23, 2025, https://codepumpkin.com/java-10-features/
What is new added features in Java 11, accessed April 23, 2025, https://www.dineshonjava.com/what-is-new-added-features-in-java-11/
JEP 310: Application Class-Data Sharing - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/310
[JDK-8190336] Make sure that AppCDS works on aarch64 platform, accessed April 23, 2025, https://bugs.openjdk.org/browse/JDK-8190336
JEP 197: Segmented Code Cache - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/197
Reasons to move to Java 11 - Azure | Microsoft Learn, accessed April 23, 2025, https://learn.microsoft.com/en-us/java/openjdk/reasons-to-move-to-java-11
JEP 158: Unified JVM Logging - OpenJDK, accessed April 23, 2025, https://openjdk.org/jeps/158
Embracing JVM unified logging (JEP-158 / JEP-271) - Brice Dutheil, accessed April 23, 2025, https://blog.arkey.fr/2020/07/28/embracing-jvm-unified-logging-jep-158-jep-271/
Introduction to JVM Unified Logging (JEP-158 / JEP-271) - Foojay.io, accessed April 23, 2025, https://foojay.io/today/introduction-to-jvm-unified-logging-jep-158-jep-271/
Embracing JVM unified logging (JEP-158 / JEP-271) - Foojay.io, accessed April 23, 2025, https://foojay.io/today/embracing-jvm-unified-logging/

MQTT vs Kafka by ChatGPT

2025-04-18T11:10:00+00:00

MQTT vs Kafka — both are messaging systems, but they’re designed for very different use cases. Here’s a breakdown to help clarify when to use which:

⚡ Quick Comparison:

Feature	MQTT	Apache Kafka
Protocol Type	Lightweight messaging protocol	Distributed streaming platform
Designed For	IoT devices, constrained networks	High-throughput, large-scale data pipelines
Message Model	Pub/Sub	Pub/Sub + log-based storage
Persistence	Optional, often not used	Built-in, durable, distributed log
QoS Levels	0 (at most once), 1 (at least once), 2 (exactly once)	At least once (default), exactly once (configurable)
Broker Complexity	Lightweight (e.g., Mosquitto)	Heavy (e.g., Kafka needs Zookeeper or KRaft)
Latency	Low	Low to medium, optimized for throughput
Storage	Minimal to none	Retains messages (configurable retention)
Scalability	Limited (single broker common)	Highly scalable and distributed
Security	TLS, username/password	TLS, SASL, ACLs, etc.
Typical Use Cases	IoT sensors, mobile devices, telemetry	Event sourcing, log aggregation, analytics pipelines

🧠 TL;DR:

Use MQTT when:
- You have resource-constrained devices (IoT, edge computing).
- You need real-time messaging over unstable or low-bandwidth networks.
- You care about lightweight, battery-friendly communication.
- Example: Smart thermostats sending temperature data every few seconds.
Use Kafka when:
- You need to handle millions of events per second.
- You want to persist messages for replay, analytics, or audit trails.
- You’re building complex data pipelines or event-driven systems.
- Example: Website clickstream analysis, financial transaction processing.

Timeline of most important databases by Grok 3

2025-04-16T11:10:00+00:00

History of Databases and SQL

1963

Integrated Data Store (IDS)
Developed by Charles Bachman at General Electric, IDS introduced the network data model, one of the earliest database systems.

1968

IBM Information Management System (IMS)
Released by IBM, IMS used a hierarchical model, organizing data in tree-like structures.

1970

Edgar F. Codd’s Relational Model
Edgar Codd published a groundbreaking paper introducing the relational database model, laying the foundation for modern databases.

1973

VSAM (Virtual Storage Access Method)
Introduced by IBM for mainframe systems, VSAM was a file access method that enhanced data retrieval and storage efficiency for large datasets.

1974

IBM System R
IBM’s System R was one of the first relational database management systems (RDBMS), introducing SQL (Structured Query Language) to query relational data.

1976

Adabas
Developed by Software AG, Adabas was a high-performance inverted list database optimized for large-scale transaction processing.

1979

Oracle Database
Released by Oracle Corporation, this was the first commercial RDBMS based on Codd’s relational model.

1983

IBM DB2
Built on System R, DB2 became a leading RDBMS for mainframes, widely adopted for robust transaction support.

1986

SQL ANSI Standard
The American National Standards Institute (ANSI) standardized SQL, establishing it as the universal language for relational databases.

1995

MySQL and PostgreSQL
These open-source RDBMS systems emerged: MySQL for its speed and simplicity, and PostgreSQL for its robustness and SQL standards compliance.

2009

Redis
An in-memory data structure store, Redis functions as a database, cache, or message broker, known for its speed and versatility.

2012

Google Spanner
A globally distributed RDBMS, Spanner combined relational consistency with NoSQL scalability, introducing synchronized global clocks for distributed transactions.

Prompting ChatGPT: most revolutionary databases

Here’s a concise, chronological overview of some of the most revolutionary database systems and models, showing how each shifted the paradigm of data management:

The Navigational Era (1960s)

Integrated Data Store (IDS) – 1961: Charles Bachman’s IDS is generally considered the first “proper” database management system, introducing the idea of a navigational interface for data retrieval.
IBM Information Management System (IMS) – 1967: Built for NASA’s Apollo program, IMS was the first widely used hierarchical database, proving the value of structured, computer‑based data storage.

The Relational Era (1970s–1980s)

Relational Model – 1970: E.F. Codd’s seminal paper “A Relational Model of Data for Large Shared Data Banks” laid the mathematical foundation for organizing data into tables and set-based querying.
IBM System R – 1974: The first implementation of Codd’s model, introducing SQL and demonstrating that relational systems could be practical.
INGRES – 1974: Developed at UC Berkeley, INGRES used the QUEL query language and influenced many later systems (including PostgreSQL).
Oracle v2 – 1979: The first commercial relational DBMS, Oracle Version 2 brought SQL to the market and kick‑started the RDBMS industry.

The Object‑Relational & Open‑Source Era (1986–present)

PostgreSQL (Postgres) – 1986: Evolved from INGRES, Postgres introduced true extensibility and object‑relational features (user‑defined types, functions, rules), blending the power of relational models with object concepts.

The Document & Key‑Value Era (1989–1990s)

Lotus Notes – 1989: One of the first document‑centric, application‑building platforms, Lotus Notes treated data as self‑contained documents and foreshadowed NoSQL approaches.
Berkeley DB – 1991: An embeddable key‑value store for local applications, Berkeley DB showed how lightweight, non‑relational storage could power everything from desktop apps to servers.

The Web‑Scale NoSQL Era (2005–2010s)

Apache CouchDB – 2005: A JSON/HTTP‑based document store that emphasized ease of use, master‑master replication, and eventual consistency for web‑scale apps.
Google Bigtable – 2006: A sparse, distributed, persistent multidimensional sorted map (column‑family store) built to power Google Search and many other services at massive scale.
Amazon Dynamo – 2007: A highly available, partition‑tolerant key‑value store underpinning Amazon’s e‑commerce platform, pioneering eventual‑consistency techniques.
MongoDB – 2009: Popularized JSON‑style documents with dynamic schemas and rich secondary indexes, enabling rapid development of modern, schema‑flexible applications.

The Distributed SQL Era (2012–present)

Google Spanner – 2012: The first system to offer globally‑distributed, synchronously‑replicated ACID transactions across datacenters, using a novel TrueTime API for external consistency.

Each of these systems not only introduced new data models or architectures but also reshaped expectations around scalability, consistency, and ease of development—setting the stage for the next wave of innovation in data management.

Prompt: Weren’t any important inventions/ideas in the database field after 2012?

Great question—and yes, plenty of revolutionary ideas and systems emerged post-2012, building on past breakthroughs while adapting to modern demands like global scale, real-time analytics, hybrid workloads, and cloud-native architectures. Here’s a rundown of some key post-2012 innovations:

🪨 1. CockroachDB (2015) – Distributed SQL

Inspired by Google Spanner, but open-source and designed for broad enterprise adoption.
Fully distributed PostgreSQL-compatible SQL engine with ACID transactions and geo-partitioning.
Self-healing, auto-rebalancing nodes—a massive step for resilient, global apps.

☁️ 2. Snowflake (2014, major rise after 2016) – Cloud-Native Data Warehousing

Completely decouples storage and compute, allowing dynamic scaling and pay-per-use.
Multi-cluster shared data architecture made data warehousing as a service truly elastic.
Query time data sharing and seamless multi-tenant environments revolutionized collaboration.

📊 3. Apache Arrow (2016) – In-Memory Columnar Data Format

A high-performance standard for zero-copy data exchange between systems like Pandas, Spark, and databases.
Led to major boosts in analytics, ML pipelines, and cross-system interoperability.

⏱️ 4. Materialize (founded 2019) – Streaming SQL

Brings real-time views and incremental computation to SQL queries over streaming data.
Think “PostgreSQL meets Kafka”—great for event-driven systems that want up-to-date answers, not just stale batch results.

🔁 5. Delta Lake, Apache Iceberg, Hudi (2018–2020) – Transactional Data Lakes

Enable ACID transactions on data lakes, solving the messiness of raw cloud storage (like S3) for analytics.
Key to the rise of the “lakehouse” architecture: one system for both BI and ML workloads.

🧠 6. Vector Databases (Pinecone, Weaviate, Qdrant, Milvus) – 2020+

Specialized for storing and searching high-dimensional vector embeddings (used in AI, NLP, recommender systems).
Power the backend of semantic search, chatbots, and LLM-enhanced apps.

Databases like SingleStore, TiDB, and ClickHouse allow OLTP + OLAP on the same engine.
Blur the line between operational and analytical processing—faster insights with fewer systems.

These post-2012 innovations are all about scale, flexibility, and developer experience. The field didn’t slow down—it just shifted from foundational theory to engineering reality at web scale.

2024 StackOverflow Survey - Most Popular Databases

PostgreSQL debuted in the developer survey in 2018 when 33% of developers reported using it, compared with the most popular option that year: MySQL, in use by 59% of developers. Six years later, PostgreSQL is used by 49% of developers and is the most popular database for the second year in a row.

#1BRC Timeline

2024-02-05T14:30:00+00:00

1BRC Timeline

Important moments, key ideas and most performance improvements in The One Billion Row Challenge

The goal is to learn which changes most improved performance, and also to give credit to the original authors.

ps: it doesn’t necessarily mean that other people copied the code. Others might have arrived at the same/similar solution on their own.

Leaderboard Changes

git log -p --reverse README.md | rg "\+\|\s1"

#	Result (m:s.ms)	Implementation	JDK	Submitter	Notes
1.	04:13.449	CalculateAverage.java (baseline)	21.0.1-open	Gunnar Morling
1.	02:08.845	CalculateAverage_royvanrijn.java	21.0.1-open	Roy van Rijn	Process lines in parallel
1.	02:08.315	CalculateAverage_itaske.java	21.0.1-open	itaske
1.	00:38.510	CalculateAverage_bjhara.java	21.0.1-open	Hampus Ram	Memory mapped file using `FileChannel` and `MappedByteBuffer`
1.	00:23.366	link	21.0.1-open	Roy van Rijn	SWAR
1.	00:14.848	link	21.0.1-graalce	Sam Pullara	Lazy String creation and better hash table?
1.	00:12.063	link	21.0.1-graalce	Sam Pullara
1.	00:10.870	link	21.0.1-graalce	Elliot Barlas
1.	00:07.999	link	21.0.1-GraalVM Native - Unsafe	Roy van Rijn
1.	00:07.620	link	21.0.1-open	Quan Anh Mai	Vector API, Magic Branchless Number Parsing
1.	00:06.159	link	21.0.1-GraalVM Native - Unsafe	royvanrijn
1	00:03.604	link	21.0.1-GraalVM Native - Unsafe	Roy van Rijn	New server
1	00:03.044	link	21.0.1-GraalVM Native - Unsafe	Thomas Wuerthinger, Quan Anh Mai, Alfonso² Peterssen
1	00:02.941	link	21.0.1-GraalVM Native - Unsafe	Roy van Rijn
1	00:02.904	link	21.0.1-GraalVM Native - Unsafe	Roy van Rijn
1	00:02.575	link	21.0.1-open	Quan Anh Mai	Use `Unsafe` His previous time after server changed was 00:03.258, so 683ms faster.
1	00:02.552	link	21.0.1-GraalVM Native - Unsafe	Thomas Wuerthinger, Quan Anh Mai, Alfonso² Peterssen
1*	00:02.461	link	21.0.1-GraalVM Native - Unsafe	Artsiom Korzun
1*	00:02.477	link	21.0.1-GraalVM Native - Unsafe	Van Phu DO
1	00:02.336	link	21.0.1-GraalVM Native - Unsafe	Van Phu DO
1	00:02.195	link	21.0.1-GraalVM Native - Unsafe	Thomas Wuerthinger, Quan Anh Mai, Alfonso² Peterssen	Subprocess spawn trick
1	00:02.019	link	21.0.2-GraalVM Native - Unsafe	Artsiom Korzun
1	00:01.893	link	21.0.2-GraalVM Native - Unsafe	Thomas Wuerthinger, Quan Anh Mai, Alfonso² Peterssen	Process 2MB segments to make all threads finish at the same time. Process with 3 scanners in parallel in the same thread.
1	00:01.878	link	21.0.2-GraalVM Native - Unsafe	Thomas Wuerthinger, Quan Anh Mai, Alfonso² Peterssen
1	00:01.832	link	21.0.2-GraalVM Native - Unsafe	Thomas Wuerthinger, Quan Anh Mai, Alfonso² Peterssen
1	00:01.645	link	21.0.2-GraalVM Native - Unsafe	Jaromir Hamala
1	00:01.535	link	21.0.2-GraalVM Native - Unsafe	Thomas Wuerthinger, Quan Anh Mai, Alfonso² Peterssen

Timeline

Dec 28, 2023 07:44:58 - Project is created

Jan 1, 2024 11:38:16 - Baseline time: 04:13.449

Jan 1, 2024 14:33:40 - royvanrijn - Processing lines in parallel - 02:08.845

Map<String, Measurement> resultMap = Files.lines(Path.of(FILE)).parallel()

Jan 2, 2024 10:04:29 - bjhara - Memory Mapped File - 00:38.510

https://github.com/gunnarmorling/1brc/pull/10

https://github.com/bjhara/1brc/blob/a7edadedaef09702982bbd3618fcdcaf9bf66e12/src/main/java/dev/morling/onebrc/CalculateAverage_bjhara.java

Key idea:

Memory mapped file using FileChannel and MappedByteBuffer

    private static Stream<ByteBuffer> splitFileChannel(final FileChannel fileChannel) throws IOException {
        return StreamSupport.stream(Spliterators.spliteratorUnknownSize(new Iterator<ByteBuffer>() {
            private static final long CHUNK_SIZE = 1024 * 1024 * 10L;

            private final long size = fileChannel.size();
            private long start = 0;

            @Override
            public boolean hasNext() {
                return start < size;
            }

            @Override
            public ByteBuffer next() {
                try {
                    MappedByteBuffer mappedByteBuffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, start,
                            Math.min(CHUNK_SIZE, size - start));

                    // don't split the data in the middle of lines
                    // find the closest previous newline
                    int realEnd = mappedByteBuffer.limit() - 1;
                    while (mappedByteBuffer.get(realEnd) != '\n')
                        realEnd--;

                    realEnd++;

                    mappedByteBuffer.limit(realEnd);
                    start += realEnd;

                    return mappedByteBuffer;
                }
                catch (IOException ex) {
                    throw new UncheckedIOException(ex);
                }
            }
        }, Spliterator.IMMUTABLE), false);
    }

Jan 2, 2024 16:14:32 - padreati - Vector API - 00:50.547

Key ideas:

First use of the Vector API

JAVA_OPTS="--enable-preview --add-modules jdk.incubator.vector"
time java $JAVA_OPTS --class-path target/average-1.0.0-SNAPSHOT.jar dev.morling.onebrc.CalculateAverage_padreati

+            
+              --enable-preview
+              --add-modules
+              java.base,jdk.incubator.vector
+            

import jdk.incubator.vector.ByteVector;
import jdk.incubator.vector.VectorOperators;
import jdk.incubator.vector.VectorSpecies;

Jan 3, 2024 - royvanrijn - SWAR - 00:23.366

Key ideas:

SIMD Within A Register (SWAR) for finding ‘;’.
- Iterates over a long, instead of a byte.
Use int instead of double
Branchless parse int

https://github.com/gunnarmorling/1brc/pull/5

https://github.com/gunnarmorling/1brc/blob/5570f1b60a557baf9ec6af412f8d5bd75fc44891/src/main/java/dev/morling/onebrc/CalculateAverage_royvanrijn.java

/**
 * Changelog:
 *
 * Initial submission:          62000 ms
 * Chunked reader:              16000 ms
 * Optimized parser:            13000 ms
 * Branchless methods:          11000 ms
 * Adding memory mapped files:  6500 ms (based on bjhara's submission)
 * Skipping string creation:    4700 ms
 * Custom hashmap...            4200 ms
 * Added SWAR token checks:     3900 ms
 * Skipped String creation:     3500 ms (idea from kgonia)
 * Improved String skip:        3250 ms
 * Segmenting files:            3150 ms (based on spullara's code)
 * Not using SWAR for EOL:      2850 ms
 *
 * Best performing JVM on MacBook M2 Pro: 21.0.1-graal
 * `sdk use java 21.0.1-graal`
 *
 */

findNextSWAR

    /**
     * -------- This section contains SWAR code (SIMD Within A Register) which processes a bytebuffer as longs to find values:
     */
    private static final long SEPARATOR_PATTERN = compilePattern((byte) ';');

    private int findNextSWAR(ByteBuffer bb, long pattern, int start, int limit) {
        int i;
        for (i = start; i <= limit - 8; i += 8) {
            long word = bb.getLong(i);
            int index = firstAnyPattern(word, pattern);
            if (index < Long.BYTES) {
                return i + index;
            }
        }
        // Handle remaining bytes
        for (; i < limit; i++) {
            if (bb.get(i) == (byte) pattern) {
                return i;
            }
        }
        return limit; // delimiter not found
    }

    private static int firstAnyPattern(long word, long pattern) {
        final long match = word ^ pattern;
        long mask = match - 0x0101010101010101L;
        mask &= ~match;
        mask &= 0x8080808080808080L;
        return Long.numberOfTrailingZeros(mask) >>> 3;
    }

Read more about this here: https://richardstartin.github.io/posts/finding-bytes

compilePattern

It replicates the byte value in a long.

    private static long compilePattern(byte value) {
        return ((long) value << 56) | ((long) value << 48) | ((long) value << 40) | ((long) value << 32) |
                ((long) value << 24) | ((long) value << 16) | ((long) value << 8) | (long) value;
    }

jshell> var p = compilePattern((byte) ';')
p ==> 4268070197446523707

jshell> Long.toHexString(p)
$4 ==> "3b3b3b3b3b3b3b3b"

jshell> Integer.toHexString((byte) ';')
$5 ==> "3b"

branchlessParseInt

    /**
     * Branchless parser, goes from String to int (10x):
     * "-1.2" to -12
     * "40.1" to 401
     * etc.
     *
     * @param input
     * @return int value x10
     */
    private static int branchlessParseInt(final byte[] input, int start, int length) {
        // 0 if positive, 1 if negative
        final int negative = ~(input[start] >> 4) & 1;
        // 0 if nr length is 3, 1 if length is 4
        final int has4 = ((length - negative) >> 2) & 1;

        final int digit1 = input[start + negative] - '0';
        final int digit2 = input[start + negative + has4];
        final int digit3 = input[start + negative + has4 + 2];

        return (-negative ^ (has4 * (digit1 * 100) + digit2 * 10 + digit3 - 528) - negative); // 528 == ('0' * 10 + '0')
    }

franz1981 comment on ByteBuffer and big-endian

By default ByteBuffers uses Big Indian, which is native for M1/M2…but not on x86, which is the one used for the benchmark. It translates in:

wrong position, while found a useless reverse bytes

It would be better to store in a static final, the native byte order and based on it, correctly reverse the bytes yourself (look at what I have done in the Netty code). Conversely you can always assume little endian, setting the mapped byte buffer order, and leave M1 to have the performance hit

https://github.com/gunnarmorling/1brc/pull/5#discussion_r1440157267

franz1981 - get long idea

Hash code here has a data dependency: you either manually unroll this or just relax the hash code by using a var handle and use getLong amortizing the data dependency in batches, handing only the last 7 (or less) bytes separately, using the array. In this way the most of computation would like resolve in much less loop iterations too, similar to https://github.com/apache/activemq-artemis/blob/25fc0342275b29cd73123523a46e6e94582597cd/artemis-commons/src/main/java/org/apache/activemq/artemis/utils/ByteUtil.java#L299

Jan 3, 2024 - Interesting comment about ZGC by fisk:

https://github.com/gunnarmorling/1brc/pull/15#issuecomment-1875495420

https://github.com/fisk/jdk/commit/8ce820de84b7031ced52fb63d190d9c8546f6730

@lobaorn do you really think you can nerd snipe me like that with benchmarking games?

Having said that... I implemented some experimental leyden support for generational ZGC, with some object streaming shenanigans allowing loading of archived objects, and support for archived code from the JIT cache so we can run with compiled code immediately and dodge most of the warmup costs.

On my machine with 256 cores...

real 0m2.318s
user 1m24.229s
sys 0m2.009s

...beat that!

Oh and here are the JVM flags: -XX:+UseLargePages -XX:+UseZGC -XX:+ZGenerational -Xms8G -Xmx8G -XX:+AlwaysPreTouch -XX:ConcGCThreads=4 -XX:+UnlockDiagnosticVMOptions -XX:-ZProactive -XX:ZTenuringThreshold=1 -Xlog:gc*:file=fatroom_gc.log -XX:SharedArchiveFile=Fatroom-dynamic.jsa -XX:+ReplayTraining -XX:+ArchiveInvokeDynamic -XX:+LoadCachedCode -XX:CachedCodeFile=Fatroom-dynamic.jsa-sc

Here is my experimental Generational ZGC leyden branch: https://github.com/fisk/jdk/tree/1brc_genz_leyden

Takes about 1/3 of the time compared to normal ZGC on my machine.

Jan 3, 2024 11:35:51 - Nice 35 lines solution by Sam Pullara

https://github.com/spullara/1brc/blob/dd10a02e075fcdc11eb7ca9dbcb245ba9db739d2/src/main/java/dev/morling/onebrc/CalculateAverage_naive.java

package dev.morling.onebrc;

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.concurrent.ConcurrentSkipListMap;
import java.util.stream.Collectors;

public class CalculateAverage_naive {

    record Result(double min, double max, double sum, long count) {
    }

    public static void main(String[] args) throws FileNotFoundException {
        long start = System.currentTimeMillis();
        var results = new BufferedReader(new FileReader("./measurements.txt"))
                .lines()
                //.parallel() // I don't know why Sam removed this. But it makes it much faster.
                .map(l -> l.split(";"))
                .collect(Collectors.toMap(
                        parts -> parts[0],
                        parts -> {
                            double temperature = Double.parseDouble(parts[1]);
                            return new Result(temperature, temperature, temperature, 1);
                        },
                        (oldResult, newResult) -> {
                            double min = Math.min(oldResult.min, newResult.min);
                            double max = Math.max(oldResult.max, newResult.max);
                            double sum = oldResult.sum + newResult.sum;
                            long count = oldResult.count + newResult.count;
                            return new Result(min, max, sum, count);
                        }, ConcurrentSkipListMap::new));
        System.out.println(System.currentTimeMillis() - start);
        System.out.println(results);
    }
}

Jan 3, 2024 11:35:51 - spullara - lazy String creation and better hash table (?) - 00:14.848

https://github.com/spullara/1brc/blob/dd10a02e075fcdc11eb7ca9dbcb245ba9db739d2/src/main/java/dev/morling/onebrc/CalculateAverage_spullara.java

https://github.com/gunnarmorling/1brc/pull/21

It was not clear at first what made such big difference, because his solution:

does not use SWAR
work with doubles

…, but creating String name only when aggregating certainly speeds things up.

class ByteArrayToResultMap {
  public static final int MAPSIZE = 1024*128;
  Result[] slots = new Result[MAPSIZE];
  byte[][] keys = new byte[MAPSIZE][];

  private int hashCode(byte[] a, int fromIndex, int length) {
    int result = 0;
    int end = fromIndex + length;
    for (int i = fromIndex; i < end; i++) {
      result = 31 * result + a[i];
    }
    return result;
  }
  // ...
}

Jan 3, 2024 - ebarlas - Changed JVM to GraalVM CE 21.0.1

https://github.com/gunnarmorling/1brc/pull/45

Jan 3, 2024 - ddimtirov - use Epsilon GC, MemorySegment and Arena

Nice example of how to replace MappedByteBuffer with MemorySegment:

https://github.com/gunnarmorling/1brc/pull/32

switched to the foreign memory access preview API for another 10% speedup

-JAVA_OPTS="-XX:+UseParallelGC"
+# --enable-preview to use the new memory mapped segments
+# We don't allocate much, so just give it 1G heap and turn off GC; the AlwaysPreTouch was suggested by the ergonomics
+JAVA_OPTS="--enable-preview -Xms1g -Xmx1g -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC  -XX:+AlwaysPreTouch"
 time java $JAVA_OPTS --class-path target/average-1.0.0-SNAPSHOT.jar dev.morling.onebrc.CalculateAverage_ddimtirov

Jan 4, 2024 - royvanrijn - Trying to fix the endian issue #68

https://github.com/gunnarmorling/1brc/pull/68/files

Jan 4, 2024 - artsiomkorzun first submission

https://github.com/gunnarmorling/1brc/pull/83

Jan 5, 2024 - yemreinci - Calculate the hashcode while reading the data

https://github.com/gunnarmorling/1brc/pull/86

Improvements

Calculate the hashcode while reading the data, instead of later in the hashmap implementation. This is expected to increase instruction level parallelism as CPU can work on the math while waiting for data from memory/cache.

Convert the number parsing while loop into a few if statements. While loop with a switch-case inside is likely not so great for the branch predictor.

int hash = 0;
while ((b = bb.get(currentPosition++)) != ';') {
    buffer[offset++] = b;
    hash = (hash << 5) - hash + b;
}

Jan 5, 2024 3:02 GMT-3 - artsiomkorzun - AtomicInteger, AtomicReference

        AtomicInteger counter = new AtomicInteger();
        AtomicReference<Aggregates> result = new AtomicReference<>();
        Aggregator[] aggregators = new Aggregator[PARALLELISM];

Jan 6, 2024 6:55 AM GMT-3 - thomaswue - Unsafe, GraalVM Native Image - 9.625

https://github.com/gunnarmorling/1brc/pull/70

https://github.com/thomaswue/1brc/blob/c67abfcd469cbc00f89f5850cbf64a5407e21549/src/main/java/dev/morling/onebrc/CalculateAverage_thomaswue.java

Update: Thanks to tuning from @mukel using sun.misc.Unsafe to directly access the mapped memory, it is now down to 1.28s (total CPU user time 32.2s) on my machine. Also, instead of PGO, this is now just using a single native image build run with tuning flags “-O3” and “-march=native”.

mmap the entire file, use Unsafe directly instead of ByteBuffer, avoid byte[] copies. These tricks give a ~30% speedup, over an already fast implementation.

source "$HOME/.sdkman/bin/sdkman-init.sh"
sdk use java 21.0.1-graal 1>&2
NATIVE_IMAGE_OPTS="--gc=epsilon -O3 -march=native --enable-preview"
native-image $NATIVE_IMAGE_OPTS -cp target/average-1.0.0-SNAPSHOT.jar -o image_calculateaverage_thomaswue dev.morling.onebrc.CalculateAverage_thomaswue

    private static final Unsafe UNSAFE = initUnsafe();

    private static Unsafe initUnsafe() {
        try {
            Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
            theUnsafe.setAccessible(true);
            return (Unsafe) theUnsafe.get(Unsafe.class);
        }
        catch (NoSuchFieldException | IllegalAccessException e) {
            throw new RuntimeException(e);
        }
    }

Jan 6, 2024 1:01 PM GMT-3 - merykitty - Vector API, magic branchless number parsing - 00:07.620

https://github.com/gunnarmorling/1brc/pull/114

https://github.com/merykitty/1brc/blob/aa5f311ae156a04369dc854b4cc091a63370cee3/src/main/java/dev/morling/onebrc/CalculateAverage_merykitty.java

    // The technique here is to align the key in both vectors so that we can do an
    // element-wise comparison and check if all characters match
    var nodeKey = ByteVector.fromArray(BYTE_SPECIES, node.data, BYTE_SPECIES.length() - localOffset);
    var eqMask = line.compare(VectorOperators.EQ, nodeKey).toLong();
    long validMask = (-1L >>> -semicolonPos) << localOffset;
    if ((eqMask & validMask) == validMask) {
        aggr = node.aggr;
        break;
    }

    // Parse a number that may/may not contain a minus sign followed by a decimal with
    // 1 - 2 digits to the left and 1 digits to the right of the separator to a
    // fix-precision format. It returns the offset of the next line (presumably followed
    // the final digit and a '\n')
    private static long parseDataPoint(Aggregator aggr, MemorySegment data, long offset) {
        long word = data.get(JAVA_LONG_LT, offset);
        // The 4th binary digit of the ascii of a digit is 1 while
        // that of the '.' is 0. This finds the decimal separator
        // The value can be 12, 20, 28
        int decimalSepPos = Long.numberOfTrailingZeros(~word & 0x10101000);
        int shift = 28 - decimalSepPos;
        // signed is -1 if negative, 0 otherwise
        long signed = (~word << 59) >> 63;
        long designMask = ~(signed & 0xFF);
        // Align the number to a specific position and transform the ascii code
        // to actual digit value in each byte
        long digits = ((word & designMask) << shift) & 0x0F000F0F00L;

        // Now digits is in the form 0xUU00TTHH00 (UU: units digit, TT: tens digit, HH: hundreds digit)
        // 0xUU00TTHH00 * (100 * 0x1000000 + 10 * 0x10000 + 1) =
        // 0x000000UU00TTHH00 +
        // 0x00UU00TTHH000000 * 10 +
        // 0xUU00TTHH00000000 * 100
        // Now TT * 100 has 2 trailing zeroes and HH * 100 + TT * 10 + UU < 0x400
        // This results in our value lies in the bit 32 to 41 of this product
        // That was close :)
        long absValue = ((digits * 0x640a0001) >>> 32) & 0x3FF;
        long value = (absValue ^ signed) - signed;
        aggr.min = Math.min(value, aggr.min);
        aggr.max = Math.max(value, aggr.max);
        aggr.sum += value;
        aggr.count++;
        return offset + (decimalSepPos >>> 3) + 3;
    }

TODO Understand how this code works.

Interesting way to create a HashMap, using a static function instead of the constructor HashMap(int initialCapacity)

https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/HashMap.html#newHashMap(int)

var res = HashMap.<String, Aggregator> newHashMap(processorCnt);

API note in the JDK:

https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/HashMap.java#L462

    /**
     * Constructs an empty {@code HashMap} with the specified initial
     * capacity and the default load factor (0.75).
     *
     * @apiNote
     * To create a {@code HashMap} with an initial capacity that accommodates
     * an expected number of mappings, use {@link #newHashMap(int) newHashMap}.
     *
     * @param  initialCapacity the initial capacity.
     * @throws IllegalArgumentException if the initial capacity is negative.
     */
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

Jan 7, 2024 - royvanrijn - Adding Unsafe and merykitty’s branchless parser - 00:06.159

https://github.com/gunnarmorling/1brc/pull/194

Jan 7, 2024 - thomaswue - search and compare > 1 byte - 00:06.532

https://github.com/gunnarmorling/1brc/pull/213

This is some tuning over the initial submission now searching for the delimiter and comparing names more than 1 byte at a time. Should be ~30% faster than initial version.

Jan 9, 2024 - mukel - Optimizing `findDelimiter`

https://github.com/gunnarmorling/1brc/pull/263#discussion_r1446699670

private static int findDelimiter(long word) {
    long input = word ^ 0x3B3B3B3B3B3B3B3BL;
    long tmp = (input - 0x0101010101010101L) & ~input & 0x8080808080808080L;
    return Long.numberOfTrailingZeros(tmp) >>> 3;
}

TODO Understand how this code works.

Jan 10, 2024 - Server change - Solutions got faster

Old server: dedicated virtual cloud environment (Hetzner CCX33)

New server: Hetzner AX161 dedicated server (32 core AMD EPYC™ 7502P (Zen2), 128 GB RAM)

royvanrijn: 6.159 -> 3.604
thomaswue.: 6.532 -> 3.911
merykitty.: 7.620 -> 4.496

Jan 11, 2024 - mtopolnik - focus on 10k (what was really asked for!)

https://github.com/gunnarmorling/1brc/pull/246

My primary focus was performance on the “new” dataset with 10,000 keys and names in the full range of 1-100 bytes. I find that the timing on that dataset degrades to 10.1 seconds.

You can generate the dataset with create_measurements3.sh, and I encourage contestants to try it out and use as the optimization target. It’s unquestionably a kick to see oneself at the top of the leaderboard, but it’s more fun (and useful!) to design a solution that works well beyond the 416 station names of max length 26 chars.

Jan 14, 2024 - Cliff Click submission =D - 4.741

https://github.com/cliffclick/1brc/blob/262270c12b904e110bc468e132e578a2e1842ac3/src/main/java/dev/morling/onebrc/CalculateAverage_cliffclick.java

Adding Unsafe:

https://github.com/gunnarmorling/1brc/pull/185/commits/5f8fb7ce09d3f74ab5beb68522008ac2687e8c32

Jan 15, 2024 - jerrinot - Instruction-Level Parallelism (ILP) - 3.409

https://github.com/gunnarmorling/1brc/pull/424

This initial submission is aiming to exploit instruction-level parallelism: Each thread pulls from multiple chunks in each iteration. This breaks nice sequential access, but a CPU gets independent streams so it can go bananas with out-of-order processing.

There are optimization opportunities left on the table, this is to give me some feeling of how it may perform on the testing box.

Credits:

    @mtopolnik - for many conversations we have had in the last 2 weeks!
    @merykitty - for the amazing branch-less parser
    @royvanrijn - for the hashing-as-you-go idea

            chunkStartOffsets[0] = inputBase;
            chunkStartOffsets[chunkCount] = inputBase + length;

            Processor[] processors = new Processor[THREAD_COUNT];
            Thread[] threads = new Thread[THREAD_COUNT];

            for (int i = 0; i < THREAD_COUNT; i++) {
                long startA = chunkStartOffsets[i * chunkPerThread];
                long endA = chunkStartOffsets[i * chunkPerThread + 1];
                long startB = chunkStartOffsets[i * chunkPerThread + 1];
                long endB = chunkStartOffsets[i * chunkPerThread + 2];
                long startC = chunkStartOffsets[i * chunkPerThread + 2];
                long endC = chunkStartOffsets[i * chunkPerThread + 3];
                long startD = chunkStartOffsets[i * chunkPerThread + 3];
                long endD = chunkStartOffsets[i * chunkPerThread + 4];

                Processor processor = new Processor(startA, endA, startB, endB, startC, endC, startD, endD);
                processors[i] = processor;
                Thread thread = new Thread(processor);
                threads[i] = thread;
                thread.start();
            }

Jan 16, 2024 - plevart: Look Mom No Unsafe! - 5.336

ffb09bf4bf0b41835b3340415be4f3c34565c126

Final version - 4.676

https://github.com/plevart/1brc/blob/7777e9ee4e2302fdab3671d2aae401f05e0394d5/src/main/java/dev/morling/onebrc/CalculateAverage_plevart.java

Jan 31, 2024 - Shiplëv submission =D - No Unsafe! - 4.884

Beautiful solution with great comments!!!

https://github.com/shipilev/1brc/blob/c2bb7d2621a070c5e4f0c592aece0f55fa50bee7/src/main/java/dev/morling/onebrc/CalculateAverage_shipilev.java

Jan 31, 2024 - Serkan ÖZAL - Fastest Solution running on the JVM!!! - 21.0.1-open - 1.880

Amazing code using the Vector API.

https://github.com/gunnarmorling/1brc/pull/679

https://github.com/serkan-ozal/1brc/blob/c74cddbe8d6139af9833bb9c2546da44b8f7a065/src/main/java/dev/morling/onebrc/CalculateAverage_serkan_ozal.java

First Use of Important Concepts

MappedByteBuffer

$ git log -S"MappedBy" --reverse
commit 6b13d52b676d4ccfe7d766ac46f81abee8225816
Author: Hampus Ram
Date:   Tue Jan 2 14:04:29 2024 +0100

    Implementation using memory mapped file

Vector API

$ git log -S"jdk.incubator.vector" --reverse
commit 17218485709af71ebd7cb7c2c47f3abc262f5a2d
Author: Aurelian Tutuianu
Date:   Tue Jan 2 21:14:32 2024 +0200

     - implementation by padreati

AtomicReference

git log -S"AtomicReference" --reverse
commit 0ba5cf33d4a4ce290dbe4fa5e4e4c37b424c8675
Author: Richard Startin <[email protected]>
Date:   Wed Jan 3 18:15:06 2024 +0000

    richardstartin submission

+import java.util.concurrent.atomic.AtomicIntegerFieldUpdater;
+import java.util.concurrent.atomic.AtomicReferenceArray;
+import java.util.concurrent.atomic.AtomicReferenceFieldUpdater;

$ git log -S"AtomicReference;" --reverse
commit cec579b506d448afd2365e0197ed5b98bd0d6a5e
Author: Artsiom Korzun <[email protected]>
Date:   Fri Jan 5 12:15:54 2024 +0100

    improved artsiomkorzun solution

He uses it to merge the results!

        AtomicInteger counter = new AtomicInteger();
        AtomicReference<Aggregates> result = new AtomicReference<>();
        int parallelism = Runtime.getRuntime().availableProcessors();
        Aggregator[] aggregators = new Aggregator[parallelism];

        for (int i = 0; i < aggregators.length; i++) {
            aggregators[i] = new Aggregator(counter, result, fileAddress, fileSize, segmentCount);
            aggregators[i].start();
        }

        // ...

            while (!result.compareAndSet(null, aggregates)) {
                Aggregates rights = result.getAndSet(null);

                if (rights != null) {
                    aggregates.merge(rights);
                }
            }

That’s a great idea, because when joining on threads sequentially, the first thread you’re waiting might be the last one to finish …

TODO: can’t that code get in an infinite loop?

You can use AtomicReference when applying optimistic locks. You have a shared object and you want to change it from more than 1 thread.

You can create a copy of the shared object Modify the shared object You need to check that the shared object is still the same as before - if yes, then update with the reference of the modified copy.

As other thread might have modified it and/can modify between these 2 steps. You need to do it in an atomic operation. this is where AtomicReference can help

HamoriZ answered Jun 12, 2015 at 12:30 https://stackoverflow.com/a/30803137/339561

MemorySegment

$ git log -S"MemorySegm" --reverse
commit d73457872f0b9990ab6258d0a96d3edad3b24b1a
Author: Dimitar Dimitrov
Date:   Thu Jan 4 04:22:39 2024 +0900

    ddimtirov - switched to the foreign memory access preview API for another 10% speedup

Epsilon GC

$ git log -S"Epsilon" --reverse
commit d73457872f0b9990ab6258d0a96d3edad3b24b1a
Author: Dimitar Dimitrov
Date:   Thu Jan 4 04:22:39 2024 +0900

Unsafe

$ git log -S"Unsafe" --reverse
commit a53aa2e6fdbcc74e0c6a9a308f0a8c3507b0e06a
Author: Thomas Wuerthinger
Date:   Sat Jan 6 10:55:07 2024 +0100

Instruction-Level Parallelism (ILP)

I didn’t look at all other solutions, so if there was a previous one, let me know =)

commit dbdd89a84779761ca092e5aaeb6f6e92394a422d
Author: Jaromir Hamala
Date:   Mon Jan 15 18:55:22 2024 +0100

    jerrinot's initial submission (#424)
    
    * initial version
    
    let's exploit that superscalar beauty!

Conclusion

There were a few changes that really changed the game.
The others were minor improvements and mostly people combining code from other solutions.

Here are the important changes:

Process lines in parallel (one line change =));
Memory map the file, split in chunks and process them in parallel;
Great hash table;
Create UTF-8 String for each city after the main loop;
SWAR;
Vector API;
Unsafe;
ILP.

That’s it for the timeline. If I missed something, let me know here: https://twitter.com/tivrfoa/status/1754607677362106552

In the next post I’ll explore the performance impact of using Unsafe.

References

https://github.com/gunnarmorling/1brc

https://en.wikipedia.org/wiki/SWAR

Multiple byte processing with full-word instructions - 1975

GENERAL-PURPOSE SIMD WITHIN A REGISTER:PARALLEL PROCESSING ON CONSUMER MICROPROCESSORS - 2003

Finding Bytes in Arrays - 2019

Java faster than Rust?!

2023-09-27T11:30:00+00:00

A post claimed that Java was faster than Rust in a backend challenge:

“How to use Java superpowers to beat Rust in a backend challenge”

https://www.linkedin.com/pulse/how-use-java-superpowers-beat-rust-backend-challenge-leonardo-zanivan/

From the post:

Java: 50996 records inserted, a response time of 120ms @ p99 and a total number of requests of 119412 (all successful).

Rust: 47010 records inserted, a response time of 37284ms @ p99 and a total number of requests of 115005.

Codes that he used to compare:

Rust with actix-web: https://github.com/viniciusfonseca/rinha-backend-rust

Java with Vertx: https://github.com/panga/rinhabackend-vertx

How could Java beat the blazingly fast Rust?! =)

Because of the way this benchmark challenge works, there’s no way to have more than 47000¹ valid insertions. So there’s a problem in his validation code.

The Rust version also passed that limit, because it had a bug that it inserts ~500 records before the stress test begins, and it was not deleting those records.

¹ Actually, the limit when randomization is removed is 46576 insertions, but I still need to confirm how many is possible if randomized. But it seems it’s no more than 47000.

TL;DR:

What happened is that he used a faster hardware (Apple M2 Pro) than the one used in the challenge. With his hardware, the stress test was not really stressing that much, and both solutions would have had the same number of insertions, if not by the bugs mentioned above.

The same happened with MrPowerGamerBR:
The Results are wrong, and I can prove

… where he showed that his and other solutions were able to achieve the same number of insertions.

He used: AMD Ryzen 7 5800X3D, 32GBs and SSD NVMe

With faster HW, worse solutions sometimes can achieve the same result.

On the remaining of the post I’ll show the numbers that I got running on my Lenovo ideapad S145:

8GB
AMD Ryzen 5 3500u
SSD

Code	Requests	KO	Insertions
Java docker-compose.yml	68647	63543	19931
Java docker-compose-local.yml	93545	13015	26815
Java Native with GraalVM	99665	20730	39260
Java outside docker	114975	0	46569
Rust	113360	4855	43278
Rust before Akita changes	109916	7373	41512
Rust without duplicate format	116811	0	46570
Rust Axum Sqlx	114823	169	46409
Rust Axum Sqlx outside docker	114995	0	46578

ps: you should always confirm these results running on your machine. The numbers will be different, because of different hardwares, but the relative difference should be close … And sadly, some results are not consistent between two runs, because there’s a randomization in the stress test¹.

¹ MrPowerGamerBR showed how to make it more deterministic here: The Results are wrong, and I can prove

rinha-de-backend-2023-q3/stress-test/user-files/simulations/rinhabackend/RinhaBackendSimulation.scala

-      constantUsersPerSec(5).during(15.seconds).randomized, // are you ready?
+      constantUsersPerSec(5).during(15.seconds), // are you ready?

Challenge Instructions and Rules

https://github.com/zanfranceschi/rinha-de-backend-2023-q3/blob/main/INSTRUCOES.md

Java

I generated .class files using Correto 20:

JAVA_HOME=/some-path/amazon-corretto-20.0.2.10.1-linux-x64 mvn clean package

The Java project has more than one docker-compose:

docker-compose.yml
docker-compose-local.yml

Java docker-compose.yml

Java Errors on startup

The erros below are happening after docker-compose up, even before executing the stress test.

api01_1  | Sep 18, 2023 8:19:18 PM io.vertx.core.impl.BlockedThreadChecker
api01_1  | WARNING: Thread Thread[#19,vert.x-eventloop-thread-1,5,main] has been blocked for 2005 ms, time limit is 2000 ms
api01_1  | Sep 18, 2023 8:19:20 PM io.vertx.core.impl.BlockedThreadChecker
api01_1  | WARNING: Thread Thread[#19,vert.x-eventloop-thread-1,5,main] has been blocked for 3800 ms, time limit is 2000 ms
api02_1  | Sep 18, 2023 8:19:18 PM io.vertx.core.impl.BlockedThreadChecker
api02_1  | WARNING: Thread Thread[#19,vert.x-eventloop-thread-1,5,main] has been blocked for 2108 ms, time limit is 2000 ms
api02_1  | Sep 18, 2023 8:19:20 PM io.vertx.core.impl.BlockedThreadChecker
api02_1  | WARNING: Thread Thread[#19,vert.x-eventloop-thread-1,5,main] has been blocked for 4203 ms, time limit is 2000 ms
api01_1  | Listening on port 80
api01_1  | Sep 18, 2023 8:19:20 PM io.vertx.core.impl.launcher.commands.VertxIsolatedDeployer
api01_1  | INFO: Succeeded in deploying verticle
api02_1  | Listening on port 80
api02_1  | Sep 18, 2023 8:19:21 PM io.vertx.core.impl.launcher.commands.VertxIsolatedDeployer
api02_1  | INFO: Succeeded in deploying verticle

Errors during stress test

The error below happend a lot:

nginx_1  | 2023/09/18 20:25:18 [error] 32#32: *71036 no live upstreams while connecting to upstream, client: 172.18.0.1, server: , request: "POST /pessoas HTTP/1.1", upstream: "http://api/pessoas", host: "localhost:9999"
nginx_1  | 2023/09/18 20:25:20 [error] 32#32: *71039 no live upstreams while connecting to upstream, client: 172.18.0.1, server: , request: "GET /pessoas?t=FhgrjREPOCiYmdgUPuK+ETe HTTP/1.1", upstream: "http://api/pessoas?t=FhgrjREPOCiYmdgUPuK+ETe", host: "localhost:9999"
nginx_1  | 2023/09/18 20:25:21 [error] 32#32: *71114 no live upstreams while connecting to upstream, client: 172.18.0.1, server: , request: "GET /pessoas?t=xEuFfbusdVDQQQthpYPVFcaHuc+q HTTP/1.1", upstream: "http://api/pessoas?t=xEuFfbusdVDQQQthpYPVFcaHuc+q", host: "localhost:9999"

Result

The Java code uses lots of CPU. 0.15 is just too litle for it …

We can improve it a bit, giving it more cpu. More on that later.

---- Requests ------------------------------------------------------------------
> Global                                                   (OK=5104   KO=63543 )
> busca inválida                                           (OK=1930   KO=2260  )
> criação                                                  (OK=2903   KO=51734 )
> busca válida                                             (OK=113    KO=9477  )
> consulta                                                 (OK=158    KO=72    )
---- Errors --------------------------------------------------------------------
> Request timeout to localhost/127.0.0.1:9999 after 60000 ms      24534 (38,61%)
> status.find.in(201,422,400), but actually found 502             22414 (35,27%)
> i.n.c.ConnectTimeoutException: connection timed out: localhost   9232 (14,53%)
/127.0.0.1:9999
> status.find.in([200, 209], 304), found 502                       3845 ( 6,05%)
> j.i.IOException: Premature close                                 1783 ( 2,81%)
> status.find.is(400), but actually found 502                      1631 ( 2,57%)
> status.find.in(201,422,400), but actually found 504               104 ( 0,16%)

================================================================================
---- Global Information --------------------------------------------------------
> request count                                      68647 (OK=5104   KO=63543 )
> min response time                                      0 (OK=4      KO=0     )
> max response time                                  60524 (OK=59975  KO=60524 )
> mean response time                                 27031 (OK=14865  KO=28008 )
> std deviation                                      25425 (OK=9210   KO=26051 )
> response time 50th percentile                      14899 (OK=14396  KO=15022 )
> response time 75th percentile                      60000 (OK=20434  KO=60000 )
> response time 95th percentile                      60001 (OK=31877  KO=60001 )
> response time 99th percentile                      60004 (OK=37900  KO=60004 )
> mean requests/sec                                258.071 (OK=19.188 KO=238.883)
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                           273 (  0%)
> 800 ms <= t < 1200 ms                                108 (  0%)
> t >= 1200 ms                                        4723 (  7%)
> failed                                             63543 ( 93%)
---- Errors --------------------------------------------------------------------
> Request timeout to localhost/127.0.0.1:9999 after 60000 ms      24534 (38,61%)
> status.find.in(201,422,400), but actually found 502             22414 (35,27%)
> i.n.c.ConnectTimeoutException: connection timed out: localhost   9232 (14,53%)
/127.0.0.1:9999
> status.find.in([200, 209], 304), found 502                       3845 ( 6,05%)
> j.i.IOException: Premature close                                 1783 ( 2,81%)
> status.find.is(400), but actually found 502                      1631 ( 2,57%)
> status.find.in(201,422,400), but actually found 504               104 ( 0,16%)
================================================================================
<
* Connection #0 to host localhost left intact
19931

It’s strange that it says it created 19931 registers, which it did, but the Response Time Distribution does not show more than 5500 successful requests …

So it created, but failed to communicate it back to the client.

Java docker-compose-local.yml

No errors on startup.

The error no live upstreams while connecting to upstream did not happen.

Result

---- Requests ------------------------------------------------------------------
> Global                                                   (OK=80530  KO=13015 )
> busca inválida                                           (OK=3675   KO=515   )
> criação                                                  (OK=46955  KO=7679  )
> busca válida                                             (OK=4883   KO=4707  )
> consulta                                                 (OK=25017  KO=114   )
---- Errors --------------------------------------------------------------------
> i.n.c.ConnectTimeoutException: connection timed out: localhost   5237 (40,24%)
/127.0.0.1:9999
> Request timeout to localhost/127.0.0.1:9999 after 60000 ms       3754 (28,84%)
> status.find.in([200, 209], 304), found 500                       3450 (26,51%)
> j.i.IOException: Premature close                                  574 ( 4,41%)

================================================================================
---- Global Information --------------------------------------------------------
> request count                                      93545 (OK=80530  KO=13015 )
> min response time                                      0 (OK=0      KO=1     )
> max response time                                  60019 (OK=59962  KO=60019 )
> mean response time                                 13416 (OK=11952  KO=22473 )
> std deviation                                      15707 (OK=13189  KO=24526 )
> response time 50th percentile                       8682 (OK=6935   KO=10485 )
> response time 75th percentile                      20514 (OK=19835  KO=60000 )
> response time 95th percentile                      50461 (OK=39449  KO=60001 )
> response time 99th percentile                      60000 (OK=50910  KO=60001 )
> mean requests/sec                                369.743 (OK=318.3  KO=51.443)
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                         22222 ( 24%)
> 800 ms <= t < 1200 ms                               1302 (  1%)
> t >= 1200 ms                                       57006 ( 61%)
> failed                                             13015 ( 14%)
---- Errors --------------------------------------------------------------------
> i.n.c.ConnectTimeoutException: connection timed out: localhost   5237 (40,24%)
/127.0.0.1:9999
> Request timeout to localhost/127.0.0.1:9999 after 60000 ms       3754 (28,84%)
> status.find.in([200, 209], 304), found 500                       3450 (26,51%)
> j.i.IOException: Premature close                                  574 ( 4,41%)
================================================================================

* Connection #0 to host localhost left intact
26815

Java Native with GraalVM

docker-compose-native.yml

---- Requests ------------------------------------------------------------------
> Global                                                   (OK=78935  KO=20730 )
> busca inválida                                           (OK=3619   KO=571   )
> criação                                                  (OK=37573  KO=17056 )
> busca válida                                             (OK=6507   KO=3083  )
> consulta                                                 (OK=31236  KO=20    )
---- Errors --------------------------------------------------------------------
> Request timeout to localhost/127.0.0.1:9999 after 60000 ms      10414 (50,24%)
> i.n.c.ConnectTimeoutException: connection timed out: localhost   8455 (40,79%)
/127.0.0.1:9999
> j.i.IOException: Premature close                                  995 ( 4,80%)
> j.n.ConnectException: connect(..) failed: Cannot assign reques    866 ( 4,18%)
ted address

================================================================================
---- Global Information --------------------------------------------------------
> request count                                      99665 (OK=78935  KO=20730 )
> min response time                                      0 (OK=0      KO=1427  )
> max response time                                  66642 (OK=66642  KO=65902 )
> mean response time                                 26114 (OK=22260  KO=40792 )
> std deviation                                      22508 (OK=21438  KO=20340 )
> response time 50th percentile                      24172 (OK=16021  KO=59551 )
> response time 75th percentile                      47109 (OK=44398  KO=60000 )
> response time 95th percentile                      60000 (OK=53325  KO=60001 )
> response time 99th percentile                      60001 (OK=58410  KO=60003 )
> mean requests/sec                                344.862 (OK=273.131 KO=71.73 )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                         25418 ( 26%)
> 800 ms <= t < 1200 ms                               1374 (  1%)
> t >= 1200 ms                                       52143 ( 52%)
> failed                                             20730 ( 21%)
---- Errors --------------------------------------------------------------------
> Request timeout to localhost/127.0.0.1:9999 after 60000 ms      10414 (50,24%)
> i.n.c.ConnectTimeoutException: connection timed out: localhost   8455 (40,79%)
/127.0.0.1:9999
> j.i.IOException: Premature close                                  995 ( 4,80%)
> j.n.ConnectException: connect(..) failed: Cannot assign reques    866 ( 4,18%)
ted address
================================================================================

* Connection #0 to host localhost left intact
39260

Java outside docker

Running the app without CPU/memory constraints.

This is a good test to check if the app is really working and to confirm that the bad results were caused because of the resources constraints on the docker containers.

The PostgreSQL database was still running as a container, though.

---- Requests ------------------------------------------------------------------
> Global                                                   (OK=114975 KO=0     )
> busca inválida                                           (OK=4190   KO=0     )
> criação                                                  (OK=54626  KO=0     )
> busca válida                                             (OK=9590   KO=0     )
> consulta                                                 (OK=46569  KO=0     )

================================================================================
---- Global Information --------------------------------------------------------
> request count                                     114975 (OK=114975 KO=0     )
> min response time                                      0 (OK=0      KO=-     )
> max response time                                  57071 (OK=57071  KO=-     )
> mean response time                                 11823 (OK=11823  KO=-     )
> std deviation                                      16495 (OK=16495  KO=-     )
> response time 50th percentile                       1410 (OK=1413   KO=-     )
> response time 75th percentile                      21695 (OK=21693  KO=-     )
> response time 95th percentile                      48101 (OK=48088  KO=-     )
> response time 99th percentile                      55121 (OK=55121  KO=-     )
> mean requests/sec                                432.237 (OK=432.237 KO=-     )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                         50764 ( 44%)
> 800 ms <= t < 1200 ms                               5761 (  5%)
> t >= 1200 ms                                       58450 ( 51%)
> failed                                                 0 (  0%)
================================================================================

* Connection #0 to host localhost left intact
46569

Rust

---- Requests ------------------------------------------------------------------
> Global                                                   (OK=108505 KO=4855  )
> busca válida                                             (OK=8929   KO=661   )
> busca inválida                                           (OK=3922   KO=268   )
> criação                                                  (OK=50709  KO=3926  )
> consulta                                                 (OK=44945  KO=0     )
---- Errors --------------------------------------------------------------------
> i.n.c.ConnectTimeoutException: connection timed out: localhost   4819 (99,26%)
/127.0.0.1:9999
> j.i.IOException: Premature close                                   28 ( 0,58%)
> Request timeout to localhost/127.0.0.1:9999 after 60000 ms          8 ( 0,16%)

================================================================================
---- Global Information --------------------------------------------------------
> request count                                     113360 (OK=108505 KO=4855  )
> min response time                                      0 (OK=0      KO=10000 )
> max response time                                  60002 (OK=50163  KO=60002 )
> mean response time                                  1303 (OK=905    KO=10196 )
> std deviation                                       2758 (OK=1980   KO=2722  )
> response time 50th percentile                         99 (OK=73     KO=10000 )
> response time 75th percentile                       1207 (OK=1059   KO=10000 )
> response time 95th percentile                       9269 (OK=4413   KO=10001 )
> response time 99th percentile                      10001 (OK=8935   KO=10009 )
> mean requests/sec                                482.383 (OK=461.723 KO=20.66 )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                         77496 ( 68%)
> 800 ms <= t < 1200 ms                               7361 (  6%)
> t >= 1200 ms                                       23648 ( 21%)
> failed                                              4855 (  4%)
---- Errors --------------------------------------------------------------------
> i.n.c.ConnectTimeoutException: connection timed out: localhost   4819 (99,26%)
/127.0.0.1:9999
> j.i.IOException: Premature close                                   28 ( 0,58%)
> Request timeout to localhost/127.0.0.1:9999 after 60000 ms          8 ( 0,16%)
================================================================================

* Connection #0 to host localhost left intact
43278

Rust Original Version - Before Akita Changes

---- Requests ------------------------------------------------------------------
> Global                                                   (OK=102543 KO=7373  )
> busca válida                                             (OK=8577   KO=1013  )
> busca inválida                                           (OK=3767   KO=423   )
> criação                                                  (OK=48687  KO=5937  )
> consulta                                                 (OK=41512  KO=0     )
---- Errors --------------------------------------------------------------------
> i.n.c.ConnectTimeoutException: connection timed out: localhost   7118 (96,54%)
/127.0.0.1:9999
> j.i.IOException: Premature close                                  246 ( 3,34%)
> Request timeout to localhost/127.0.0.1:9999 after 60000 ms          9 ( 0,12%)

================================================================================
---- Global Information --------------------------------------------------------
> request count                                     109916 (OK=102543 KO=7373  )
> min response time                                      0 (OK=0      KO=9460  )
> max response time                                  60001 (OK=59151  KO=60001 )
> mean response time                                  1852 (OK=1237   KO=10410 )
> std deviation                                       3439 (OK=2527   KO=2988  )
> response time 50th percentile                        467 (OK=385    KO=10000 )
> response time 75th percentile                       1577 (OK=1315   KO=10001 )
> response time 95th percentile                      10000 (OK=4687   KO=10002 )
> response time 99th percentile                      10149 (OK=9633   KO=22768 )
> mean requests/sec                                467.728 (OK=436.353 KO=31.374)
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                         64951 ( 59%)
> 800 ms <= t < 1200 ms                               8239 (  7%)
> t >= 1200 ms                                       29353 ( 27%)
> failed                                              7373 (  7%)
---- Errors --------------------------------------------------------------------
> i.n.c.ConnectTimeoutException: connection timed out: localhost   7118 (96,54%)
/127.0.0.1:9999
> j.i.IOException: Premature close                                  246 ( 3,34%)
> Request timeout to localhost/127.0.0.1:9999 after 60000 ms          9 ( 0,12%)
================================================================================
* Connection #0 to host localhost left intact
41512

Rust Running on Ubuntu and removing duplicate format! and clone

---- Requests ------------------------------------------------------------------
> Global                                                   (OK=116811 KO=0     )
> busca válida                                             (OK=9590   KO=0     )
> criação                                                  (OK=54628  KO=0     )
> busca inválida                                           (OK=4190   KO=0     )
> consulta                                                 (OK=48403  KO=0     )

================================================================================
---- Global Information --------------------------------------------------------
> request count                                     116811 (OK=116811 KO=0     )
> min response time                                      0 (OK=0      KO=-     )
> max response time                                   3826 (OK=3826   KO=-     )
> mean response time                                   173 (OK=173    KO=-     )
> std deviation                                        241 (OK=241    KO=-     )
> response time 50th percentile                        107 (OK=107    KO=-     )
> response time 75th percentile                        239 (OK=239    KO=-     )
> response time 95th percentile                        668 (OK=669    KO=-     )
> response time 99th percentile                       1047 (OK=1048   KO=-     )
> mean requests/sec                                564.304 (OK=564.304 KO=-     )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                        113389 ( 97%)
> 800 ms <= t < 1200 ms                               2775 (  2%)
> t >= 1200 ms                                         647 (  1%)
> failed                                                 0 (  0%)
================================================================================

* Connection #0 to host localhost left intact
46570

Rust Axum Sqlx

https://github.com/tivrfoa/rinha-backend-2023-rust-axum-sqlx/commit/640b69868862570407fb13c7e7c517f235d22b7c

---- Requests ------------------------------------------------------------------
> Global                                                   (OK=114654 KO=169   )
> criação                                                  (OK=54634  KO=0     )
> busca inválida                                           (OK=4190   KO=0     )
> busca válida                                             (OK=9590   KO=0     )
> consulta                                                 (OK=46240  KO=169   )
---- Errors --------------------------------------------------------------------
> status.find.in([200, 209], 304), found 404                        169 (100,0%)

Simulation RinhaBackendSimulation completed in 211 seconds

================================================================================
---- Global Information --------------------------------------------------------
> request count                                     114823 (OK=114654 KO=169   )
> min response time                                      0 (OK=0      KO=5000  )
> max response time                                   6225 (OK=6225   KO=5273  )
> mean response time                                   490 (OK=483    KO=5081  )
> std deviation                                       1048 (OK=1034   KO=43    )
> response time 50th percentile                         52 (OK=51     KO=5083  )
> response time 75th percentile                        247 (OK=244    KO=5091  )
> response time 95th percentile                       3306 (OK=3280   KO=5176  )
> response time 99th percentile                       4571 (OK=4477   KO=5194  )
> mean requests/sec                                541.618 (OK=540.821 KO=0.797 )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                         95653 ( 83%)
> 800 ms <= t < 1200 ms                               3494 (  3%)
> t >= 1200 ms                                       15507 ( 14%)
> failed                                               169 (  0%)
---- Errors --------------------------------------------------------------------
> status.find.in([200, 209], 304), found 404                        169 (100,0%)
================================================================================
* Connection #0 to host localhost left intact
46409

Rust Axum Sqlx outside docker

HTTP_PORT=9999 ./target/release/rinha-backend-axum
Max connections: 4
Acquire Timeout: 3
listening on 127.0.0.1:9999

---- Requests ------------------------------------------------------------------
> Global                                                   (OK=114995 KO=0     )
> criação                                                  (OK=54637  KO=0     )
> busca válida                                             (OK=9590   KO=0     )
> busca inválida                                           (OK=4190   KO=0     )
> consulta                                                 (OK=46578  KO=0     )

Simulation RinhaBackendSimulation completed in 207 seconds

================================================================================
---- Global Information --------------------------------------------------------
> request count                                     114995 (OK=114995 KO=0     )
> min response time                                      0 (OK=0      KO=-     )
> max response time                                   3628 (OK=3628   KO=-     )
> mean response time                                   103 (OK=103    KO=-     )
> std deviation                                        391 (OK=391    KO=-     )
> response time 50th percentile                          1 (OK=1      KO=-     )
> response time 75th percentile                          2 (OK=2      KO=-     )
> response time 95th percentile                        714 (OK=711    KO=-     )
> response time 99th percentile                       1993 (OK=1993   KO=-     )
> mean requests/sec                                552.861 (OK=552.861 KO=-     )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                        109661 ( 95%)
> 800 ms <= t < 1200 ms                               2678 (  2%)
> t >= 1200 ms                                        2656 (  2%)
> failed                                                 0 (  0%)
================================================================================

* Connection #0 to host localhost left intact
46578

Improving the Java version

0.15 cpu is just too little for the JVM.

After startup, it’s already using more than 20% cpu, doing only GOD knows what. :P
It’s probably JIT working …

You can check that using docker stats

I didn’t change the source code, only the configuration:

Increased app cpu to 0.3
Decreased 0.3 cpu from db
Increased Nginx to 0.4GB
Used network_mode: host in all containers
Changed code to connect to db using localhost instead of container name
Increased max db connections to 10 for each app

---- Requests ------------------------------------------------------------------
> Global                                                   (OK=33737  KO=48767 )
> busca inválida                                           (OK=2883   KO=1307  )
> busca válida                                             (OK=3109   KO=6481  )
> criação                                                  (OK=18675  KO=35948 )
> consulta                                                 (OK=9070   KO=5031  )
---- Errors --------------------------------------------------------------------
> Request timeout to localhost/127.0.0.1:9999 after 60000 ms      25151 (51,57%)
> status.find.in(201,422,400), but actually found 502             12252 (25,12%)
> i.n.c.ConnectTimeoutException: connection timed out: localhost   7870 (16,14%)
/127.0.0.1:9999
> status.find.in([200, 209], 304), found 502                       2090 ( 4,29%)
> status.find.is(400), but actually found 502                       817 ( 1,68%)
> j.i.IOException: Premature close                                  476 ( 0,98%)
> status.find.in(201,422,400), but actually found 504               111 ( 0,23%)

Simulation RinhaBackendSimulation completed in 281 seconds

================================================================================
---- Global Information --------------------------------------------------------
> request count                                      82504 (OK=33737  KO=48767 )
> min response time                                      1 (OK=1      KO=2     )
> max response time                                  60258 (OK=60228  KO=60258 )
> mean response time                                 32955 (OK=31934  KO=33661 )
> std deviation                                      25746 (OK=22697  KO=27637 )
> response time 50th percentile                      34524 (OK=32061  KO=60000 )
> response time 75th percentile                      60000 (OK=53687  KO=60000 )
> response time 95th percentile                      60001 (OK=59099  KO=60001 )
> response time 99th percentile                      60013 (OK=59817  KO=60024 )
> mean requests/sec                                292.567 (OK=119.635 KO=172.933)
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                          4204 (  5%)
> 800 ms <= t < 1200 ms                                661 (  1%)
> t >= 1200 ms                                       28872 ( 35%)
> failed                                             48767 ( 59%)
---- Errors --------------------------------------------------------------------
> Request timeout to localhost/127.0.0.1:9999 after 60000 ms      25151 (51,57%)
> status.find.in(201,422,400), but actually found 502             12252 (25,12%)
> i.n.c.ConnectTimeoutException: connection timed out: localhost   7870 (16,14%)
/127.0.0.1:9999
> status.find.in([200, 209], 304), found 502                       2090 ( 4,29%)
> status.find.is(400), but actually found 502                       817 ( 1,68%)
> j.i.IOException: Premature close                                  476 ( 0,98%)
> status.find.in(201,422,400), but actually found 504               111 ( 0,23%)
================================================================================
<
* Connection #0 to host localhost left intact
30392

Akita Video about the Benchmark

Fabio Akita did a wonderful video (in Portuguese) about this benchmark:

16 Linguagens em 16 Dias: Minha Saga da Rinha de Backend

His conclusion was great, but I think some people might be lead to think that the language does not matter much.

He points that you could have achieved great performance with any programming language for that benchmark, so you should choose the one you’re most comfortable with.

That is true in many cases, but it’s flawed in many others.

For this benchmark this is true, if the only thing you’re considering is the number of insertions. But there are at least three important things that were not talked about: response time distribution, the resources needed to run the app and time elapsed.

Response Time Distribution

Basically, you want most responses to be < 800ms, so if lots of responses are greater than that, you’re delivering a worse user experience.

App CPU and Memory

Languages with a runtime require more cpu and memory. The runtime is another program that is competing for resources with your application. There’s no free lunch.

Why is this important?

You can give the extra cpu and memory to other resources;
Save money on your cloud provider bill.

Time Elapsed

This one is connected with Response Time Distribution.

Some solutions are able to finish in ~210 seconds, but some need more than 250 seconds.

Conclusion

There are many things that affect this benchmark, so the claim in the original post about Java being faster than Rust is misleading.

The programs have different architecture;
Differences in the database schema;
Different resource (memory/cpu) distribution;
Different database configuration;
Different nginx configuration;
Different docker base image;

Besides, I was not able to reproduce his result.
The Java version actually ran much worse on my computer, which is another thing that affects benchmark …

Benchmarking is hard and takes time …, but I learned a lot doing it =)

The app itself is simple, just 4 endpoints, but the amount of possible configurations is big!

And as the workhorse for this benchmark was actually PostgreSQL, if there’s a language winner here, it is C! =D

You can find some tests that I did here: https://github.com/tivrfoa/rinha-backend-2023-rust-axum-sqlx/commits/playing-with-docker-resources and here https://github.com/tivrfoa/rinha-backend-2023-rust-axum-sqlx/commits/use-hashmap-gambiarra-xD

Extra! About the Backend Challenge

It was very well done! Congratulations to Francisco Zanfranceschi and the other people involved!!!

There were some flaws in the testing, and the number of insertions should not be randomized, but it was still nice.

Using docker stats I saw that this benchmark was not much about memory, but more about CPU!

The cpu is fine until 70%, then the app receive lots of requests, and termo search is a really CPU intensive one!

You should assign at least 1 CPU to the database, which leaves you with just 0.5 left.

About the winning solution

I consider that the winning solution has a bug: eventual consistency!

It uses a queue to do batch inserting every two seconds, but this breaks the search termo request!

If you insert a record and do a search within this two seconds, it will not return that record!

References

Os Resultados da RINHA DE BACKEND estão ERRADOS, e eu posso provar

PostgreSQL

https://github.com/docker-library/postgres/issues/193

https://stackoverflow.com/questions/30848670/how-to-customize-the-configuration-file-of-the-official-postgresql-docker-image

https://stackoverflow.com/questions/47252026/how-to-increase-max-connection-in-the-official-postgresql-docker-image

https://www.postgresql.org/docs/8.1/gist.html

https://www.postgresql.org/docs/current/pgtrgm.html

https://www.postgresql.org/docs/9.1/textsearch-indexes.html

https://stackoverflow.com/questions/60193781/postgres-with-docker-compose-gives-fatal-role-root-does-not-exist-error

https://stackoverflow.com/questions/12452395/difference-between-like-and-in-postgres

MySQL

https://planetscale.com/learn/courses/mysql-for-developers/schema/generated-columns

Nginx

https://www.digitalocean.com/community/tutorials/understanding-nginx-http-proxying-load-balancing-buffering-and-caching

https://stackoverflow.com/questions/63077100/how-much-memory-and-cpu-nginx-and-nodejs-in-each-container-needs

https://www.nginx.com/resources/wiki/start/topics/examples/full/

https://www.nginx.com/blog/avoiding-top-10-nginx-configuration-mistakes/

https://www.nginx.com/blog/http-keepalives-and-web-performance/

https://nginx.org/en/docs/http/ngx_http_upstream_module.html#keepalive

sql_builder

https://docs.rs/sql-builder/latest/sql_builder/

Docker

https://hub.docker.com/_/ubuntu/tags

https://stackoverflow.com/questions/25845538/how-to-use-sudo-inside-a-docker-container

https://stackoverflow.com/questions/32804643/how-to-create-a-copy-of-exisiting-docker-image

https://stackoverflow.com/questions/35231362/dockerfile-and-docker-compose-not-updating-with-new-instructions

https://github.com/peter-evans/docker-compose-healthcheck

docker inspect rinha-backend-rust_nginx_1 | egrep -i '(Memory|cpu)'

Java

https://docs.aws.amazon.com/corretto/latest/corretto-20-ug/downloads-list.html

Rust

https://docs.rs/speedate/latest/src/speedate/date.rs.html#101-103

Rust Postgres

https://www.craft.ai/post/implement-postgresql-pool-connection-tokio-runtime-rust

https://github.com/launchbadge/sqlx/issues/102

https://medium.com/@edandresvan/a-brief-introduction-about-rust-sqlx-5d3cea2e8544

Sqlx

https://medium.com/@edandresvan/a-brief-introduction-about-rust-sqlx-5d3cea2e8544

https://codevoweb.com/rust-build-a-crud-api-with-sqlx-and-postgresql/

Axum

https://docs.rs/axum/latest/axum/response/index.html

https://stackoverflow.com/questions/74891118/how-to-use-both-axumextractquery-and-axumextractstate-with-axum

https://stackoverflow.com/questions/73146289/how-can-i-get-an-axum-handler-function-to-return-a-vec

https://docs.rs/sqlx/latest/sqlx/enum.Error.html

https://github.com/launchbadge/sqlx/blob/main/examples/postgres/axum-social-with-tests/src/http/post/comment.rs

https://codevoweb.com/rust-crud-api-example-with-axum-and-postgresql/#google_vignette

Running the stress test

curl -o gatling.zip https://repo1.maven.org/maven2/io/gatling/highcharts/gatling-charts-highcharts-bundle/3.9.5/gatling-charts-highcharts-bundle-3.9.5-bundle.zip
unzip gatling.zip
mkdir ~/gatling
mv gatling-charts-highcharts-bundle-3.9.5 ~/gatling/3.9.5

Validating Requirements Using httpie

POST pessoas

echo -n '{
  "apelido" : "josé",
  "nome" : "José Roberto",
  "nascimento" : "2000-10-01",
  "stack" : ["C#", "Node", "Oracle"]
}' | http POST localhost:9999/pessoas

Response:

HTTP/1.1 201 Created
Connection: keep-alive
Content-Length: 0
Date: Thu, 14 Sep 2023 10:41:34 GMT
Location: /pessoas/dcf1436d-d3f3-4a7e-a6ba-bc839f51da3c
Server: nginx/1.25.2

GET pessoas/[:id] - Querying person

http localhost:9999/pessoas/dcf1436d-d3f3-4a7e-a6ba-bc839f51da3c

Response:

HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 143
Content-Type: application/json
Date: Thu, 14 Sep 2023 10:45:14 GMT
Server: nginx/1.25.2

{
    "apelido": "josé",
    "id": "dcf1436d-d3f3-4a7e-a6ba-bc839f51da3c",
    "nascimento": "2000-10-01",
    "nome": "José Roberto",
    "stack": [
        "C#",
        "Node",
        "Oracle"
    ]
}

GET contagem-pessoas

http localhost:9999/contagem-pessoas

Response:

HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 1
Date: Thu, 14 Sep 2023 10:52:32 GMT
Server: nginx/1.25.2

1

GET /pessoas?t=[:termo da busca]

http localhost:9999/pessoas?t=Oracle

HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 145
Content-Type: application/json
Date: Thu, 14 Sep 2023 21:42:02 GMT
Server: nginx/1.25.2

[
    {
        "apelido": "josé",
        "id": "dcf1436d-d3f3-4a7e-a6ba-bc839f51da3c",
        "nascimento": "2000-10-01",
        "nome": "José Roberto",
        "stack": [
            "C#",
            "Node",
            "Oracle"
        ]
    }
]

Essential Vitamins by ChatGPT

2023-07-23T11:30:00+00:00

User

how many vitamins exist?

ChatGPT

As of my last update in September 2021, there are 13 recognized essential vitamins for humans. These vitamins are divided into two categories: water-soluble vitamins (B vitamins and vitamin C) and fat-soluble vitamins (vitamins A, D, E, and K). Here is the list of the 13 essential vitamins:

Water-soluble vitamins:

Vitamin B1 (Thiamine)
Vitamin B2 (Riboflavin)
Vitamin B3 (Niacin)
Vitamin B5 (Pantothenic Acid)
Vitamin B6 (Pyridoxine)
Vitamin B7 (Biotin)
Vitamin B9 (Folate or Folic Acid)
Vitamin B12 (Cobalamin)
Vitamin C (Ascorbic Acid)

Fat-soluble vitamins:

Vitamin A (Retinol)

Vitamin D (Calciferol) Vitamin E (Tocopherol) Vitamin K (Phylloquinone)

These vitamins play crucial roles in various physiological processes and are necessary for maintaining good health. It’s essential to consume a balanced diet to ensure an adequate intake of all these vitamins. Please note that research and knowledge in the field of nutrition may have evolved since my last update, so there might be new developments or discoveries related to vitamins.

User

For each of these vitamins, show me the top 4 foods that contain more of them.

ChatGPT

Sure! Here are the top 4 food sources for each of the 13 essential vitamins:

Vitamin B1 (Thiamine):
    Pork (lean cuts)
    Sunflower seeds
    Legumes (beans, lentils)
    Whole grains (brown rice, oats)

Vitamin B2 (Riboflavin):
    Almonds
    Dairy products (milk, yogurt)
    Spinach
    Mushrooms

Vitamin B3 (Niacin):
    Chicken (breast)
    Tuna (yellowfin)
    Peanuts
    Brown rice

Vitamin B5 (Pantothenic Acid):
    Avocado
    Chicken (breast)
    Sunflower seeds
    Broccoli

Vitamin B6 (Pyridoxine):
    Chickpeas
    Salmon
    Potatoes (with skin)
    Turkey (breast)

Vitamin B7 (Biotin):
    Organ meats (liver, kidney)
    Egg yolk
    Nuts (almonds, peanuts)
    Sweet potatoes

Vitamin B9 (Folate or Folic Acid):
    Leafy greens (spinach, kale)
    Asparagus
    Legumes (chickpeas, lentils)
    Avocado

Vitamin B12 (Cobalamin):
    Clams
    Liver (beef)
    Trout
    Salmon

Vitamin C (Ascorbic Acid):
    Oranges
    Strawberries
    Kiwi fruit
    Red bell peppers

Vitamin A (Retinol):
    Sweet potatoes
    Carrots
    Spinach
    Kale

Vitamin D (Calciferol):
    Fatty fish (salmon, mackerel)
    Cod liver oil
    Fortified dairy products (milk, yogurt)
    Fortified orange juice

Vitamin E (Tocopherol):
    Sunflower seeds
    Almonds
    Spinach
    Avocado

Vitamin K (Phylloquinone):
    Leafy greens (kale, spinach)
    Broccoli
    Brussels sprouts
    Green peas

Please note that the vitamin content in foods can vary based on factors such as how they are prepared and processed. Additionally, there are many other food sources for each vitamin, and it’s always best to maintain a varied and balanced diet to ensure you get a wide range of essential nutrients.

User

what is the recommended daily intake for each of these vitamins?

ChatGPT

The recommended daily intake for each vitamin can vary based on factors such as age, gender, life stage, and individual health conditions. The recommended dietary allowances (RDAs) and adequate intake (AI) values are set by various health organizations and may be periodically updated as new research emerges. Here are the general RDAs or AIs for each of the 13 essential vitamins for adults:

Vitamin B1 (Thiamine):
    RDA: 1.1 mg for adult males, 0.8 mg for adult females

Vitamin B2 (Riboflavin):
    RDA: 1.3 mg for adult males, 1.1 mg for adult females

Vitamin B3 (Niacin):
    RDA: 16 mg for adult males, 14 mg for adult females

Vitamin B5 (Pantothenic Acid):
    AI: 5 mg for adults (both males and females)

Vitamin B6 (Pyridoxine):
    RDA: 1.3 mg for adults between 19-50 years, 1.7 mg for adult males over 50, 1.5 mg for adult females over 50

Vitamin B7 (Biotin):
    AI: 30 mcg for adults (both males and females)

Vitamin B9 (Folate or Folic Acid):
    RDA: 400 mcg for adults (both males and females)

Vitamin B12 (Cobalamin):
    RDA: 2.4 mcg for adults (both males and females)

Vitamin C (Ascorbic Acid):
    RDA: 90 mg for adult males, 75 mg for adult females

Vitamin A (Retinol):
    RDA: 900 mcg for adult males, 700 mcg for adult females

Vitamin D (Calciferol):
    RDA: 600 IU (International Units) for adults up to 70 years, 800 IU for adults over 70 years

Vitamin E (Tocopherol):
    RDA: 15 mg for adults (both males and females)

Vitamin K (Phylloquinone):
    AI: 120 mcg for adult males, 90 mcg for adult females

These values are general guidelines, and it’s important to consult with a healthcare professional or a registered dietitian for personalized recommendations based on your specific needs and health status. Pregnant or breastfeeding individuals, as well as people with certain medical conditions, may require different vitamin intake levels.

User

what is mcg?

ChatGPT

“mcg” stands for micrograms. It is a unit of measurement used to express very small amounts of a substance. One microgram (mcg) is equal to one-millionth of a gram (1 mcg = 0.000001 grams). It is often used to measure the amount of vitamins, minerals, and other nutrients in dietary supplements or food labels, especially when the quantities are relatively small. For example, the recommended daily intake of certain vitamins like Vitamin B12 or Vitamin D is often expressed in micrograms (mcg).

User

I didn’t understand the metric you used for vitamin D. Why is it different from the others?

ChatGPT

I apologize for any confusion caused. The metric used for expressing Vitamin D intake is different from the others because Vitamin D is unique in how it can be obtained by the body.

Vitamin D is often referred to as the “sunshine vitamin” because our skin can synthesize it when exposed to sunlight. The amount of Vitamin D synthesized by the skin depends on various factors such as the time of day, latitude, skin color, and the amount of skin exposed to sunlight. Due to this unique ability of the body to produce Vitamin D, its recommended daily intake is often expressed in International Units (IU) rather than the standard metric units like milligrams (mg) or micrograms (mcg).

The reason for using IU for Vitamin D is historical and related to the way the vitamin was initially measured and standardized. One IU of Vitamin D is equal to a specific biological activity of the vitamin.

For reference, 1 IU of Vitamin D is approximately equal to 0.025 micrograms (mcg) or 0.000025 milligrams (mg). The recommended daily intake of Vitamin D for adults is typically around 600 IU to 800 IU, depending on age and other factors.

To summarize, the use of IU for Vitamin D is a historical convention, and it’s essential to be aware of the conversion factors when comparing IU to micrograms or milligrams. Always follow the specific recommendations provided by health authorities or healthcare professionals for your Vitamin D intake.

User

how long should we be exposed to the Sun to produce enough vitamin D?

ChatGPT

The amount of time needed to produce enough vitamin D through sun exposure can vary based on several factors, including the time of day, geographical location, season, skin color, and the amount of skin exposed to the sun. Here are some general guidelines:

Time of Day: The best time to get vitamin D from the sun is when the sun is highest in the sky, which is typically between 10 am and 2 pm. During this time, UVB rays, which are responsible for triggering vitamin D synthesis, are more intense.
Geographical Location: The closer you are to the equator, the more direct and intense the sunlight, making it easier to produce vitamin D. In contrast, at higher latitudes (further away from the equator), the angle of the sun is lower, reducing the effectiveness of vitamin D production.
Season: Vitamin D synthesis is more efficient during sunny seasons like spring and summer when the sun’s rays are stronger. In fall and winter, especially in regions with limited sunlight, vitamin D production through sun exposure can be challenging.
Skin Color: People with darker skin have more melanin, which provides some natural protection against UVB rays. As a result, individuals with darker skin may require longer sun exposure to produce sufficient vitamin D compared to those with lighter skin.

Amount of Skin Exposed: The more skin you expose to the sun, the more vitamin D you can produce. However, it’s essential to balance sun exposure to avoid sunburn and skin damage.

As a general guideline, spending about 10 to 30 minutes in the sun, with arms, legs, face, and hands exposed, two to three times a week, can be sufficient for most people to produce enough vitamin D. Keep in mind that these times are approximate, and individual needs can vary. It’s also crucial to be cautious about prolonged sun exposure, as excessive exposure can lead to sunburn and increase the risk of skin cancer.

If you live in a location with limited sunlight, have darker skin, or are unable to spend enough time outdoors, obtaining vitamin D through dietary sources or supplements may be necessary to meet your daily requirements. As always, consult with a healthcare professional for personalized advice on vitamin D supplementation and sun exposure based on your specific circumstances.

User

Show me a daily diet that provides the recommended vitamins.

ChatGPT

Designing a complete and balanced daily diet that provides all the recommended vitamins can be complex and will depend on individual preferences, dietary restrictions, and nutritional needs. However, I can give you a sample daily diet plan that includes a variety of foods rich in essential vitamins:

Breakfast:

Greek yogurt with sliced strawberries (Vitamin C, B2, B12)
Whole-grain toast with avocado (Vitamin B5, B6, E, K)
Poached egg (Vitamin B12, Biotin)
Glass of fortified orange juice (Vitamin D)

Mid-morning snack:

Handful of almonds (Vitamin E)
Fresh orange (Vitamin C)

Lunch:

Grilled chicken breast (Vitamin B3, B6, B12)
Quinoa salad with spinach, cherry tomatoes, and chickpeas (Vitamin B9, K)
Sautéed broccoli (Vitamin C, K)
Sliced bell peppers (Vitamin C, A)

Afternoon snack:

Carrot sticks (Vitamin A, K)
Hummus (Vitamin B6, B7)

Dinner:

Baked salmon (Vitamin D, B12)
Brown rice (Vitamin B1, B3)
Steamed asparagus (Vitamin A, B9)
Side salad with mixed greens (Vitamin C, K) and a drizzle of olive oil (Vitamin E)

Evening snack:

A handful of sunflower seeds (Vitamin E, B1)

Please note that this is just a sample diet plan, and the actual vitamin content may vary depending on portion sizes and the specific nutrient content of the foods you choose. Additionally, the diet plan does not cover all vitamins from the list I provided earlier. To ensure you are meeting all the recommended vitamin intake, consider including a wide variety of nutrient-dense foods in your daily meals and consult with a registered dietitian for personalized guidance based on your nutritional needs and health goals.

ChatGPT Minesweeper web game

2023-07-01T11:30:00+00:00

I asked ChatGPT to create a Minesweeper game for the web, and the result was pretty good.

It had some bugs sometimes, but with a little help it was able to find a fix.

Only in the last part, with the timer bug, it got a little confused.

So the good part (and maybe bad part for some people =)) is that developers are still needed!

For example:

The first code was ineficcient, in that it recreated all the elements for every single cell selection;
Sometimes you really need to talk code with ChatGPT, to guide him to the correct solution.

Play the final version!

Here’s the full conversation:

User create a Minesweeper game for the web

ChatGPT

Sure! Here’s a simple implementation of the Minesweeper game using HTML, CSS, and JavaScript:

  </span>Minesweeper<span class="nt">
  
   class="board">

To play the game, you can copy the above code into an HTML file and open it in a web browser. The game will have a 10x10 grid with 10 mines randomly placed. Left-clicking on a cell reveals its content. If you click on a mine, the game is over. If you reveal all non-mine cells, you win the game.

Please note that this is a basic implementation, and you can modify and enhance it further to add features like a timer, a restart button, or different difficulty levels.

User There is a bug: all the cells are vertically aligned, there is no row

ChatGPT

Apologies for the mistake. It seems there was an issue with the code formatting. Here’s the corrected version:

  </span>Minesweeper<span class="nt">
  
   class="board">

Please copy the updated code into an HTML file and run it in a web browser. The cells should now be displayed in a grid with rows properly aligned.

User Now it’s aligned. Thank you

ChatGPT

You’re welcome! I’m glad the alignment issue is resolved. If you have any more questions or need further assistance, feel free to ask. Enjoy playing Minesweeper!

User cells that were revealed should have a different background color

ChatGPT

Certainly! To differentiate the revealed cells, we can add a CSS class to them and apply a different background color. Here’s an updated version of the code that includes this feature:

  </span>Minesweeper<span class="nt">
  
   class="board">

In this updated code, the CSS class .cell.revealed has been added to cells that have been revealed. The class applies a different background color (#eee). When a cell is clicked and revealed, it will have this background color to distinguish it from the unrevealed cells.

Copy the updated code into an HTML file and run it in a web browser. Now, the revealed cells will have a different background color.

User Nice! It worked

ChatGPT

I’m glad to hear that it worked! If you have any more questions or need further assistance, feel free to ask. Enjoy playing Minesweeper!

User Add a timer to the game, please

ChatGPT

Certainly! Adding a timer to the game can enhance the gameplay experience. Here’s an updated version of the code that includes a timer:

  </span>Minesweeper<span class="nt">
  
   class="board">

   class="timer">Time: 0

In this updated code, a timer has been added to the game. The timer starts when the board is initialized and updates every second. The timer is displayed in a

element with the class “timer”. The timer is stopped when the game is over, either by winning or losing.

Copy the updated code into an HTML file and run it in a web browser. Now, you will see a timer that tracks the elapsed time while playing the Minesweeper game.

User does this logic create the board every time a cell is clicked?

ChatGPT

Apologies for the confusion. The logic in the previous implementation doesn’t create the board every time a cell is clicked. It only updates the state of the clicked cell and checks for a win or loss condition.

Here’s an updated version of the code that avoids creating the board on every cell click:

  </span>Minesweeper<span class="nt">
  
   class="board">

   class="timer">Time: 0

Now, the board is only created once when the game is initialized, and it is not recreated on every cell click. The cell click event simply updates the state of the clicked cell, checks for a win or loss condition, and then re-renders the board to reflect the changes.

Copy the updated code into an HTML file and run it in a web browser. The board will be created only once, and you

User The problem is that renderBoard creates the elements (like div) every time handleCellClick is called. Maybe if each cell had an id, you wouldn’t need to create all the elements every time.

ChatGPT

You’re right! We can optimize the rendering process by assigning unique IDs to each cell element and updating their properties instead of recreating them on every render. This will avoid unnecessary DOM manipulations.

Here’s an updated version of the code that implements this optimization:

  </span>Minesweeper<span class="nt">
  
   class="board">

   class="timer">Time: 0

Now, instead of recreating cell elements every time, we assign unique IDs to each cell element during the board initialization. We can then use these IDs to update the properties of the cell elements directly in the updateCellElement() function.

By utilizing this optimization, we minimize the number of DOM operations and improve the performance of the game.

User after this change the game stopped working. For example, after game over the board is not revealed.

ChatGPT

Apologies for the inconvenience. It seems there was an oversight in the updated code. To ensure that the board is properly revealed after a game over, we need to call the updateCellElement() function for each cell in the revealBoard() function as well.

Here’s the corrected code:

  </span>Minesweeper<span class="nt">
  
   class="board">

   class="timer">Time: 0

In the revealBoard() function, I added a call to updateCellElement(i, j) for each cell in the loop to update the properties of the cell elements after they are revealed.

Now, after a game over, the board will be correctly revealed, and you can restart the game by calling the initializeBoard() function again.

User There is a bug. Sometimes I click on a cell and nothing happens, even if they were not selected previously

ChatGPT

Apologies for the issue. The bug you mentioned might occur due to the recursive revealCell() function being called on neighboring cells. It’s possible that the function is being called on cells that are already revealed, leading to an infinite loop. To fix this, we can add a check at the beginning of the revealCell() function to skip cells that are already revealed.

Here’s the modified code:

// Reveal the cell and its neighbors
function revealCell(row, col) {
  if (
    row < 0 || row >= rows ||
    col < 0 || col >= cols ||
    board[row][col].revealed ||
    board[row][col].isMine
  ) {
    return;
  }

  if (board[row][col].neighborMines === 0) {
    board[row][col].revealed = true;
    revealedCount++;

    for (let dx = -1; dx <= 1; dx++) {
      for (let dy = -1; dy <= 1; dy++) {
        const newRow = row + dx;
        const newCol = col + dy;

        if (
          newRow >= 0 && newRow < rows &&
          newCol >= 0 && newCol < cols &&
          !board[newRow][newCol].revealed
        ) {
          revealCell(newRow, newCol);
        }
      }
    }
  } else {
    board[row][col].revealed = true;
    revealedCount++;
  }
}

In the modified code, before calling revealCell() on a neighboring cell, we check if it’s already revealed (!board[newRow][newCol].revealed). If it is, we skip calling the function on that cell. This prevents unnecessary recursive calls and resolves the issue of cells not being selected when they should be.

Please try this updated code, and let me know if you encounter any further issues.

User still not working. The code you showed before adding cell id was working, but now it isn’t. I’ll try to find out what changed.

ChatGPT

I apologize for the inconvenience. It seems there was a mistake in the modified code. The issue is that I accidentally removed the updateCellElement(row, col) function call in the revealCell() function, which is responsible for updating the cell element after it’s revealed.

To fix the problem, let’s add the updateCellElement(row, col) function call after setting the revealed property in the revealCell() function:

// Reveal the cell and its neighbors
function revealCell(row, col) {
  if (
    row < 0 || row >= rows ||
    col < 0 || col >= cols ||
    board[row][col].revealed ||
    board[row][col].isMine
  ) {
    return;
  }

  board[row][col].revealed = true;
  revealedCount++;

  if (board[row][col].neighborMines === 0) {
    for (let dx = -1; dx <= 1; dx++) {
      for (let dy = -1; dy <= 1; dy++) {
        const newRow = row + dx;
        const newCol = col + dy;

        if (
          newRow >= 0 && newRow < rows &&
          newCol >= 0 && newCol < cols
        ) {
          revealCell(newRow, newCol);
        }
      }
    }
  }

  updateCellElement(row, col);
}

With this change, the updateCellElement(row, col) function will be called immediately after setting the revealed property of the cell, ensuring that the cell element is updated correctly.

I apologize for the confusion caused. Please try this updated code, and let me know if the issue is resolved.

User Now it’s working!

ChatGPT

I’m glad to hear that it’s working now! Apologies for the confusion and any inconvenience caused along the way. If you have any more questions or need further assistance, feel free to ask. Enjoy playing Minesweeper!

User Now we are going to let the user choose the difficulty level. He can choose: Beginner, Intermediate and Advanced.

ChatGPT

Sure! Let’s add the ability for the user to choose the difficulty level. We’ll create a dropdown menu with the available difficulty levels, and when the user selects a level, we’ll update the game configuration accordingly.

Here’s the updated code:

  </span>Minesweeper<span class="nt">
  


   class="difficulty">
     for="difficulty">Choose Difficulty:
     id="difficulty" name="difficulty">
       value="beginner">Beginner
       value="intermediate">Intermediate
       value="advanced">Advanced
    
     id="startButton">Start
  

   class="board">


   class="timer">Time: 0

In this updated code, we added a

element with a dropdown menu for the difficulty level selection. The available difficulty levels are “Beginner,” “Intermediate,” and “Advanced.” When the user clicks the “Start” button, the game is initialized with the selected difficulty level.

The game configuration is stored in the difficulties object, which maps the difficulty levels to their corresponding rows, columns, and mine counts. The selected difficulty is retrieved from the dropdown menu, and the game is initialized accordingly.

Feel free to give it a try and let me know if you have any further questions!

User Great! It worked!

ChatGPT

I’m glad to hear that it worked! If you have any more questions or need further assistance, feel free to ask. Enjoy playing Minesweeper with different difficulty levels!

User After you reveal the entire board, show the cells with mines in red.

ChatGPT

Certainly! To show the cells with mines in red after revealing the entire board, we can modify the revealBoard() function to add a CSS class to those cells.

Here’s the updated code:

// Reveal the entire board
function revealBoard() {
  for (let i = 0; i < board.length; i++) {
    for (let j = 0; j < board[i].length; j++) {
      board[i][j].revealed = true;
      updateCellElement(i, j);
    }
  }
}

// Update the cell element based on the game state
function updateCellElement(row, col) {
  const cellId = board[row][col].elementId;
  const cellElement = document.getElementById(cellId);

  cellElement.classList.remove('revealed');

  if (gameOver || board[row][col].revealed) {
    if (board[row][col].isMine) {
      cellElement.classList.add('mine');
      cellElement.textContent = 'X';
    } else if (board[row][col].neighborMines > 0) {
      cellElement.textContent = board[row][col].neighborMines;
    }
  }

  if (board[row][col].revealed) {
    cellElement.classList.add('revealed');
  }
}

// CSS styles
<style>
  .mine {
    background-color: red;
  }
style>

In the CSS styles, we added a new .mine class with a background color of red.

Then, in the revealBoard() function, after setting board[i][j].revealed to true, we call updateCellElement(i, j). This will update the cell element and, if the cell contains a mine, add the mine class to it, causing it to be displayed in red.

Now, when the entire board is revealed, the cells with mines will be highlighted in red.

I hope this helps! Let me know if you have any further questions.

User This code was already there. The problem is that ‘revealed’ class is overriding ‘mine’ class background color. Maybe ‘revealed’ class should not be applied to mines.

ChatGPT

Apologies for the confusion. You’re right, the revealed class is overriding the mine class background color. To ensure that the mine class background color is not overridden, we can modify the updateCellElement() function to handle this case.

Here’s the updated code for the updateCellElement() function:

// Update the cell element based on the game state
function updateCellElement(row, col) {
  const cellId = board[row][col].elementId;
  const cellElement = document.getElementById(cellId);

  cellElement.classList.remove('revealed', 'mine');

  if (gameOver || board[row][col].revealed) {
    if (board[row][col].isMine) {
      cellElement.classList.add('mine');
      cellElement.textContent = 'X';
    } else if (board[row][col].neighborMines > 0) {
      cellElement.textContent = board[row][col].neighborMines;
    }
  }

  if (board[row][col].revealed) {
    cellElement.classList.add('revealed');
  }
}

In this updated code, we remove the mine class from the cell element using cellElement.classList.remove(‘mine’) before checking if the cell is a mine. This ensures that the mine class background color is not applied when the cell is revealed.

Now, when the entire board is revealed, the cells with mines will display the red background color, and the revealed class will not override it.

I apologize for the oversight in the previous response. Please try this updated code, and let me know if you have any further questions or concerns.

User Nope this is still wrong. Here is the fix:

      if (board[row][col].revealed && !board[row][col].isMine) {
        cellElement.classList.add('revealed');
      }

ChatGPT

Apologies for the confusion. You’re right, we need to modify the condition in the updateCellElement() function to apply the revealed class only to non-mine cells. Here’s the corrected code:

// Update the cell element based on the game state
function updateCellElement(row, col) {
  const cellId = board[row][col].elementId;
  const cellElement = document.getElementById(cellId);

  cellElement.classList.remove('revealed');

  if (gameOver || board[row][col].revealed) {
    if (board[row][col].isMine) {
      cellElement.classList.add('mine');
      cellElement.textContent = 'X';
    } else if (board[row][col].neighborMines > 0) {
      cellElement.textContent = board[row][col].neighborMines;
    }
  }

  if (board[row][col].revealed && !board[row][col].isMine) {
    cellElement.classList.add('revealed');
  }
}

In this updated code, we modified the condition in the last if statement to add the revealed class only if the cell is revealed and not a mine.

Now, when the entire board is revealed, the cells with mines will display the red background color, and the revealed class will not override it.

I apologize for the oversight in the previous responses. Please try this updated code, and it should resolve the issue. Let me know if you have any further questions or concerns.

User The timer stopped working after you added the difficulty functionality.

ChatGPT

Apologies for the issue with the timer. To fix it, we need to update the startTimer() function to set the initial time based on the selected difficulty level.

Here’s the corrected code for the startTimer() function:

// Start the timer
function startTimer() {
  time = 0; // Reset the time

  // Set the initial time based on the selected difficulty level
  const selectedDifficulty = document.getElementById('difficulty').value;
  time = difficulties[selectedDifficulty].time;

  timerInterval = setInterval(updateTimer, 1000);
}

In this updated code, we reset the time variable to 0 and then set it to the initial time based on the selected difficulty level. We retrieve the selected difficulty level using document.getElementById(‘difficulty’).value and access the corresponding time value from the difficulties object.

Please give it a try and let me know if the timer is working correctly now. If you have any further questions, feel free to ask!

User Nope. The problem is that startTimer is never called.

ChatGPT

Apologies for the oversight. You’re correct that the startTimer() function is not being called. We need to update the code to invoke startTimer() when the game is initialized.

Here’s the corrected code:

// Initialize the game
function initializeGame() {
  // ... existing code ...

  renderBoard();
  startTimer(); // Add this line to start the timer

  // ... remaining code ...
}

In the initializeGame() function, add the line startTimer() after renderBoard() to ensure that the timer starts when the game is initialized.

Please give it a try, and the timer should work correctly now. Let me know if you have any further questions or concerns!

tivrfoa blog

Should you use Rust unwrap?

Use cases for unwrap

“recoverable” vs “unrecoverable”

Use of unwrap in important Rust crates

ripgrep

tokio

serde

clap

reqwest

Would it be better if Rust didn’t have unwrap?

The Case Against unwrap (The “Why it’s bad” view)

The Case For unwrap (Why we need it)

1. Prototyping and Learning

2. Unit Tests

3. Mathematical Invariants (The “I know better than the compiler” scenario)

The “Better” Alternatives

1. expect()

2. The ? Operator (Propagation)

3. unwrap_or / unwrap_or_else

Summary Table: When to use what

Conclusion

How could the Cloudflare bug have been avoided?

References

Appendix A

Miguel Ojeda

2. Static Initialization (Regex / Lazy Static)

3. Writing to Memory Buffers (Serde / fmt)

4. Proven Invariants (Internal Logic)

Summary: The Difference is Intent

Intrusive Data Structures

What is an Intrusive Data Structure?

From: Crate intrusive_collections

From: Data Structures in the Linux Kernel - Doubly linked list

Gemini Definition

Who is responsible for freeing the object?

Who is Responsible for Freeing the Object?

Can You Free the List but Keep the Tree?

Use Cases for This Behavior

Why use them?

From: Boost - Intrusive and non-intrusive containers

From mgaunard

Gemini:

Examples

C

C Generic List

C Intrusive List and Tree

C Task Scheduler

C LRU Cache

C Multi-threaded Producer-Consumer Queue

Should we worry about thread safety?

Use Case: Multi-threaded Producer-Consumer Queue

Rust

Zig

Protty @kingprotty Jul 21, 2025

Comparing container_of implementations

Version 1: Simple (Less Robust)

Version 2: Linux Kernel Style (Robust with Type Checking)

Summary:

Rust for Linux

Add container_of and offset_of macros

macro_rules! container_of

macro_rules! impl_has_list_links

offset_of

References

Zig

Rust

Go vs Rust Concurrency

Gemini

Grok

Go Program: Demonstrating a Data Race Bug

Explanation

Rust Equivalent: Safety Through Compiler Enforcement

Attempted Unsafe Rust Program (Won’t Compile)

Compilation Error

Safe Rust Program with Synchronization

Alternative with Atomic Operations

Explanation

Comparison and Conclusion

ChatGPT

The Case Against `unwrap` (The “Why it’s bad” view)

The Case For `unwrap` (Why we need it)

1. `expect()`

2. The `?` Operator (Propagation)

3. `unwrap_or` / `unwrap_or_else`

Comparing `container_of` implementations

`macro_rules! container_of`

`macro_rules! impl_has_list_links`