FastLabs / Blog

Stop Forwarding Errors, Start Designing Them

Sun, 04 Jan 2026 00:00:00 GMT

It’s 3am. Production is down. You’re staring at a log line that says:

Error: serialization error: expected ',' or '}' at line 3, column 7

You know JSON is broken. But you have zero idea why, where, or who caused it. Was it the config loader? The user API? The webhook consumer?

The error has successfully bubbled up through 20 layers of your stack, preserving its original message perfectly, yet losing every scrap of meaning along the way.

We have a name for this. We call it “Error Handling.” But in reality, it’s just Error Forwarding. We treat errors like hot potatoes—catch them, wrap them (maybe), and throw them up the stack as fast as possible.

You add a println!, restart the service, wait for the bug to reproduce. It’s going to be a long night.

As noted in a detailed analysis of error handling in a large Rust project:

“There’re tons of opinionated articles or libraries promoting their best practices, leading to an epic debate that never ends. We were all starting to notice that there was something wrong with the error handling practices, but pinpointing the exact problems is challenging.”

What’s Wrong with Current Practices

The `std::error::Error` Trait: A Noble but Flawed Abstraction

The standard Error trait is built around source(): one error optionally points to another. That matches a lot of failures.

But some of the nastiest problems aren’t a single line of causality. Validation can fail in five places at once. A batch operation can partially succeed. Timeouts can come with partial results. Those want something closer to a set or a tree of causes, not a single chain.

Backtraces: Expensive Medicine for the Wrong Disease

Rust’s std::backtrace::Backtrace was meant to improve error observability. It’s better than nothing. But they have serious limitations:

In async code, they can be noisy or misleading. Your backtrace will contain 49 stack frames, of which 12 are calls to GenFuture::poll(). The Async Working Group notes that suspended tasks are invisible to traditional stack traces.

They only show the origin, not the path. A backtrace tells you where the error was created, not the logical path it took through your application. It won’t tell you “this was the request handler for user X, calling service Y, with parameters Z.”

Capturing backtraces is expensive. The standard library documentation acknowledges: “Capturing a backtrace can be a quite expensive runtime operation.”

The Provide/Request API: Overengineering in Action

The Provider API (RFC 3192) and generic member access (RFC 2895) add dynamic type-based data access to errors:

fn provide<'a>(&'a self, request: &mut Request<'a>) {
    request.provide_ref::<Backtrace>(&self.backtrace);
}

The unstable Provide/Request API represents the latest attempt to make errors more flexible. The idea: errors can dynamically provide typed context (like HTTP status codes or backtraces) that callers can request at runtime. In practice, it introduces new problems:

Unpredictability: Your error might provide an HTTP status code. Or it might not. You won’t know until runtime.

Complexity: The API is subtle enough that LLVM struggles to optimize multiple provide calls.

Most of the time, a boring struct with named fields is still the thing you want.

`thiserror`: Categorizing by Origin, Not by Action

thiserror makes it easy to define error enums:

#[derive(Debug, thiserror::Error)]
pub enum DatabaseError {
    #[error("connection failed: {0}")]
    Connection(#[from] ConnectionError),
    #[error("query failed: {0}")]
    Query(#[from] QueryError),
    #[error("serialization failed: {0}")]
    Serde(#[from] serde_json::Error),
}

This looks reasonable. But notice how this common practice categorizes errors: by origin, not by what the caller can do about it.

When you receive a DatabaseError::Query, what should you do? Retry? Report raw SQL to the user? The error doesn’t tell you. It just tells you which dependency failed.

As one blogger aptly put it: “This error type does not tell the caller what problem you are solving but how you solve it.”

`anyhow`: So Convenient You’ll Forget to Add Context

anyhow takes the opposite approach: type erasure. Just use anyhow::Result<T> everywhere and propagate with ?. No more enum variants, no more #[from] annotations.

The problem is that it’s too convenient.

fn process_request(req: Request) -> anyhow::Result<Response> {
    let user = db.get_user(req.user_id)?;
    let data = fetch_external_api(user.api_key)?;
    let result = compute(data)?;
    Ok(result)
}

Every ? is a missed opportunity to add context. What was the user ID? What API were we calling? What computation failed? The error knows none of this.

The anyhow documentation encourages using .context() to add information. But .context() is optional—the type system doesn’t require it. And “I’ll add context later” is the easiest lie to tell yourself.

The Problem: Error Handling Without Purpose

Consider this common pattern in Rust codebases:

#[derive(thiserror::Error, Debug)]
pub enum ServiceError {
    #[error("database error: {0}")]
    Database(#[from] sqlx::Error),
    #[error("http error: {0}")]
    Http(#[from] reqwest::Error),
    #[error("serialization error: {0}")]
    Serde(#[from] serde_json::Error),
    // ... ten more variants
}

It looks neat, well-structured, and it compiles. But pause and ask:

If you are holding a DatabaseError::Query, is it retryable? Should you show the raw SQL error to users? The error type doesn’t help answer these questions.
When debugging, does “serialization error: expected , or }” tell you which request, which field, which code path led here?

This is the fundamental disconnect in how we think about error handling. We focus on propagating errors exactly, on making the types line up. But we forget that errors are messages—messages that will eventually be read by either a machine trying to recover, or a human trying to debug.

The “Library vs Application” Myth

You’ve probably heard the conventional wisdom: “Use thiserror for libraries, anyhow for applications.”

It’s a nice, simple rule, just not quite right. As Luca Palmieri notes: “It is not the right framing. You need to reason about intent.”

The real question isn’t whether you’re writing a library or an application. The real question is: what do you expect the caller to do with this error?

Two Audiences, Two Needs

Audience	Goal	Needs
Machines	Automated recovery	Flat structure, clear error kinds, predictable codes
Humans	Debugging	Rich context, call path, business-level information

Most error handling designs optimize for neither. They optimize for the compiler.

For Machines: Flat, Actionable, Kind-Based

When errors need to be handled programmatically, complexity is the enemy. Your retry logic doesn’t want to traverse a nested error chain checking for specific variants. It wants to ask: is_retryable()?

Apache OpenDAL’s error design shows one way to do this:

pub struct Error {
    kind: ErrorKind,
    message: String,
    status: ErrorStatus,
    operation: &'static str,
    context: Vec<(&'static str, String)>,
    source: Option<anyhow::Error>,
}

pub enum ErrorKind {
    NotFound,
    PermissionDenied,
    RateLimited,
    // ... categorized by what the caller CAN DO
}

pub enum ErrorStatus {
    Permanent,   // Don't retry
    Temporary,   // Safe to retry
    Persistent,  // Was retried, still failing
}

Then the call site stays straightforward:

match result {
    Err(e) if e.kind() == ErrorKind::RateLimited && e.is_temporary() => {
        sleep(Duration::from_secs(1)).await;
        retry().await
    }
    Err(e) if e.kind() == ErrorKind::NotFound => {
        create_default().await
    }
    Err(e) => return Err(e),
    Ok(v) => v,
}

A few things to note:

ErrorKind is categorized by response, not origin. NotFound means “the thing doesn’t exist, don’t retry.” RateLimited means “slow down and try again.” The caller doesn’t need to know whether it was an S3 404 or a filesystem ENOENT—they need to know what to do about it.

ErrorStatus is explicit. Instead of guessing retryability from error types, it’s a first-class field. Services can mark errors as temporary when they know a retry might help.

One Error type per library. Instead of scattering error enums across modules, a single flat structure keeps things simple. The context field provides all the specificity you need without type proliferation.

No more traversing error chains, no more guessing from error types. Just ask the error directly.

For Humans: Low-Friction Context Capture

The biggest enemy of good error context isn’t capability—it’s friction. If adding context is annoying, developers won’t do it.

The exn library (294 lines of Rust, zero dependencies) demonstrates one approach: errors form a tree of frames, each automatically capturing its source location via #[track_caller]. Unlike linear error chains, trees can represent multiple causes—useful when parallel operations fail or validation produces multiple errors.

The key ingredients:

Automatic location capture. Instead of expensive backtraces, use #[track_caller] to capture file/line/column at zero cost. Every error frame should know where it was created.

Ergonomic context addition. The API for adding context should be so natural that not adding it feels wrong:

fetch_user(user_id)
    .or_raise(|| AppError(format!("failed to fetch user {user_id}")))?;

Compare this to thiserror, where adding the same context requires defining a new variant and manual wrapping:

#[derive(thiserror::Error, Debug)]
pub enum AppError {
    #[error("failed to fetch user {user_id}: {source}")]
    FetchUser {
        user_id: String,
        #[source]
        source: DbError,
    },
    // ... one variant per call site that needs context
}

fn fetch_user(user_id: &str) -> Result<User, AppError> {
    db.query(user_id).map_err(|e| AppError::FetchUser {
        user_id: user_id.to_string(),
        source: e,
    })?
}

Enforce context at module boundaries. This is where exn differs critically from anyhow. With anyhow, every error is erased to anyhow::Error, so you can always use ? and move on—the type system won’t stop you. The context methods exist, but nothing prevents you from ignoring them.

exn takes a different approach: Exn<E> preserves the outermost error type. If your function returns Result<T, Exn<ServiceError>>, you can’t directly ? a Result<U, Exn<DatabaseError>>—the types don’t match. The compiler forces you to call or_raise() and provide a ServiceError, which is exactly the moment you should be adding context about what your module was trying to do.

// This won't compile--type mismatch forces you to add context
pub fn fetch_user(user_id: &str) -> Result<User, Exn<ServiceError>> {
    let user = db.query(user_id)?;  // Error: expected Exn<ServiceError>, found Exn<DbError>
    Ok(user)
}

// You must provide context at the boundary
pub fn fetch_user(user_id: &str) -> Result<User, Exn<ServiceError>> {
    let user = db.query(user_id)
        .or_raise(|| ServiceError(format!("failed to fetch user {user_id}")))?;  // Now it compiles
    Ok(user)
}

The type system becomes your ally: it won’t let you be lazy at module boundaries.

In practice:

pub async fn execute(&self, task: Task) -> Result<Output, ExecutorError> {
    let make_error = || ExecutorError(format!("failed to execute task {}", task.id));

    let user = self.fetch_user(task.user_id)
        .await
        .or_raise(make_error)?;

    let result = self.process(user)
        .or_raise(make_error)?;

    Ok(result)
}

Every ? has context. When this fails at 3am, instead of the cryptic serialization error, you see:

failed to execute task 7829, at src/executor.rs:45:12
|
|-> failed to fetch user "John Doe", at src/executor.rs:52:10
|
|-> connection refused, at src/client.rs:89:24

Putting It Together

In real systems, you often need both: machine-readable errors for automated recovery, and human-readable context for debugging. The pattern: use a flat, kind-based error type (like Apache OpenDAL’s) for the structured data, and wrap it in a context-tracking mechanism for propagation.

// Machine-oriented: flat struct with status
pub struct StorageError {
    pub status: ErrorStatus,
    pub message: String,
}

// Human-oriented: propagate with context at each layer
pub async fn save_document(doc: Document) -> Result<(), Exn<StorageError>> {
    let data = serialize(&doc)
        .or_raise(|| StorageError::permanent("serialization failed"))?;

    storage.write(&doc.path, data)
        .await
        .or_raise(|| StorageError::temporary("write failed"))?;

    Ok(())
}

At the boundary, walk the error tree to find the structured error:

// Extract a typed error from anywhere in the tree
fn find_error<T>(exn: &Exn<impl Error>) -> Option<&T> {
    fn walk<T>(frame: &Frame) -> Option<&T> {
        if let Some(e) = frame.as_any().downcast_ref::<T>() {
            return Some(e);
        }
        frame.children().iter().find_map(walk)
    }
    walk(exn.as_frame())
}

match save_document(doc).await {
    Ok(()) => Ok(()),
    Err(report) => {
        // For humans: log the full context tree
        log::error!("{:?}", report);

        // For machines: find and handle the structured error
        if let Some(err) = find_error::<StorageError>(&report) {
            if err.status == ErrorStatus::Temporary {
                return queue_for_retry(report);
            }
            return Err(map_to_http_status(err.kind));
        }
        Err(StatusCode::INTERNAL_SERVER_ERROR)
    }
}

You do have to walk the tree—but compare that to the Provide/Request API. Here you’re searching for a concrete type, like StorageError: it has named fields, it’s documented, and your IDE can autocomplete it. No guesswork, no runtime surprises—just a well-defined struct you can understand and maintain.

Closing thought

Propagating errors is easy in Rust. Explaining them is the part we tend to postpone.

Next time you return a Result, take 30 seconds to ask: “If this fails in production, what would I wish the log said?” Then make it say that.

Resources

StackSafe: Taming Recursion in Rust Without Stack Overflow

Thu, 24 Jul 2025 00:00:00 GMT

TL;DR

Recursive algorithms in Rust can easily cause stack overflows that crash your program:

fn tree_depth(node: &TreeNode) -> usize {
    match node {
        TreeNode::Leaf => 1,
        TreeNode::Branch(left, right) => {
            1 + tree_depth(left).max(tree_depth(right))
        }
    }
}

// This panics: thread 'main' panicked at 'stack overflow'
let deep_tree = create_deep_tree(100000);
println!("{}", tree_depth(&deep_tree));

StackSafe solves this by automatically growing the stack in recursive functions and data structures. Just add #[stacksafe] and your code works without crashes:

use stacksafe::stacksafe;

#[stacksafe]  // Add attribute to recursive functions
fn tree_depth(node: &TreeNode) -> usize {
    match node {
        TreeNode::Leaf => 1,
        TreeNode::Branch(left, right) => {
            1 + tree_depth(left).max(tree_depth(right))
        }
    }
}

// No panic, works perfectly!
let deep_tree = create_deep_tree(100000);
println!("{}", tree_depth(&deep_tree));

StackSafe is being used in production by products like ScopeDB, where it helps trace and debug petabyte-scale observability data workloads.

The Stack Overflow Problem

Recursive algorithms are elegant and intuitive, but they come with a fundamental limitation: stack overflow. Consider another common scenario:

// JSON parsing - will crash on deeply nested JSON
fn parse_value(tokens: &mut TokenStream) -> JsonValue {
    match tokens.peek() {
        Token::LeftBrace => parse_object(tokens),  // Recursive call
        Token::LeftBracket => parse_array(tokens), // Recursive call
        _ => parse_primitive(tokens),
    }
}

A sufficiently deep structure will cause your program to crash with stack overflow, and there’s no clean way to predict or handle this in standard Rust.

Existing Solutions

Manual Transformation to Iterative Code

The most common approach is rewriting recursive algorithms as loops with explicit stacks:

fn parse_value_iterative(tokens: &mut TokenStream) -> JsonValue {
    let mut stack = vec![ParseState::ParseValue];
    let mut results = Vec::new();

    while let Some(state) = stack.pop() {
        match state {
            ParseState::ParseValue => {
                match tokens.peek() {
                    Token::LeftBrace => {
                        stack.push(ParseState::ParseObject(HashMap::new()));
                    }
                    Token::LeftBracket => {
                        stack.push(ParseState::ParseArray(Vec::new()));
                    }
                    _ => {
                        results.push(parse_primitive(tokens));
                    }
                }
            }
            ParseState::ParseObject(mut obj) => {
                // Complex state management for nested objects...
            }
            ParseState::ParseArray(mut arr) => {
                // Complex state management for nested arrays...
            }
        }
    }

    results.pop().unwrap()
}

This approach works for simple cases but becomes extremely complex or impossible when any of these conditions apply:

The algorithm transforms data structures rather than just evaluating them (e.g., optimizing an AST)
Multiple recursive calls need to be coordinated (e.g., tree balancing algorithms)
The algorithm doesn’t fit the tail-recursion pattern

Lower-Level Crates: `stacker` and `recursive`

stacker: Provides low-level stack growth mechanisms
recursive: Provides macro #[recursive] to ease the application of stacker

Limitations:

You must carefully not leaving any recursive functions not annotated with #[recursive]
Derived traits like Debug, Clone, and Drop on deeply nested structures still cause stack overflow, you must manually implement all traits with stack protection:

#[derive(Clone, Debug)]
enum Tree {
    Leaf(i32),
    Node(Box<Tree>, Box<Tree>),
}

#[recursive::recursive]
fn create_deep_tree(depth: usize) -> Tree {
    if depth == 0 {
        Tree::Leaf(42)
    } else {
        Tree::Node(
            Box::new(create_deep_tree(depth - 1)),
            Box::new(Tree::Leaf(0))
        )
    }
}

let deep_tree = create_deep_tree(10000);
let cloned = deep_tree.clone();  // Stack overflow: derived Clone is recursive!
println!("{:?}", cloned);        // Stack overflow: derived Debug is recursive!
// Stack overflow when `deep_tree` is dropped: derived Drop is recursive!

`StackSafe`: The Complete Solution

StackSafe addresses both recursive functions and recursive data structures with a simple, unified approach.

Recursive Functions Made Safe

Transform any recursive function by adding #[stacksafe]:

use stacksafe::stacksafe;

#[stacksafe]
fn fibonacci(n: u64) -> u64 {
    match n {
        0 | 1 => n,
        _ => fibonacci(n - 1) + fibonacci(n - 2),
    }
}

#[stacksafe]
fn evaluate_ast(expr: &Expr) -> i32 {
    match expr {
        Expr::Number(n) => *n,
        Expr::Add(left, right) => evaluate_ast(left) + evaluate_ast(right),
        Expr::Multiply(left, right) => evaluate_ast(left) * evaluate_ast(right),
    }
}

Recursive Data Structures Made Safe

Wrap recursive fields with StackSafe<T> for automatic stack-safe trait implementations:

use stacksafe::{stacksafe, StackSafe};

#[derive(Debug, Clone, PartialEq)]  // All traits are automatically stack-safe!
enum BinaryTree {
    Leaf(i32),
    Node {
        value: i32,
        left: StackSafe<Box<BinaryTree>>,
        right: StackSafe<Box<BinaryTree>>,
    },
}

// All operations work safely on arbitrarily deep trees:
let deep_tree = create_deep_tree(100000);
let cloned = deep_tree.clone();           // No stack overflow
let are_equal = deep_tree == cloned;      // No stack overflow
println!("{:?}", deep_tree);              // No stack overflow
drop(deep_tree);                          // No stack overflow

Debug-Time Safety Checks

StackSafe<T> exposes the wrapped value through Rust’s Deref trait, allowing transparent access to the underlying data. What’s more, it includes an important safety mechanism: in debug builds, it checks whether the current function is properly annotated with #[stacksafe] whenever you access the wrapped value.

fn unsafe_tree_sum(tree: &BinaryTree) -> i32 {
    match tree {
        BinaryTree::Leaf(value) => *value,
        BinaryTree::Node { value, left, right } => {
            // This will panic in debug builds:
            // "StackSafe should only be accessed within a stack-safe context"
            // Missing #[stacksafe] annotation!
            value + unsafe_tree_sum(left) + unsafe_tree_sum(right)
        }
    }
}

#[stacksafe]
fn safe_tree_sum(tree: &BinaryTree) -> i32 {
    match tree {
        BinaryTree::Leaf(value) => *value,
        BinaryTree::Node { value, left, right } => {
            // Works fine - properly protected
            value + safe_tree_sum(left) + safe_tree_sum(right)
        }
    }
}

This debug-time check helps you identify all potential stack overflow locations during development, rather than discovering them in production when they cause crashes.

Adopting `StackSafe` in Existing Code

Converting existing recursive code is straightforward. Here’s a real-world example:

Before (crash-prone):

#[derive(Debug, Clone)]
pub enum JsonValue {
    Object(HashMap<String, JsonValue>),
    Array(Vec<JsonValue>),
    String(String),
    Number(f64),
}

fn parse_json(input: &str) -> JsonValue {
    parse_value(&mut tokenize(input))
}

fn stringify(value: &JsonValue) -> String {
    match value {
        JsonValue::Object(map) => {
            let items: Vec<_> = map.iter()
                .map(|(k, v)| format!("\"{}\":{}", k, stringify(v)))
                .collect();
            format!("{{{}}}", items.join(","))
        }
        // ...other cases
    }
}

After (stack-safe):

use stacksafe::{stacksafe, StackSafe};

#[derive(Debug, Clone)]
pub enum JsonValue {
    Object(HashMap<String, StackSafe<JsonValue>>),  // Wrap recursive fields
    Array(Vec<StackSafe<JsonValue>>),               // Wrap recursive fields
    String(String),
    Number(f64),
}

fn parse_json(input: &str) -> JsonValue {
    parse_value(&mut tokenize(input))
}

#[stacksafe]  // Add attribute to recursive functions
fn stringify(value: &JsonValue) -> String {
    match value {
        JsonValue::Object(map) => {
            let items: Vec<_> = map.iter()
                .map(|(k, v)| format!("\"{}\":{}", k, stringify(v)))
                .collect();
            format!("{{{}}}", items.join(","))
        }
        // ...other cases
    }
}

The changes are minimal, but the result is a completely stack-safe JSON processor that can handle arbitrarily deep nesting.

Conclusion

StackSafe eliminates the fundamental tension between writing elegant recursive code and avoiding stack overflows. By handling both recursive functions and data structures comprehensively, it lets you focus on algorithm logic rather than stack management.

Simple adoption: Add #[stacksafe] to functions and StackSafe<T> to recursive fields
Complete protection: Covers both function calls and trait operations

Resources

Crate: https://crates.io/crates/stacksafe
Documents: https://docs.rs/stacksafe
GitHub: https://github.com/fast/stacksafe

Fastrace: A Modern Approach to Distributed Tracing in Rust

Sat, 22 Mar 2025 00:00:00 GMT

TL;DR

Distributed tracing is critical for understanding modern microservice architectures. While tokio-rs/tracing is widely used in Rust, it comes with significant challenges: ecosystem fragmentation, complex configuration, and high overhead.

Fastrace provides a production-ready solution with seamless ecosystem integration, out-of-box OpenTelemetry support, and a more straightforward API that works naturally with the existing logging infrastructure.

The following example demonstrates how to trace functions with fastrace:

#[fastrace::trace]
pub fn send_request(req: HttpRequest) -> Result<(), Error> {
    // ...
}

It’s being used in production by products like ScopeDB, where it helps trace and debug petabyte-scale observability data workloads.

Distributed Tracing Visualization

Why Distributed Tracing Matters

Understanding what is happening inside your applications has never been more challenging in today’s microservices and distributed systems. A user request might touch dozens of services before completion, and traditional logging approaches quickly fall short.

Consider a typical request flow:

User → API Gateway → Auth Service → User Service → Database

When an exception is thrown, or the app performs poorly, where exactly is the root cause? Individual service logs only show fragments of the whole trace, lacking the crucial context of how the request flows through your entire system.

This is where distributed tracing becomes essential. Tracing creates a connected view of your request’s flow across service boundaries, making it possible to:

Identify performance bottlenecks across services
Debug complex interactions between components
Understand dependencies and service relationships
Analyze latency distributions and outliers
Correlate logs and metrics with request context

A Famous Approach: `tokio-rs/tracing`

For some Rust developers, tokio-rs/tracing is the go-to solution for application instrumentation. Let’s look at how a typical implementation works:

fn main() {
    // Initialize the tracing subscriber
    // Complex configuration code omitted...

    // Create a span and record some data
    let span = tracing::info_span!("processing_request",
        user_id = 42,
        request_id = "abcd1234"
    );

    // Enter the span (activates it for the current execution context)
    let _guard = span.enter();

    // Log within the span context
    tracing::info!("Starting request processing");

    process_data();

    tracing::info!("Finished processing request");
}

For instrumenting functions, tokio-rs/tracing provides attribute macros:

#[tracing::instrument(skip(password), fields(user_id = user.id))]
async fn authenticate(user: &User, password: &str) -> Result<AuthToken, AuthError> {
    tracing::info!("Authenticating user {}", user.id);
    // ...more code...
}

The Challenges with `tokio-rs/tracing`

According to our previous user experience, tokio-rs/tracing comes with several significant challenges:

1. Ecosystem Fragmentation

By introducing its own logging macros, tokio-rs/tracing creates a division with code using the standard log crate:

// Using log crate
log::info!("Starting operation");

// Using tracing crate (different syntax)
tracing::info!("Starting operation");

This fragmentation is particularly problematic for library authors. When creating a library, authors face a difficult choice:

Use the log crate for compatibility with the broader ecosystem
Use tokio-rs/tracing for better observability features

Many libraries choose the first option for simplicity, but miss out on the benefits of tracing.

While tokio-rs/tracing does provide a feature flag ‘log’ that allows emitting log records to the log crate when using tokio-rs/tracing’s macros, library authors must manually enable this feature flag to ensure all users properly receive log records regardless of which logging framework they use. This creates additional configuration complexity for library maintainers.

Furthermore, applications using tokio-rs/tracing must additionally install and configure the tracing-log bridge to properly receive log records from libraries that use the log crate. This creates a bidirectional compatibility problem requiring explicit configuration:

# Library's Cargo.toml
[dependencies]
tracing = { version = "0.1", features = ["log"] }  # Emit log records for log compatibility

# Application's Cargo.toml
[dependencies]
tracing = "0.1"
tracing-log = "0.2"  # Listen to log records for log compatibility

2. Performance Impact for Libraries

Library authors are particularly sensitive to performance overhead, as their code may be called in tight loops or performance-critical paths. tokio-rs/tracing’s overhead can be substantial when instrumented, which creates a dilemma:

Always instrument tracing (and impose overhead on all users)
Don’t instrument at all (and lose observability)
Create an additional feature flag system (increasing maintenance burden)

Here is a common pattern in libraries using tokio-rs/tracing:
```
#[cfg_attr(feature = "tracing", tracing::instrument(skip(password), fields(user_id = user.id)))]
async fn authenticate(user: &User, password: &str) -> Result<AuthToken, AuthError> {
    // ...more code...
}
```
Different libraries may define feature flags with subtly different names, making it hard for the final application to configure all of them.

With tokio-rs/tracing, there’s no clean way to have tracing zero-cost disabled. This makes library authors reluctant to add instrumentation to performance-sensitive code paths.

3. No Context Propagation

Distributed tracing requires propagating context across service boundaries, but tokio-rs/tracing leaves this largely as an exercise for the developer. For example, this is tonic’s official example demonstrating how to trace a gRPC service:

Server::builder()
    .trace_fn(|_| tracing::info_span!("grpc_server"))
    .add_service(MyServiceServer::new(MyService::default()))
    .serve(addr)
    .await?;

The above example only creates a basic span but doesn’t extract tracing context from the incoming request.

The consequences of missing context propagation are severe in distributed systems. When a trace disconnects due to missing context:

Instead of seeing a complete flow of a request like:

Trace #1: Frontend → API Gateway → User Service → Database → Response

You’ll see disconnected fragments from a request:

Trace #1: Frontend → API Gateway
Trace #2: User Service → Database
Trace #3: API Gateway → Response

Even worse, when multiple requests are interleaved, the traces become a chaotic mess:

Trace #1: Frontend → API Gateway
Trace #2: Frontend → API Gateway
Trace #3: Frontend → API Gateway
Trace #4: User Service → Database
Trace #6: API Gateway → Response
Trace #5: User Service → Database

This fragmentation makes it extremely difficult to follow request flows, isolate performance issues, or understand causal relationships between services.

Introducing `fastrace`: A Fast and Complete Solution

1. Zero-cost Abstraction

fastrace is designed with real zero-cost abstraction. When disabled, instrumentations are completely omitted from compilation, resulting in no runtime overhead. This makes it ideal for libraries concerned about performance.

2. Ecosystem Compatibility

fastrace focuses exclusively on distributed tracing. Through its composable design, it integrates seamlessly with the existing Rust ecosystem, including compatibility with the standard log crate. This architectural approach allows libraries to implement comprehensive tracing while preserving their users’ freedom to use their preferred logging setup.

3. Simplicity First

The API is designed to be intuitive and require minimal boilerplate, focusing on the most common use cases while still providing extensibility when needed.

4. Insanely Fast

Fastrace Performance

fastrace is designed for high-performance applications. It can handle massive amounts of spans with minimal impact on CPU and memory usage.

5. Ergonomic for both Applications and Libraries

Libraries can use fastrace without imposing performance penalties when not needed:

#[fastrace::trace]  // Zero-cost when the application doesn't enable the 'enable' feature
pub fn process_data(data: &[u8]) -> Result<Vec<u8>, Error> {
    // Library uses standard log crate
    log::debug!("Processing {} bytes of data", data.len());

    // ...more code...
}

The key point here is that libraries should include fastrace without enabling any features:

[dependencies]
fastrace = "0.7"  # No 'enable' feature

When an application uses this library and doesn’t enable the ‘enable’ feature of fastrace:

All tracing code is completely optimized away at compile time
Zero runtime overhead is added to the library
No impact on performance-critical code paths

When the application does enable tracing via the ‘enable’ feature:

Instrumentation in the dedicated library becomes active
Spans are collected and reported
The application gets full visibility into library behavior

This is a significant advantage over other tracing solutions that either always impose overhead or require libraries to implement complex feature-flag systems.

6. Seamless Context Propagation

fastrace provides companion crates for popular frameworks that handle context propagation automatically:

// For HTTP clients with reqwest
let response = client.get(&format!("https://user-service/users/{}", user_id))
    .headers(fastrace_reqwest::traceparent_headers())  // Automatically inject trace context
    .send()
    .await?;

// For gRPC servers with tonic
Server::builder()
    .layer(fastrace_tonic::FastraceServerLayer)  // Automatically extracts context from incoming requests
    .add_service(MyServiceServer::new(MyService::default()))
    .serve(addr);

// For gRPC clients
let channel = ServiceBuilder::new()
    .layer(fastrace_tonic::FastraceClientLayer)  // Automatically injects context into outgoing requests
    .service(channel);

// For data access with Apache OpenDAL
let op = Operator::new(services::Memory::default())?
    .layer(opendal::layers::FastraceLayer)  // Automatically traces all data operations
    .finish();
op.write("test", "0".repeat(16 * 1024 * 1024).into_bytes())
    .await?;

This provides out-of-box distributed tracing without manual context handling.

The Complete Solution: `fastrace` + `log` + `logforth`

fastrace deliberately focuses on doing one thing well: tracing. Through its composable design and the Rust’s great ecosystem, a powerful combination emerges:

log: The standard Rust logging facade
logforth: A flexible logging implementation with industrial-ready features
fastrace: High-performance tracing with distributed context propagation

This integration allows automatically associating your logs with trace spans, providing correlation without requiring using different logging macros:

log::info!("Processing started");

// Later, in your logging infrastructure, you can see which trace and span
// each log entry belongs to.

To illustrate the simplicity of this approach, here’s a streamlined example of building a microservice with complete observability:

#[poem::handler]
#[fastrace::trace]  // Automatically creates and manages spans
async fn get_user(Path(user_id): Path<String>) -> Json<User> {
    // Standard log calls are automatically associated with the current span
    log::info!("Fetching user {}", user_id);

    let user_details = fetch_user_details(&user_id).await;

    Json(User {
        id: user_id,
        name: user_details.name,
        email: user_details.email,
    })
}

#[fastrace::trace]
async fn fetch_user_details(user_id: &str) -> UserDetails {
    let client = reqwest::Client::new();

    let response = client.get(&format!("https://user-details-service/users/{}", user_id))
        .headers(fastrace_reqwest::traceparent_headers())  // Automatic trace context propagation
        .send()
        .await
        .expect("Request failed");

    response.json::<UserDetails>().await.expect("Failed to parse JSON")
}

#[tokio::main]
async fn main() {
    // Configure logging and tracing
    setup_observability("user-service");

    let app = poem::Route::new()
        .at("/users/:id", poem::get(get_user))
        .with(fastrace_poem::FastraceMiddleware); // Automatic trace context extraction

    poem::Server::new(poem::listener::TcpListener::bind("0.0.0.0:3000"))
        .run(app)
        .await
        .unwrap();

    fastrace::flush();
}

fn setup_observability(service_name: &str) {
    // Setup logging with logforth
    logforth::stderr()
        .dispatch(|d| {
            d.filter(log::LevelFilter::Info)
                // Attaches trace id to logs
                .diagnostic(logforth::diagnostic::FastraceDiagnostic::default())
                // Attaches logs to spans
                .append(logforth::append::FastraceEvent::default())
        })
        .apply();

    // Setup tracing with fastrace
    fastrace::set_reporter(
        fastrace_jaeger::JaegerReporter::new("127.0.0.1:6831".parse().unwrap(), service_name).unwrap(),
        fastrace::collector::Config::default()
    );
}

Conclusion

fastrace represents a modern approach to distributed tracing in Rust. The most significant advantages of fastrace are:

Zero Runtime Overhead When Disabled: Libraries can add rich instrumentation without worrying about performance impact when tracing is not enabled by the application.
No Ecosystem Lock-In: Libraries can use fastrace without forcing their users into a specific logging ecosystem.
Simple API Surface: The minimal API surface makes it easy to add comprehensive tracing with little code.
Predictable Performance: fastrace’s performance characteristics are consistent and predictable, even under high load.

An ecosystem where libraries are comprehensively instrumented with fastrace would enable unprecedented visibility into applications, without the performance or compatibility concerns that have historically prevented such instrumentation.

FastLabs / Blog

Stop Forwarding Errors, Start Designing Them

What’s Wrong with Current Practices

The std::error::Error Trait: A Noble but Flawed Abstraction

Backtraces: Expensive Medicine for the Wrong Disease

The Provide/Request API: Overengineering in Action

thiserror: Categorizing by Origin, Not by Action

anyhow: So Convenient You’ll Forget to Add Context

The Problem: Error Handling Without Purpose

The “Library vs Application” Myth

Two Audiences, Two Needs

For Machines: Flat, Actionable, Kind-Based

For Humans: Low-Friction Context Capture

Putting It Together

Closing thought

Resources

StackSafe: Taming Recursion in Rust Without Stack Overflow

TL;DR

The Stack Overflow Problem

Existing Solutions

Manual Transformation to Iterative Code

Lower-Level Crates: stacker and recursive

StackSafe: The Complete Solution

Recursive Functions Made Safe

Recursive Data Structures Made Safe

Debug-Time Safety Checks

Adopting StackSafe in Existing Code

Conclusion

Resources

Fastrace: A Modern Approach to Distributed Tracing in Rust

TL;DR

Why Distributed Tracing Matters

A Famous Approach: tokio-rs/tracing

The Challenges with tokio-rs/tracing

1. Ecosystem Fragmentation

2. Performance Impact for Libraries

3. No Context Propagation

Introducing fastrace: A Fast and Complete Solution

1. Zero-cost Abstraction

2. Ecosystem Compatibility

3. Simplicity First

4. Insanely Fast

5. Ergonomic for both Applications and Libraries

6. Seamless Context Propagation

The Complete Solution: fastrace + log + logforth

Conclusion

Resources

The `std::error::Error` Trait: A Noble but Flawed Abstraction

`thiserror`: Categorizing by Origin, Not by Action

`anyhow`: So Convenient You’ll Forget to Add Context

Lower-Level Crates: `stacker` and `recursive`

`StackSafe`: The Complete Solution

Adopting `StackSafe` in Existing Code

A Famous Approach: `tokio-rs/tracing`

The Challenges with `tokio-rs/tracing`

Introducing `fastrace`: A Fast and Complete Solution

The Complete Solution: `fastrace` + `log` + `logforth`