FastLabs / BlogWe develop fast Rust crates and release them fast.https://fast.github.ioStop Forwarding Errors, Start Designing Themhttps://fast.github.io/blog/stop-forwarding-errors-start-designing-themhttps://fast.github.io/blog/stop-forwarding-errors-start-designing-themSun, 04 Jan 2026 00:00:00 GMT<p>It’s 3am. Production is down. You’re staring at a log line that says:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>Error: serialization error: expected ',' or '}' at line 3, column 7</span></div></div></code></pre><div><div></div></div></figure></div> <p>You know JSON is broken. But you have zero idea <em>why</em>, <em>where</em>, or <em>who</em> caused it. Was it the config loader? The user API? The webhook consumer?</p> <p>The error has successfully bubbled up through 20 layers of your stack, preserving its original message perfectly, yet losing every scrap of meaning along the way.</p> <p>We have a name for this. We call it “Error Handling.” But in reality, it’s just <strong>Error Forwarding</strong>. We treat errors like hot potatoes—catch them, wrap them (maybe), and throw them up the stack as fast as possible.</p> <p>You add a <code>println!</code>, restart the service, wait for the bug to reproduce. It’s going to be a long night.</p> <p>As noted in a <a href="https://bugenzhao.com/2024/04/24/error-handling-1/">detailed analysis of error handling in a large Rust project</a>:</p> <blockquote> <p>“There’re tons of opinionated articles or libraries promoting their best practices, leading to an epic debate that never ends. We were all starting to notice that there was something wrong with the error handling practices, but pinpointing the exact problems is challenging.”</p> </blockquote> <hr /> <h2>What’s Wrong with Current Practices</h2> <h3>The <code>std::error::Error</code> Trait: A Noble but Flawed Abstraction</h3> <p>The standard <code>Error</code> trait is built around <code>source()</code>: one error optionally points to another. That matches a lot of failures.</p> <p>But some of the nastiest problems aren’t a single line of causality. Validation can fail in five places at once. A batch operation can partially succeed. Timeouts can come with partial results. Those want something closer to a set or a tree of causes, not a single chain.</p> <h3>Backtraces: Expensive Medicine for the Wrong Disease</h3> <p>Rust’s <code>std::backtrace::Backtrace</code> was meant to improve error observability. It’s better than nothing. But they have serious limitations:</p> <p><strong>In async code, they can be noisy or misleading.</strong> Your backtrace will contain <a href="https://github.com/rust-lang/rust/issues/74779">49 stack frames, of which 12 are calls to <code>GenFuture::poll()</code></a>. The <a href="https://rust-lang.github.io/wg-async/design_docs/async_stack_traces.html">Async Working Group notes</a> that suspended tasks are invisible to traditional stack traces.</p> <p><strong>They only show the origin, not the path.</strong> A backtrace tells you where the error was <em>created</em>, not the logical path it took through your application. It won’t tell you “this was the request handler for user X, calling service Y, with parameters Z.”</p> <p><strong>Capturing backtraces is expensive.</strong> The standard library documentation acknowledges: “Capturing a backtrace can be a quite expensive runtime operation.”</p> <h3>The Provide/Request API: Overengineering in Action</h3> <p>The <a href="https://github.com/rust-lang/rust/issues/96024">Provider API (RFC 3192)</a> and <a href="https://github.com/rust-lang/rfcs/pull/2895">generic member access (RFC 2895)</a> add dynamic type-based data access to errors:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>fn</span><span> </span><span>provide</span><span>&lt;'</span><span>a</span><span>&gt;(</span><span>&amp;</span><span>'</span><span>a</span><span> </span><span>self</span><span>, request</span><span>:</span><span> </span><span>&amp;mut</span><span> </span><span>Request</span><span>&lt;'</span><span>a</span><span>&gt;) {</span></div></div><div><div><span><span> </span></span><span>request</span><span>.</span><span>provide_ref</span><span>::</span><span>&lt;</span><span>Backtrace</span><span>&gt;(</span><span>&amp;</span><span>self</span><span>.</span><span>backtrace);</span></div></div><div><div><span>}</span></div></div></code></pre><div><div></div></div></figure></div> <p>The unstable <code>Provide</code>/<code>Request</code> API represents the latest attempt to make errors more flexible. The idea: errors can dynamically provide typed context (like HTTP status codes or backtraces) that callers can request at runtime. In practice, it introduces new problems:</p> <p><strong>Unpredictability</strong>: Your error <em>might</em> provide an HTTP status code. Or it might not. You won’t know until runtime.</p> <p><strong>Complexity</strong>: The API is subtle enough that <a href="https://github.com/rust-lang/rfcs/pull/3192#issuecomment-1018020335">LLVM struggles to optimize multiple provide calls</a>.</p> <p>Most of the time, a boring struct with named fields is still the thing you want.</p> <h3><code>thiserror</code>: Categorizing by Origin, Not by Action</h3> <p><code>thiserror</code> makes it easy to define error enums:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>#[derive(</span><span>Debug</span><span>, thiserror</span><span>::</span><span>Error</span><span>)]</span></div></div><div><div><span>pub</span><span> </span><span>enum</span><span> </span><span>DatabaseError</span><span> {</span></div></div><div><div><span><span> </span></span><span>#[error(</span><span>"connection failed: {0}"</span><span>)]</span></div></div><div><div><span> </span><span>Connection</span><span>(#[from] </span><span>ConnectionError</span><span>),</span></div></div><div><div><span><span> </span></span><span>#[error(</span><span>"query failed: {0}"</span><span>)]</span></div></div><div><div><span> </span><span>Query</span><span>(#[from] </span><span>QueryError</span><span>),</span></div></div><div><div><span><span> </span></span><span>#[error(</span><span>"serialization failed: {0}"</span><span>)]</span></div></div><div><div><span> </span><span>Serde</span><span>(#[from] </span><span>serde_json</span><span>::</span><span>Error</span><span>),</span></div></div><div><div><span>}</span></div></div></code></pre><div><div></div></div></figure></div> <p>This looks reasonable. But notice how this common practice categorizes errors: by origin, not by what the caller can do about it.</p> <p>When you receive a <code>DatabaseError::Query</code>, what should you do? Retry? Report raw SQL to the user? The error doesn’t tell you. It just tells you which dependency failed.</p> <p>As one blogger <a href="https://mmapped.blog/posts/12-rust-error-handling">aptly put it</a>: “This error type does not tell the caller what problem you are solving but how you solve it.”</p> <h3><code>anyhow</code>: So Convenient You’ll Forget to Add Context</h3> <p><code>anyhow</code> takes the opposite approach: type erasure. Just use <code>anyhow::Result&lt;T&gt;</code> everywhere and propagate with <code>?</code>. No more enum variants, no more <code>#[from]</code> annotations.</p> <p>The problem is that it’s <em>too</em> convenient.</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>fn</span><span> </span><span>process_request</span><span>(req</span><span>:</span><span> </span><span>Request</span><span>) </span><span>-&gt;</span><span> </span><span>anyhow</span><span>::</span><span>Result</span><span>&lt;</span><span>Response</span><span>&gt; {</span></div></div><div><div><span> </span><span>let</span><span> user </span><span>=</span><span> db</span><span>.</span><span>get_user</span><span>(req</span><span>.</span><span>user_id)</span><span>?</span><span>;</span></div></div><div><div><span> </span><span>let</span><span> data </span><span>=</span><span> </span><span>fetch_external_api</span><span>(user</span><span>.</span><span>api_key)</span><span>?</span><span>;</span></div></div><div><div><span> </span><span>let</span><span> result </span><span>=</span><span> </span><span>compute</span><span>(data)</span><span>?</span><span>;</span></div></div><div><div><span> </span><span>Ok</span><span>(result)</span></div></div><div><div><span>}</span></div></div></code></pre><div><div></div></div></figure></div> <p>Every <code>?</code> is a missed opportunity to add context. What was the user ID? What API were we calling? What computation failed? The error knows none of this.</p> <p>The <code>anyhow</code> documentation encourages using <code>.context()</code> to add information. But <code>.context()</code> is optional—the type system doesn’t require it. And “I’ll add context later” is the easiest lie to tell yourself.</p> <hr /> <h2>The Problem: Error Handling Without Purpose</h2> <p>Consider this common pattern in Rust codebases:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>#[derive(thiserror</span><span>::</span><span>Error</span><span>, </span><span>Debug</span><span>)]</span></div></div><div><div><span>pub</span><span> </span><span>enum</span><span> </span><span>ServiceError</span><span> {</span></div></div><div><div><span><span> </span></span><span>#[error(</span><span>"database error: {0}"</span><span>)]</span></div></div><div><div><span> </span><span>Database</span><span>(#[from] </span><span>sqlx</span><span>::</span><span>Error</span><span>),</span></div></div><div><div><span><span> </span></span><span>#[error(</span><span>"http error: {0}"</span><span>)]</span></div></div><div><div><span> </span><span>Http</span><span>(#[from] </span><span>reqwest</span><span>::</span><span>Error</span><span>),</span></div></div><div><div><span><span> </span></span><span>#[error(</span><span>"serialization error: {0}"</span><span>)]</span></div></div><div><div><span> </span><span>Serde</span><span>(#[from] </span><span>serde_json</span><span>::</span><span>Error</span><span>),</span></div></div><div><div><span> </span><span>// ... ten more variants</span></div></div><div><div><span>}</span></div></div></code></pre><div><div></div></div></figure></div> <p>It looks neat, well-structured, and it compiles. But pause and ask:</p> <ul> <li> <p>If you are holding a <code>DatabaseError::Query</code>, is it retryable? Should you show the raw SQL error to users? The error type doesn’t help answer these questions.</p> </li> <li> <p>When debugging, does “serialization error: expected <code>,</code> or <code>}</code>” tell you which request, which field, which code path led here?</p> </li> </ul> <p>This is the fundamental disconnect in how we think about error handling. We focus on <em>propagating</em> errors exactly, on making the types line up. But we forget that errors are messages—messages that will eventually be read by either a machine trying to recover, or a human trying to debug.</p> <h2>The “Library vs Application” Myth</h2> <p>You’ve probably heard the conventional wisdom: <em>“Use <code>thiserror</code> for libraries, <code>anyhow</code> for applications.”</em></p> <p>It’s a nice, simple rule, just not quite right. As <a href="https://lpalmieri.com/posts/error-handling-rust/">Luca Palmieri notes</a>: “It is not the right framing. You need to reason about intent.”</p> <p>The real question isn’t whether you’re writing a library or an application. The real question is: <strong>what do you expect the caller to do with this error?</strong></p> <h2>Two Audiences, Two Needs</h2> <table><thead><tr><th>Audience</th><th>Goal</th><th>Needs</th></tr></thead><tbody><tr><td><strong>Machines</strong></td><td>Automated recovery</td><td>Flat structure, clear error kinds, predictable codes</td></tr><tr><td><strong>Humans</strong></td><td>Debugging</td><td>Rich context, call path, business-level information</td></tr></tbody></table> <p>Most error handling designs optimize for neither. They optimize for <em>the compiler</em>.</p> <h3>For Machines: Flat, Actionable, Kind-Based</h3> <p>When errors need to be handled programmatically, complexity is the enemy. Your retry logic doesn’t want to traverse a nested error chain checking for specific variants. It wants to ask: <code>is_retryable()?</code></p> <p><a href="https://github.com/apache/opendal/pull/977">Apache OpenDAL’s error design</a> shows one way to do this:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>pub</span><span> </span><span>struct</span><span> </span><span>Error</span><span> {</span></div></div><div><div><span><span> </span></span><span>kind</span><span>:</span><span> </span><span>ErrorKind</span><span>,</span></div></div><div><div><span><span> </span></span><span>message</span><span>:</span><span> </span><span>String</span><span>,</span></div></div><div><div><span><span> </span></span><span>status</span><span>:</span><span> </span><span>ErrorStatus</span><span>,</span></div></div><div><div><span><span> </span></span><span>operation</span><span>:</span><span> </span><span>&amp;</span><span>'</span><span>static</span><span> </span><span>str</span><span>,</span></div></div><div><div><span><span> </span></span><span>context</span><span>:</span><span> </span><span>Vec</span><span>&lt;(</span><span>&amp;</span><span>'</span><span>static</span><span> </span><span>str</span><span>, </span><span>String</span><span>)&gt;,</span></div></div><div><div><span><span> </span></span><span>source</span><span>:</span><span> </span><span>Option</span><span>&lt;anyhow</span><span>::</span><span>Error</span><span>&gt;,</span></div></div><div><div><span>}</span></div></div><div><div> </div></div><div><div><span>pub</span><span> </span><span>enum</span><span> </span><span>ErrorKind</span><span> {</span></div></div><div><div><span> </span><span>NotFound</span><span>,</span></div></div><div><div><span> </span><span>PermissionDenied</span><span>,</span></div></div><div><div><span> </span><span>RateLimited</span><span>,</span></div></div><div><div><span> </span><span>// ... categorized by what the caller CAN DO</span></div></div><div><div><span>}</span></div></div><div><div> </div></div><div><div><span>pub</span><span> </span><span>enum</span><span> </span><span>ErrorStatus</span><span> {</span></div></div><div><div><span> </span><span>Permanent</span><span>, </span><span>// Don't retry</span></div></div><div><div><span> </span><span>Temporary</span><span>, </span><span>// Safe to retry</span></div></div><div><div><span> </span><span>Persistent</span><span>, </span><span>// Was retried, still failing</span></div></div><div><div><span>}</span></div></div></code></pre><div><div></div></div></figure></div> <p>Then the call site stays straightforward:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>match</span><span> result {</span></div></div><div><div><span> </span><span>Err</span><span>(e) </span><span>if</span><span> e</span><span>.</span><span>kind</span><span>() </span><span>==</span><span> </span><span>ErrorKind</span><span>::</span><span>RateLimited</span><span> </span><span>&amp;&amp;</span><span> e</span><span>.</span><span>is_temporary</span><span>() </span><span>=&gt;</span><span> {</span></div></div><div><div><span> </span><span>sleep</span><span>(</span><span>Duration</span><span>::</span><span>from_secs</span><span>(</span><span>1</span><span>))</span><span>.await</span><span>;</span></div></div><div><div><span> </span><span>retry</span><span>()</span><span>.await</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span> </span><span>Err</span><span>(e) </span><span>if</span><span> e</span><span>.</span><span>kind</span><span>() </span><span>==</span><span> </span><span>ErrorKind</span><span>::</span><span>NotFound</span><span> </span><span>=&gt;</span><span> {</span></div></div><div><div><span> </span><span>create_default</span><span>()</span><span>.await</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span> </span><span>Err</span><span>(e) </span><span>=&gt;</span><span> </span><span>return</span><span> </span><span>Err</span><span>(e),</span></div></div><div><div><span> </span><span>Ok</span><span>(v) </span><span>=&gt;</span><span> v,</span></div></div><div><div><span>}</span></div></div></code></pre><div><div></div></div></figure></div> <p>A few things to note:</p> <p><strong>ErrorKind is categorized by response, not origin.</strong> <code>NotFound</code> means “the thing doesn’t exist, don’t retry.” <code>RateLimited</code> means “slow down and try again.” The caller doesn’t need to know whether it was an S3 404 or a filesystem ENOENT—they need to know what to do about it.</p> <p><strong>ErrorStatus is explicit.</strong> Instead of guessing retryability from error types, it’s a first-class field. Services can mark errors as temporary when they know a retry might help.</p> <p><strong>One Error type per library.</strong> Instead of scattering error enums across modules, a single flat structure keeps things simple. The <code>context</code> field provides all the specificity you need without type proliferation.</p> <p>No more traversing error chains, no more guessing from error types. Just ask the error directly.</p> <h3>For Humans: Low-Friction Context Capture</h3> <p>The biggest enemy of good error context isn’t capability—it’s friction. If adding context is annoying, developers won’t do it.</p> <p>The <a href="https://github.com/fast/exn">exn</a> library (294 lines of Rust, zero dependencies) demonstrates one approach: errors form a <em>tree</em> of frames, each automatically capturing its source location via <code>#[track_caller]</code>. Unlike linear error chains, trees can represent multiple causes—useful when parallel operations fail or validation produces multiple errors.</p> <p>The key ingredients:</p> <p><strong>Automatic location capture.</strong> Instead of expensive backtraces, use <code>#[track_caller]</code> to capture file/line/column at <strong>zero cost</strong>. Every error frame should know where it was created.</p> <p><strong>Ergonomic context addition.</strong> The API for adding context should be so natural that <em>not</em> adding it feels wrong:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>fetch_user</span><span>(user_id)</span></div></div><div><div><span> </span><span>.</span><span>or_raise</span><span>(</span><span>||</span><span> </span><span>AppError</span><span>(</span><span>format!</span><span>(</span><span>"failed to fetch user {user_id}"</span><span>)))</span><span>?</span><span>;</span></div></div></code></pre><div><div></div></div></figure></div> <p>Compare this to <code>thiserror</code>, where adding the same context requires defining a new variant and manual wrapping:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>#[derive(thiserror</span><span>::</span><span>Error</span><span>, </span><span>Debug</span><span>)]</span></div></div><div><div><span>pub</span><span> </span><span>enum</span><span> </span><span>AppError</span><span> {</span></div></div><div><div><span><span> </span></span><span>#[error(</span><span>"failed to fetch user {user_id}: {source}"</span><span>)]</span></div></div><div><div><span> </span><span>FetchUser</span><span> {</span></div></div><div><div><span><span> </span></span><span>user_id</span><span>:</span><span> </span><span>String</span><span>,</span></div></div><div><div><span><span> </span></span><span>#[source]</span></div></div><div><div><span><span> </span></span><span>source</span><span>:</span><span> </span><span>DbError</span><span>,</span></div></div><div><div><span><span> </span></span><span>},</span></div></div><div><div><span> </span><span>// ... one variant per call site that needs context</span></div></div><div><div><span>}</span></div></div><div><div> </div></div><div><div><span>fn</span><span> </span><span>fetch_user</span><span>(user_id</span><span>:</span><span> </span><span>&amp;</span><span>str</span><span>) </span><span>-&gt;</span><span> </span><span>Result</span><span>&lt;</span><span>User</span><span>, </span><span>AppError</span><span>&gt; {</span></div></div><div><div><span><span> </span></span><span>db</span><span>.</span><span>query</span><span>(user_id)</span><span>.</span><span>map_err</span><span>(</span><span>|</span><span>e</span><span>|</span><span> </span><span>AppError</span><span>::</span><span>FetchUser</span><span> {</span></div></div><div><div><span><span> </span></span><span>user_id</span><span>:</span><span> user_id</span><span>.</span><span>to_string</span><span>(),</span></div></div><div><div><span><span> </span></span><span>source</span><span>:</span><span> e,</span></div></div><div><div><span><span> </span></span><span>})</span><span>?</span></div></div><div><div><span>}</span></div></div></code></pre><div><div></div></div></figure></div> <p><strong>Enforce context at module boundaries.</strong> This is where exn differs critically from <code>anyhow</code>. With <code>anyhow</code>, every error is erased to <code>anyhow::Error</code>, so you can always use <code>?</code> and move on—the type system won’t stop you. The context methods exist, but <em>nothing</em> prevents you from ignoring them.</p> <p>exn takes a different approach: <code>Exn&lt;E&gt;</code> preserves the outermost error type. If your function returns <code>Result&lt;T, Exn&lt;ServiceError&gt;&gt;</code>, you can’t directly <code>?</code> a <code>Result&lt;U, Exn&lt;DatabaseError&gt;&gt;</code>—the types don’t match. The compiler <em>forces</em> you to call <code>or_raise()</code> and provide a <code>ServiceError</code>, which is exactly the moment you should be adding context about what your module was trying to do.</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>// This won't compile--type mismatch forces you to add context</span></div></div><div><div><span>pub</span><span> </span><span>fn</span><span> </span><span>fetch_user</span><span>(user_id</span><span>:</span><span> </span><span>&amp;</span><span>str</span><span>) </span><span>-&gt;</span><span> </span><span>Result</span><span>&lt;</span><span>User</span><span>, </span><span>Exn</span><span>&lt;</span><span>ServiceError</span><span>&gt;&gt; {</span></div></div><div><div><span> </span><span>let</span><span> user </span><span>=</span><span> db</span><span>.</span><span>query</span><span>(user_id)</span><span>?</span><span>; </span><span>// Error: expected Exn&lt;ServiceError&gt;, found Exn&lt;DbError&gt;</span></div></div><div><div><span> </span><span>Ok</span><span>(user)</span></div></div><div><div><span>}</span></div></div><div><div> </div></div><div><div><span>// You must provide context at the boundary</span></div></div><div><div><span>pub</span><span> </span><span>fn</span><span> </span><span>fetch_user</span><span>(user_id</span><span>:</span><span> </span><span>&amp;</span><span>str</span><span>) </span><span>-&gt;</span><span> </span><span>Result</span><span>&lt;</span><span>User</span><span>, </span><span>Exn</span><span>&lt;</span><span>ServiceError</span><span>&gt;&gt; {</span></div></div><div><div><span> </span><span>let</span><span> user </span><span>=</span><span> db</span><span>.</span><span>query</span><span>(user_id)</span></div></div><div><div><span> </span><span>.</span><span>or_raise</span><span>(</span><span>||</span><span> </span><span>ServiceError</span><span>(</span><span>format!</span><span>(</span><span>"failed to fetch user {user_id}"</span><span>)))</span><span>?</span><span>; </span><span>// Now it compiles</span></div></div><div><div><span> </span><span>Ok</span><span>(user)</span></div></div><div><div><span>}</span></div></div></code></pre><div><div></div></div></figure></div> <p>The type system becomes your ally: it won’t let you be lazy at module boundaries.</p> <p>In practice:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>pub</span><span> </span><span>async</span><span> </span><span>fn</span><span> </span><span>execute</span><span>(</span><span>&amp;</span><span>self</span><span>, task</span><span>:</span><span> </span><span>Task</span><span>) </span><span>-&gt;</span><span> </span><span>Result</span><span>&lt;</span><span>Output</span><span>, </span><span>ExecutorError</span><span>&gt; {</span></div></div><div><div><span> </span><span>let</span><span> make_error </span><span>=</span><span> </span><span>||</span><span> </span><span>ExecutorError</span><span>(</span><span>format!</span><span>(</span><span>"failed to execute task {}"</span><span>, task</span><span>.</span><span>id));</span></div></div><div><div> </div></div><div><div><span> </span><span>let</span><span> user </span><span>=</span><span> </span><span>self</span><span>.</span><span>fetch_user</span><span>(task</span><span>.</span><span>user_id)</span></div></div><div><div><span> </span><span>.await</span></div></div><div><div><span> </span><span>.</span><span>or_raise</span><span>(make_error)</span><span>?</span><span>;</span></div></div><div><div> </div></div><div><div><span> </span><span>let</span><span> result </span><span>=</span><span> </span><span>self</span><span>.</span><span>process</span><span>(user)</span></div></div><div><div><span> </span><span>.</span><span>or_raise</span><span>(make_error)</span><span>?</span><span>;</span></div></div><div><div> </div></div><div><div><span> </span><span>Ok</span><span>(result)</span></div></div><div><div><span>}</span></div></div></code></pre><div><div></div></div></figure></div> <p>Every <code>?</code> has context. When this fails at 3am, instead of the cryptic <code>serialization error</code>, you see:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>failed to execute task 7829, at src/executor.rs:45:12</span></div></div><div><div><span>|</span></div></div><div><div><span>|-&gt; failed to fetch user "John Doe", at src/executor.rs:52:10</span></div></div><div><div><span>|</span></div></div><div><div><span>|-&gt; connection refused, at src/client.rs:89:24</span></div></div></code></pre><div><div></div></div></figure></div> <hr /> <h2>Putting It Together</h2> <p>In real systems, you often need both: machine-readable errors for automated recovery, and human-readable context for debugging. The pattern: use a flat, kind-based error type (like Apache OpenDAL’s) for the structured data, and wrap it in a context-tracking mechanism for propagation.</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>// Machine-oriented: flat struct with status</span></div></div><div><div><span>pub</span><span> </span><span>struct</span><span> </span><span>StorageError</span><span> {</span></div></div><div><div><span> </span><span>pub</span><span> status</span><span>:</span><span> </span><span>ErrorStatus</span><span>,</span></div></div><div><div><span> </span><span>pub</span><span> message</span><span>:</span><span> </span><span>String</span><span>,</span></div></div><div><div><span>}</span></div></div><div><div> </div></div><div><div><span>// Human-oriented: propagate with context at each layer</span></div></div><div><div><span>pub</span><span> </span><span>async</span><span> </span><span>fn</span><span> </span><span>save_document</span><span>(doc</span><span>:</span><span> </span><span>Document</span><span>) </span><span>-&gt;</span><span> </span><span>Result</span><span>&lt;(), </span><span>Exn</span><span>&lt;</span><span>StorageError</span><span>&gt;&gt; {</span></div></div><div><div><span> </span><span>let</span><span> data </span><span>=</span><span> </span><span>serialize</span><span>(</span><span>&amp;</span><span>doc)</span></div></div><div><div><span> </span><span>.</span><span>or_raise</span><span>(</span><span>||</span><span> </span><span>StorageError</span><span>::</span><span>permanent</span><span>(</span><span>"serialization failed"</span><span>))</span><span>?</span><span>;</span></div></div><div><div> </div></div><div><div><span><span> </span></span><span>storage</span><span>.</span><span>write</span><span>(</span><span>&amp;</span><span>doc</span><span>.</span><span>path, data)</span></div></div><div><div><span> </span><span>.await</span></div></div><div><div><span> </span><span>.</span><span>or_raise</span><span>(</span><span>||</span><span> </span><span>StorageError</span><span>::</span><span>temporary</span><span>(</span><span>"write failed"</span><span>))</span><span>?</span><span>;</span></div></div><div><div> </div></div><div><div><span> </span><span>Ok</span><span>(())</span></div></div><div><div><span>}</span></div></div></code></pre><div><div></div></div></figure></div> <p>At the boundary, walk the error tree to find the structured error:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>// Extract a typed error from anywhere in the tree</span></div></div><div><div><span>fn</span><span> </span><span>find_error</span><span>&lt;</span><span>T</span><span>&gt;(exn</span><span>:</span><span> </span><span>&amp;</span><span>Exn</span><span>&lt;</span><span>impl</span><span> </span><span>Error</span><span>&gt;) </span><span>-&gt;</span><span> </span><span>Option</span><span>&lt;</span><span>&amp;</span><span>T</span><span>&gt; {</span></div></div><div><div><span> </span><span>fn</span><span> </span><span>walk</span><span>&lt;</span><span>T</span><span>&gt;(frame</span><span>:</span><span> </span><span>&amp;</span><span>Frame</span><span>) </span><span>-&gt;</span><span> </span><span>Option</span><span>&lt;</span><span>&amp;</span><span>T</span><span>&gt; {</span></div></div><div><div><span> </span><span>if</span><span> </span><span>let</span><span> </span><span>Some</span><span>(e) </span><span>=</span><span> frame</span><span>.</span><span>as_any</span><span>()</span><span>.</span><span>downcast_ref</span><span>::</span><span>&lt;</span><span>T</span><span>&gt;() {</span></div></div><div><div><span> </span><span>return</span><span> </span><span>Some</span><span>(e);</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span><span> </span></span><span>frame</span><span>.</span><span>children</span><span>()</span><span>.</span><span>iter</span><span>()</span><span>.</span><span>find_map</span><span>(walk)</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span> </span><span>walk</span><span>(exn</span><span>.</span><span>as_frame</span><span>())</span></div></div><div><div><span>}</span></div></div><div><div> </div></div><div><div><span>match</span><span> </span><span>save_document</span><span>(doc)</span><span>.await</span><span> {</span></div></div><div><div><span> </span><span>Ok</span><span>(()) </span><span>=&gt;</span><span> </span><span>Ok</span><span>(()),</span></div></div><div><div><span> </span><span>Err</span><span>(report) </span><span>=&gt;</span><span> {</span></div></div><div><div><span> </span><span>// For humans: log the full context tree</span></div></div><div><div><span> </span><span>log</span><span>::</span><span>error!</span><span>(</span><span>"{:?}"</span><span>, report);</span></div></div><div><div> </div></div><div><div><span> </span><span>// For machines: find and handle the structured error</span></div></div><div><div><span> </span><span>if</span><span> </span><span>let</span><span> </span><span>Some</span><span>(err) </span><span>=</span><span> </span><span>find_error</span><span>::</span><span>&lt;</span><span>StorageError</span><span>&gt;(</span><span>&amp;</span><span>report) {</span></div></div><div><div><span> </span><span>if</span><span> err</span><span>.</span><span>status </span><span>==</span><span> </span><span>ErrorStatus</span><span>::</span><span>Temporary</span><span> {</span></div></div><div><div><span> </span><span>return</span><span> </span><span>queue_for_retry</span><span>(report);</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span> </span><span>return</span><span> </span><span>Err</span><span>(</span><span>map_to_http_status</span><span>(err</span><span>.</span><span>kind));</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span> </span><span>Err</span><span>(</span><span>StatusCode</span><span>::</span><span>INTERNAL_SERVER_ERROR</span><span>)</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span>}</span></div></div></code></pre><div><div></div></div></figure></div> <p>You do have to walk the tree—but compare that to the Provide/Request API. Here you’re searching for a concrete type, like <code>StorageError</code>: it has named fields, it’s documented, and your IDE can autocomplete it. No guesswork, no runtime surprises—just a well-defined struct you can understand and maintain.</p> <hr /> <h2>Closing thought</h2> <p>Propagating errors is easy in Rust. Explaining them is the part we tend to postpone.</p> <p>Next time you return a <code>Result</code>, take 30 seconds to ask: “If this fails in production, what would I wish the log said?” Then make it say that.</p> <h2>Resources</h2> <ul> <li><a href="https://github.com/apache/opendal/pull/977">OpenDAL Error Design RFC</a></li> <li><a href="https://xuanwo.io/en-us/reports/2022-46/">OpenDAL’s Error Handling Practices</a></li> <li><a href="https://github.com/fast/exn">exn: Context-aware errors for Rust</a></li> <li><a href="https://greptime.com/blogs/2024-05-07-error-rust">Error Handling in Large Rust Projects (GreptimeDB)</a></li> <li><a href="https://bugenzhao.com/2024/04/24/error-handling-1/">A Guide to Error Handling that Just Works</a></li> <li><a href="https://matklad.github.io/2020/10/15/study-of-std-io-error.html">Study of std::io::Error</a></li> <li><a href="https://lpalmieri.com/posts/error-handling-rust/">Error Handling In Rust - A Deep Dive</a></li> <li><a href="https://github.com/rust-lang/rust/issues/96024">Tracking Issue for Provider API</a></li> <li><a href="https://rust-lang.github.io/wg-async/design_docs/async_stack_traces.html">Async Stack Traces Working Group</a></li> </ul>StackSafe: Taming Recursion in Rust Without Stack Overflowhttps://fast.github.io/blog/stacksafe-taming-recursion-in-rust-without-stack-overflowhttps://fast.github.io/blog/stacksafe-taming-recursion-in-rust-without-stack-overflowThu, 24 Jul 2025 00:00:00 GMT<h2>TL;DR</h2> <p>Recursive algorithms in Rust can easily cause stack overflows that crash your program:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>fn</span><span> </span><span>tree_depth</span><span>(node</span><span>:</span><span> </span><span>&amp;</span><span>TreeNode</span><span>) </span><span>-&gt;</span><span> </span><span>usize</span><span> {</span></div></div><div><div><span> </span><span>match</span><span> node {</span></div></div><div><div><span> </span><span>TreeNode</span><span>::</span><span>Leaf</span><span> </span><span>=&gt;</span><span> </span><span>1</span><span>,</span></div></div><div><div><span> </span><span>TreeNode</span><span>::</span><span>Branch</span><span>(left, right) </span><span>=&gt;</span><span> {</span></div></div><div><div><span> </span><span>1</span><span> </span><span>+</span><span> </span><span>tree_depth</span><span>(left)</span><span>.</span><span>max</span><span>(</span><span>tree_depth</span><span>(right))</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span>}</span></div></div><div><div> </div></div><div><div><span>// This panics: thread 'main' panicked at 'stack overflow'</span></div></div><div><div><span>let</span><span> deep_tree </span><span>=</span><span> </span><span>create_deep_tree</span><span>(</span><span>100000</span><span>);</span></div></div><div><div><span>println!</span><span>(</span><span>"{}"</span><span>, </span><span>tree_depth</span><span>(</span><span>&amp;</span><span>deep_tree));</span></div></div></code></pre><div><div></div></div></figure></div> <p><a href="https://github.com/fast/stacksafe">StackSafe</a> solves this by automatically growing the stack in recursive functions and data structures. Just add <code>#[stacksafe]</code> and your code works without crashes:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>use</span><span> </span><span>stacksafe</span><span>::</span><span>stacksafe;</span></div></div><div><div> </div></div><div><div><span>#[stacksafe] </span><span>// Add attribute to recursive functions</span></div></div><div><div><span>fn</span><span> </span><span>tree_depth</span><span>(node</span><span>:</span><span> </span><span>&amp;</span><span>TreeNode</span><span>) </span><span>-&gt;</span><span> </span><span>usize</span><span> {</span></div></div><div><div><span> </span><span>match</span><span> node {</span></div></div><div><div><span> </span><span>TreeNode</span><span>::</span><span>Leaf</span><span> </span><span>=&gt;</span><span> </span><span>1</span><span>,</span></div></div><div><div><span> </span><span>TreeNode</span><span>::</span><span>Branch</span><span>(left, right) </span><span>=&gt;</span><span> {</span></div></div><div><div><span> </span><span>1</span><span> </span><span>+</span><span> </span><span>tree_depth</span><span>(left)</span><span>.</span><span>max</span><span>(</span><span>tree_depth</span><span>(right))</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span>}</span></div></div><div><div> </div></div><div><div><span>// No panic, works perfectly!</span></div></div><div><div><span>let</span><span> deep_tree </span><span>=</span><span> </span><span>create_deep_tree</span><span>(</span><span>100000</span><span>);</span></div></div><div><div><span>println!</span><span>(</span><span>"{}"</span><span>, </span><span>tree_depth</span><span>(</span><span>&amp;</span><span>deep_tree));</span></div></div></code></pre><div><div></div></div></figure></div> <p><code>StackSafe</code> is being used in production by products like <a href="https://www.scopedb.io/blog/manage-observability-data-in-petabytes">ScopeDB</a>, where it helps trace and debug petabyte-scale observability data workloads.</p> <h2>The Stack Overflow Problem</h2> <p>Recursive algorithms are elegant and intuitive, but they come with a fundamental limitation: stack overflow. Consider another common scenario:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>// JSON parsing - will crash on deeply nested JSON</span></div></div><div><div><span>fn</span><span> </span><span>parse_value</span><span>(tokens</span><span>:</span><span> </span><span>&amp;mut</span><span> </span><span>TokenStream</span><span>) </span><span>-&gt;</span><span> </span><span>JsonValue</span><span> {</span></div></div><div><div><span> </span><span>match</span><span> tokens</span><span>.</span><span>peek</span><span>() {</span></div></div><div><div><span> </span><span>Token</span><span>::</span><span>LeftBrace</span><span> </span><span>=&gt;</span><span> </span><span>parse_object</span><span>(tokens), </span><span>// Recursive call</span></div></div><div><div><span> </span><span>Token</span><span>::</span><span>LeftBracket</span><span> </span><span>=&gt;</span><span> </span><span>parse_array</span><span>(tokens), </span><span>// Recursive call</span></div></div><div><div><span><span> </span></span><span>_ </span><span>=&gt;</span><span> </span><span>parse_primitive</span><span>(tokens),</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span>}</span></div></div></code></pre><div><div></div></div></figure></div> <p>A sufficiently deep structure will cause your program to crash with <code>stack overflow</code>, and there’s no clean way to predict or handle this in standard Rust.</p> <h2>Existing Solutions</h2> <h3>Manual Transformation to Iterative Code</h3> <p>The most common approach is rewriting recursive algorithms as loops with explicit stacks:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>fn</span><span> </span><span>parse_value_iterative</span><span>(tokens</span><span>:</span><span> </span><span>&amp;mut</span><span> </span><span>TokenStream</span><span>) </span><span>-&gt;</span><span> </span><span>JsonValue</span><span> {</span></div></div><div><div><span> </span><span>let</span><span> </span><span>mut</span><span> stack </span><span>=</span><span> </span><span>vec!</span><span>[</span><span>ParseState</span><span>::</span><span>ParseValue</span><span>];</span></div></div><div><div><span> </span><span>let</span><span> </span><span>mut</span><span> results </span><span>=</span><span> </span><span>Vec</span><span>::</span><span>new</span><span>();</span></div></div><div><div> </div></div><div><div><span> </span><span>while</span><span> </span><span>let</span><span> </span><span>Some</span><span>(state) </span><span>=</span><span> stack</span><span>.</span><span>pop</span><span>() {</span></div></div><div><div><span> </span><span>match</span><span> state {</span></div></div><div><div><span> </span><span>ParseState</span><span>::</span><span>ParseValue</span><span> </span><span>=&gt;</span><span> {</span></div></div><div><div><span> </span><span>match</span><span> tokens</span><span>.</span><span>peek</span><span>() {</span></div></div><div><div><span> </span><span>Token</span><span>::</span><span>LeftBrace</span><span> </span><span>=&gt;</span><span> {</span></div></div><div><div><span><span> </span></span><span>stack</span><span>.</span><span>push</span><span>(</span><span>ParseState</span><span>::</span><span>ParseObject</span><span>(</span><span>HashMap</span><span>::</span><span>new</span><span>()));</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span> </span><span>Token</span><span>::</span><span>LeftBracket</span><span> </span><span>=&gt;</span><span> {</span></div></div><div><div><span><span> </span></span><span>stack</span><span>.</span><span>push</span><span>(</span><span>ParseState</span><span>::</span><span>ParseArray</span><span>(</span><span>Vec</span><span>::</span><span>new</span><span>()));</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span><span> </span></span><span>_ </span><span>=&gt;</span><span> {</span></div></div><div><div><span><span> </span></span><span>results</span><span>.</span><span>push</span><span>(</span><span>parse_primitive</span><span>(tokens));</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span> </span><span>ParseState</span><span>::</span><span>ParseObject</span><span>(</span><span>mut</span><span> obj) </span><span>=&gt;</span><span> {</span></div></div><div><div><span> </span><span>// Complex state management for nested objects...</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span> </span><span>ParseState</span><span>::</span><span>ParseArray</span><span>(</span><span>mut</span><span> arr) </span><span>=&gt;</span><span> {</span></div></div><div><div><span> </span><span>// Complex state management for nested arrays...</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div> </div></div><div><div><span><span> </span></span><span>results</span><span>.</span><span>pop</span><span>()</span><span>.</span><span>unwrap</span><span>()</span></div></div><div><div><span>}</span></div></div></code></pre><div><div></div></div></figure></div> <p>This approach works for simple cases but becomes extremely complex or impossible when any of these conditions apply:</p> <ol> <li>The algorithm transforms data structures rather than just evaluating them (e.g., optimizing an AST)</li> <li>Multiple recursive calls need to be coordinated (e.g., tree balancing algorithms)</li> <li>The algorithm doesn’t fit the tail-recursion pattern</li> </ol> <h3>Lower-Level Crates: <code>stacker</code> and <code>recursive</code></h3> <ul> <li><a href="https://crates.io/crates/stacker">stacker</a>: Provides low-level stack growth mechanisms</li> <li><a href="https://crates.io/crates/recursive">recursive</a>: Provides macro <code>#[recursive]</code> to ease the application of <code>stacker</code></li> </ul> <p><strong>Limitations</strong>:</p> <ul> <li>You must carefully not leaving any recursive functions not annotated with <code>#[recursive]</code></li> <li>Derived traits like <code>Debug</code>, <code>Clone</code>, and <code>Drop</code> on deeply nested structures still cause stack overflow, you must manually implement all traits with stack protection:</li> </ul> <div><figure><figcaption></figcaption><pre><code><div><div><span>#[derive(</span><span>Clone</span><span>, </span><span>Debug</span><span>)]</span></div></div><div><div><span>enum</span><span> </span><span>Tree</span><span> {</span></div></div><div><div><span> </span><span>Leaf</span><span>(</span><span>i32</span><span>),</span></div></div><div><div><span> </span><span>Node</span><span>(</span><span>Box</span><span>&lt;</span><span>Tree</span><span>&gt;, </span><span>Box</span><span>&lt;</span><span>Tree</span><span>&gt;),</span></div></div><div><div><span>}</span></div></div><div><div> </div></div><div><div><span>#[recursive</span><span>::</span><span>recursive]</span></div></div><div><div><span>fn</span><span> </span><span>create_deep_tree</span><span>(depth</span><span>:</span><span> </span><span>usize</span><span>) </span><span>-&gt;</span><span> </span><span>Tree</span><span> {</span></div></div><div><div><span> </span><span>if</span><span> depth </span><span>==</span><span> </span><span>0</span><span> {</span></div></div><div><div><span> </span><span>Tree</span><span>::</span><span>Leaf</span><span>(</span><span>42</span><span>)</span></div></div><div><div><span><span> </span></span><span>} </span><span>else</span><span> {</span></div></div><div><div><span> </span><span>Tree</span><span>::</span><span>Node</span><span>(</span></div></div><div><div><span> </span><span>Box</span><span>::</span><span>new</span><span>(</span><span>create_deep_tree</span><span>(depth </span><span>-</span><span> </span><span>1</span><span>)),</span></div></div><div><div><span> </span><span>Box</span><span>::</span><span>new</span><span>(</span><span>Tree</span><span>::</span><span>Leaf</span><span>(</span><span>0</span><span>))</span></div></div><div><div><span><span> </span></span><span>)</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span>}</span></div></div><div><div> </div></div><div><div><span>let</span><span> deep_tree </span><span>=</span><span> </span><span>create_deep_tree</span><span>(</span><span>10000</span><span>);</span></div></div><div><div><span>let</span><span> cloned </span><span>=</span><span> deep_tree</span><span>.</span><span>clone</span><span>(); </span><span>// Stack overflow: derived Clone is recursive!</span></div></div><div><div><span>println!</span><span>(</span><span>"{:?}"</span><span>, cloned); </span><span>// Stack overflow: derived Debug is recursive!</span></div></div><div><div><span>// Stack overflow when `deep_tree` is dropped: derived Drop is recursive!</span></div></div></code></pre><div><div></div></div></figure></div> <h2><code>StackSafe</code>: The Complete Solution</h2> <p><code>StackSafe</code> addresses both recursive functions and recursive data structures with a simple, unified approach.</p> <h3>Recursive Functions Made Safe</h3> <p>Transform any recursive function by adding <code>#[stacksafe]</code>:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>use</span><span> </span><span>stacksafe</span><span>::</span><span>stacksafe;</span></div></div><div><div> </div></div><div><div><span>#[stacksafe]</span></div></div><div><div><span>fn</span><span> </span><span>fibonacci</span><span>(n</span><span>:</span><span> </span><span>u64</span><span>) </span><span>-&gt;</span><span> </span><span>u64</span><span> {</span></div></div><div><div><span> </span><span>match</span><span> n {</span></div></div><div><div><span> </span><span>0</span><span> </span><span>|</span><span> </span><span>1</span><span> </span><span>=&gt;</span><span> n,</span></div></div><div><div><span><span> </span></span><span>_ </span><span>=&gt;</span><span> </span><span>fibonacci</span><span>(n </span><span>-</span><span> </span><span>1</span><span>) </span><span>+</span><span> </span><span>fibonacci</span><span>(n </span><span>-</span><span> </span><span>2</span><span>),</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span>}</span></div></div><div><div> </div></div><div><div><span>#[stacksafe]</span></div></div><div><div><span>fn</span><span> </span><span>evaluate_ast</span><span>(expr</span><span>:</span><span> </span><span>&amp;</span><span>Expr</span><span>) </span><span>-&gt;</span><span> </span><span>i32</span><span> {</span></div></div><div><div><span> </span><span>match</span><span> expr {</span></div></div><div><div><span> </span><span>Expr</span><span>::</span><span>Number</span><span>(n) </span><span>=&gt;</span><span> </span><span>*</span><span>n,</span></div></div><div><div><span> </span><span>Expr</span><span>::</span><span>Add</span><span>(left, right) </span><span>=&gt;</span><span> </span><span>evaluate_ast</span><span>(left) </span><span>+</span><span> </span><span>evaluate_ast</span><span>(right),</span></div></div><div><div><span> </span><span>Expr</span><span>::</span><span>Multiply</span><span>(left, right) </span><span>=&gt;</span><span> </span><span>evaluate_ast</span><span>(left) </span><span>*</span><span> </span><span>evaluate_ast</span><span>(right),</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span>}</span></div></div></code></pre><div><div></div></div></figure></div> <h3>Recursive Data Structures Made Safe</h3> <p>Wrap recursive fields with <code>StackSafe&lt;T&gt;</code> for automatic stack-safe trait implementations:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>use</span><span> </span><span>stacksafe</span><span>::</span><span>{stacksafe, </span><span>StackSafe</span><span>};</span></div></div><div><div> </div></div><div><div><span>#[derive(</span><span>Debug</span><span>, </span><span>Clone</span><span>, </span><span>PartialEq</span><span>)] </span><span>// All traits are automatically stack-safe!</span></div></div><div><div><span>enum</span><span> </span><span>BinaryTree</span><span> {</span></div></div><div><div><span> </span><span>Leaf</span><span>(</span><span>i32</span><span>),</span></div></div><div><div><span> </span><span>Node</span><span> {</span></div></div><div><div><span><span> </span></span><span>value</span><span>:</span><span> </span><span>i32</span><span>,</span></div></div><div><div><span><span> </span></span><span>left</span><span>:</span><span> </span><span>StackSafe</span><span>&lt;</span><span>Box</span><span>&lt;</span><span>BinaryTree</span><span>&gt;&gt;,</span></div></div><div><div><span><span> </span></span><span>right</span><span>:</span><span> </span><span>StackSafe</span><span>&lt;</span><span>Box</span><span>&lt;</span><span>BinaryTree</span><span>&gt;&gt;,</span></div></div><div><div><span><span> </span></span><span>},</span></div></div><div><div><span>}</span></div></div><div><div> </div></div><div><div><span>// All operations work safely on arbitrarily deep trees:</span></div></div><div><div><span>let</span><span> deep_tree </span><span>=</span><span> </span><span>create_deep_tree</span><span>(</span><span>100000</span><span>);</span></div></div><div><div><span>let</span><span> cloned </span><span>=</span><span> deep_tree</span><span>.</span><span>clone</span><span>(); </span><span>// No stack overflow</span></div></div><div><div><span>let</span><span> are_equal </span><span>=</span><span> deep_tree </span><span>==</span><span> cloned; </span><span>// No stack overflow</span></div></div><div><div><span>println!</span><span>(</span><span>"{:?}"</span><span>, deep_tree); </span><span>// No stack overflow</span></div></div><div><div><span>drop</span><span>(deep_tree); </span><span>// No stack overflow</span></div></div></code></pre><div><div></div></div></figure></div> <h3>Debug-Time Safety Checks</h3> <p><code>StackSafe&lt;T&gt;</code> exposes the wrapped value through Rust’s <code>Deref</code> trait, allowing transparent access to the underlying data. What’s more, it includes an important safety mechanism: in debug builds, it checks whether the current function is properly annotated with <code>#[stacksafe]</code> whenever you access the wrapped value.</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>fn</span><span> </span><span>unsafe_tree_sum</span><span>(tree</span><span>:</span><span> </span><span>&amp;</span><span>BinaryTree</span><span>) </span><span>-&gt;</span><span> </span><span>i32</span><span> {</span></div></div><div><div><span> </span><span>match</span><span> tree {</span></div></div><div><div><span> </span><span>BinaryTree</span><span>::</span><span>Leaf</span><span>(value) </span><span>=&gt;</span><span> </span><span>*</span><span>value,</span></div></div><div><div><span> </span><span>BinaryTree</span><span>::</span><span>Node</span><span> { value, left, right } </span><span>=&gt;</span><span> {</span></div></div><div><div><span> </span><span>// This will panic in debug builds:</span></div></div><div><div><span> </span><span>// "StackSafe should only be accessed within a stack-safe context"</span></div></div><div><div><span> </span><span>// Missing #[stacksafe] annotation!</span></div></div><div><div><span><span> </span></span><span>value </span><span>+</span><span> </span><span>unsafe_tree_sum</span><span>(left) </span><span>+</span><span> </span><span>unsafe_tree_sum</span><span>(right)</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span>}</span></div></div><div><div> </div></div><div><div><span>#[stacksafe]</span></div></div><div><div><span>fn</span><span> </span><span>safe_tree_sum</span><span>(tree</span><span>:</span><span> </span><span>&amp;</span><span>BinaryTree</span><span>) </span><span>-&gt;</span><span> </span><span>i32</span><span> {</span></div></div><div><div><span> </span><span>match</span><span> tree {</span></div></div><div><div><span> </span><span>BinaryTree</span><span>::</span><span>Leaf</span><span>(value) </span><span>=&gt;</span><span> </span><span>*</span><span>value,</span></div></div><div><div><span> </span><span>BinaryTree</span><span>::</span><span>Node</span><span> { value, left, right } </span><span>=&gt;</span><span> {</span></div></div><div><div><span> </span><span>// Works fine - properly protected</span></div></div><div><div><span><span> </span></span><span>value </span><span>+</span><span> </span><span>safe_tree_sum</span><span>(left) </span><span>+</span><span> </span><span>safe_tree_sum</span><span>(right)</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span>}</span></div></div></code></pre><div><div></div></div></figure></div> <p>This debug-time check helps you identify all potential stack overflow locations during development, rather than discovering them in production when they cause crashes.</p> <h3>Adopting <code>StackSafe</code> in Existing Code</h3> <p>Converting existing recursive code is straightforward. Here’s a real-world example:</p> <p><strong>Before</strong> (crash-prone):</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>#[derive(</span><span>Debug</span><span>, </span><span>Clone</span><span>)]</span></div></div><div><div><span>pub</span><span> </span><span>enum</span><span> </span><span>JsonValue</span><span> {</span></div></div><div><div><span> </span><span>Object</span><span>(</span><span>HashMap</span><span>&lt;</span><span>String</span><span>, </span><span>JsonValue</span><span>&gt;),</span></div></div><div><div><span> </span><span>Array</span><span>(</span><span>Vec</span><span>&lt;</span><span>JsonValue</span><span>&gt;),</span></div></div><div><div><span> </span><span>String</span><span>(</span><span>String</span><span>),</span></div></div><div><div><span> </span><span>Number</span><span>(</span><span>f64</span><span>),</span></div></div><div><div><span>}</span></div></div><div><div> </div></div><div><div><span>fn</span><span> </span><span>parse_json</span><span>(input</span><span>:</span><span> </span><span>&amp;</span><span>str</span><span>) </span><span>-&gt;</span><span> </span><span>JsonValue</span><span> {</span></div></div><div><div><span> </span><span>parse_value</span><span>(</span><span>&amp;mut</span><span> </span><span>tokenize</span><span>(input))</span></div></div><div><div><span>}</span></div></div><div><div> </div></div><div><div><span>fn</span><span> </span><span>stringify</span><span>(value</span><span>:</span><span> </span><span>&amp;</span><span>JsonValue</span><span>) </span><span>-&gt;</span><span> </span><span>String</span><span> {</span></div></div><div><div><span> </span><span>match</span><span> value {</span></div></div><div><div><span> </span><span>JsonValue</span><span>::</span><span>Object</span><span>(map) </span><span>=&gt;</span><span> {</span></div></div><div><div><span> </span><span>let</span><span> items</span><span>:</span><span> </span><span>Vec</span><span>&lt;_&gt; </span><span>=</span><span> map</span><span>.</span><span>iter</span><span>()</span></div></div><div><div><span> </span><span>.</span><span>map</span><span>(</span><span>|</span><span>(k, v)</span><span>|</span><span> </span><span>format!</span><span>(</span><span>"</span><span>\"</span><span>{}</span><span>\"</span><span>:{}"</span><span>, k, </span><span>stringify</span><span>(v)))</span></div></div><div><div><span> </span><span>.</span><span>collect</span><span>();</span></div></div><div><div><span> </span><span>format!</span><span>(</span><span>"{{{}}}"</span><span>, items</span><span>.</span><span>join</span><span>(</span><span>","</span><span>))</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span> </span><span>// ...other cases</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span>}</span></div></div></code></pre><div><div></div></div></figure></div> <p><strong>After</strong> (stack-safe):</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>use</span><span> </span><span>stacksafe</span><span>::</span><span>{stacksafe, </span><span>StackSafe</span><span>};</span></div></div><div><div> </div></div><div><div><span>#[derive(</span><span>Debug</span><span>, </span><span>Clone</span><span>)]</span></div></div><div><div><span>pub</span><span> </span><span>enum</span><span> </span><span>JsonValue</span><span> {</span></div></div><div><div><span> </span><span>Object</span><span>(</span><span>HashMap</span><span>&lt;</span><span>String</span><span>, </span><span>StackSafe</span><span>&lt;</span><span>JsonValue</span><span>&gt;&gt;), </span><span>// Wrap recursive fields</span></div></div><div><div><span> </span><span>Array</span><span>(</span><span>Vec</span><span>&lt;</span><span>StackSafe</span><span>&lt;</span><span>JsonValue</span><span>&gt;&gt;), </span><span>// Wrap recursive fields</span></div></div><div><div><span> </span><span>String</span><span>(</span><span>String</span><span>),</span></div></div><div><div><span> </span><span>Number</span><span>(</span><span>f64</span><span>),</span></div></div><div><div><span>}</span></div></div><div><div> </div></div><div><div><span>fn</span><span> </span><span>parse_json</span><span>(input</span><span>:</span><span> </span><span>&amp;</span><span>str</span><span>) </span><span>-&gt;</span><span> </span><span>JsonValue</span><span> {</span></div></div><div><div><span> </span><span>parse_value</span><span>(</span><span>&amp;mut</span><span> </span><span>tokenize</span><span>(input))</span></div></div><div><div><span>}</span></div></div><div><div> </div></div><div><div><span>#[stacksafe] </span><span>// Add attribute to recursive functions</span></div></div><div><div><span>fn</span><span> </span><span>stringify</span><span>(value</span><span>:</span><span> </span><span>&amp;</span><span>JsonValue</span><span>) </span><span>-&gt;</span><span> </span><span>String</span><span> {</span></div></div><div><div><span> </span><span>match</span><span> value {</span></div></div><div><div><span> </span><span>JsonValue</span><span>::</span><span>Object</span><span>(map) </span><span>=&gt;</span><span> {</span></div></div><div><div><span> </span><span>let</span><span> items</span><span>:</span><span> </span><span>Vec</span><span>&lt;_&gt; </span><span>=</span><span> map</span><span>.</span><span>iter</span><span>()</span></div></div><div><div><span> </span><span>.</span><span>map</span><span>(</span><span>|</span><span>(k, v)</span><span>|</span><span> </span><span>format!</span><span>(</span><span>"</span><span>\"</span><span>{}</span><span>\"</span><span>:{}"</span><span>, k, </span><span>stringify</span><span>(v)))</span></div></div><div><div><span> </span><span>.</span><span>collect</span><span>();</span></div></div><div><div><span> </span><span>format!</span><span>(</span><span>"{{{}}}"</span><span>, items</span><span>.</span><span>join</span><span>(</span><span>","</span><span>))</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span> </span><span>// ...other cases</span></div></div><div><div><span><span> </span></span><span>}</span></div></div><div><div><span>}</span></div></div></code></pre><div><div></div></div></figure></div> <p>The changes are minimal, but the result is a completely stack-safe JSON processor that can handle arbitrarily deep nesting.</p> <h2>Conclusion</h2> <p><code>StackSafe</code> eliminates the fundamental tension between writing elegant recursive code and avoiding stack overflows. By handling both recursive functions and data structures comprehensively, it lets you focus on algorithm logic rather than stack management.</p> <ul> <li><strong>Simple adoption</strong>: Add <code>#[stacksafe]</code> to functions and <code>StackSafe&lt;T&gt;</code> to recursive fields</li> <li><strong>Complete protection</strong>: Covers both function calls and trait operations</li> </ul> <h2>Resources</h2> <ul> <li>Crate: <a href="https://crates.io/crates/stacksafe">https://crates.io/crates/stacksafe</a></li> <li>Documents: <a href="https://docs.rs/stacksafe">https://docs.rs/stacksafe</a></li> <li>GitHub: <a href="https://github.com/fast/stacksafe">https://github.com/fast/stacksafe</a></li> </ul>Fastrace: A Modern Approach to Distributed Tracing in Rusthttps://fast.github.io/blog/fastrace-a-modern-approach-to-distributed-tracing-in-rusthttps://fast.github.io/blog/fastrace-a-modern-approach-to-distributed-tracing-in-rustSat, 22 Mar 2025 00:00:00 GMT<h2>TL;DR</h2> <p>Distributed tracing is critical for understanding modern microservice architectures. While <code>tokio-rs/tracing</code> is widely used in Rust, it comes with significant challenges: ecosystem fragmentation, complex configuration, and high overhead.</p> <p><a href="https://github.com/fast/fastrace">Fastrace</a> provides a production-ready solution with seamless ecosystem integration, out-of-box OpenTelemetry support, and a more straightforward API that works naturally with the existing logging infrastructure.</p> <p>The following example demonstrates how to trace functions with <code>fastrace</code>:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>#[fastrace</span><span>::</span><span>trace]</span></div></div><div><div><span>pub</span><span> </span><span>fn</span><span> </span><span>send_request</span><span>(req</span><span>:</span><span> </span><span>HttpRequest</span><span>) </span><span>-&gt;</span><span> </span><span>Result</span><span>&lt;(), </span><span>Error</span><span>&gt; {</span></div></div><div><div><span> </span><span>// ...</span></div></div><div><div><span>}</span></div></div></code></pre><div><div></div></div></figure></div> <p>It’s being used in production by products like <a href="https://www.scopedb.io/blog/manage-observability-data-in-petabytes">ScopeDB</a>, where it helps trace and debug petabyte-scale observability data workloads.</p> <figure><img src="/_astro/scopedb-traces.BnaDCPay_2ou32s.webp" alt="Distributed Tracing Visualization" width="1618" height="1274" loading="lazy" /><figcaption>Distributed Tracing Visualization</figcaption></figure> <h2>Why Distributed Tracing Matters</h2> <p>Understanding what is happening inside your applications has never been more challenging in today’s microservices and distributed systems. A user request might touch dozens of services before completion, and traditional logging approaches quickly fall short.</p> <p>Consider a typical request flow:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>User → API Gateway → Auth Service → User Service → Database</span></div></div></code></pre><div><div></div></div></figure></div> <p>When an exception is thrown, or the app performs poorly, where exactly is the root cause? Individual service logs only show fragments of the whole trace, lacking the crucial context of how the request flows through your entire system.</p> <p>This is where distributed tracing becomes essential. Tracing creates a connected view of your request’s flow across service boundaries, making it possible to:</p> <ul> <li>Identify performance bottlenecks across services</li> <li>Debug complex interactions between components</li> <li>Understand dependencies and service relationships</li> <li>Analyze latency distributions and outliers</li> <li>Correlate logs and metrics with request context</li> </ul> <h2>A Famous Approach: <code>tokio-rs/tracing</code></h2> <p>For some Rust developers, <code>tokio-rs/tracing</code> is the go-to solution for application instrumentation. Let’s look at how a typical implementation works:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>fn</span><span> </span><span>main</span><span>() {</span></div></div><div><div><span> </span><span>// Initialize the tracing subscriber</span></div></div><div><div><span> </span><span>// Complex configuration code omitted...</span></div></div><div><div> </div></div><div><div><span> </span><span>// Create a span and record some data</span></div></div><div><div><span> </span><span>let</span><span> span </span><span>=</span><span> </span><span>tracing</span><span>::</span><span>info_span!</span><span>(</span><span>"processing_request"</span><span>,</span></div></div><div><div><span><span> </span></span><span>user_id </span><span>=</span><span> </span><span>42</span><span>,</span></div></div><div><div><span><span> </span></span><span>request_id </span><span>=</span><span> </span><span>"abcd1234"</span></div></div><div><div><span><span> </span></span><span>);</span></div></div><div><div> </div></div><div><div><span> </span><span>// Enter the span (activates it for the current execution context)</span></div></div><div><div><span> </span><span>let</span><span> _guard </span><span>=</span><span> span</span><span>.</span><span>enter</span><span>();</span></div></div><div><div> </div></div><div><div><span> </span><span>// Log within the span context</span></div></div><div><div><span> </span><span>tracing</span><span>::</span><span>info!</span><span>(</span><span>"Starting request processing"</span><span>);</span></div></div><div><div> </div></div><div><div><span> </span><span>process_data</span><span>();</span></div></div><div><div> </div></div><div><div><span> </span><span>tracing</span><span>::</span><span>info!</span><span>(</span><span>"Finished processing request"</span><span>);</span></div></div><div><div><span>}</span></div></div></code></pre><div><div></div></div></figure></div> <p>For instrumenting functions, <code>tokio-rs/tracing</code> provides attribute macros:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>#[tracing</span><span>::</span><span>instrument(skip(password), fields(user_id </span><span>=</span><span> user</span><span>.</span><span>id))]</span></div></div><div><div><span>async</span><span> </span><span>fn</span><span> </span><span>authenticate</span><span>(user</span><span>:</span><span> </span><span>&amp;</span><span>User</span><span>, password</span><span>:</span><span> </span><span>&amp;</span><span>str</span><span>) </span><span>-&gt;</span><span> </span><span>Result</span><span>&lt;</span><span>AuthToken</span><span>, </span><span>AuthError</span><span>&gt; {</span></div></div><div><div><span> </span><span>tracing</span><span>::</span><span>info!</span><span>(</span><span>"Authenticating user {}"</span><span>, user</span><span>.</span><span>id);</span></div></div><div><div><span> </span><span>// ...more code...</span></div></div><div><div><span>}</span></div></div></code></pre><div><div></div></div></figure></div> <h2>The Challenges with <code>tokio-rs/tracing</code></h2> <p>According to our previous user experience, <code>tokio-rs/tracing</code> comes with several significant challenges:</p> <h3>1. Ecosystem Fragmentation</h3> <p>By introducing its own logging macros, <code>tokio-rs/tracing</code> creates a division with code using the standard <code>log</code> crate:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>// Using log crate</span></div></div><div><div><span>log</span><span>::</span><span>info!</span><span>(</span><span>"Starting operation"</span><span>);</span></div></div><div><div> </div></div><div><div><span>// Using tracing crate (different syntax)</span></div></div><div><div><span>tracing</span><span>::</span><span>info!</span><span>(</span><span>"Starting operation"</span><span>);</span></div></div></code></pre><div><div></div></div></figure></div> <p>This fragmentation is particularly problematic for library authors. When creating a library, authors face a difficult choice:</p> <ol> <li>Use the <code>log</code> crate for compatibility with the broader ecosystem</li> <li>Use <code>tokio-rs/tracing</code> for better observability features</li> </ol> <p>Many libraries choose the first option for simplicity, but miss out on the benefits of tracing.</p> <p>While <code>tokio-rs/tracing</code> does provide a feature flag ‘log’ that allows emitting log records to the <code>log</code> crate when using <code>tokio-rs/tracing</code>’s macros, library authors must manually enable this feature flag to ensure all users properly receive log records regardless of which logging framework they use. This creates additional configuration complexity for library maintainers.</p> <p>Furthermore, applications using <code>tokio-rs/tracing</code> must additionally install and configure the <code>tracing-log</code> bridge to properly receive log records from libraries that use the <code>log</code> crate. This creates a bidirectional compatibility problem requiring explicit configuration:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span># Library's Cargo.toml</span></div></div><div><div><span>[</span><span>dependencies</span><span>]</span></div></div><div><div><span>tracing = { version = </span><span>"0.1"</span><span>, features = [</span><span>"log"</span><span>] } </span><span># Emit log records for log compatibility</span></div></div><div><div> </div></div><div><div><span># Application's Cargo.toml</span></div></div><div><div><span>[</span><span>dependencies</span><span>]</span></div></div><div><div><span>tracing = </span><span>"0.1"</span></div></div><div><div><span>tracing-log = </span><span>"0.2"</span><span> </span><span># Listen to log records for log compatibility</span></div></div></code></pre><div><div></div></div></figure></div> <h3>2. Performance Impact for Libraries</h3> <p>Library authors are particularly sensitive to performance overhead, as their code may be called in tight loops or performance-critical paths. <code>tokio-rs/tracing</code>’s overhead can be substantial when instrumented, which creates a dilemma:</p> <ol> <li> <p>Always instrument tracing (and impose overhead on all users)</p> </li> <li> <p>Don’t instrument at all (and lose observability)</p> </li> <li> <p>Create an additional feature flag system (increasing maintenance burden)</p> <p>Here is a common pattern in libraries using <code>tokio-rs/tracing</code>:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>#[cfg_attr(feature </span><span>=</span><span> </span><span>"tracing"</span><span>, tracing</span><span>::</span><span>instrument(skip(password), fields(user_id </span><span>=</span><span> user</span><span>.</span><span>id)))]</span></div></div><div><div><span>async</span><span> </span><span>fn</span><span> </span><span>authenticate</span><span>(user</span><span>:</span><span> </span><span>&amp;</span><span>User</span><span>, password</span><span>:</span><span> </span><span>&amp;</span><span>str</span><span>) </span><span>-&gt;</span><span> </span><span>Result</span><span>&lt;</span><span>AuthToken</span><span>, </span><span>AuthError</span><span>&gt; {</span></div></div><div><div><span> </span><span>// ...more code...</span></div></div><div><div><span>}</span></div></div></code></pre><div><div></div></div></figure></div> <p>Different libraries may define feature flags with subtly different names, making it hard for the final application to configure all of them.</p> </li> </ol> <p>With <code>tokio-rs/tracing</code>, there’s no clean way to have tracing zero-cost disabled. This makes library authors reluctant to add instrumentation to performance-sensitive code paths.</p> <h3>3. No Context Propagation</h3> <p>Distributed tracing requires propagating context across service boundaries, but <code>tokio-rs/tracing</code> leaves this largely as an exercise for the developer. For example, this is <a href="https://github.com/hyperium/tonic/blob/master/examples/src/tracing/server.rs">tonic’s official example</a> demonstrating how to trace a gRPC service:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>Server</span><span>::</span><span>builder</span><span>()</span></div></div><div><div><span> </span><span>.</span><span>trace_fn</span><span>(</span><span>|</span><span>_</span><span>|</span><span> </span><span>tracing</span><span>::</span><span>info_span!</span><span>(</span><span>"grpc_server"</span><span>))</span></div></div><div><div><span> </span><span>.</span><span>add_service</span><span>(</span><span>MyServiceServer</span><span>::</span><span>new</span><span>(</span><span>MyService</span><span>::</span><span>default</span><span>()))</span></div></div><div><div><span> </span><span>.</span><span>serve</span><span>(addr)</span></div></div><div><div><span> </span><span>.await?</span><span>;</span></div></div></code></pre><div><div></div></div></figure></div> <p>The above example only creates a basic span but doesn’t extract tracing context from the incoming request.</p> <p>The consequences of missing context propagation are severe in distributed systems. When a trace disconnects due to missing context:</p> <ul> <li> <p>Instead of seeing a complete flow of a request like:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>Trace #1: Frontend → API Gateway → User Service → Database → Response</span></div></div></code></pre><div><div></div></div></figure></div> </li> <li> <p>You’ll see disconnected fragments from a request:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>Trace #1: Frontend → API Gateway</span></div></div><div><div><span>Trace #2: User Service → Database</span></div></div><div><div><span>Trace #3: API Gateway → Response</span></div></div></code></pre><div><div></div></div></figure></div> </li> <li> <p>Even worse, when multiple requests are interleaved, the traces become a chaotic mess:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>Trace #1: Frontend → API Gateway</span></div></div><div><div><span>Trace #2: Frontend → API Gateway</span></div></div><div><div><span>Trace #3: Frontend → API Gateway</span></div></div><div><div><span>Trace #4: User Service → Database</span></div></div><div><div><span>Trace #6: API Gateway → Response</span></div></div><div><div><span>Trace #5: User Service → Database</span></div></div></code></pre><div><div></div></div></figure></div> </li> </ul> <p>This fragmentation makes it extremely difficult to follow request flows, isolate performance issues, or understand causal relationships between services.</p> <h2>Introducing <code>fastrace</code>: A Fast and Complete Solution</h2> <h3>1. Zero-cost Abstraction</h3> <p><code>fastrace</code> is designed with real zero-cost abstraction. When disabled, instrumentations are completely omitted from compilation, resulting in no runtime overhead. This makes it ideal for libraries concerned about performance.</p> <h3>2. Ecosystem Compatibility</h3> <p><code>fastrace</code> focuses exclusively on distributed tracing. Through its composable design, it integrates seamlessly with the existing Rust ecosystem, including compatibility with the standard <code>log</code> crate. This architectural approach allows libraries to implement comprehensive tracing while preserving their users’ freedom to use their preferred logging setup.</p> <h3>3. Simplicity First</h3> <p>The API is designed to be intuitive and require minimal boilerplate, focusing on the most common use cases while still providing extensibility when needed.</p> <h3>4. Insanely Fast</h3> <figure><img src="/_astro/trace-100-spans.DyxTLW8f_Z23JryK.webp" alt="Fastrace Performance" width="1772" height="842" loading="lazy" /><figcaption>Fastrace Performance</figcaption></figure> <p><code>fastrace</code> is designed for high-performance applications. It can handle massive amounts of spans with minimal impact on CPU and memory usage.</p> <h3>5. Ergonomic for both Applications and Libraries</h3> <p>Libraries can use <code>fastrace</code> without imposing performance penalties when not needed:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>#[fastrace</span><span>::</span><span>trace] </span><span>// Zero-cost when the application doesn't enable the 'enable' feature</span></div></div><div><div><span>pub</span><span> </span><span>fn</span><span> </span><span>process_data</span><span>(data</span><span>:</span><span> </span><span>&amp;</span><span>[</span><span>u8</span><span>]) </span><span>-&gt;</span><span> </span><span>Result</span><span>&lt;</span><span>Vec</span><span>&lt;</span><span>u8</span><span>&gt;, </span><span>Error</span><span>&gt; {</span></div></div><div><div><span> </span><span>// Library uses standard log crate</span></div></div><div><div><span> </span><span>log</span><span>::</span><span>debug!</span><span>(</span><span>"Processing {} bytes of data"</span><span>, data</span><span>.</span><span>len</span><span>());</span></div></div><div><div> </div></div><div><div><span> </span><span>// ...more code...</span></div></div><div><div><span>}</span></div></div></code></pre><div><div></div></div></figure></div> <p>The key point here is that libraries should include <code>fastrace</code> without enabling any features:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>[</span><span>dependencies</span><span>]</span></div></div><div><div><span>fastrace = </span><span>"0.7"</span><span> </span><span># No 'enable' feature</span></div></div></code></pre><div><div></div></div></figure></div> <p>When an application uses this library and doesn’t enable the ‘enable’ feature of <code>fastrace</code>:</p> <ul> <li>All tracing code is completely optimized away at compile time</li> <li>Zero runtime overhead is added to the library</li> <li>No impact on performance-critical code paths</li> </ul> <p>When the application does enable tracing via the ‘enable’ feature:</p> <ul> <li>Instrumentation in the dedicated library becomes active</li> <li>Spans are collected and reported</li> <li>The application gets full visibility into library behavior</li> </ul> <p>This is a significant advantage over other tracing solutions that either always impose overhead or require libraries to implement complex feature-flag systems.</p> <h3>6. Seamless Context Propagation</h3> <p><code>fastrace</code> provides companion crates for popular frameworks that handle context propagation automatically:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>// For HTTP clients with reqwest</span></div></div><div><div><span>let</span><span> response </span><span>=</span><span> client</span><span>.</span><span>get</span><span>(</span><span>&amp;</span><span>format!</span><span>(</span><span>"https://user-service/users/{}"</span><span>, user_id))</span></div></div><div><div><span> </span><span>.</span><span>headers</span><span>(</span><span>fastrace_reqwest</span><span>::</span><span>traceparent_headers</span><span>()) </span><span>// Automatically inject trace context</span></div></div><div><div><span> </span><span>.</span><span>send</span><span>()</span></div></div><div><div><span> </span><span>.await?</span><span>;</span></div></div><div><div> </div></div><div><div><span>// For gRPC servers with tonic</span></div></div><div><div><span>Server</span><span>::</span><span>builder</span><span>()</span></div></div><div><div><span> </span><span>.</span><span>layer</span><span>(</span><span>fastrace_tonic</span><span>::</span><span>FastraceServerLayer</span><span>) </span><span>// Automatically extracts context from incoming requests</span></div></div><div><div><span> </span><span>.</span><span>add_service</span><span>(</span><span>MyServiceServer</span><span>::</span><span>new</span><span>(</span><span>MyService</span><span>::</span><span>default</span><span>()))</span></div></div><div><div><span> </span><span>.</span><span>serve</span><span>(addr);</span></div></div><div><div> </div></div><div><div><span>// For gRPC clients</span></div></div><div><div><span>let</span><span> channel </span><span>=</span><span> </span><span>ServiceBuilder</span><span>::</span><span>new</span><span>()</span></div></div><div><div><span> </span><span>.</span><span>layer</span><span>(</span><span>fastrace_tonic</span><span>::</span><span>FastraceClientLayer</span><span>) </span><span>// Automatically injects context into outgoing requests</span></div></div><div><div><span> </span><span>.</span><span>service</span><span>(channel);</span></div></div><div><div> </div></div><div><div><span>// For data access with Apache OpenDAL</span></div></div><div><div><span>let</span><span> op </span><span>=</span><span> </span><span>Operator</span><span>::</span><span>new</span><span>(</span><span>services</span><span>::</span><span>Memory</span><span>::</span><span>default</span><span>())</span><span>?</span></div></div><div><div><span> </span><span>.</span><span>layer</span><span>(</span><span>opendal</span><span>::</span><span>layers</span><span>::</span><span>FastraceLayer</span><span>) </span><span>// Automatically traces all data operations</span></div></div><div><div><span> </span><span>.</span><span>finish</span><span>();</span></div></div><div><div><span>op</span><span>.</span><span>write</span><span>(</span><span>"test"</span><span>, </span><span>"0"</span><span>.</span><span>repeat</span><span>(</span><span>16</span><span> </span><span>*</span><span> </span><span>1024</span><span> </span><span>*</span><span> </span><span>1024</span><span>)</span><span>.</span><span>into_bytes</span><span>())</span></div></div><div><div><span> </span><span>.await?</span><span>;</span></div></div></code></pre><div><div></div></div></figure></div> <p>This provides out-of-box distributed tracing without manual context handling.</p> <h2>The Complete Solution: <code>fastrace</code> + <code>log</code> + <code>logforth</code></h2> <p><code>fastrace</code> deliberately focuses on doing one thing well: tracing. Through its composable design and the Rust’s great ecosystem, a powerful combination emerges:</p> <ul> <li><strong>log</strong>: The standard Rust logging facade</li> <li><strong>logforth</strong>: A flexible logging implementation with industrial-ready features</li> <li><strong>fastrace</strong>: High-performance tracing with distributed context propagation</li> </ul> <p>This integration allows automatically associating your logs with trace spans, providing correlation without requiring using different logging macros:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>log</span><span>::</span><span>info!</span><span>(</span><span>"Processing started"</span><span>);</span></div></div><div><div> </div></div><div><div><span>// Later, in your logging infrastructure, you can see which trace and span</span></div></div><div><div><span>// each log entry belongs to.</span></div></div></code></pre><div><div></div></div></figure></div> <p>To illustrate the simplicity of this approach, here’s a streamlined example of building a microservice with complete observability:</p> <div><figure><figcaption></figcaption><pre><code><div><div><span>#[poem</span><span>::</span><span>handler]</span></div></div><div><div><span>#[fastrace</span><span>::</span><span>trace] </span><span>// Automatically creates and manages spans</span></div></div><div><div><span>async</span><span> </span><span>fn</span><span> </span><span>get_user</span><span>(</span><span>Path</span><span>(user_id)</span><span>:</span><span> </span><span>Path</span><span>&lt;</span><span>String</span><span>&gt;) </span><span>-&gt;</span><span> </span><span>Json</span><span>&lt;</span><span>User</span><span>&gt; {</span></div></div><div><div><span> </span><span>// Standard log calls are automatically associated with the current span</span></div></div><div><div><span> </span><span>log</span><span>::</span><span>info!</span><span>(</span><span>"Fetching user {}"</span><span>, user_id);</span></div></div><div><div> </div></div><div><div><span> </span><span>let</span><span> user_details </span><span>=</span><span> </span><span>fetch_user_details</span><span>(</span><span>&amp;</span><span>user_id)</span><span>.await</span><span>;</span></div></div><div><div> </div></div><div><div><span> </span><span>Json</span><span>(</span><span>User</span><span> {</span></div></div><div><div><span><span> </span></span><span>id</span><span>:</span><span> user_id,</span></div></div><div><div><span><span> </span></span><span>name</span><span>:</span><span> user_details</span><span>.</span><span>name,</span></div></div><div><div><span><span> </span></span><span>email</span><span>:</span><span> user_details</span><span>.</span><span>email,</span></div></div><div><div><span><span> </span></span><span>})</span></div></div><div><div><span>}</span></div></div><div><div> </div></div><div><div><span>#[fastrace</span><span>::</span><span>trace]</span></div></div><div><div><span>async</span><span> </span><span>fn</span><span> </span><span>fetch_user_details</span><span>(user_id</span><span>:</span><span> </span><span>&amp;</span><span>str</span><span>) </span><span>-&gt;</span><span> </span><span>UserDetails</span><span> {</span></div></div><div><div><span> </span><span>let</span><span> client </span><span>=</span><span> </span><span>reqwest</span><span>::</span><span>Client</span><span>::</span><span>new</span><span>();</span></div></div><div><div> </div></div><div><div><span> </span><span>let</span><span> response </span><span>=</span><span> client</span><span>.</span><span>get</span><span>(</span><span>&amp;</span><span>format!</span><span>(</span><span>"https://user-details-service/users/{}"</span><span>, user_id))</span></div></div><div><div><span> </span><span>.</span><span>headers</span><span>(</span><span>fastrace_reqwest</span><span>::</span><span>traceparent_headers</span><span>()) </span><span>// Automatic trace context propagation</span></div></div><div><div><span> </span><span>.</span><span>send</span><span>()</span></div></div><div><div><span> </span><span>.await</span></div></div><div><div><span> </span><span>.</span><span>expect</span><span>(</span><span>"Request failed"</span><span>);</span></div></div><div><div> </div></div><div><div><span><span> </span></span><span>response</span><span>.</span><span>json</span><span>::</span><span>&lt;</span><span>UserDetails</span><span>&gt;()</span><span>.await.</span><span>expect</span><span>(</span><span>"Failed to parse JSON"</span><span>)</span></div></div><div><div><span>}</span></div></div><div><div> </div></div><div><div><span>#[tokio</span><span>::</span><span>main]</span></div></div><div><div><span>async</span><span> </span><span>fn</span><span> </span><span>main</span><span>() {</span></div></div><div><div><span> </span><span>// Configure logging and tracing</span></div></div><div><div><span> </span><span>setup_observability</span><span>(</span><span>"user-service"</span><span>);</span></div></div><div><div> </div></div><div><div><span> </span><span>let</span><span> app </span><span>=</span><span> </span><span>poem</span><span>::</span><span>Route</span><span>::</span><span>new</span><span>()</span></div></div><div><div><span> </span><span>.</span><span>at</span><span>(</span><span>"/users/:id"</span><span>, </span><span>poem</span><span>::</span><span>get</span><span>(get_user))</span></div></div><div><div><span> </span><span>.</span><span>with</span><span>(</span><span>fastrace_poem</span><span>::</span><span>FastraceMiddleware</span><span>); </span><span>// Automatic trace context extraction</span></div></div><div><div> </div></div><div><div><span> </span><span>poem</span><span>::</span><span>Server</span><span>::</span><span>new</span><span>(</span><span>poem</span><span>::</span><span>listener</span><span>::</span><span>TcpListener</span><span>::</span><span>bind</span><span>(</span><span>"0.0.0.0:3000"</span><span>))</span></div></div><div><div><span> </span><span>.</span><span>run</span><span>(app)</span></div></div><div><div><span> </span><span>.await</span></div></div><div><div><span> </span><span>.</span><span>unwrap</span><span>();</span></div></div><div><div> </div></div><div><div><span> </span><span>fastrace</span><span>::</span><span>flush</span><span>();</span></div></div><div><div><span>}</span></div></div><div><div> </div></div><div><div><span>fn</span><span> </span><span>setup_observability</span><span>(service_name</span><span>:</span><span> </span><span>&amp;</span><span>str</span><span>) {</span></div></div><div><div><span> </span><span>// Setup logging with logforth</span></div></div><div><div><span> </span><span>logforth</span><span>::</span><span>stderr</span><span>()</span></div></div><div><div><span> </span><span>.</span><span>dispatch</span><span>(</span><span>|</span><span>d</span><span>|</span><span> {</span></div></div><div><div><span><span> </span></span><span>d</span><span>.</span><span>filter</span><span>(</span><span>log</span><span>::</span><span>LevelFilter</span><span>::</span><span>Info</span><span>)</span></div></div><div><div><span> </span><span>// Attaches trace id to logs</span></div></div><div><div><span> </span><span>.</span><span>diagnostic</span><span>(</span><span>logforth</span><span>::</span><span>diagnostic</span><span>::</span><span>FastraceDiagnostic</span><span>::</span><span>default</span><span>())</span></div></div><div><div><span> </span><span>// Attaches logs to spans</span></div></div><div><div><span> </span><span>.</span><span>append</span><span>(</span><span>logforth</span><span>::</span><span>append</span><span>::</span><span>FastraceEvent</span><span>::</span><span>default</span><span>())</span></div></div><div><div><span><span> </span></span><span>})</span></div></div><div><div><span> </span><span>.</span><span>apply</span><span>();</span></div></div><div><div> </div></div><div><div><span> </span><span>// Setup tracing with fastrace</span></div></div><div><div><span> </span><span>fastrace</span><span>::</span><span>set_reporter</span><span>(</span></div></div><div><div><span> </span><span>fastrace_jaeger</span><span>::</span><span>JaegerReporter</span><span>::</span><span>new</span><span>(</span><span>"127.0.0.1:6831"</span><span>.</span><span>parse</span><span>()</span><span>.</span><span>unwrap</span><span>(), service_name)</span><span>.</span><span>unwrap</span><span>(),</span></div></div><div><div><span> </span><span>fastrace</span><span>::</span><span>collector</span><span>::</span><span>Config</span><span>::</span><span>default</span><span>()</span></div></div><div><div><span><span> </span></span><span>);</span></div></div><div><div><span>}</span></div></div></code></pre><div><div></div></div></figure></div> <h2>Conclusion</h2> <p><code>fastrace</code> represents a modern approach to distributed tracing in Rust. The most significant advantages of <code>fastrace</code> are:</p> <ul> <li><strong>Zero Runtime Overhead When Disabled</strong>: Libraries can add rich instrumentation without worrying about performance impact when tracing is not enabled by the application.</li> <li><strong>No Ecosystem Lock-In</strong>: Libraries can use <code>fastrace</code> without forcing their users into a specific logging ecosystem.</li> <li><strong>Simple API Surface</strong>: The minimal API surface makes it easy to add comprehensive tracing with little code.</li> <li><strong>Predictable Performance</strong>: <code>fastrace</code>’s performance characteristics are consistent and predictable, even under high load.</li> </ul> <p>An ecosystem where libraries are comprehensively instrumented with <code>fastrace</code> would enable unprecedented visibility into applications, without the performance or compatibility concerns that have historically prevented such instrumentation.</p> <h2>Resources</h2> <ul> <li><a href="https://github.com/fast/fastrace">fastrace</a></li> <li><a href="https://crates.io/crates/fastrace-jaeger">fastrace-jaeger</a></li> <li><a href="https://crates.io/crates/fastrace-opentelemetry">fastrace-opentelemetry</a></li> <li><a href="https://crates.io/crates/fastrace-reqwest">fastrace-reqwest</a></li> <li><a href="https://crates.io/crates/fastrace-poem">fastrace-poem</a></li> <li><a href="https://crates.io/crates/fastrace-tonic">fastrace-tonic</a></li> <li><a href="https://crates.io/crates/logforth">logforth</a></li> </ul>