Skip to content

Implement stable session identifier headers for telemetry#295

Open
khanayan123 wants to merge 52 commits intomainfrom
ayan.khan/stable-session-id-headers
Open

Implement stable session identifier headers for telemetry#295
khanayan123 wants to merge 52 commits intomainfrom
ayan.khan/stable-session-id-headers

Conversation

@khanayan123
Copy link
Copy Markdown
Contributor

@khanayan123 khanayan123 commented Mar 20, 2026

Summary

Implements the Stable Service Instance Identifier RFC for the C++ SDK.

  • Adds DD-Session-ID (= runtime_id) and conditional DD-Root-Session-ID headers to all telemetry requests (app-started, heartbeats, app-closing, etc.)
  • DD-Root-Session-ID is only sent when it differs from DD-Session-ID (i.e., in child processes)
  • Root session ID is stored in a Meyer's singleton (static const local), set once by the first Tracer to initialize. Immutable after init, so fork-safe.
  • Fork propagation is automatic — the singleton value persists in child process memory

Integration testing note

System-tests for cpp_nginx and cpp_httpd cannot validate session headers until the downstream modules (nginx-datadog, httpd-datadog) release a version that includes these dd-trace-cpp changes. The system-tests manifest entries remain missing_feature until then. Unit tests in this PR validate the implementation directly.

Companion system-tests PR: DataDog/system-tests#6510

@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Mar 20, 2026

Benchmarks

Benchmark execution time: 2026-04-10 17:51:36

Comparing candidate commit e61e812 in PR branch ayan.khan/stable-session-id-headers with baseline commit 967246f in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1 metrics, 0 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

@datadog-prod-us1-6
Copy link
Copy Markdown

datadog-prod-us1-6 bot commented Mar 23, 2026

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 90.94%

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: e61e812 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

Adds DD-Session-ID and DD-Root-Session-ID HTTP headers to all telemetry
requests per the Stable Service Instance Identifier RFC.

- DD-Session-ID is always set to the tracer's runtime_id
- DD-Root-Session-ID is only set when it differs from the session ID
  (i.e. when running as a child process)
- Root session ID is propagated to exec'd children via
  _DD_ROOT_CPP_SESSION_ID env var, read in finalize_config()
- _DD_ROOT_CPP_SESSION_ID registered in the environment variable
  registry and supported-configurations.json

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@khanayan123 khanayan123 force-pushed the ayan.khan/stable-session-id-headers branch from 85d241a to e40dd5d Compare March 25, 2026 14:35
khanayan123 and others added 3 commits April 6, 2026 10:43
Protect concurrent set() calls with std::mutex, matching the codebase's
standard pattern for shared state protection.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Replace std::mutex with std::atomic<bool> compare_exchange_strong to
avoid potential pthread issues in Docker/CI environments.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Revert the app_started() → send_payload() simplification as it was out
of scope for this PR. Keep only the session headers addition.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
signature_{runtime_id_, config.defaults.service,
config.defaults.environment},
signature_{runtime_id_,
config.root_session_id.value_or(runtime_id_.string()),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to sound really weird, but I have a suggestion for updating this line of code. I think Method 1 Meyers Singleton from this Google AI response would be the best way to initialize the root session ID singleton. If we change the singleton implementation to align with that, then what we will be doing is:

  1. We are guaranteed to have a valid runtime_id_ from the previous line
  2. Right after we've set the runtime_id_ field, we try to initialize the Singleton root session ID with the runtime_id_ value, but the returned value will always be the singleton from the first runtime_id_ that was set
  3. Once we get the singleton instance back, then we get the string value so we can assign it to the root_session_id field
Suggested change
config.root_session_id.value_or(runtime_id_.string()),
Singleton::getInstance(runtime_id_.string()).getValue(),

khanayan123 and others added 9 commits April 6, 2026 21:09
Replace separate set()/get() with a single get_or_init(runtime_id) that
initializes on first call and returns the stored value on subsequent
calls. Remove root_session_id from TracerConfig/FinalizedTracerConfig
— the singleton is now the sole source of truth, read directly in
Tracer construction.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Verify get_or_init returns the first runtime ID on subsequent calls
with different values (first-writer-wins semantics).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The 3-param constructor now calls root_session_id::get_or_init internally,
so every TracerSignature construction via the short form goes through the
singleton. Tracer no longer needs to call get_or_init directly.

Session header tests use the 4-param constructor to control values
directly without depending on singleton state.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The singleton may already be set from a prior test case. Compare
the child's result against what get_or_init actually returned,
not the runtime ID passed in this test.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Use root_session_id::get_or_init to acquire the root session ID and
use it throughout the session header test cases.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Keep the 3-param TracerSignature constructor as a simple inline
delegating constructor. The get_or_init call stays in tracer.cpp
where the Tracer is constructed.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown
Collaborator

@xlamorlette-datadog xlamorlette-datadog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review the singleton implementation.


#include <atomic>

namespace datadog {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rather use C++17 compound namespaces:

namespace datadog::tracing::root_session_id {

return instance();
}

} // namespace root_session_id
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, pedantic, personal preference: I think such comments are noise


const std::string& get_or_init(const std::string& runtime_id) {
bool expected = false;
if (initialized().compare_exchange_strong(expected, true)) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you need this, rather than implementing the singleton directly:

const std::string& get_or_init(const std::string& runtime_id) {
  static const std::string id = runtime_id;
  return id;
}

By using inline in the header file, you can avoid the whole .cpp file:

inline const std::string& get_or_init(const std::string& runtime_id) {
  static const std::string id = runtime_id;
  return id;
}

}

void set_session_headers(DictWriter& headers,
const tracing::TracerSignature& sig) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, clean code: use explicit names rather than uncommon abbreviation: signature
same line 322 and 378

: RuntimeID::generate()),
signature_{runtime_id_, config.defaults.service,
config.defaults.environment},
signature_{runtime_id_,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I agree one should generalize the usage of braces initialization calls, but here I think we should favor consistency with existing code

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

up

"test"};

Telemetry telemetry{*finalize_config(), sig, logger, client,
scheduler, *url};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra spaces; same below

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

up


SECTION("root process: DD-Session-ID present, DD-Root-Session-ID absent") {
auto session_rid = RuntimeID::generate();
const TracerSignature sig{session_rid, session_rid.string(), "testsvc",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the 3 parameters constructor.

auto url = HTTPClient::URL::parse("http://localhost:8000");

auto init_rid = RuntimeID::generate();
const auto& root_id = root_session_id::get_or_init(init_rid.string());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid calling the singleton in the unit tests.

- Simplify singleton to static const Meyer's pattern, remove .cpp file
- Use C++17 compound namespaces in root_session_id.h
- Rename sig -> signature in telemetry_impl.cpp
- Remove singleton usage from tests, use 3-param constructor for match
- Add blank lines between constructors in tracer_signature.h
- Fix extra spaces in test Telemetry init alignment

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown
Contributor

@zacharycmontoya zacharycmontoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for incorporating all our feedback, this is a much more simplified implementation now! Please also get approval @xlamorlette-datadog before merging.

"test"};

Telemetry telemetry{*finalize_config(), sig, logger, client,
scheduler, *url};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

up

khanayan123 and others added 3 commits April 7, 2026 13:50
clang-format requires the extra spaces for argument alignment.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Using the longer variable name avoids clang-format's column alignment
padding while matching the naming convention in the rest of the file.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Integrations (nginx, httpd, kong) need to set root_session_id in the
master process before workers fork. The Tracer passes it to get_or_init,
falling back to runtime_id if not provided.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown
Collaborator

@xlamorlette-datadog xlamorlette-datadog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

// Root session ID for stable telemetry correlation across forked workers.
// Integrations (nginx, httpd, kong) should set this in the master process
// before workers fork so all Tracers share the same root.
Optional<std::string> root_session_id;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we should have the same order for the fields in TracerConfig and FinalizedTracerConfig.

final_config.runtime_id = user_config.runtime_id;
}

if (user_config.root_session_id) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test if (user_config.root_session_id) is useless since both fields are Optional and since final_config has just been instantiated with default values.
Same above for if (user_config.runtime_id).

: RuntimeID::generate()),
signature_{runtime_id_, config.defaults.service,
config.defaults.environment},
signature_{runtime_id_,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

up

CHECK(root_it->second == root_rid.string());
}

SECTION("heartbeat includes session headers") {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit puzzled: is is expected that the only difference with the child process: DD-Root-Session-ID present when different test is the following two calls:

    client->clear();
    scheduler->trigger_heartbeat();

But, after that, there is no difference in what is checked.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test verifies headers persist on heartbeat requests (after client->clear() + trigger_heartbeat()), not just app-started. The assertions are intentionally identical same headers should appear on every telemetry event

…redundant if checks

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants