`flagd` faults take time to propagate in astronomy shop

Essentially, the change in flagd is not actually picked up right away by the corresponding services (which is why we restarted some of them IIRC).

From a debugging session with an AI that seems correct:
```
No, it only restarts flagd, not the ad service. The injection flow is:

Update flagd-config ConfigMap — set adFailure.defaultVariant to "on"
kubectl rollout restart deployment flagd — so flagd picks up the new config
That’s it — no restart of the ad deployment

The assumption is that the ad service’s OpenFeature SDK is connected to flagd via gRPC EventStream and will pick up the flag change in real-time. But in practice, there’s a race:

flagd restarts → the ad service’s EventStream connection drops
The OpenFeature SDK falls back to the code-level default (false = no failure) while reconnecting
Only after the SDK re-establishes the EventStream and syncs the flag state does adFailure=true take effect
That reconnection can take minutes, explaining the ~6–30 minute delay before errors appear
```

This seems to be the root cause.

Thanks @tianyi-tz for finding this and reaching out!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`flagd` faults take time to propagate in astronomy shop #724

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

flagd faults take time to propagate in astronomy shop #724

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`flagd` faults take time to propagate in astronomy shop #724