Skip to content

bug: coder-logstream-kube connections interfere with agent connectivity status #21625

@spikecurtis

Description

@spikecurtis

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

When https://github.com/coder/coder-logstream-kube is in use, we see extraneous changes to the Agent connectivity state. These incluce marking the agent connected before the agent process actually connects to coderd and marking the agent disconnected even though the agent never disconnects from coderd. You will often see logs like the following in the coderd log that do not correspond with the agent itself disconnecting.

The root problem is that coder-logstream-kube impersonates the agent to do it's work with coderd. It learns the agent token via Kubernetes pod spec, then connects to the same RPC endpoint as the real agent does. Coderd is designed with the assumption that only the real agent will connect to this RPC endpoint, and so uses the connectivity state of the RPC to set connection and disconnection timestamps for the agent. These timestamps are used to compute the connectivity state of the agent.

Consequences

At least one customer has observed VSCode attempting to connect to the agent before it actually starts.

At least two customers have observed disconnections from both VSCode and Jetbrains due to the agent being marked disconnected. This second issue can be mitigated by not triggering disconnections at the SSH layer due to momentary agent disconnects reported by coderd.

The dashboard and metrics briefly, incorrectly, indicate the agent is disconnected.

Relevant Log Output

coderd.agentrpc.yamux.stdlib: [ERR] yamux: Failed to read header: failed to get reader: failed to read frame header: EOF  owner=spike  workspace_name=dogfood  agent_name=main  request_id=a64d2f7f-631c-4daf-9692-53e95c4bbd0e

Expected Behavior

coder-logstream-kube connections should not affect the connectivity status of the agent.

Steps to Reproduce

  1. set up Coder with Kubernetes as the workspace infrastructure (including template)
  2. install coder-logstream-kubernetes in the Kubernetes cluster as usual
  3. start a workspace

Environment

Iaas: Kubernetes
coder-logstream-kube: since v0.0.10
coder: since v2.7 at least, possibly older

Additional Context

No response

Metadata

Metadata

Assignees

Labels

must-doIssues that must be completed by the end of the Sprint. Or else. Only humans may set this.

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions