Skip to content

feat: add automations backend — webhooks to chat bridge#23631

Draft
kylecarbs wants to merge 9 commits intomainfrom
automations
Draft

feat: add automations backend — webhooks to chat bridge#23631
kylecarbs wants to merge 9 commits intomainfrom
automations

Conversation

@kylecarbs
Copy link
Member

Adds the backend infrastructure for automations, a user-owned resource that bridges external webhooks to Coder chats. When a webhook arrives, the system verifies the HMAC-SHA256 signature, evaluates a gjson-based filter, resolves a chat session via labels, and logs the event.

What's included

Database (migration 000452): automations table with webhook secret (dbcrypt-ready), filter, session_labels, rate limit columns; automation_webhook_events table for observability and rate limiting; chats.automation_id FK.

RBAC: New "automation" resource type with Create/Read/Update/Delete actions. dbauthz wrappers for all 12 query methods with full test coverage (48 sub-tests).

HTTP API: Authenticated CRUD under /api/experimental/automations (experiment-gated), plus /events, /test (dry-run), and /regenerate-secret. Unauthenticated webhook at /api/v2/automations/{id}/webhook.

Core logic: gjson-based filter matching, HMAC-SHA256 signature verification, label resolution from webhook payloads — all with unit tests.

SDK: Full codersdk.Automation types and ExperimentalClient methods for all endpoints.

Deferred to follow-up

  • Active-mode chat creation/continuation through chatd.Server
  • Audit logging (requires resource_type enum migration)
  • dbcrypt encryption of webhook secrets
  • Event purge job registration in dbpurge

Depends on #23594 (chat labels, merged).

Adds the backend infrastructure for automations, a user-owned resource
that bridges external webhooks to Coder chats. When a webhook arrives,
the system verifies the HMAC-SHA256 signature, evaluates a gjson-based
filter, resolves a chat session via labels, and logs the event.

Database: migration 000452 adds automations table, automation_webhook_events
table, and chats.automation_id FK. RBAC: new "automation" resource type
with CRUD actions. Routes: authenticated CRUD under /api/experimental/automations
(experiment-gated), unauthenticated webhook at /api/v2/automations/{id}/webhook.

Active-mode chat creation through chatd.Server is deferred to a follow-up.
Audit logging requires a resource_type enum addition (separate migration).
- Move convertAutomation/convertWebhookEvent to db2sdk package
- Rename session_labels → label_paths (clearer: maps label keys to gjson paths)
- Rename system_prompt → instructions (sent as user message, not system prompt)
- Remove workspace_id column (chat creation handles workspace selection)
- Add cron_schedule column for scheduled automations (nullable, mutually
  exclusive with webhook_secret in v1)
- Make webhook_secret nullable (cron automations don't need it)
Trigger-specific fields (webhook_secret, cron_schedule, filter,
label_paths) move from the automations table into a new
automation_triggers table. An automation can have multiple triggers,
each with its own type (webhook or cron), secret, and filter config.

Each webhook trigger gets its own URL and HMAC secret, so external
systems configure per-trigger webhooks that can be independently
revoked.

The event log table is renamed from automation_webhook_events to
automation_events (events can now come from cron triggers too) and
gains a trigger_id FK to trace which trigger fired.

New CRUD endpoints for triggers nested under
/api/experimental/automations/{automation}/triggers. The webhook
ingestion endpoint moves to /api/v2/automations/triggers/{trigger_id}/webhook.
Critical fixes:
- postAutomationWebhook: parse trigger UUID manually to always return
  200 (never leak 400 to webhook sources)
- deleteAutomationTrigger/regenerateAutomationTriggerSecret: verify
  trigger.AutomationID matches automation from middleware (prevents
  cross-user trigger manipulation)
- truncatePayload: return valid JSON stub instead of byte-slicing
  (which produced broken JSON at 64KB boundary)
- InsertAutomationEvent: scope RBAC check to specific automation
  instead of bare resource type
- Add webhook_secret_key_id to trigger insert/update SQL queries
  (enables dbcrypt key tracking)

High fixes:
- Add index on chats.automation_id for ON DELETE SET NULL performance
- Add unique constraint on automations(owner_id, org_id, name)
- Add index on automation_events(received_at) for purge query
- patchAutomation: validate status against allowed values, reject
  empty name
- postAutomationWebhook: verify trigger.Type is 'webhook' before
  processing
- DeleteAutomationTriggerByID: use ActionUpdate (not ActionDelete)
  on parent automation, consistent with child-entity RBAC pattern
- TestAutomation SDK client: send proper struct with payload/filter/
  label_paths instead of raw JSON
- Remove dead UpdateAutomationTriggerRequest type
- postAutomationTrigger: validate trigger type against known values
- CountAutomationMessagesInWindow: count both 'created' and
  'continued' events toward message rate limit
- Scope CountAutomationChatCreatesInWindow and
  CountAutomationMessagesInWindow RBAC to specific automation
High fixes:
- MatchFilter: use reflect.DeepEqual instead of != to prevent
  runtime panics on array/object filter values. Add test cases
  for booleans, arrays, and nested objects.
- postAutomation/patchAutomation: handle unique constraint
  violations with 409 Conflict instead of leaking raw Postgres
  errors as 500s. Uses database.IsUniqueViolation pattern.

Medium fixes:
- regenerateAutomationTriggerSecret: reject non-webhook triggers
  with 400 instead of silently setting an unused secret.
- coderd.go: fix mangled indentation on automations, chats, webhook,
  and deployment route blocks. Split r.Use(apiKeyMiddleware) onto
  its own line in /deployment route.
- Add session_test.go with 8 test cases for ResolveLabels covering
  nil/empty input, path extraction, missing paths, and type coercion.

Low fixes:
- testAutomation: return 400 on malformed label_paths instead of
  silently swallowing the parse error.
- Validate rate limit fields (1-1000 range) in both create and
  update handlers to prevent DB check constraint 500s.
…e non-JSON

M1: Add experiment gate in postAutomationWebhook — checks
api.Experiments.Enabled(ExperimentAgents) and returns 200 early
if disabled. Done in-handler (not middleware) to preserve the
always-200 contract.

M2: Replace nondeterministic orgs[0] selection with explicit
OrganizationID field on CreateAutomationRequest. Returns 400
if not provided. Follows the established pattern from chats.

M7: Add WebhookSecret field (omitempty) to AutomationTrigger
SDK type. Populated only in postAutomationTrigger and
regenerateAutomationTriggerSecret responses so the user can
configure their webhook provider. Never returned from list/get.

L2: Add safePayload() that wraps non-JSON webhook bodies in a
valid JSON envelope before storing. Preserves the audit trail
when webhook providers send form-encoded or XML payloads.
H1: Regenerate typesGenerated.ts to match SDK changes
  (organization_id, webhook_secret, TestAutomationRequest).

H2: Fix gofmt alignment in AutomationTrigger struct.

M1: Fix safePayload data loss — truncate inner body before
  wrapping in JSON envelope, not after. Prevents complete data
  loss when non-JSON payloads are close to the 64KB limit.

M2: Validate filter (must be JSON object) and label_paths
  (must be map[string]string) in postAutomationTrigger. Prevents
  silent webhook filtering from malformed trigger config.

M3: Replace inline anonymous struct in testAutomation with
  codersdk.TestAutomationRequest to prevent type drift.

M4: Remove unused accessURL parameter from db2sdk.Automation()
  and all 4 call sites.

M5: Add format:"uuid" tags to MCPServerIDs on Automation,
  CreateAutomationRequest, and UpdateAutomationRequest.

L2: Catch FK violations in postAutomation — return 400 with
  descriptive message instead of generic 500.
Adds a background scheduler that evaluates cron-based automation
triggers every minute. Follows the dbpurge pattern with quartz.Clock
for testability and advisory locking for single-replica execution.

Changes:
- Migration 000453: adds last_triggered_at to automation_triggers
- SQL queries: GetActiveCronTriggers (JOIN with automations),
  UpdateAutomationTriggerLastTriggeredAt
- cron.Standard(): full 5-field cron parser without Weekly/Daily
  restrictions
- cronscheduler package: background goroutine, advisory lock,
  schedule evaluation, event creation, preview/active mode
- Handler validation: cron_schedule validated on trigger creation
- Wired into cli/server.go alongside dbpurge and autobuild
- 20 new tests (13 cron parser + 7 scheduler with quartz mock)
…ate limiting

Adds the Fire() executor that handles the full automation trigger
lifecycle for both webhook and cron paths:

- Rate limiting via CountAutomationChatCreatesInWindow and
  CountAutomationMessagesInWindow (both queries already existed
  but were never called)
- Label-based chat lookup to continue existing conversations
- New chat creation through chatd.Server via ChatCreator interface
- Event recording with matched_chat_id / created_chat_id
- Preview mode still only logs events without acting

Changes:
- automations/executor.go: shared Fire() function with full flow
- automations/chatadapter.go: ChatdAdapter bridges ChatCreator to
  chatd.Server
- exp_automations.go: webhook handler now calls Fire() instead of
  stub event insertion
- cronscheduler: accepts ChatCreator, delegates to Fire()
- coderd.go: ChatDaemon() public getter for chatd.Server
- GetActiveCronTriggers query expanded with rate limit and model
  fields
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant