feat: add automations backend — webhooks to chat bridge#23631
Draft
feat: add automations backend — webhooks to chat bridge#23631
Conversation
Adds the backend infrastructure for automations, a user-owned resource
that bridges external webhooks to Coder chats. When a webhook arrives,
the system verifies the HMAC-SHA256 signature, evaluates a gjson-based
filter, resolves a chat session via labels, and logs the event.
Database: migration 000452 adds automations table, automation_webhook_events
table, and chats.automation_id FK. RBAC: new "automation" resource type
with CRUD actions. Routes: authenticated CRUD under /api/experimental/automations
(experiment-gated), unauthenticated webhook at /api/v2/automations/{id}/webhook.
Active-mode chat creation through chatd.Server is deferred to a follow-up.
Audit logging requires a resource_type enum addition (separate migration).
- Move convertAutomation/convertWebhookEvent to db2sdk package - Rename session_labels → label_paths (clearer: maps label keys to gjson paths) - Rename system_prompt → instructions (sent as user message, not system prompt) - Remove workspace_id column (chat creation handles workspace selection) - Add cron_schedule column for scheduled automations (nullable, mutually exclusive with webhook_secret in v1) - Make webhook_secret nullable (cron automations don't need it)
Trigger-specific fields (webhook_secret, cron_schedule, filter,
label_paths) move from the automations table into a new
automation_triggers table. An automation can have multiple triggers,
each with its own type (webhook or cron), secret, and filter config.
Each webhook trigger gets its own URL and HMAC secret, so external
systems configure per-trigger webhooks that can be independently
revoked.
The event log table is renamed from automation_webhook_events to
automation_events (events can now come from cron triggers too) and
gains a trigger_id FK to trace which trigger fired.
New CRUD endpoints for triggers nested under
/api/experimental/automations/{automation}/triggers. The webhook
ingestion endpoint moves to /api/v2/automations/triggers/{trigger_id}/webhook.
Critical fixes: - postAutomationWebhook: parse trigger UUID manually to always return 200 (never leak 400 to webhook sources) - deleteAutomationTrigger/regenerateAutomationTriggerSecret: verify trigger.AutomationID matches automation from middleware (prevents cross-user trigger manipulation) - truncatePayload: return valid JSON stub instead of byte-slicing (which produced broken JSON at 64KB boundary) - InsertAutomationEvent: scope RBAC check to specific automation instead of bare resource type - Add webhook_secret_key_id to trigger insert/update SQL queries (enables dbcrypt key tracking) High fixes: - Add index on chats.automation_id for ON DELETE SET NULL performance - Add unique constraint on automations(owner_id, org_id, name) - Add index on automation_events(received_at) for purge query - patchAutomation: validate status against allowed values, reject empty name - postAutomationWebhook: verify trigger.Type is 'webhook' before processing - DeleteAutomationTriggerByID: use ActionUpdate (not ActionDelete) on parent automation, consistent with child-entity RBAC pattern - TestAutomation SDK client: send proper struct with payload/filter/ label_paths instead of raw JSON - Remove dead UpdateAutomationTriggerRequest type - postAutomationTrigger: validate trigger type against known values - CountAutomationMessagesInWindow: count both 'created' and 'continued' events toward message rate limit - Scope CountAutomationChatCreatesInWindow and CountAutomationMessagesInWindow RBAC to specific automation
High fixes: - MatchFilter: use reflect.DeepEqual instead of != to prevent runtime panics on array/object filter values. Add test cases for booleans, arrays, and nested objects. - postAutomation/patchAutomation: handle unique constraint violations with 409 Conflict instead of leaking raw Postgres errors as 500s. Uses database.IsUniqueViolation pattern. Medium fixes: - regenerateAutomationTriggerSecret: reject non-webhook triggers with 400 instead of silently setting an unused secret. - coderd.go: fix mangled indentation on automations, chats, webhook, and deployment route blocks. Split r.Use(apiKeyMiddleware) onto its own line in /deployment route. - Add session_test.go with 8 test cases for ResolveLabels covering nil/empty input, path extraction, missing paths, and type coercion. Low fixes: - testAutomation: return 400 on malformed label_paths instead of silently swallowing the parse error. - Validate rate limit fields (1-1000 range) in both create and update handlers to prevent DB check constraint 500s.
…e non-JSON M1: Add experiment gate in postAutomationWebhook — checks api.Experiments.Enabled(ExperimentAgents) and returns 200 early if disabled. Done in-handler (not middleware) to preserve the always-200 contract. M2: Replace nondeterministic orgs[0] selection with explicit OrganizationID field on CreateAutomationRequest. Returns 400 if not provided. Follows the established pattern from chats. M7: Add WebhookSecret field (omitempty) to AutomationTrigger SDK type. Populated only in postAutomationTrigger and regenerateAutomationTriggerSecret responses so the user can configure their webhook provider. Never returned from list/get. L2: Add safePayload() that wraps non-JSON webhook bodies in a valid JSON envelope before storing. Preserves the audit trail when webhook providers send form-encoded or XML payloads.
H1: Regenerate typesGenerated.ts to match SDK changes (organization_id, webhook_secret, TestAutomationRequest). H2: Fix gofmt alignment in AutomationTrigger struct. M1: Fix safePayload data loss — truncate inner body before wrapping in JSON envelope, not after. Prevents complete data loss when non-JSON payloads are close to the 64KB limit. M2: Validate filter (must be JSON object) and label_paths (must be map[string]string) in postAutomationTrigger. Prevents silent webhook filtering from malformed trigger config. M3: Replace inline anonymous struct in testAutomation with codersdk.TestAutomationRequest to prevent type drift. M4: Remove unused accessURL parameter from db2sdk.Automation() and all 4 call sites. M5: Add format:"uuid" tags to MCPServerIDs on Automation, CreateAutomationRequest, and UpdateAutomationRequest. L2: Catch FK violations in postAutomation — return 400 with descriptive message instead of generic 500.
Adds a background scheduler that evaluates cron-based automation triggers every minute. Follows the dbpurge pattern with quartz.Clock for testability and advisory locking for single-replica execution. Changes: - Migration 000453: adds last_triggered_at to automation_triggers - SQL queries: GetActiveCronTriggers (JOIN with automations), UpdateAutomationTriggerLastTriggeredAt - cron.Standard(): full 5-field cron parser without Weekly/Daily restrictions - cronscheduler package: background goroutine, advisory lock, schedule evaluation, event creation, preview/active mode - Handler validation: cron_schedule validated on trigger creation - Wired into cli/server.go alongside dbpurge and autobuild - 20 new tests (13 cron parser + 7 scheduler with quartz mock)
…ate limiting Adds the Fire() executor that handles the full automation trigger lifecycle for both webhook and cron paths: - Rate limiting via CountAutomationChatCreatesInWindow and CountAutomationMessagesInWindow (both queries already existed but were never called) - Label-based chat lookup to continue existing conversations - New chat creation through chatd.Server via ChatCreator interface - Event recording with matched_chat_id / created_chat_id - Preview mode still only logs events without acting Changes: - automations/executor.go: shared Fire() function with full flow - automations/chatadapter.go: ChatdAdapter bridges ChatCreator to chatd.Server - exp_automations.go: webhook handler now calls Fire() instead of stub event insertion - cronscheduler: accepts ChatCreator, delegates to Fire() - coderd.go: ChatDaemon() public getter for chatd.Server - GetActiveCronTriggers query expanded with rate limit and model fields
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds the backend infrastructure for automations, a user-owned resource that bridges external webhooks to Coder chats. When a webhook arrives, the system verifies the HMAC-SHA256 signature, evaluates a gjson-based filter, resolves a chat session via labels, and logs the event.
What's included
Database (migration 000452):
automationstable with webhook secret (dbcrypt-ready), filter, session_labels, rate limit columns;automation_webhook_eventstable for observability and rate limiting;chats.automation_idFK.RBAC: New
"automation"resource type with Create/Read/Update/Delete actions. dbauthz wrappers for all 12 query methods with full test coverage (48 sub-tests).HTTP API: Authenticated CRUD under
/api/experimental/automations(experiment-gated), plus/events,/test(dry-run), and/regenerate-secret. Unauthenticated webhook at/api/v2/automations/{id}/webhook.Core logic: gjson-based filter matching, HMAC-SHA256 signature verification, label resolution from webhook payloads — all with unit tests.
SDK: Full
codersdk.Automationtypes andExperimentalClientmethods for all endpoints.Deferred to follow-up
chatd.Serverresource_typeenum migration)dbpurgeDepends on #23594 (chat labels, merged).