feat: add chat automations database schema and queries#23675
feat: add chat automations database schema and queries#23675
Conversation
d54f12a to
5ef6764
Compare
| -- chats. An automation defines *what* to say, *which* model and tools to | ||
| -- use, and *how fast* it is allowed to create or continue chats. | ||
| CREATE TABLE automations ( | ||
| id uuid DEFAULT gen_random_uuid() NOT NULL, |
There was a problem hiding this comment.
We generate the uuids in golang rather than in the sql.
This has been helpful when unable to returning * and in dbgen when the uuid is nice to know up front.
| -- Lifecycle state: | ||
| -- disabled — trigger events are silently dropped. | ||
| -- preview — events are logged but no chat is created (dry-run). | ||
| -- active — events create or continue chats. | ||
| status text NOT NULL DEFAULT 'disabled', |
| created_at timestamp with time zone NOT NULL DEFAULT now(), | ||
| updated_at timestamp with time zone NOT NULL DEFAULT now(), |
There was a problem hiding this comment.
Do not use defaults. Make the Insert specify the times. Really helps with unit testing and mocking different timestamps.
| -- Webhook and cron triggers share the same row shape with type-specific | ||
| -- nullable columns to keep the schema simple. | ||
| CREATE TABLE automation_triggers ( | ||
| id uuid DEFAULT gen_random_uuid() NOT NULL, |
| -- Discriminator: 'webhook' or 'cron'. Determines which nullable | ||
| -- columns are meaningful. | ||
| type text NOT NULL, |
| created_at timestamp with time zone NOT NULL DEFAULT now(), | ||
| updated_at timestamp with time zone NOT NULL DEFAULT now(), |
9f5762f to
3004e2d
Compare
3004e2d to
007645d
Compare
007645d to
0ae7379
Compare
| "chat:delete", | ||
| "chat:read", | ||
| "chat:update", | ||
| "chat_automation:*", |
There was a problem hiding this comment.
no docs on experimental endpoints - sadly agents need to be reminded to look at other methods in x/chatd
| mcp_server_ids uuid[] NOT NULL DEFAULT '{}', | ||
| -- Tool allowlist. Empty means all tools available to the model | ||
| -- config are permitted. | ||
| allowed_tools text[] NOT NULL DEFAULT '{}', |
There was a problem hiding this comment.
mcps will add their own tools on top of this from their allowed list? or is this tools from our harness?
| -- Maximum number of *new* chats this automation may create in a | ||
| -- rolling one-hour window. Prevents runaway webhook storms from | ||
| -- flooding the system. | ||
| max_chat_creates_per_hour integer NOT NULL DEFAULT 10, |
There was a problem hiding this comment.
should this be not null? cron jobs shouldn't have to worry about this. do subagents go into that count?
ibetitsmike
left a comment
There was a problem hiding this comment.
Nice schema design. The three-table split (automations, triggers, events) is clean and well-normalized. The partial index on rate-limit events, the dbcrypt integration, and the check constraints for webhook/cron field exclusivity are all well-thought-out. A few findings from a deep review below.
1 P2, 3 P3, 4 Nit across 8 inline comments. This review contains no blockers, but the P2 and the P3 constraint findings are worth discussing before merge.
This review was generated by an automated agent (best-of-10 deep-review synthesis across database, security, concurrency, contract, edge-case, test, style, and dbcrypt domains).
| return database.ChatAutomationEvent{}, err | ||
| } | ||
| if err := q.authorizeContext(ctx, policy.ActionUpdate, automation); err != nil { | ||
| return database.ChatAutomationEvent{}, err |
There was a problem hiding this comment.
P2 InsertChatAutomationEvent checks ActionUpdate on the automation, but events are append-only audit records created by the system (webhook handlers, cron scheduler), not user mutations of the automation config. (8/10 independent reviewers flagged this)
The wrapper requires
ActionUpdatepermission to insert an event, but events are system-generated side effects of trigger evaluation, not user-initiated modifications.
If the webhook handler or cron scheduler runs as AsSystemRestricted(ctx), this passes fine. But the semantic mismatch could cause confusion: a future developer reading this code will think "inserting events requires the ability to modify the automation," when really it's "the system is recording what happened." Could this be ActionRead instead (since the system is reading the automation config to process the event), or should there be a comment explaining why ActionUpdate was chosen?
The same pattern is used for InsertChatAutomationTrigger at position 159, which is more defensible since creating a trigger is genuinely modifying the automation's configuration.
This comment was written by an automated agent.
| return err | ||
| } | ||
| if err := q.authorizeContext(ctx, policy.ActionUpdate, automation); err != nil { | ||
| return err |
There was a problem hiding this comment.
P3 DeleteChatAutomationTriggerByID authorizes with ActionUpdate, not ActionDelete. (3/10 reviewers flagged this)
Deleting a trigger requires only
updatepermission on the parent automation, notdelete. This is inconsistent with other delete operations in the codebase (e.g.,DeleteChatAutomationByIDusesdeleteQwhich checksActionDelete).
This may be intentional if triggers are considered sub-resources whose lifecycle is managed via "updating" the automation. If so, a short comment explaining the rationale would help future readers. If not, this should be ActionDelete for consistency.
This comment was written by an automated agent.
| PRIMARY KEY (id), | ||
| FOREIGN KEY (automation_id) REFERENCES chat_automations(id) ON DELETE CASCADE, | ||
| CONSTRAINT chat_automation_triggers_webhook_fields CHECK ( | ||
| type != 'webhook' OR (cron_schedule IS NULL AND last_triggered_at IS NULL) |
There was a problem hiding this comment.
P3 The CHECK constraints enforce that webhook triggers don't have cron fields and vice versa, but they don't enforce that a webhook trigger actually has a webhook_secret, or that a cron trigger actually has a cron_schedule. (3/10 reviewers flagged this)
A row like
(type='webhook', webhook_secret=NULL, cron_schedule=NULL)passes both constraints but is logically invalid: a webhook trigger with no shared secret can't verify HMAC signatures.
This may be acceptable if application-level validation catches it before insert, but a database-level guard would prevent malformed rows from any code path:
CONSTRAINT chat_automation_triggers_webhook_has_secret CHECK (
type != 'webhook' OR webhook_secret IS NOT NULL
),
CONSTRAINT chat_automation_triggers_cron_has_schedule CHECK (
type != 'cron' OR cron_schedule IS NOT NULL
)Non-blocking, but worth considering for defense-in-depth.
This comment was written by an automated agent.
| FOREIGN KEY (matched_chat_id) REFERENCES chats(id) ON DELETE SET NULL, | ||
| FOREIGN KEY (created_chat_id) REFERENCES chats(id) ON DELETE SET NULL, | ||
| CONSTRAINT chat_automation_events_chat_exclusivity CHECK ( | ||
| matched_chat_id IS NULL OR created_chat_id IS NULL |
There was a problem hiding this comment.
P3 The chat_exclusivity constraint ensures at most one of matched_chat_id/created_chat_id is set, but doesn't enforce that at least one is set when status IN ('created', 'continued'). (1/10 reviewers flagged this, but the logic is sound)
An event with
status='created'and bothmatched_chat_id=NULLandcreated_chat_id=NULLpasses the constraint but is logically inconsistent: it claims a chat was created but doesn't reference which one.
A status-dependent constraint would tighten this:
CONSTRAINT chat_automation_events_chat_required CHECK (
status NOT IN ('created', 'continued')
OR matched_chat_id IS NOT NULL
OR created_chat_id IS NOT NULL
)This is lower-priority since the insert code controls these values, but it prevents silent data integrity issues from any code path.
This comment was written by an automated agent.
| } | ||
|
|
||
| func (q *querier) GetActiveChatAutomationCronTriggers(ctx context.Context) ([]database.GetActiveChatAutomationCronTriggersRow, error) { | ||
| if err := q.authorizeContext(ctx, policy.ActionRead, rbac.ResourceChatAutomation.All()); err != nil { |
There was a problem hiding this comment.
Nit GetActiveChatAutomationCronTriggers checks rbac.ResourceChatAutomation.All(), which is a binary admin-or-reject gate with no per-row filtering. This is correct for a system-level scheduler, but worth a short comment explaining that this function must only be called from a privileged/system context (e.g., dbauthz.AsSystemRestricted). (8/10 reviewers flagged this)
Without the comment, a future developer might wonder why there's no SQL-level row filtering like GetChatAutomations has.
This comment was written by an automated agent.
| AND received_at > @window_start::timestamptz; | ||
|
|
||
| -- name: PurgeOldChatAutomationEvents :exec | ||
| DELETE FROM chat_automation_events |
There was a problem hiding this comment.
Nit The 7-day retention is hardcoded in SQL. (5/10 reviewers noted this)
This is fine for a V1, but if the retention period ever needs to change, it requires modifying the SQL query rather than a configuration knob. Also, for high-volume deployments, a single unbounded DELETE could hold locks for a while. Consider batching or accepting a parameter from the caller:
DELETE FROM chat_automation_events
WHERE received_at < @cutoff::timestamptz;Non-blocking for this PR since handlers aren't included yet.
This comment was written by an automated agent.
| SET webhook_secret = NULL, | ||
| webhook_secret_key_id = NULL | ||
| WHERE webhook_secret_key_id IS NOT NULL;COMMIT; | ||
| ` |
There was a problem hiding this comment.
Nit Formatting: COMMIT; is concatenated directly to the WHERE clause without a newline. The SQL is technically valid (; terminates the UPDATE, then COMMIT; runs), but it's easy to misread. (1/10 reviewers flagged this)
- WHERE webhook_secret_key_id IS NOT NULL;COMMIT;
+ WHERE webhook_secret_key_id IS NOT NULL;
+COMMIT;This comment was written by an automated agent.
| LockIDReconcileSystemRoles | ||
| LockIDBoundaryUsageStats | ||
| LockIDChatAutomationCron | ||
| ) |
There was a problem hiding this comment.
Nit Missing comment on LockIDChatAutomationCron. Other lock constants in this file have (or should have) comments explaining their purpose. (4/10 reviewers noted this)
Suggestion:
// LockIDChatAutomationCron prevents concurrent cron trigger
// evaluation across coderd replicas.
LockIDChatAutomationCronThis comment was written by an automated agent.
ibetitsmike
left a comment
There was a problem hiding this comment.
Nice schema design. The three-table split (automations, triggers, events) is clean and well-normalized. The partial index on rate-limit events, the dbcrypt integration, and the check constraints for webhook/cron field exclusivity are all well-thought-out. A few findings from a deep review below.
1 P2, 3 P3, 4 Nit across 8 inline comments. This review contains no blockers, but the P2 and the P3 constraint findings are worth discussing before merge.
This review was generated by an automated agent (best-of-10 deep-review synthesis across database, security, concurrency, contract, edge-case, test, style, and dbcrypt domains).
| return database.ChatAutomationEvent{}, err | ||
| } | ||
| if err := q.authorizeContext(ctx, policy.ActionUpdate, automation); err != nil { | ||
| return database.ChatAutomationEvent{}, err |
There was a problem hiding this comment.
P2 InsertChatAutomationEvent checks ActionUpdate on the automation, but events are append-only audit records created by the system (webhook handlers, cron scheduler), not user mutations of the automation config. (8/10 independent reviewers flagged this)
The wrapper requires
ActionUpdatepermission to insert an event, but events are system-generated side effects of trigger evaluation, not user-initiated modifications.
If the webhook handler or cron scheduler runs as AsSystemRestricted(ctx), this passes fine. But the semantic mismatch could cause confusion: a future developer reading this code will think "inserting events requires the ability to modify the automation," when really it's "the system is recording what happened." Could this be ActionRead instead (since the system is reading the automation config to process the event), or should there be a comment explaining why ActionUpdate was chosen?
The same pattern is used for InsertChatAutomationTrigger at position 159, which is more defensible since creating a trigger is genuinely modifying the automation's configuration.
This comment was written by an automated agent.
mafredri
left a comment
There was a problem hiding this comment.
Well-designed schema. The three-state lifecycle, two-tier rate limiting, and trigger-type discriminator with cross-check constraints are thoughtful. The dbcrypt integration covers all paths and the cron trigger query correctly excludes secret columns.
2 P1, 3 P2, 2 P3, 2 obs, 2 nits across 11 inline comments.
Killua: "A deployment receiving 1 event/second accumulates ~600K rows/week. If the purge job falls behind, this single DELETE attempts to remove all accumulated rows in one transaction: long lock hold, massive WAL generation, vacuum pressure."
Hisoka: "An org admin could set owner_id to another admin or a user with different permissions, attributing automated chat actions to that user. This is the kind of thing that holds up fine until the handler trusts the data layer to enforce ownership."
Nit: the commit message body references migration 000454 (doesn't exist, the column is in 000453), and uses wrong file/query names: automations.sql (actual: chatautomations.sql), automationtriggers.sql (actual: chatautomationtriggers.sql), automationevents.sql (actual: chatautomationevents.sql), GetActiveCronTriggers (actual: GetActiveChatAutomationCronTriggers), LockIDAutomationCron (actual: LockIDChatAutomationCron). (Leorio)
PS. The PR description says webhook_secret_key_id reserved for future dbcrypt encryption, but the diff implements full dbcrypt integration (seven methods in dbcrypt.go, rotation/decrypt/delete in cliutil.go). Worth updating so reviewers don't skip the encryption paths.
🤖 This review was automatically generated with Coder Agents.
0ae7379 to
cabaa9b
Compare
mafredri
left a comment
There was a problem hiding this comment.
Thanks for fixing the unbounded purge, the missing .WithOwner(), the COMMIT formatting, and adding the status consistency constraint. All four look correct.
One P1 remains (dbcrypt test coverage), one P2 (TOCTOU, may be fine to address in the handler PR). The rest are nits.
1 P1, 1 P2, 1 P3, 1 obs, 3 nits.
PS. The PR description still says webhook_secret_key_id reserved for future dbcrypt encryption and the commit message still references migration 000454 and wrong file/query names (automations.sql instead of chatautomations.sql, GetActiveCronTriggers instead of GetActiveChatAutomationCronTriggers, LockIDAutomationCron instead of LockIDChatAutomationCron).
🤖 This review was automatically generated with Coder Agents.
cabaa9b to
227578d
Compare
Adds the database layer for the automations feature: Schema (migration 000453): - automations table with status, rate limits, instructions, MCP config - automation_triggers table (webhook + cron types) - automation_events table for audit trail - Indexes for performance and unique constraint on (owner, org, name) - automation_id FK on chats table Schema (migration 000454): - last_triggered_at column on automation_triggers for cron scheduling SQL queries: - automations.sql: CRUD with authorize_filter support - automationtriggers.sql: CRUD + GetActiveCronTriggers JOIN query - automationevents.sql: insert, list, count windows, purge RBAC: - automation resource with CRUD actions - dbauthz wrappers for all 18 query methods Also adds LockIDAutomationCron advisory lock constant.
227578d to
51de284
Compare
Why
Automations bridge external events to Coder chats. A GitHub webhook, a PagerDuty alert, or a cron schedule fires a trigger, and Coder turns it into a chat conversation — either creating a new one or continuing an existing one by matching labels.
This PR adds the database layer only: three tables, their indexes, RBAC resource definition, dbauthz authorization wrappers, and all generated code. No HTTP handlers, SDK types, or application logic — those stay in #23631 and will land on top of this.
Schema design
automations— what to do when a trigger firesEach automation belongs to a user+org pair (RBAC-scoped) and carries:
instructions— the user-role message injected into every chat. This is the core prompt.model_config_id/mcp_server_ids/allowed_tools— which model, MCP servers, and tools to use. Config references are SET NULL on delete so automations survive config removal.status— three-state lifecycle:disabled(silently drop events),preview(log events but take no chat action),active(create/continue chats). Preview mode lets users verify their filter and label config before going live.max_chat_creates_per_hour/max_messages_per_hour— two-tier rate limiting. The first caps new chats (protects against webhook storms creating hundreds of conversations). The second caps total messages including continuations (protects against tight cron loops flooding a single chat). Both have CHECK > 0 constraints.(owner_id, organization_id, name)so automations are addressable by name.automation_triggers— how an automation is invokedEach automation can have multiple triggers. Two types share one table with type-specific nullable columns (simpler than inheritance):
webhook_secretfor HMAC-SHA256 verification,webhook_secret_key_idreserved for future dbcrypt encryption.cron_schedule(5-field standard cron),last_triggered_atfor the scheduler to compute the next fire time. Falls back tocreated_atwhen NULL (first fire).filter(gjson path→value conditions that must all match),label_paths(gjson paths that extract chat labels from payloads for routing events to the right conversation).automation_events— audit trail and rate-limit data sourceEvery trigger invocation produces an event row regardless of outcome. The
statuscolumn captures what happened:filtered,preview,created,continued,rate_limited, orerror. Rate-limit window counts query this table. Rows are append-only and purged by retention.Two indexes:
(automation_id, received_at DESC)for the per-automation event list UI, and(received_at)for the purge job.chats.automation_id— backlinkNullable FK from chats to the automation that created them. SET NULL on delete. Indexed for "list chats spawned by automation X" queries.
What is included
000453_automations) with all three tables, indexes, constraints, and the chats FKautomationresource with create/read/update/delete actionsLockIDAutomationCronadvisory lock constant for the cron schedulermake genoutput (models, querier, dbmock, dbmetrics, constraints, dump.sql)What is NOT included
HTTP handlers, SDK types, db2sdk converters, the automations package, the cron scheduler, and CLI wiring all stay in #23631.