Skip to content

Latest commit

 

History

History
332 lines (236 loc) · 14.2 KB

File metadata and controls

332 lines (236 loc) · 14.2 KB

Upgrade Guide

General Principles

  • Database migrations run automatically on startup. No manual SQL required.
  • Config files are backward compatible. New fields have sensible defaults; old configs continue to work.
  • Mitigations survive restarts. Active mitigations are stored in PostgreSQL and re-announced by the reconciliation loop after startup.
  • Always back up the database before upgrading.

Docker Compose Upgrade

# 1. Back up the database
docker compose exec postgres pg_dump -U prefixd prefixd > backup-$(date +%F).sql

# 2. Pull latest code
git pull origin main

# 3. Rebuild containers
docker compose build

# 4. Restart (prefixd applies migrations on startup)
docker compose up -d

# 5. Verify
curl http://localhost/v1/health
docker compose logs prefixd | grep "database migrations applied"

Zero-Downtime Upgrade

For environments where downtime is unacceptable:

  1. Build the new image while the old one is running
  2. Run docker compose up -d --no-deps prefixd to replace only the prefixd container
  3. The reconciliation loop will re-announce any rules within 30 seconds
  4. Active mitigations are not affected (fail-open: GoBGP retains routes until explicitly withdrawn)

Bare Metal Upgrade

# 1. Back up the database
pg_dump -U prefixd prefixd > backup-$(date +%F).sql

# 2. Build new version
git pull origin main
cargo build --release

# 3. Stop the daemon
sudo systemctl stop prefixd

# 4. Install new binary
sudo cp target/release/prefixd /usr/local/bin/
sudo cp target/release/prefixdctl /usr/local/bin/

# 5. Start (migrations run automatically)
sudo systemctl start prefixd

# 6. Verify
prefixdctl status
prefixdctl migrations

Rollback

If an upgrade causes issues:

Docker Compose

# Check out the previous version
git checkout v0.8.5  # or whatever the previous tag was

# Restore database (if migration changed schema)
docker compose exec -T postgres psql -U prefixd prefixd < backup-2026-02-20.sql

# Rebuild and restart
docker compose build
docker compose up -d

Bare Metal

# Stop
sudo systemctl stop prefixd

# Restore database backup
psql -U prefixd prefixd < backup-2026-02-20.sql

# Install previous binary
sudo cp /path/to/previous/prefixd /usr/local/bin/

# Start
sudo systemctl start prefixd

Important: Database migrations are forward-only. If a migration altered the schema, you must restore from backup to roll back. Migrations that only add tables or columns (using IF NOT EXISTS) are safe to roll back without a restore.


Checking Migration Status

After an upgrade, confirm all migrations applied:

# CLI
prefixdctl migrations

# Expected output:
# VERSION   NAME                            APPLIED AT
# -----------------------------------------------------------------
# 1         initial                         2026-01-15 10:00:00
# 2         operators_sessions              2026-01-15 10:00:00
# 3         raw_details                     2026-01-28 12:00:00
# 4         schema_migrations               2026-02-20 10:00:00
# 5         acknowledge                     2026-03-18 14:37:00
# 6         notification_preferences        2026-03-18 14:37:00
# 7         signal_groups                   2026-03-19 18:00:00
#
# 7 migration(s) applied

Version-Specific Notes

v0.13.0 -> v0.14.0

New: Multi-Signal Correlation Engine

This release adds the correlation engine, signal adapters, and correlation dashboard. No breaking changes — correlation is opt-in.

New config file: correlation.yaml

Create configs/correlation.yaml (or add a correlation: section to prefixd.yaml). If the file is absent, correlation is disabled and behavior is identical to v0.13.0.

# configs/correlation.yaml
enabled: true
window_seconds: 300
min_sources: 1          # Set to 2+ to require corroboration
confidence_threshold: 0.5
default_weight: 1.0
sources:
  fastnetmon:
    weight: 1.0
    type: detector
  alertmanager:
    weight: 0.8
    type: telemetry

With min_sources: 1, events flow through the correlation engine but mitigate immediately on the first signal — identical to pre-correlation behavior. Increase to 2+ to require corroboration from multiple detectors.

See configuration.md#correlation for full reference.

New database migration (007)

Migration 007 (signal_groups) runs automatically on startup. It adds:

  • signal_groups table (group_id, victim_ip, vector, window, confidence, status)
  • signal_group_events junction table
  • mitigations.signal_group_id nullable FK column
  • Two indexes for performance (victim/vector lookup, expiry sweep)

This is an additive migration — it only creates new tables and adds a nullable column. Safe to roll back without a database restore (the new tables/column will be ignored by v0.13.0).

New API endpoints
Endpoint Auth Description
GET /v1/signal-groups Yes List signal groups (cursor pagination, status/vector/date filters)
GET /v1/signal-groups/{id} Yes Signal group detail with contributing events
POST /v1/signals/alertmanager Yes Alertmanager v4 webhook adapter
POST /v1/signals/fastnetmon Yes FastNetMon native JSON webhook adapter
GET /v1/config/correlation Yes Correlation config (secrets redacted)
PUT /v1/config/correlation Admin Update correlation config (validates, writes YAML, hot-reloads)
New API response fields
  • GET /v1/mitigations and GET /v1/mitigations/{id} responses now include an optional correlation field on correlated mitigations. Contains signal_group_id, derived_confidence, source_count, corroboration_met, contributing_sources, and explanation. This field is null for mitigations created without correlation. Backward-compatible — clients that don't read this field are unaffected.
Signal adapter integration

If you use Alertmanager, point a webhook receiver at POST /v1/signals/alertmanager. The adapter maps Alertmanager labels/annotations to attack event fields:

Alertmanager Field Maps To
labels.instance or annotations.victim_ip victim_ip
labels.vector or annotations.vector vector
labels.severity (critical/warning/info) confidence (0.9/0.7/0.5)
annotations.bps, annotations.pps Traffic metrics
fingerprint Dedup key (idempotent)

If you use FastNetMon, point the webhook notify URL at POST /v1/signals/fastnetmon. The adapter classifies vector from the traffic breakdown and maps action to confidence (ban=0.9, partial_block=0.7, alert=0.5).

Both adapters feed events through the full pipeline (correlation → policy → guardrails → announce).

New Prometheus metrics
Metric Type Description
prefixd_signal_groups_total Counter Total signal groups created
prefixd_signal_group_sources Histogram Source count distribution per group
prefixd_correlation_confidence Histogram Derived confidence distribution
prefixd_corroboration_met_total Counter Groups that met corroboration threshold
prefixd_corroboration_timeout_total Counter Groups that expired without corroboration
Dashboard: Correlation page

The dashboard adds a /correlation page with three tabs:

  • Signals — Recent events with source, confidence, and group assignment
  • Groups — Signal groups with status, source count, confidence, corroboration
  • Config — Visual correlation configuration editor

Plus a signal group detail page at /correlation/groups/[id] and a correlation context section on the mitigation detail page.

Docker: Config volume now writable by default

The default docker-compose.yml now mounts ./configs:/etc/prefixd (writable) so the dashboard config editors work out of the box. Previously it was :ro. If you have a customized docker-compose.yml with :ro, the PUT /v1/config/correlation endpoint (and playbooks/alerting PUT) will return 500. Either remove :ro or edit configs on the host and use POST /v1/config/reload.

Upgrade steps
  1. Back up the database (as always)
  2. Add configs/correlation.yaml if you want correlation (optional — omit to keep existing behavior)
  3. Rebuild and restart: docker compose build && docker compose up -d
  4. Migration 007 runs automatically
  5. Verify: curl http://localhost/v1/config/correlation should return config (or defaults if no file)
  6. If using Alertmanager/FastNetMon, point webhook URLs at the new adapter endpoints
  7. Tune min_sources and confidence_threshold to taste

v0.11.0 -> v0.12.0

Breaking: Offset pagination removed

GET /v1/mitigations, GET /v1/events, and GET /v1/audit no longer accept the offset query parameter. Use cursor-based pagination instead. See ADR 016 for rationale.

Before (v0.11.0):

# Page 1
curl '/v1/mitigations?limit=50&offset=0'
# Page 2
curl '/v1/mitigations?limit=50&offset=50'

After (v0.12.0):

# Page 1 (no cursor = first page)
curl '/v1/mitigations?limit=50'
# Response includes: {"mitigations": [...], "next_cursor": "MjAyNi0w...", "has_more": true}

# Page 2 (pass next_cursor from previous response)
curl '/v1/mitigations?limit=50&cursor=MjAyNi0w...'

If you use prefixdctl, replace --offset with --cursor or omit it for the first page.

If you have custom scripts or integrations that page through results with offset, update them to use the next_cursor value from each response. The cursor is an opaque base64 string — do not construct it manually.

Breaking: Audit response shape changed

GET /v1/audit now returns a wrapped object instead of a bare array:

// Before: [{"id": "...", "action": "...", ...}, ...]
// After:  {"entries": [...], "count": 5, "next_cursor": "...", "has_more": false}

Other changes

  • New migration (005): Adds acknowledged_at and acknowledged_by columns to mitigations table. Runs automatically on startup. Uses IF NOT EXISTS so safe to re-run.
  • New migration (006): Adds notification_preferences table for per-operator toast settings. Runs automatically. FK to operators table.
  • New endpoint: POST /v1/mitigations/acknowledge for bulk acknowledging mitigations.
  • New endpoint: GET/PUT /v1/preferences for notification preferences (muted events, quiet hours). Quiet hours require both start and end or both null — partial configuration is rejected (400).
  • New query params: ?start=, ?end= (ISO 8601) for date range filtering on all list endpoints. ?acknowledged=true|false on mitigations.
  • Per-destination event routing: Alerting destinations now accept an optional events array to override the global event filter. Existing configs without per-destination events are unaffected (backward-compatible, see ADR 017).
  • Frontend: API hooks now return response objects ({mitigations, count, next_cursor, has_more}) instead of bare arrays. If you have custom frontend code consuming these hooks, update accordingly.
  • No config file changes required. Existing alerting.yaml files work as-is.

v0.10.1 -> v0.11.0

  • CVE gate in CI: cargo audit and bun audit now gate Docker publishing. CycloneDX SBOM generated on version tags.
  • Vendor capability matrix: New docs/vendors.md.
  • Security fixes: Next.js 16.1.7, undici 7.24.4, quinn-proto 0.11.14, rollup 4.59.0.
  • New features: Bulk withdraw (POST /v1/mitigations/withdraw), FlowSpec rule preview on mitigation detail page.
  • No database migrations required.
  • No breaking API changes.

v0.10.0 -> v0.10.1

  • FastNetMon integration fix: Deterministic event IDs replaced with UUIDs. Unban now queries GET /v1/mitigations?victim_ip=X then withdraws, instead of constructing a deterministic ID.
  • New query param: victim_ip on GET /v1/mitigations.
  • No database migrations required.

v0.9.1 -> v0.10.0

  • Playbook editor: New PUT /v1/config/playbooks endpoint (admin-only). Config page now has form-based and raw YAML editors for playbooks. Playbooks are saved with atomic write + .bak backup.
  • Interactive alerting config: New PUT /v1/config/alerting endpoint (admin-only). Alerting configuration moved from prefixd.yaml to standalone alerting.yaml (backward-compatible fallback). Config page alerting tab now supports add/edit/remove destinations with type-specific forms. Secrets preserved via *** sentinel on update.
  • SSRF protection: Webhook destination URLs now require HTTPS and reject localhost/private IPs.
  • Event cross-links: Mitigation detail page shows clickable triggering and last event links.
  • GHCR publishing: CI publishes Docker images to ghcr.io on push to main and version tags.
  • No database migrations required
  • No breaking API changes

v0.9.0 -> v0.9.1

  • Security fixes: Login brute-force throttle hardened (atomic check, bounded state), CIDR validation uses ipnet::IpNet, generic webhook headers redacted in API, alerting test endpoint admin-only, CSV formula regex strengthened
  • Webhook alerting: New src/alerting/ module with 7 destination types (Slack, Discord, Teams, Telegram, PagerDuty, OpsGenie, generic). Fire-and-forget dispatch with bounded concurrency (64 tasks). New endpoints: GET /v1/config/alerting, POST /v1/config/alerting/test
  • Dashboard: Alerting config tab, audit log detail expansion, customer/POP filters, timeseries range selector (1h/6h/24h/7d), active count sidebar badge, severity badges
  • New config section: Optional alerting: in prefixd.yaml (see example in config file)
  • No database migrations required
  • No breaking API changes

v0.8.5 -> v0.9.0

  • New table: schema_migrations (migration 004) -- tracks applied migrations
  • Reconciliation loop now pages through all active mitigations (previously capped at 1000)
  • New metrics: prefixd_reconciliation_active_count, prefixd_db_pool_connections
  • New endpoints: GET /v1/stats/timeseries, GET /v1/ip/{ip}/history
  • prefixdctl default endpoint changed from http://127.0.0.1:8080 to http://127.0.0.1 (nginx)
  • Timeseries bucketing fix -- sub-hour buckets now align correctly
  • No config file changes required