Version: 1.0 Last Updated: 2025-12-22
- Quick Reference
- Deployment Procedures
- Scaling Operations
- Monitoring and Health Checks
- Database Operations
- Troubleshooting
- Emergency Procedures
# Check app status
fly status --app ampel-api
fly status --app ampel-worker
fly status --app ampel-frontend
# View logs (real-time)
fly logs --app ampel-api -f
# Check health
fly checks list --app ampel-api
# SSH into machine
fly ssh console --app ampel-api
# Scale machines
fly scale count 2 --app ampel-api
# Restart app
fly apps restart ampel-api- Frontend: https://ampel-frontend.fly.dev
- API: https://ampel-api.fly.dev
- API Health: https://ampel-api.fly.dev/health
- API Docs: https://ampel-api.fly.dev/api/docs
- Fly.io Dashboard: https://fly.io/dashboard
- API App: https://fly.io/apps/ampel-api
- Worker App: https://fly.io/apps/ampel-worker
- Frontend App: https://fly.io/apps/ampel-frontend
- Database: https://fly.io/apps/ampel-db
- Redis: https://fly.io/apps/ampel-redis
- Push code to
productionbranch (or merge PR intoproduction) - GitHub Actions automatically runs tests
- If tests pass, deploys to Fly.io
- Monitor deployment:
fly logs --app ampel-api -f
Note: The deploy workflow only triggers on the
productionbranch to prevent accidental deployments. Use manual workflow dispatch for staging deployments or when deploying from other branches.
# Deploy API
fly deploy --app ampel-api --config fly.api.toml --remote-only
# Deploy Worker
fly deploy --app ampel-worker --config fly.worker.toml --remote-only
# Deploy Frontend
fly deploy --app ampel-frontend --config fly.frontend.toml --remote-only# Deploy without restart
fly deploy --app ampel-api --stage
# Later, restart to apply
fly apps restart ampel-api# List recent releases
fly releases --app ampel-api
# Rollback to previous version
fly releases rollback --app ampel-api --version <previous-version>
# Verify rollback
fly status --app ampel-api
fly logs --app ampel-api# Run migrations after API deployment
fly ssh console --app ampel-api -C "/app/ampel-api migrate run"
# Check migration status
fly ssh console --app ampel-api -C "/app/ampel-api migrate status"
# Rollback migration (if needed)
fly ssh console --app ampel-api -C "/app/ampel-api migrate down"# Scale API to 3 machines
fly scale count 3 --app ampel-api
# Scale Worker to 2 machines
fly scale count 2 --app ampel-worker
# Scale Frontend to 2 machines
fly scale count 2 --app ampel-frontend
# Check current count
fly status --app ampel-api# Upgrade to performance-1x (1 dedicated vCPU, 2GB RAM)
fly scale vm performance-1x --app ampel-api
# Downgrade to shared-cpu-1x (1 shared vCPU, 256MB RAM)
fly scale vm shared-cpu-1x --app ampel-api
# Increase memory only
fly scale memory 512 --app ampel-api
# Check available VM sizes
fly platform vm-sizesEdit fly.toml:
[scaling]
min_count = 1
max_count = 5
[http_service]
auto_stop_machines = true
auto_start_machines = trueThen redeploy:
fly deploy --app ampel-api# Add a machine in Frankfurt
fly scale count 1 --region fra --app ampel-api
# List machines by region
fly machines list --app ampel-api
# Remove machine by ID
fly machine destroy <machine-id> --app ampel-api# Check all health checks
fly checks list --app ampel-api
# Watch health checks (every 10s)
watch -n 10 'fly checks list --app ampel-api'
# Test health endpoint manually
curl https://ampel-api.fly.dev/health# Real-time logs
fly logs --app ampel-api -f
# Last 100 lines
fly logs --app ampel-api
# Filter by machine
fly logs --app ampel-api --machine <machine-id>
# Search logs
fly logs --app ampel-api | grep ERROR# Check VM metrics
fly vm status --app ampel-api
# Monitor metrics (if configured)
curl https://ampel-api.fly.dev/metrics# Connect to database
fly postgres connect --app ampel-db
# Check database metrics
fly postgres db list --app ampel-db
# Check slow queries
fly ssh console --app ampel-db
# Then in psql:
# SELECT * FROM pg_stat_activity WHERE state = 'active';# Connect to Redis
fly redis connect --app ampel-redis
# Check Redis info
fly ssh console --app ampel-api -C "redis-cli -u \$REDIS_URL INFO"
# Monitor Redis
fly ssh console --app ampel-api -C "redis-cli -u \$REDIS_URL MONITOR"# List available backups (Managed Postgres has automatic backups)
fly postgres db list --app ampel-db
# Create manual backup
fly postgres db backup --app ampel-db
# Download backup
fly ssh console --app ampel-db -C "pg_dump ampel" > ampel_backup_$(date +%Y%m%d).sql# Restore from backup
fly postgres db restore --app ampel-db --backup <backup-id>
# Restore from local file
fly ssh console --app ampel-db < ampel_backup_20251222.sql# Connect to database
fly postgres connect --app ampel-db
# Run single query
fly ssh console --app ampel-db -C "psql ampel -c 'SELECT COUNT(*) FROM users;'"
# Execute SQL file
fly ssh console --app ampel-db < migration.sql# Connect to database
fly postgres connect --app ampel-db
# Vacuum database
VACUUM ANALYZE;
# Reindex
REINDEX DATABASE ampel;
# Check database size
SELECT pg_size_pretty(pg_database_size('ampel'));
# Check table sizes
SELECT
schemaname || '.' || tablename AS table,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS size
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 10;Symptoms: App shows as unhealthy, logs show startup errors
Steps:
-
Check logs for errors:
fly logs --app ampel-api
-
Verify environment variables:
fly secrets list --app ampel-api
-
Test locally with same environment:
docker build -f Dockerfile.api -t ampel-api . docker run -e DATABASE_URL=... ampel-api -
SSH into machine and debug:
fly ssh console --app ampel-api /app/ampel-api --version
Symptoms: Logs show "connection refused" or "timeout"
Steps:
-
Verify DATABASE_URL is set:
fly secrets list --app ampel-api | grep DATABASE -
Check database is running:
fly status --app ampel-db
-
Test connection from API machine:
fly ssh console --app ampel-api -C "psql \$DATABASE_URL -c 'SELECT 1;'" -
Verify Flycast address:
fly postgres db list --app ampel-db # Should show .flycast address
Symptoms: App crashes with OOM errors
Steps:
-
Check current memory usage:
fly vm status --app ampel-api
-
Review logs for memory leaks:
fly logs --app ampel-api | grep -i "out of memory"
-
Scale up memory:
fly scale memory 512 --app ampel-api
-
Profile application locally for memory leaks
Symptoms: API responds slowly, timeouts
Steps:
-
Check database query performance:
fly postgres connect --app ampel-db # Run EXPLAIN ANALYZE on slow queries -
Check Redis hit rate:
fly redis connect --app ampel-redis INFO stats
-
Review API logs for slow endpoints:
fly logs --app ampel-api | grep -i "slow"
-
Consider scaling:
fly scale count 2 --app ampel-api
Symptoms: Machines marked unhealthy, traffic not routed
Steps:
-
Check health endpoint manually:
curl -v https://ampel-api.fly.dev/health
-
Review health check configuration in
fly.toml:[[http_service.checks]] grace_period = "10s" # Increase if app is slow to start interval = "30s" timeout = "5s" # Increase if endpoint is slow
-
Check logs during health check failures:
fly logs --app ampel-api -f
-
Test health endpoint from within machine:
fly ssh console --app ampel-api -C "curl http://localhost:8080/health"
Severity: Critical
Steps:
-
Check all app statuses:
fly status --app ampel-api fly status --app ampel-worker fly status --app ampel-frontend fly status --app ampel-db
-
Review logs for all apps:
fly logs --app ampel-api fly logs --app ampel-worker fly logs --app ampel-frontend
-
Restart all apps:
fly apps restart ampel-api fly apps restart ampel-worker fly apps restart ampel-frontend
-
If database is down, contact Fly.io support:
fly postgres db list --app ampel-db # If down, create support ticket -
If restart fails, rollback to last known good version:
fly releases rollback --app ampel-api
Severity: Critical
Steps:
-
Stop all database writes:
fly scale count 0 --app ampel-api fly scale count 0 --app ampel-worker
-
Restore from latest backup:
fly postgres db list --app ampel-db fly postgres db restore --app ampel-db --backup <latest-backup-id>
-
Verify data integrity:
fly postgres connect --app ampel-db # Run validation queries -
Restart apps:
fly scale count 1 --app ampel-api fly scale count 1 --app ampel-worker
Severity: Critical
Steps:
-
Rotate all secrets immediately:
# Generate new secrets NEW_JWT=$(openssl rand -hex 32) NEW_ENCRYPTION=$(openssl rand -hex 32) # Set new secrets fly secrets set --app ampel-api JWT_SECRET="$NEW_JWT" fly secrets set --app ampel-api ENCRYPTION_KEY="$NEW_ENCRYPTION" fly secrets set --app ampel-worker ENCRYPTION_KEY="$NEW_ENCRYPTION"
-
Review access logs:
fly logs --app ampel-api | grep -E "(POST|PUT|DELETE)"
-
Check database for suspicious activity:
fly postgres connect --app ampel-db # Review audit tables -
Notify users to rotate their Personal Access Tokens (PATs) in provider settings
-
Force logout all users (clear sessions in Redis):
fly redis connect --app ampel-redis FLUSHDB
-
Document incident and notify stakeholders
Severity: High
Steps:
-
Scale up immediately:
fly scale count 3 --app ampel-api fly scale vm performance-1x --app ampel-api
-
Identify bottleneck:
# Check database fly postgres connect --app ampel-db SELECT * FROM pg_stat_activity; # Check Redis fly redis connect --app ampel-redis INFO stats # Check API logs fly logs --app ampel-api | grep -i "slow\|timeout\|error"
-
Apply immediate fixes:
- Add database indexes for slow queries
- Increase cache TTL
- Enable rate limiting
-
Monitor improvement:
watch -n 5 'fly checks list --app ampel-api'
- Primary: [Team Lead Name] - [Phone/Email]
- Secondary: [Senior Engineer Name] - [Phone/Email]
- Escalation: [CTO/VP Engineering] - [Phone/Email]
- Fly.io Support: https://fly.io/docs/about/support/
- Fly.io Community: https://community.fly.io/
- Emergency Contact: Email [email protected] with "URGENT" in subject
To stop compute costs while preserving configuration:
Via GitHub Actions (Recommended):
- Go to Actions → "Undeploy from Fly.io"
- Click "Run workflow"
- Set
scale_to_zero: true - Type "DESTROY" in confirmation field
- Run workflow
Via CLI:
# Scale all apps to zero instances
fly scale count 0 --app ampel-api --yes
fly scale count 0 --app ampel-worker --yes
fly scale count 0 --app ampel-frontend --yes
# Verify
fly status --app ampel-apiTo restore:
fly scale count 1 --app ampel-api
fly scale count 1 --app ampel-worker
fly scale count 1 --app ampel-frontendOr run the deploy workflow with force_deploy: true.
Via GitHub Actions:
- Go to Actions → "Undeploy from Fly.io"
- Click "Run workflow"
- Select environment (
stagingorproduction) - Check components to destroy
- Type "DESTROY" in confirmation field
- Run workflow
Via CLI:
# Destroy apps (preserves database)
fly apps destroy ampel-api --yes
fly apps destroy ampel-worker --yes
fly apps destroy ampel-frontend --yes
# DANGEROUS: Destroy database (all data lost)
fly postgres destroy ampel-db --yes
# DANGEROUS: Destroy Redis
fly redis destroy ampel-redis --yes# Check billing
fly billing show
# Check app resource usage
fly vm status --app ampel-api
# List all apps in org
fly apps list --org ampel-orgRunbook Version: 1.1 Next Review Date: 2025-03-22