This directory contains the configuration for Ampel's observability stack.
monitoring/
├── README.md # This file
├── prometheus.yml # Prometheus configuration
├── alerts/
│ └── ampel.yml # Alert rules
├── grafana/
│ ├── datasources/
│ │ └── prometheus.yml # Prometheus datasource
│ └── dashboards/
│ ├── ampel-overview.json # Main dashboard
│ └── dashboard-provider.yml
# Start monitoring stack
make monitoring-up
# Access Grafana
open http://localhost:3000 # admin/admin
# Access Prometheus
open http://localhost:9090
# View logs
make monitoring-logs
# Stop monitoring
make monitoring-downMetrics storage and querying engine. Scrapes metrics from:
- Ampel API (:8080/metrics)
- Ampel Worker (:8081/metrics)
- PostgreSQL Exporter (:9187)
- Redis Exporter (:9121)
Visualization and dashboarding. Pre-configured with:
- Prometheus datasource
- Ampel Overview dashboard
- Default credentials: admin/admin
- postgres-exporter (port 9187) - PostgreSQL database metrics
- redis-exporter (port 9121) - Redis cache metrics
Log aggregation service for centralized logging.
http_requests_total{method, path, status}- Total requestshttp_request_duration_seconds{method, path, status}- Request duration histogram
Add custom metrics in your Rust code:
use metrics::{counter, histogram, gauge};
// Increment counter
counter!("pull_requests_synced_total",
"provider" => "github",
"status" => "success"
).increment(1);
// Record histogram
histogram!("sync_duration_seconds",
"provider" => "github"
).record(duration.as_secs_f64());
// Set gauge
gauge!("active_repositories").set(count as f64);Configured alerts in alerts/ampel.yml:
- HighErrorRate - Triggers when error rate >5% for 5 minutes
- HighLatency - Triggers when P95 latency >1s for 10 minutes
- DatabaseDown - Triggers when PostgreSQL is unreachable
- HighDatabaseConnections - Triggers when connections >80
- ServiceDown - Triggers when service is unavailable for 2 minutes
Main dashboard showing:
- HTTP request rate by endpoint
- Request duration (P95)
- HTTP status code distribution
- Database connections
- Active pull requests
- Open Grafana at http://localhost:3000
- Create new dashboard
- Add panels with PromQL queries
- Export JSON to
grafana/dashboards/
Example PromQL queries:
# Request rate
rate(http_requests_total[5m])
# Error rate percentage
(sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))) * 100
# P95 latency
histogram_quantile(0.95,
rate(http_request_duration_seconds_bucket[5m])
)
# Database connections by database
pg_stat_database_numbackends
Edit prometheus.yml:
global:
scrape_interval: 15s # How often to scrape targets
evaluation_interval: 15s # How often to evaluate rulesDefault retention: 15 days (configurable in docker-compose)
To change, update command in docker/docker-compose.monitoring.yml:
prometheus:
command:
- '--storage.tsdb.retention.time=30d'
- '--storage.tsdb.retention.size=10GB'To add alert notifications:
- Add alertmanager service to docker-compose
- Update
prometheus.ymlalerting section - Configure notification channels (email, Slack, PagerDuty)
Example alertmanager config:
receivers:
- name: 'team-notifications'
slack_configs:
- api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
channel: '#alerts'Check targets page: http://localhost:9090/targets
Common issues:
- Service not exposing /metrics endpoint
- Wrong port in prometheus.yml
- Network connectivity (check docker network)
- Verify Prometheus datasource: Configuration > Data Sources
- Check data exists: Explore > Run query
- Verify time range matches your data
Reduce scrape frequency or retention:
global:
scrape_interval: 30s # Increase from 15s
# Or add retention limits
--storage.tsdb.retention.time=7d
--storage.tsdb.retention.size=5GB- Change Grafana admin password
- Enable authentication on Prometheus
- Restrict metrics endpoint to monitoring network
- Use TLS for all connections
- Set up alerting with on-call rotation
- Configure backup for Prometheus data
For Fly.io deployments, metrics are automatically available:
# View Fly.io metrics
fly dashboard -a ampel-api
# Configure /metrics endpoint
# Add to fly.toml:
[metrics]
port = 8080
path = "/metrics"See docs/observability/OBSERVABILITY.md for details.