monitoring

Ampel Monitoring Configuration

This directory contains the configuration for Ampel's observability stack.

Directory Structure

monitoring/
├── README.md                    # This file
├── prometheus.yml               # Prometheus configuration
├── alerts/
│   └── ampel.yml                # Alert rules
├── grafana/
│   ├── datasources/
│   │   └── prometheus.yml       # Prometheus datasource
│   └── dashboards/
│       ├── ampel-overview.json  # Main dashboard
│       └── dashboard-provider.yml

Quick Start

# Start monitoring stack
make monitoring-up

# Access Grafana
open http://localhost:3000  # admin/admin

# Access Prometheus
open http://localhost:9090

# View logs
make monitoring-logs

# Stop monitoring
make monitoring-down

Services

Prometheus (port 9090)

Metrics storage and querying engine. Scrapes metrics from:

Ampel API (:8080/metrics)
Ampel Worker (:8081/metrics)
PostgreSQL Exporter (:9187)
Redis Exporter (:9121)

Grafana (port 3000)

Visualization and dashboarding. Pre-configured with:

Prometheus datasource
Ampel Overview dashboard
Default credentials: admin/admin

Exporters

postgres-exporter (port 9187) - PostgreSQL database metrics
redis-exporter (port 9121) - Redis cache metrics

Loki (port 3100)

Log aggregation service for centralized logging.

Metrics

HTTP Metrics

http_requests_total{method, path, status} - Total requests
http_request_duration_seconds{method, path, status} - Request duration histogram

Custom Application Metrics

Add custom metrics in your Rust code:

use metrics::{counter, histogram, gauge};

// Increment counter
counter!("pull_requests_synced_total",
    "provider" => "github",
    "status" => "success"
).increment(1);

// Record histogram
histogram!("sync_duration_seconds",
    "provider" => "github"
).record(duration.as_secs_f64());

// Set gauge
gauge!("active_repositories").set(count as f64);

Alerts

Configured alerts in alerts/ampel.yml:

HighErrorRate - Triggers when error rate >5% for 5 minutes
HighLatency - Triggers when P95 latency >1s for 10 minutes
DatabaseDown - Triggers when PostgreSQL is unreachable
HighDatabaseConnections - Triggers when connections >80
ServiceDown - Triggers when service is unavailable for 2 minutes

Dashboards

Ampel Overview

Main dashboard showing:

HTTP request rate by endpoint
Request duration (P95)
HTTP status code distribution
Database connections
Active pull requests

Creating Custom Dashboards

Open Grafana at http://localhost:3000
Create new dashboard
Add panels with PromQL queries
Export JSON to grafana/dashboards/

Example PromQL queries:

# Request rate
rate(http_requests_total[5m])

# Error rate percentage
(sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))) * 100

# P95 latency
histogram_quantile(0.95,
  rate(http_request_duration_seconds_bucket[5m])
)

# Database connections by database
pg_stat_database_numbackends

Configuration

Prometheus Scrape Intervals

Edit prometheus.yml:

global:
  scrape_interval: 15s # How often to scrape targets
  evaluation_interval: 15s # How often to evaluate rules

Retention Settings

Default retention: 15 days (configurable in docker-compose)

To change, update command in docker/docker-compose.monitoring.yml:

prometheus:
  command:
    - '--storage.tsdb.retention.time=30d'
    - '--storage.tsdb.retention.size=10GB'

Alertmanager (Optional)

To add alert notifications:

Add alertmanager service to docker-compose
Update prometheus.yml alerting section
Configure notification channels (email, Slack, PagerDuty)

Example alertmanager config:

receivers:
  - name: 'team-notifications'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
        channel: '#alerts'

Troubleshooting

Prometheus not scraping

Check targets page: http://localhost:9090/targets

Common issues:

Service not exposing /metrics endpoint
Wrong port in prometheus.yml
Network connectivity (check docker network)

Grafana dashboard empty

Verify Prometheus datasource: Configuration > Data Sources
Check data exists: Explore > Run query
Verify time range matches your data

High memory usage

Reduce scrape frequency or retention:

global:
  scrape_interval: 30s  # Increase from 15s

# Or add retention limits
--storage.tsdb.retention.time=7d
--storage.tsdb.retention.size=5GB

Production Deployment

Security Checklist

Change Grafana admin password
Enable authentication on Prometheus
Restrict metrics endpoint to monitoring network
Use TLS for all connections
Set up alerting with on-call rotation
Configure backup for Prometheus data

Fly.io Integration

For Fly.io deployments, metrics are automatically available:

# View Fly.io metrics
fly dashboard -a ampel-api

# Configure /metrics endpoint
# Add to fly.toml:
[metrics]
  port = 8080
  path = "/metrics"

See docs/observability/OBSERVABILITY.md for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Ampel Monitoring Configuration

Directory Structure

Quick Start

Services

Prometheus (port 9090)

Grafana (port 3000)

Exporters

Loki (port 3100)

Metrics

HTTP Metrics

Custom Application Metrics

Alerts

Dashboards

Ampel Overview

Creating Custom Dashboards

Configuration

Prometheus Scrape Intervals

Retention Settings

Alertmanager (Optional)

Troubleshooting

Prometheus not scraping

Grafana dashboard empty

High memory usage

Production Deployment

Security Checklist

Fly.io Integration

Resources

Name		Name	Last commit message	Last commit date
parent directory ..
alerts		alerts
grafana		grafana
README.md		README.md
prometheus.yml		prometheus.yml

FilesExpand file tree

monitoring

Directory actions

More options

Directory actions

More options

Latest commit

History

monitoring

Folders and files

parent directory

README.md

Ampel Monitoring Configuration

Directory Structure

Quick Start

Services

Prometheus (port 9090)

Grafana (port 3000)

Exporters

Loki (port 3100)

Metrics

HTTP Metrics

Custom Application Metrics

Alerts

Dashboards

Ampel Overview

Creating Custom Dashboards

Configuration

Prometheus Scrape Intervals

Retention Settings

Alertmanager (Optional)

Troubleshooting

Prometheus not scraping

Grafana dashboard empty

High memory usage

Production Deployment

Security Checklist

Fly.io Integration

Resources