The API provides the web interface and REST endpoints for managing the itsup infrastructure.
Purpose: Web-based management interface for infrastructure operations.
Technology: Python FastAPI application running as a host process (not containerized).
Access: https://api.srv.instrukt.ai
Port: 8080 (local only, proxied through Traefik)
The API runs as a host process for several reasons:
- Direct System Access: Needs to manage Docker, systemd services, and system files
- Zero-Downtime Deployments: Traefik runs on host network for scaling deployments
- Operational Flexibility: Easier debugging and log access during incidents
- Consistency: CLI and API share the same codebase and environment
# projects/itsup/ingress.yml
enabled: true
host: 192.168.1.x # Router IP (dynamic)
ingress:
- service: api
domain: api.srv.instrukt.ai
port: 8080
router: httpThis configuration:
- Has no
docker-compose.yml(no containers) - Skips artifact generation
- Still generates Traefik routing config for reverse proxy
Location: api/ directory (exact structure TBD)
Key Features:
- REST API for infrastructure operations
- Web UI for monitoring and management
- Real-time log streaming
- Project deployment triggers
- Health checks and status monitoring
Endpoints (examples, actual API may vary):
GET /health # Health check
GET /projects # List all projects
POST /projects/{name}/deploy # Deploy project
GET /projects/{name}/logs # Stream project logs
GET /stacks/{name}/status # Stack status
POST /stacks/{name}/restart # Restart stack
Start: bin/start-api.sh (systemd service recommended)
Stop: Kill Python process or use systemd
Logs: logs/api.log (rotated via logrotate)
# From project root
bin/start-api.shCreate /etc/systemd/system/itsup-api.service:
[Unit]
Description=itsUP Infrastructure API
After=network.target docker.service
[Service]
Type=simple
User=morriz
WorkingDirectory=/home/morriz/srv
ExecStart=/home/morriz/srv/bin/start-api.sh
Restart=always
RestartSec=10
StandardOutput=append:/home/morriz/srv/logs/api.log
StandardError=append:/home/morriz/srv/logs/api.log
[Install]
WantedBy=multi-user.targetEnable and start:
sudo systemctl daemon-reload
sudo systemctl enable itsup-api
sudo systemctl start itsup-api
sudo systemctl status itsup-apiRequires:
- Python virtual environment (
.venv/) secrets/itsup.txtfor environment variables- Proxy stack (Traefik for routing)
- Docker daemon
Start Order:
- DNS stack (creates proxynet network)
- Proxy stack (Traefik)
- API (can start anytime after proxy is up)
Loaded from secrets/itsup.txt:
# API-specific secrets
API_SECRET_KEY=...
API_ADMIN_TOKEN=...
# Shared infrastructure secrets
ROUTER_IP=...The API inherits environment from:
- Shell environment (from
bin/start-api.sh) secrets/itsup.txt(loaded vialib/data.py:get_env_with_secrets()).envfile (if present)
File: logs/api.log
Format: Structured JSON or plain text (application-dependent)
Rotation:
- Method:
copytruncate(Python process keeps writing to same file) - Size: 10M per rotation
- Keep: 5 rotations
- Compression: gzip (delayed)
View:
tail -f logs/api.log # Follow live logs
grep "ERROR" logs/api.log # Search for errors
zgrep "pattern" logs/api.log*.gz # Search compressed logsSee Logging Documentation for details.
API should implement authentication for all management endpoints:
- Admin token from
secrets/itsup.txt - Session-based auth for web UI
- API key auth for programmatic access
Recommended RBAC model:
- Admin: Full access (deploy, restart, configure)
- Operator: Read/monitor access + safe operations (logs, status)
- Viewer: Read-only access
- Bind Address: 127.0.0.1 (local only)
- External Access: Only via Traefik reverse proxy
- TLS: Terminated at Traefik (Let's Encrypt certificates)
- Rate Limiting: Should be configured via Traefik middleware
API must validate all inputs to prevent:
- Command injection (especially for project names, service names)
- Path traversal (file access endpoints)
- Resource exhaustion (unbounded log streaming)
Endpoint: GET /health
Traefik Configuration:
# In proxy/traefik/api-log.conf.yaml
http:
services:
itsup-api:
loadBalancer:
servers:
- url: "http://192.168.1.x:8080"
healthCheck:
path: /health
interval: 30s
timeout: 5sConsider adding metrics endpoints:
- Request rates and latencies
- Deployment success/failure counts
- Active project status
- Container resource usage
Check if running:
ps aux | grep "python.*api"
sudo systemctl status itsup-api # If using systemdCheck port binding:
netstat -tlnp | grep :8080
# Should show Python process listening on 127.0.0.1:8080Check logs:
tail -100 logs/api.log
# Look for startup errors, exceptionsVerify Traefik routing:
itsup proxy logs traefik | grep itsup-api
# Should show service registeredCheck ingress config:
cat projects/itsup/ingress.yml
# Verify domain, port, host IP correctTest direct access (should work):
curl http://localhost:8080/healthTest via Traefik (should work):
curl -H "Host: api.srv.instrukt.ai" http://localhost/healthPython processes can accumulate memory over time:
Solution: Restart the API
sudo systemctl restart itsup-apiPrevention: Consider setting memory limits in systemd:
[Service]
MemoryMax=512M
MemoryHigh=400MIf API stops logging after rotation:
Check copytruncate is enabled in /etc/logrotate.d/itsup:
/home/morriz/srv/logs/api.log {
copytruncate # Required for Python processes
...
}
Python processes don't handle log rotation signals, so copytruncate is mandatory.
# Activate venv
source .venv/bin/activate
# Load secrets
export $(grep -v '^#' secrets/itsup.txt | xargs)
# Run API directly
python -m api.main # Or however API is structuredapi/
├── main.py # FastAPI app entry point
├── routes/
│ ├── projects.py # Project management endpoints
│ ├── stacks.py # Stack management endpoints
│ └── health.py # Health check endpoint
├── models/
│ └── schemas.py # Pydantic models
└── services/
├── docker.py # Docker operations wrapper
└── deploy.py # Deployment logic
API should have comprehensive tests:
# Unit tests
bin/test.sh # Runs all *_test.py files
# Integration tests (if API has them)
pytest api/tests/integration/- WebSocket Support: Real-time log streaming and status updates
- Metrics Collection: Prometheus-compatible metrics endpoint
- Event Streaming: Server-sent events for deployment progress
- Audit Logging: Track all management operations with user attribution
- Role-Based Access: Fine-grained permission system
- API Documentation: OpenAPI/Swagger UI for interactive API docs