A machine learning experiment management system with a microservices architecture, featuring Kafka-based messaging and three-tier service separation.
GigaEvo Platform consists of three main components:
- Role: Experiment orchestration and coordination
- Technology: FastAPI, Kafka, PostgreSQL, Redis
- Features:
- Kafka integration for async messaging
- Experiment lifecycle management
- Configuration storage and retrieval
- uv-based dependency management
- Role: Task execution with GigaEvolve integration
- Technology: FastAPI, GigaEvolve tools
- Features:
- Experiment code execution
- Results visualization
- Best program extraction
- Background task processing
- Role: Gradio-based user interface
- Technology: Gradio, Plotly, Requests
- Features:
- Interactive experiment creation
- Real-time progress monitoring
- Results visualization
- System status dashboard
- Docker & Docker Compose
- Python 3.12+ (for local development)
- uv (recommended) or pip
GigaEvo platform reads all LLM settings from a single repo-level file: llm_models.yml. Create llm_models.yml from the llm_models.yml.example template and fill in your credentials.
GigaEvo Platform uses the deploy.sh script with Docker Compose for service orchestration:
make deploy
# Or directly:
./deploy.sh deployThis will deploy with automated health checks:
- Infrastructure: PostgreSQL, Kafka (KRaft), Redis (2 instances), MinIO
- Applications: Master API, Runner API, WebUI
- Networking: Docker network and shared volumes
- Health Monitoring: Automatic service health verification
make dev# Run services locally for development (requires infrastructure running)
make master-api # Master API on port 8000
make runner-api # Runner API on port 8001
make web-ui # WebUI on port 7860# Check all services status
make status
# Or:
./deploy.sh status
# Stop all services
make stop
# Or:
./deploy.sh stop
# Remove deploy containers and volumes
./deploy.sh clean
# Full cleanup for both dev and deploy
make clean
# Restart specific service
make restart SERVICE=master-api
make restart SERVICE=runner-api
make restart SERVICE=web-ui
make restart SERVICE=kafka
# View service logs
./deploy.sh logs [service-name]- WebUI:
http://localhost:${WEB_UI_HOST_PORT}(default7860) - Master API:
http://localhost:${MASTER_API_HOST_PORT}(default8000) - Runner API:
http://localhost:${RUNNER_API_HOST_PORT}(default8001) - MinIO Console:
http://localhost:${MINIO_CONSOLE_HOST_PORT}(default9001; credentials come fromMINIO_ROOT_USER/MINIO_ROOT_PASSWORD) - Kafka Broker:
localhost:${KAFKA_HOST_PORT}(default9092) - Kafka UI: available in dev mode at
http://localhost:${KAFKA_UI_HOST_PORT}(default8080)
By default, the platform starts with a single runner instance. To run multiple experiments in parallel, increase the runner pool size:
# In .env file (or export before running `make deploy`)
RUNNER_POOL_SIZE=3 # Number of runner instances (default: 1)The system automatically generates a docker-compose.runner-pool.*.generated.yml file with N runner services.
All generated runner services reuse one shared runner image tagged from COMPOSE_PROJECT_NAME.
The main deployment-related environment variables for this repo are:
COMPOSE_PROJECT_NAMEfor Compose resource naming and the shared runner image tagGIGAEVO_NETWORK_NAMEfor the shared Docker network used by the deploy stackGIGAEVO_CORE_REPO_URL/GIGAEVO_CORE_REFfor the bakedgigaevo-corerunner image inputsMEMORY_API_URLfor the externalgigaevo-memoryAPI endpoint as seen from runner containersMINIO_ROOT_USERandMINIO_ROOT_PASSWORDfor MinIO server and platform storage access*_HOST_PORTvalues for published service ports
COMPOSE_PROJECT_NAME is required for supported container flows.
Example:
COMPOSE_PROJECT_NAME=gigaevo-platform
GIGAEVO_NETWORK_NAME=gigaevo-platform-network
MINIO_ROOT_USER=gigaevoadmin
MINIO_ROOT_PASSWORD=change-this-minio-password
POSTGRES_HOST_PORT=5432
REDIS_HOST_PORT=6379
REDIS_GIGAVOLVE_HOST_PORT=6380
KAFKA_HOST_PORT=9092
KAFKA_DOCKER_HOST_PORT=29092
MASTER_API_HOST_PORT=8000
RUNNER_API_HOST_PORT=8001
WEB_UI_HOST_PORT=7860
MINIO_API_HOST_PORT=9000
MINIO_CONSOLE_HOST_PORT=9001
KAFKA_UI_HOST_PORT=8080
GIGAEVO_CORE_REPO_URL=https://github.com/FusionBrainLab/gigaevo-core
GIGAEVO_CORE_REF=main
MEMORY_API_URL=http://host.docker.internal:8002MEMORY_API_URL must be reachable from runner containers. For local development, http://host.docker.internal:8002 is the recommended default for an externally started gigaevo-memory API.
The WebUI βRunner Instancesβ tab calls the Master API (/api/v1/instances/*) to start/stop/restart runners and fetch container logs.
With a Compose-managed runner pool (RUNNER__MANAGE_CONTAINERS=false, the default in make dev/make deploy), Master controls the already-created runner-api-N containers via Docker.
Requirements:
master-apihas Docker CLI access via the host socket: mount/var/run/docker.sock(and ensure the container user can read/write it; otherwise runmaster-apias root or align the socket group).- Runner containers are started by Docker Compose (Master finds them via
com.docker.compose.service=runner-api-Nlabels; keepCOMPOSE_PROJECT_NAMEandGIGAEVO_NETWORK_NAMEaligned with this deployment).
Security note: mounting the Docker socket grants the master-api container root-equivalent control over the host Docker engine.
Quick manual checks (requires the stack running):
bash ./smoke_runner_instances.sh status
bash ./smoke_runner_instances.sh health
bash ./smoke_runner_instances.sh logs runner-1 100POST /api/v1/experiments/- Initialize experimentGET /api/v1/experiments/- Get list of experimentsGET /api/v1/experiments/{experiment_id}/status- Request statusPOST /api/v1/experiments/{experiment_id}/start- Start experimentPOST /api/v1/experiments/{experiment_id}/stop- Stop experimentGET /api/v1/experiments/{experiment_id}/results- Get results
POST /api/v1/experiments/{experiment_id}/upload- Load experiment codePOST /api/v1/experiments/{experiment_id}/start- Start experimentPOST /api/v1/experiments/{experiment_id}/stop- Stop experimentGET /api/v1/experiments/{experiment_id}/status- Get execution statusGET /api/v1/experiments/{experiment_id}/visualization- Get visualizationGET /api/v1/experiments/{experiment_id}/best-program- Get best programGET /api/v1/experiments/{experiment_id}/logs- Get logs (optional)
The system uses these Kafka topics for coordination:
experiment-config- Experiment configuration receivedexperiment-prepared- Experiment prepared for executionexperiment-started- Experiment execution startedexperiment-stopped- Experiment execution stoppedrunner-status- Runner status updates
# Verify required local tools (docker, docker compose, python3)
make check-tools
# Install all dependencies
make install
# Run services individually (infrastructure must be running first)
make master-api # Master API on port 8000
make runner-api # Runner API on port 8001
make web-ui # WebUI on port 7860# Development with hot reload (foreground)
make dev
# Stop containers
make stop
# Stop foreground session
# Ctrl+C
# Full cleanup (including volumes)
make clean-devmake check-runner-compose # Validate the pool resolves one shared runner image
make lint # Run linting with ruff
make format # Format code with ruff
make test # Run tests (individual components)make db-reset # Drop and recreate database
make db-migrate # Run database migrations-
Port Conflicts: Ensure these ports are free:
- 5432: PostgreSQL
- 6379, 6380: Redis (2 instances)
- 7860: WebUI
- 8000: Master API
- 8001: Runner API
- 9000, 9001: MinIO
- 9092, 29092: Kafka
-
Deployment Issues:
# Check deployment status ./deploy.sh status # Or: make status # View service logs ./deploy.sh logs [service-name] # Or for all services: ./deploy.sh logs # Restart specific service make restart SERVICE=master-api make restart SERVICE=runner-api make restart SERVICE=web-ui make restart SERVICE=kafka
-
Service Health Check Failures:
# The deploy script automatically checks service health # If services fail to start, check logs: ./deploy.sh logs postgres ./deploy.sh logs kafka ./deploy.sh logs master-api
-
Database Connection Issues:
# Reset database (use after schema changes) make db-reset # Check PostgreSQL logs ./deploy.sh logs postgres
Key environment variables for Master API:
DATABASE__URL- PostgreSQL connection stringKAFKA__BOOTSTRAP_SERVERS- Kafka bootstrap serversREDIS_URL- Redis connection URLSTORAGE__ENDPOINT_URL- MinIO endpointSTORAGE__ACCESS_KEY- MinIO access keySTORAGE__SECRET_KEY- MinIO secret key
In the provided Compose setup, STORAGE__ACCESS_KEY and STORAGE__SECRET_KEY are populated from MINIO_ROOT_USER and MINIO_ROOT_PASSWORD.
The platform uses a modern microservices architecture with:
- Kafka Message Broker - Asynchronous service communication with topics for experiment coordination
- Separate Docker Compositions - Modular deployment with infrastructure and application services
- Health Monitoring - Automated service health checks and recovery
- Resource Isolation - Dedicated Redis instances and MinIO storage
- uv Dependency Management - Fast package installation and dependency caching
- deploy.sh: Main deployment script with health checks and service management
- docker-compose.kafka.yml: Core infrastructure services
- docker-compose.*.yml: Individual application service configurations
- Makefile: Development commands and shortcuts
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests and linting:
make test && make lint - Submit a pull request
- For deployment from a prebuilt image bundle, see docs/bundle.md.
MIT License - see LICENSE file for details.