ResiliKube is a lightweight framework for automating fault tolerance testing in Kubernetes. It provides a CLI implemented in Python and uses shell scripts for direct cluster operations.
├── cli/
│ ├── init.py
│ ├── main.py
│ ├── cluster_initialization.py
│ ├── cluster_cleaning.py
│ ├── pod_failure_recovery_test.py
│ ├── rolling_update_and_rollback.py
│ ├── workflow.py
│ └── utils.py
├── chaos-mesh/
│ ├── pod-failure.yaml
│ ├── network-partition.yaml
├── config/
│ └── config.properties
├── scripts/
│ ├── check_status.sh
│ ├── initialize_cluster.sh
│ ├── setup_chaos_mesh.sh
│ ├── setup_monitoring.sh
│ ├── inject_fault.sh
│ ├── create_namespace.sh
│ ├── deploy_nginx.sh
│ ├── start_rollout.sh
│ ├── start_rollback.sh
├── setup.py
└── README.md
- Python 3.x
- Minikube
- kubectl
- Helm
- Git Bash (for running shell scripts on Windows)
pip install -e .reskube initreskube clean
reskube test_pod_failure
reskube rolling_update_and_rollback
reskube workflow pod-failure
reskube workflow rolling-update-and-rollback
bash scripts/setup_chaos_mesh.sh
bash scripts/setup_monitoring.sh
bash scripts/check_status.sh
Update the config/config.properties file to configure Minikube and Git Bash (if on Windows)
- Run the experiment workflow:
reskube workflow pod-failure
- Observe the results in Grafana (http://localhost:3000) and the NGINX logs.
- Run the experiment workflow:
reskube workflow rolling-update-and-rollback
- Observe the results in Grafana (http://localhost:3000) and the NGINX logs.
- Run the experiment workflow:
reskube workflow persistence
- Observe the results in Grafana (http://localhost:3000) and query data from PostgreSQL to check persistence.