I specialize in building self-healing, scalable infrastructure and making complex distributed systems reliable. Currently focusing on Kubernetes-native automation and high-availability database architectures.
- Cloud & Infrastructure: AWS (EKS, EC2, S3), Oracle Cloud Infrastructure (OCI).
- Orchestration: Kubernetes (StatefulSets, Operators, Service Mesh).
- Infrastructure as Code: Terraform, Ansible.
- CI/CD & Automation: Python, Jenkins, GitHub Actions, ArgoCD,
- Databases: PostgreSQL (High Availability, Streaming Replication, CNPG Operator), MySQL Galera
- Monitoring: Prometheus, Grafana (Service Reliability & Alerting), Zabbix.
- HA-Database-on-K8s: Implementing 3-node PostgreSQL clusters with CloudNativePG, focusing on automated failover, leader election, and horizontal read-scaling.
- Chaos Engineering: Experimenting with system resilience by simulating node failures and verifying recovery point objectives (RPO).
- Python for SRE: Building automation scripts for infrastructure health checks and log analysis.
- AI-augmented workflows: Using LLMs locally (Ollama) and automation pipelines (N8N) to speed up incident triage, runbook generation, and infra documentation - keeping the boring parts automated.
- Experience: About 4 years in the Cloud/SRE and Private Cloud space.
- Learning: Currently mastering Distributed Systems at scale, Reliability engineering and Operations
- Fun Fact: Aspiring film analyst and Christopher Nolan fan.
- Reach me: [email protected] | Portfolio
"Simplicity is the prerequisite for reliability." — Edsger W. Dijkstra