Harshit Mani
Tripathi
SRE / Infrastructure Engineer
I automate infrastructure, observe everything,
and break things on purpose.
About
I'm a DevOps intern at Unstop (since March 2026), where I work on infrastructure automation and cloud deployments. I also maintain a hybrid homelab where I design and run infrastructure end to end. Most experiments start there before they become repeatable systems.
My setup is hybrid: bare metal Proxmox on-prem connected to Oracle Cloud Infrastructure through a WireGuard mesh. I use it to test clustering, networking, identity, and observability at realistic scale.
I am focused on making infrastructure boring in the best way: automated, observable, and reproducible from a clean checkout. If a fix requires repeated manual SSH steps, the system is not finished yet.
Experience
-
March 2026 · Present
DevOps Intern
Unstop- Built a Kubernetes-based developer platform POC on AWS EKS using Terraform and Helm, provisioning isolated workspaces for 100+ users
- Debugged a silent PVC failure traced to missing IRSA permissions on the EBS CSI driver
- Refactored Terraform modules to decouple cluster infrastructure from application layer
- Work across AWS, Kubernetes, and infrastructure-as-code daily
Skills
Infrastructure & Cloud
IaC & Automation
SRE & Kubernetes
Observability
Networking & Identity
Languages
Projects
-
Proxmox and Oracle Cloud Infrastructure hybrid setup connected via a WireGuard mesh. Keycloak handles SSO across all services and the entire state is managed through OpenTofu with HCP remote state, so a full rebuild takes a single run.
-
Three control-plane plus three worker node k3s cluster running on Proxmox VMs, automated end to end. Terraform provisions the VMs, Ansible configures the OS and bootstraps k3s, and Bash glues the handoffs together.
-
Full LGTM stack deployed on the homelab. Prometheus scrapes metrics, Loki aggregates logs, Tempo handles distributed traces, and Grafana ties it all into one pane with unified alerting.
-
Self-hosted Unbound recursive resolver processing roughly 500 k queries per month at a 75% cache hit rate and 40 to 80 ms p50 latency, integrated with split-horizon zones for internal service discovery.
Open Source
-
Contributed favicon support, analytics integration, Lucide icon support, and two other features to this Go-based static site generator.
-
Participated in cloud-native incident response simulations in May 2025 and December 2025, working through real-world failure scenarios under time pressure.