Skip to content
View engrbilal1's full-sized avatar
😎
😎

Block or report engrbilal1

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
engrbilal1/README.md

Profile Views


πŸ‘¨β€πŸ’» About Me

Cloud DevOps & SRE Engineer with 3+ years of hands-on experience designing, automating, and operating cloud-native infrastructure at scale. Passionate about building resilient systems, automating everything, and driving observability from day one.

  • πŸ”­ Β Working extensively on AWS, GCP, Azure & Huawei Cloud β€” certified across all four
  • βš™οΈ Β Managing 50+ microservices on production Kubernetes clusters
  • πŸ—οΈ Β Built HA Kubernetes clusters from scratch on on-premises VMs
  • πŸ„ Β Managing multiple K8s clusters using Rancher as a centralized control plane across on-prem and cloud
  • πŸ”„ Β GitOps with ArgoCD ApplicationSets β€” multi-app, multi-environment (dev/staging/prod) via Kustomize overlays
  • 🐘 Β Running PostgreSQL HA clusters in K8s using CloudNativePG (CNPG) with WAL archiving & PITR
  • ⬆️ Β Performed zero-downtime Kubernetes cluster upgrades (v1.30 β†’ v1.34) across 4 minor versions on bare-metal
  • πŸ“‘ Β Deep expertise in full-stack observability β€” Prometheus, Grafana, Loki, Alertmanager & more
  • 🌱 Β Currently advancing skills in Platform Engineering & FinOps
  • πŸ’¬ Β Ask me about K8s, Docker, CI/CD, IaC, Monitoring, SRE practices
  • πŸ“« Β Reach me at [email protected]

πŸ… Certifications

Cloud Provider Certification
☁️ AWS AWS Certified (Solutions Architect)
🌐 Google Cloud GCP Associate Cloud Engineer
πŸ”· Microsoft Azure Azure Fundamentals AZ-900
πŸŸ₯ Huawei Cloud HCCDA

πŸ› οΈ Tech Stack & Expertise

☁️ Cloud Platforms

AWS GCP Azure Huawei Cloud

🐳 Containers & Orchestration

Kubernetes Docker Rancher Helm Harbor

πŸ” CI/CD & GitOps

GitHub Actions Jenkins ArgoCD GitLab CI Self-Hosted Runners Kustomize Argo Rollouts

πŸ“¦ Infrastructure as Code

Terraform Ansible Pulumi

πŸ“Š Observability & SRE

Prometheus Grafana Loki Alertmanager

πŸ—„οΈ Databases & Messaging

PostgreSQL MySQL MongoDB Redis Kafka CloudNativePG PgBouncer

🌐 Networking & Security

Nginx Istio Vault Cert Manager

🐧 OS & Scripting

Linux Bash Python Git


πŸš€ Key Projects & Highlights

πŸ—οΈ HA Kubernetes Cluster (On-Prem)

Designed and deployed a production-grade, highly available Kubernetes cluster on bare-metal VMs with multi-master setup, etcd clustering, and automated failover β€” all managed via Rancher.

πŸ„ Rancher β€” Multi-Cluster Kubernetes Management

Deployed and managed Rancher as a centralized control plane for managing multiple Kubernetes clusters across on-premises and cloud environments.

  • Imported and managed multiple K8s clusters (on-prem HA + cloud-managed) from a single Rancher dashboard
  • Configured role-based access control (RBAC) across clusters β€” mapping teams to namespaces and projects with fine-grained permissions
  • Used Rancher Projects to group namespaces and enforce resource quotas and network policies across clusters
  • Managed cluster catalogs and Helm app deployments via Rancher Apps & Marketplace
  • Monitored all clusters centrally using Rancher's integrated Prometheus & Grafana stack
  • Used Rancher to perform node pool scaling, OS upgrades, and certificate rotation without touching kubeconfig directly

πŸ”­ Full-Stack Observability Platform

Built a complete observability stack: Prometheus (metrics) β†’ Grafana (dashboards) β†’ Loki + Promtail (logs) β†’ Alertmanager (notifications via Slack/PagerDuty) + Blackbox Exporter for external API/endpoint uptime monitoring.

🐳 Private Container Registry β€” Harbor

Deployed and maintained a self-hosted Harbor registry with role-based access control, image vulnerability scanning, and replication policies integrated into CI/CD pipelines.

⚑ Self-Hosted GitHub Actions Runners

Configured ephemeral self-hosted runners on Kubernetes for secure, scalable CI/CD β€” reducing pipeline costs and enabling workloads that require access to private network resources.

πŸ”„ GitOps β€” Multi-App, Multi-Environment at Scale

Implemented a GitOps platform with ArgoCD managing 50+ microservices across dev, staging, and production environments. Used ArgoCD ApplicationSets with Git directory generators to auto-deploy new apps from repo structure β€” zero manual ArgoCD config per app. Each environment maps to a dedicated overlay in a monorepo (apps/<service>/overlays/<env>/) using Kustomize, with Helm charts for third-party dependencies. Sync waves enforce deployment ordering; health checks gate promotions between environments.

πŸ“ gitops-repo/
β”œβ”€β”€ apps/
β”‚   β”œβ”€β”€ payment-service/
β”‚   β”‚   β”œβ”€β”€ base/
β”‚   β”‚   └── overlays/
β”‚   β”‚       β”œβ”€β”€ dev/        ← lower replicas, debug logging
β”‚   β”‚       β”œβ”€β”€ staging/    ← mirror of prod resources
β”‚   β”‚       └── production/ ← HPA, PDB, resource limits enforced
β”‚   └── auth-service/ ...
β”œβ”€β”€ infrastructure/
β”‚   β”œβ”€β”€ monitoring/         ← Prometheus, Grafana, Loki stack
β”‚   β”œβ”€β”€ ingress/            ← Nginx Ingress + cert-manager
β”‚   └── cnpg/               ← CloudNativePG operator + clusters
└── applicationsets/        ← ArgoCD ApplicationSet manifests

🐘 CloudNativePG (CNPG) β€” PostgreSQL HA in Kubernetes

Deployed and operated CloudNativePG operator to run highly available PostgreSQL clusters natively inside Kubernetes β€” replacing external managed DB services for cost savings and full control.

  • Provisioned primary + 2 replica PostgreSQL clusters with streaming replication and automatic failover
  • Configured continuous WAL archiving to S3-compatible object storage for point-in-time recovery (PITR)
  • Managed scheduled backups, connection pooling via PgBouncer, and TLS-encrypted client connections
  • Integrated CNPG cluster credentials with External Secrets Operator β†’ HashiCorp Vault pipeline
  • Monitored replication lag, WAL sender/receiver status, and backup freshness via dedicated Grafana dashboards (CNPG community dashboard)

⬆️ Kubernetes Cluster Upgrade β€” v1.30 β†’ v1.34

Performed a zero-downtime, rolling in-place upgrade of a production on-premises Kubernetes cluster across 4 minor versions (1.30 β†’ 1.31 β†’ 1.32 β†’ 1.33 β†’ 1.34), following Kubernetes' one-minor-version-at-a-time policy.

Upgrade sequence per version:

1. Review API deprecations & release notes for each target version
2. Upgrade kubeadm on first control-plane node β†’ apply new control-plane config
3. Upgrade remaining control-plane nodes (HA etcd stays healthy throughout)
4. Upgrade kubelet + kubectl on all control-plane nodes
5. cordon β†’ drain worker node β†’ upgrade kubeadm/kubelet/kubectl β†’ uncordon
6. Validate: kubectl get nodes, pod health, etcd member list, CNI/CSI compatibility
  • Pre-validated deprecated API removals (e.g., policy/v1beta1 PodSecurityPolicy gone in 1.25+, flowcontrol.apiserver.k8s.io/v1beta2 in 1.32) and migrated manifests ahead of upgrade
  • Verified CNI plugin (Calico/Flannel) and CSI driver compatibility matrix before each hop
  • Ran Rancher UI upgrade path in parallel for clusters managed via Rancher, using its built-in node drain + upgrade orchestration
  • Validated workloads, Ingress, PVCs, and CNPG cluster health at every version boundary before proceeding

πŸ“ SRE Practices I Follow

πŸ“  SLO / SLA Definition     β†’  Error budgets for every critical service
πŸ”  Blameless Post-mortems   β†’  RCA docs after every incident
🚦  Traffic Management       β†’  Canary & blue-green deployments via K8s + Argo Rollouts
πŸ”  Secrets Management       β†’  HashiCorp Vault + External Secrets Operator
πŸ“¦  GitOps                   β†’  ArgoCD for declarative, auditable deployments
πŸ“‰  Capacity Planning        β†’  HPA / VPA / Cluster Autoscaler on cloud & on-prem
🌐  Service Mesh             β†’  Istio for mTLS, traffic shaping & observability
πŸ›‘οΈ  Security Hardening       β†’  Pod Security Admission, NetworkPolicies, image scanning

πŸ“Š GitHub Stats


🀝 Connect With Me

LinkedIn Twitter Email


"Infrastructure is code. Reliability is a feature. Automation is the goal."

Popular repositories Loading

  1. 20DaysOfDocker 20DaysOfDocker Public

    Master Docker in 20 Days - Beginner to Production

    2

  2. Create-Your-Own-Image-Classifier Create-Your-Own-Image-Classifier Public

    Final Project of "AI Programming with Python" Udacity Nano Degree

    Jupyter Notebook 1

  3. Small-Python-Project-for-CI Small-Python-Project-for-CI Public

    Python

  4. Shopping-Cart Shopping-Cart Public

    Forked from subedi31/Shopping-Cart

    Java

  5. CircleCi-CICD CircleCi-CICD Public

    HTML

  6. 30DaysOfKubernetes 30DaysOfKubernetes Public

    Forked from AmanPathak-DevOps/30DaysOfKubernetes

    Embark on a 30-day journey to master Kubernetes. Explore its architecture, set up clusters, deploy apps, and delve into advanced topics. Your path to Kubernetes expertise starts here.