Production systems fail in ways you never anticipated during development. A container starts, passes its initial health check, and gets traffic β but internally, it's deadlocked, half-initialised, or waiting on a database connection that never came back. Users hit 502s. Your on-call phone rings at 2 AM.
Kubernetes Probes exist precisely to prevent that scenario.
A probe is a diagnostic check that the kubelet runs against a running container on a defined schedule. Depending on the probe type and result, Kubernetes either stops routing traffic to the pod, restarts it, or holds off on activating other probes until the application is genuinely ready. When tuned correctly, probes give your cluster the intelligence to detect and self-heal from failure states β automatically, without a human in the loop.
This repository demonstrates all three probe types using a real containerised web application (rayeez/kubegame:v2) deployed as a 3-replica Kubernetes Deployment. You'll see the full workflow β from generating the deployment manifest with --dry-run, patching in probes, applying to the cluster, and verifying behaviour from inside running pods.
k8s-probes-deep-dive/
βββ readiness-deployment.yaml # Deployment with Readiness Probe only
βββ readiness-liveness-deployment.yaml # Deployment with both Readiness + Liveness Probes
βββ commands_used.txt
βββ README.md
Kubernetes does not know what your application does internally. It only knows whether a container process is running. Without probes, a pod sitting in Running state could be serving 500 errors, deadlocked, or stuck waiting on config that never loaded β and Kubernetes would happily keep routing traffic to it.
Probes solve this by giving kubelet a window into the actual health of your application. There are three:
| Probe | Question It Answers |
|---|---|
| Readiness | Is this container ready to accept traffic right now? |
| Liveness | Is this container still alive and functional? |
| Startup | Has this container finished its slow initialisation? |
Each probe uses one of three mechanisms β httpGet, tcpSocket, or exec β to perform its check. The kubelet executes the check on the defined schedule and acts on the result accordingly.
The simple answer: without probes, you are flying blind.
Concretely, probes let you:
- Prevent traffic routing to unready containers β A freshly scheduled pod that hasn't finished warming up won't receive a single request until the readiness probe passes.
- Automatically recover from stuck processes β Applications deadlock. Memory corruption happens. A liveness probe catches these states and triggers a restart before your users notice.
- Protect slow-starting legacy apps β Startup probes buy slow applications the time to initialise without getting killed by an overeager liveness check.
- Improve SLA and uptime β Probes are the foundational building block of self-healing infrastructure. They are what makes "it just restarts itself" actually true.
In any environment running real user traffic, operating without probes is a reliability risk you should not accept.
A readiness probe tells Kubernetes whether a container is ready to serve traffic. When the probe fails, the pod is removed from the Service's endpoints β no traffic reaches it. Critically, the container is not restarted. The pod keeps running; it simply stops receiving requests until the probe passes again.
This is the right tool for:
- Applications that need time to warm up after scheduling
- Pods that temporarily become unavailable (cache reload, dependency temporarily down)
- Controlled rollouts where you want each replica verified before the next is updated
Rather than writing the YAML from scratch, generate a base manifest and pipe it straight into the cluster:
kubectl create deployment app1 --image rayeez/kubegame:v2 --dry-run=client -o yamlGenerates a Deployment manifest for
rayeez/kubegame:v2without applying it β output goes to stdout for review or editing.
echo 'apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: app1
name: app1
spec:
replicas: 3
selector:
matchLabels:
app: app1
template:
metadata:
labels:
app: app1
spec:
containers:
- image: rayeez/kubegame:v2
name: kubegame
readinessProbe:
httpGet:
path: /index.html
port: 80
initialDelaySeconds: 5
periodSeconds: 10' | kubectl apply -f -Applies the inline Deployment YAML directly to the cluster by piping the manifest into
kubectl apply.
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: app1
name: app1
spec:
replicas: 3
selector:
matchLabels:
app: app1
template:
metadata:
labels:
app: app1
spec:
containers:
- image: rayeez/kubegame:v2
name: kubegame
readinessProbe:
httpGet:
path: /index.html # Checks that the main page is being served correctly
port: 80 # nginx inside the container listens on port 80
initialDelaySeconds: 5 # Wait 5s after container start before the first check
periodSeconds: 10 # Re-run the check every 10 seconds thereafter| Field | Purpose |
|---|---|
httpGet.path |
The URL path the kubelet requests β a 2xx response means the pod is ready |
httpGet.port |
Port to target β must match what your container is actually listening on |
initialDelaySeconds |
Grace period before the first probe fires; set this based on your measured startup time |
periodSeconds |
Probe frequency β every 10 seconds is a sensible default for most apps |
kubectl get podsLists all pods with their
READYstatus,STATUS, andRESTARTScount β readiness probe failures show as0/1in the READY column.
Exec into a pod and manually simulate what the readiness probe does β confirm the target path is actually reachable from inside the container:
kubectl exec -it app1-66f78b979-2jm4k -- bashOpens an interactive bash shell inside pod
app1-66f78b979-2jm4kto inspect the container filesystem and test endpoints manually.
kubectl exec -it app1-66f78b979-jnxhg -- bashOpens a shell in a second replica to verify consistent behaviour across all running pods in the Deployment.
Once inside, you can run:
curl -s -o /dev/null -w "%{http_code}" http://localhost:80/index.htmlHits the readiness probe endpoint directly β a
200response confirms the probe will pass; anything else explains why the pod is marked unready.
A liveness probe answers a different question: is this container still doing useful work, or has it entered a broken state it can't recover from on its own?
When a liveness probe fails consecutively beyond failureThreshold, kubelet restarts the container. This is the automatic recovery mechanism β deadlocks, OOM states, infinite loops, corrupted internal state. The liveness probe catches them and gives the container a clean slate.
Do not conflate liveness with readiness. A pod can be alive (liveness passes) but not ready (readiness fails) β this is normal and expected during startup or when waiting on a dependency. They serve different purposes and are evaluated independently.
echo 'apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: app1
name: app1
spec:
replicas: 3
selector:
matchLabels:
app: app1
template:
metadata:
labels:
app: app1
spec:
containers:
- image: rayeez/kubegame:v2
name: kubegame
readinessProbe:
httpGet:
path: /index.html
port: 80
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /index.html
port: 80
initialDelaySeconds: 15
periodSeconds: 20' | kubectl apply -f -Updates the existing
app1Deployment in-place to add a liveness probe alongside the readiness probe β Kubernetes performs a rolling update across the 3 replicas.
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: app1
name: app1
spec:
replicas: 3
selector:
matchLabels:
app: app1
template:
metadata:
labels:
app: app1
spec:
containers:
- image: rayeez/kubegame:v2
name: kubegame
readinessProbe:
httpGet:
path: /index.html # Readiness check β gates traffic into this pod
port: 80
initialDelaySeconds: 5 # First readiness check fires 5s after container start
periodSeconds: 10 # Rechecks every 10 seconds
livenessProbe:
httpGet:
path: /index.html # Liveness check β detects broken/stuck container state
port: 80
initialDelaySeconds: 15 # Fires 15s after start β after readiness has already run
periodSeconds: 20 # Rechecks every 20 seconds (less frequent than readiness)The liveness probe is intentionally set to fire 10 seconds after the readiness probe. This ordering matters:
- At
t=5sβ readiness probe runs for the first time. Pod may still be warming up; if it fails, it's simply held out of rotation. No restart. - At
t=15sβ liveness probe fires for the first time. By this point the app should be up. If liveness starts failing here, it genuinely indicates a broken state β and a restart is appropriate.
Setting both to the same initialDelaySeconds risks the liveness probe killing a container that's still starting up normally.
kubectl get podsAfter applying the updated manifest, watch pods cycle through
TerminatingβContainerCreatingβRunningβ confirms the rolling update completed successfully with the new probe config active.
The pod names change after a rolling update. Exec into the new replicas to confirm both probes are reachable:
kubectl exec -it app1-5846c9b899-gkkt4 -- bashOpens a shell in the first updated replica to verify the container is healthy and both probe endpoints are reachable.
kubectl exec -it app1-5846c9b899-jmjzf -- bashOpens a shell in the second updated replica β cross-check that all pods in the Deployment behave identically.
kubectl exec -it app1-5846c9b899-xtmg7 -- bashOpens a shell in the third replica β validates that the rolling update applied cleanly across all three pods.
Startup probes solve one specific problem: applications with long initialisation times that would be killed by the liveness probe before they finish starting.
The startup probe runs exclusively β liveness and readiness are both disabled until it passes. Once it succeeds, kubelet hands control to the other two probes. If the startup probe never succeeds within failureThreshold Γ periodSeconds, the container is killed.
This is the pattern for JVM services, .NET apps, anything loading large ML models, or legacy applications with multi-minute startup sequences.
startupProbe:
httpGet:
path: /index.html
port: 80
failureThreshold: 30 # 30 attempts Γ 10s each = up to 5 minutes to start
periodSeconds: 10 # Checks every 10 seconds during the startup windowWith this config, the app gets up to 5 minutes to pass a single startup probe check. Once it does, readiness and liveness take over with their own schedules. Without this, you'd need to bloat initialDelaySeconds on your liveness probe to 300 seconds β meaning a real deadlock would go undetected for 5 minutes. Startup probes solve that cleanly.
For a complete production deployment, all three probes work in sequence:
apiVersion: apps/v1
kind: Deployment
metadata:
name: app1
labels:
app: app1
spec:
replicas: 3
selector:
matchLabels:
app: app1
template:
metadata:
labels:
app: app1
spec:
containers:
- name: kubegame
image: rayeez/kubegame:v2
# --- Startup Probe (runs first, blocks readiness + liveness) ---
startupProbe:
httpGet:
path: /index.html
port: 80
failureThreshold: 30 # Up to 300s for the app to start
periodSeconds: 10
# --- Readiness Probe (gates traffic) ---
readinessProbe:
httpGet:
path: /index.html
port: 80
initialDelaySeconds: 5
periodSeconds: 10
# --- Liveness Probe (triggers restarts on broken state) ---
livenessProbe:
httpGet:
path: /index.html
port: 80
initialDelaySeconds: 15
periodSeconds: 20User Request
β
βΌ
Kubernetes Service (app1)
β
ββ Pod 1 [Startup: PASS β Readiness: PASS] β
β Receives traffic
ββ Pod 2 [Startup: PASS β Readiness: FAIL] β β Removed from endpoints
ββ Pod 3 [Startup: PASS β Readiness: PASS] β
β Receives traffic
β
If Liveness fails on Pod 2:
βΌ
kubelet restarts container β Pod 2 re-enters
startup β readiness cycle automatically
Picture this: your kubegame web app is deployed with 3 replicas behind a Service. During a rolling deploy:
Without readiness probe: New pods spin up and are immediately added to the Service endpoints. If nginx inside the container takes even 3β4 seconds to fully initialise, those first requests land on a container that hasn't finished starting β users get connection refused or a blank page.
Without liveness probe: Three days later, a pod gets into a state where nginx is running but returning 500 on every request due to a corrupted temp file. The process hasn't exited, so Kubernetes considers it healthy. The pod sits there silently failing. A third of your traffic is going to a broken replica.
Without startup probe: You're deploying a heavier variant of the app that takes 45 seconds to start. Your liveness probe has initialDelaySeconds: 30. At t=30s the liveness probe fires, gets a 503 (still starting), and kills the container. It starts again. Gets killed again. CrashLoopBackOff. Infinite loop that could have been avoided.
With all three probes:
- Startup probe gives the container a 300s window β more than enough
- Readiness probe holds the pod out of the Service until
/index.htmlreturns 200 - Liveness probe detects the nginx failure and restarts the container within ~60 seconds
- The bad pod is replaced cleanly; zero manual intervention, zero pager alerts
kubectl create deployment app1 --image rayeez/kubegame:v2 --dry-run=client -o yamlOutputs a ready-to-edit Deployment YAML to stdout β the starting point for adding probe configuration before applying.
echo '...' | kubectl apply -f -Applies a manifest piped from stdin directly into the cluster β no need to create a file on disk first.
kubectl get podsLists all pods in the current namespace showing READY state, STATUS, RESTARTS, and AGE β your quick health check after every apply.
kubectl exec -it app1-66f78b979-2jm4k -- bashDrops into an interactive shell inside the first replica of the readiness-only Deployment for manual endpoint testing.
kubectl exec -it app1-66f78b979-jnxhg -- bashDrops into the second replica β verify probe endpoint consistency across all pods in the Deployment.
echo '...' | kubectl apply -f -Re-applies the Deployment manifest with the liveness probe added β Kubernetes performs a rolling update automatically.
kubectl exec -it app1-5846c9b899-gkkt4 -- bashAccesses the first updated replica (post-rolling-update) to confirm both readiness and liveness probe paths are reachable.
kubectl exec -it app1-5846c9b899-jmjzf -- bashAccesses the second updated replica for cross-replica verification.
kubectl exec -it app1-5846c9b899-xtmg7 -- bashAccesses the third updated replica β ensures the rolling update applied cleanly to every pod.
kubectl get pods -wStreams live pod status changes to the terminal β use this while applying to watch replicas cycle through readiness states.
kubectl describe pod <pod-name>Dumps full pod detail including Liveness/Readiness/Startup probe config, container state, and the Events section showing any probe failures.
kubectl describe pod <pod-name> | grep -A 20 "Liveness\|Readiness\|Startup"Filters
describeoutput to show only the configured probe parameters β quick way to confirm the right values are active.
kubectl describe pod <pod-name> | grep -A 30 EventsShows the Events block where probe failures are logged explicitly with timestamps and failure reasons.
kubectl logs <pod-name>Prints the container's stdout/stderr β check this if your probe path is returning unexpected status codes.
kubectl logs <pod-name> --previousRetrieves logs from the previous container instance β the only way to see what happened before a liveness-triggered restart killed it.
kubectl get endpointsLists which pod IPs are currently in each Service's endpoint pool β only pods with passing readiness probes appear here.
kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[0].restartCount}'Outputs the exact restart count for a pod's container β a number that keeps climbing confirms repeated liveness probe failures.
When things don't behave as expected, work through this sequence:
Step 1 β Read pod status
kubectl get pods0/1 Running = readiness probe failing. Container is running but not yet serving traffic. High RESTARTS = liveness probe triggering repeatedly.
Step 2 β Dig into Events
kubectl describe pod <pod-name>The Events section at the bottom tells you exactly what's happening:
Warning Unhealthy 8s kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
Warning Unhealthy 28s kubelet Liveness probe failed: Get "http://10.0.0.4:80/index.html": context deadline exceeded (Client.Timeout exceeded)
Normal Killing 28s kubelet Container kubegame failed liveness probe, will be restarted
Step 3 β Test the probe path from inside the pod
kubectl exec -it <pod-name> -- bash
curl -v http://localhost:80/index.htmlIf
curlfails from inside the pod, the application is the problem β not the probe configuration.
Step 4 β Check previous container logs after a restart
kubectl logs <pod-name> --previousAfter a liveness-triggered restart, the current container is fresh and its logs may be empty. --previous retrieves the logs from the container that was just killed β this is where the failure context lives.
Step 5 β Verify endpoint membership
kubectl get endpointsIf your pod is 1/1 Running but not receiving traffic, check whether its IP appears in the endpoints list. If not, the readiness probe is silently failing on a schedule.
| Probe Type | Primary Purpose | Restarts Pod? | Removes from Service? | Best Use Case |
|---|---|---|---|---|
| Readiness | Gate traffic until app is genuinely ready | β No | β Yes | Slow startup, temporary dependency unavailability |
| Liveness | Detect and recover from unrecoverable states | β Yes | β Yes (while restarting) | Deadlocks, infinite loops, corrupted state |
| Startup | Protect slow-starting apps from premature liveness failures | β Yes (if never passes) | β No | JVM apps, legacy services, large model loading |
Always define readiness and liveness on every Deployment. A workload without probes is trusting that your application never enters a degraded state. It will.
Set initialDelaySeconds based on measured startup time, not guesswork. Run your app under realistic load in staging, time how long it takes to be ready, and add a 20β30% buffer. Guessing too low causes false restarts; guessing too high delays failure detection.
Don't make probes too aggressive. A periodSeconds: 2 liveness probe creates constant load and generates false positives under CPU spikes. Start at 10β20 seconds for liveness, 5β10 for readiness, and tighten only if your SLA demands it.
Use startup probes for any app with JVM, .NET CLR, or slow framework initialisation. These runtimes regularly take 30β90 seconds to reach a ready state. Don't work around it with inflated initialDelaySeconds β use the right tool.
Keep liveness endpoints lightweight β don't check external dependencies there. If your liveness probe queries the database and the database is temporarily down, all your pods restart simultaneously. Liveness should reflect only the internal state of the process. Readiness can (and should) check downstream dependencies because it controls traffic, not pod lifecycle.
Test probe failure paths before going to production. Kill the /index.html path and confirm the pod leaves the Service endpoints. Block the liveness path and watch the restart counter climb. If you haven't deliberately triggered failure in staging, you don't actually know your probes work.
Running containers in Kubernetes without probes is like running a load balancer without health checks β technically functional, practically dangerous. Kubernetes knows when your containers crash. Probes are what teach it to understand when your containers are broken without crashing.
The workflow in this repository demonstrates the complete progression: start with a dry-run to generate a clean base manifest, layer in a readiness probe to protect traffic routing, then add a liveness probe to enable automatic recovery from unrecoverable states. Each step is verified by exec-ing into running pods to confirm behaviour from the inside β not just trusting that the status column looks right.
In production, every Deployment should have at minimum a readiness probe. Add liveness for any workload that can deadlock or accumulate state corruption over time. Add startup for anything that needs more than 30 seconds to initialise. Tune the thresholds to your actual application, test the failure paths, and your cluster will handle outages that previously required a 2 AM intervention β silently, automatically, before users notice.
That's what self-healing infrastructure actually looks like.