Kubernetes Health Probes

The container-level health-checking system that lets Kubernetes decide when to restart a container, when to stop sending it traffic, and when to give a slow-starter time to initialize. A core Workloads & Troubleshooting topic on the CKA exam. Synthesized from CKA Day 18 — Kubernetes Health Probes Explained.

By default, Kubernetes only knows whether a container’s main process is running. It has no insight into:

Whether the application has finished initializing (database connections, cache warm-up)
Whether the application is functionally healthy (responding to requests, not deadlocked)
Whether the application is in a degraded state that warrants a restart

Probes are user-defined health checks that run inside or against containers. The kubelet on each node executes them and reacts according to their results.

The Three Probe Types

Probe	Question It Answers	Action on Failure	Scope
Liveness	Is the container alive and should keep running?	Restart the container	Container
Readiness	Is the container ready to serve traffic?	Remove from Service Endpoints	Pod + Service
Startup	Has a slow-starting container finished booting?	Disable other probes until success	Container (guard)

Golden Rule: Liveness protects the container (kill & restart), Readiness protects the Service (stop routing traffic), Startup protects slow starters (prevent premature death).

Probe Mechanisms

Kubernetes can check health in four ways. The mechanism is declared under the probe block (livenessProbe, readinessProbe, or startupProbe).

1. HTTP GET Probe

Sends an HTTP GET request to a specified path and port. Any response code between 200 and 399 is considered success.

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
    httpHeaders:
      - name: Custom-Header
        value: probe
  initialDelaySeconds: 30
  periodSeconds: 10

Field	Description
`path`	URL path to request
`port`	Container port (name or number)
`httpHeaders`	Optional headers to send
`scheme`	`HTTP` (default) or `HTTPS`

Best for: Web applications, REST APIs, microservices with a dedicated health endpoint.

2. TCP Socket Probe

Attempts to open a TCP connection to the specified port. Success = connection established.

readinessProbe:
  tcpSocket:
    port: 5432
  initialDelaySeconds: 5
  periodSeconds: 5

Best for: Databases, caches, message queues, and any service where an open port implies readiness.

3. Exec Probe

Runs a command inside the container. Exit code 0 = success; any non-zero exit code = failure.

livenessProbe:
  exec:
    command:
      - cat
      - /tmp/healthy
  initialDelaySeconds: 5
  periodSeconds: 5

Best for: Legacy applications, custom validation logic, checking PID files, or verifying file existence.

4. gRPC Probe

Native gRPC health-checking using the standard gRPC health protocol. Available in Kubernetes 1.27+ (alpha, enabled via feature gate).

livenessProbe:
  grpc:
    port: 50051
  initialDelaySeconds: 10

Best for: gRPC-first microservices where an HTTP endpoint would be artificial.

Probe Parameters (Timing & Thresholds)

Every probe shares these fields. Understanding their interaction is critical for both production tuning and the CKA exam.

Parameter	Default	Description
`initialDelaySeconds`	`0`	Seconds to wait after container start before the first probe
`periodSeconds`	`10`	How often to run the probe
`timeoutSeconds`	`1`	Seconds to wait for a response before counting it as failed
`successThreshold`	`1`	Consecutive successes required to transition from `Failure` → `Success`
`failureThreshold`	`3`	Consecutive failures required to transition from `Success` → `Failure`

Time-to-Action Math

The total time before a probe triggers its action after a container starts:

total_wait = initialDelaySeconds + (periodSeconds × failureThreshold)

Example:

initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3

Total time before liveness restart = 30 + (10 × 3) = 60 seconds

Exam Trap: Many candidates assume failureThreshold: 3 means 3 seconds. It means 3 probe periods.

Liveness Probe Deep Dive

Purpose

Detect when a container has entered a broken but running state — infinite loops, deadlocks, memory leaks that haven’t caused an OOMKill, or thread starvation.

What Happens on Failure?

kubelet marks the container as failed
kubelet kills the container process (SIGTERM, then SIGKILL after grace period)
kubelet creates a new container from the same image
The Pod stays on the same node; its IP may or may not change depending on restart policy
Restart count increments (kubectl get pod shows RESTARTS)

YAML Example

apiVersion: v1
kind: Pod
metadata:
  name: liveness-demo
spec:
  containers:
  - name: app
    image: myapp:v1
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 30
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 3

Danger: Liveness Without Startup Probe

If a container takes 2 minutes to start but liveness begins after 10 seconds with failureThreshold: 3, it will be killed before ever becoming healthy. This is the classic “crash loop” caused by misconfigured probes.

Fix: Add a startup probe with a generous failureThreshold.

Readiness Probe Deep Dive

Purpose

Determine whether a container is ready to accept traffic. An application may be running but not yet usable (e.g., loading configuration, warming caches, waiting for a leader election).

What Happens on Failure?

kubelet marks the Pod as NotReady
The Pod’s IP is removed from the Service’s EndpointSlice (and Endpoints object)
kube-proxy stops routing new traffic to this Pod
Existing connections are NOT terminated — only new requests are affected
Once readiness succeeds again, the IP is re-added automatically

YAML Example

apiVersion: v1
kind: Pod
metadata:
  name: readiness-demo
spec:
  containers:
  - name: app
    image: myapp:v1
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5
      successThreshold: 1
      failureThreshold: 3

Readiness and Deployments

During a rolling update:

New Pods must pass readiness before the Deployment counts them as “available”
Old Pods are terminated only after new Pods are ready
If readiness never succeeds, the rollout stalls — this is a common deployment failure mode

Readiness and Autoscaling

HPA counts only ready replicas when calculating current utilization. A Pod that is Running but NotReady does not count toward the replica target, which can cause HPA to scale up unnecessarily. Source: CKA Day 17

Startup Probe Deep Dive

Purpose

Give slow-starting containers (JVM apps, ML model loading, large dependency downloads) enough time to initialize without being killed by aggressive liveness checks.

How It Works

While the startup probe is running, liveness and readiness probes are disabled
Once the startup probe succeeds, liveness and readiness begin their normal cycles
If the startup probe fails up to failureThreshold, the container is restarted
If no startup probe is defined, liveness and readiness start immediately after container creation

YAML Example

apiVersion: v1
kind: Pod
metadata:
  name: slow-start
spec:
  containers:
  - name: app
    image: large-java-app:v1
    startupProbe:
      httpGet:
        path: /healthz
        port: 8080
      failureThreshold: 30
      periodSeconds: 10
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      periodSeconds: 10
      failureThreshold: 3
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      periodSeconds: 5

Time budget: 30 × 10s = 300s (5 minutes) to start. After that, liveness checks every 10s and readiness every 5s.

Full Pod Example: All Three Probes

apiVersion: v1
kind: Pod
metadata:
  name: probes-demo
  labels:
    app: web
spec:
  containers:
  - name: nginx
    image: nginx:alpine
    ports:
    - containerPort: 80
    startupProbe:
      httpGet:
        path: /
        port: 80
      failureThreshold: 30
      periodSeconds: 10
    livenessProbe:
      httpGet:
        path: /healthz
        port: 80
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 3
      failureThreshold: 3
    readinessProbe:
      httpGet:
        path: /ready
        port: 80
      initialDelaySeconds: 5
      periodSeconds: 5
      successThreshold: 1
      failureThreshold: 3

Probes in the Kubernetes Control Loop

Probes are not isolated features — they participate in the cluster-wide state machine:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   kubelet   │────▶│ API Server  │────▶│ EndpointSlice│
│  (runs      │     │  (stores     │     │  Controller  │
│   probes)   │     │   status)    │     │              │
└─────────────┘     └─────────────┘     └──────┬──────┘
                                               │
                          ┌────────────────────┘
                          ▼
                   ┌─────────────┐
                   │ kube-proxy  │
                   │ (programs   │
                   │  iptables)  │
                   └─────────────┘

kubelet runs probes on the node
kubelet updates Pod status.containerStatuses via the API Server
EndpointSlice controller watches Pod readiness; ready Pods are added to EndpointSlices
kube-proxy reads EndpointSlices and updates iptables/ipvs rules
Deployment controller counts available replicas based on readiness state

Troubleshooting Matrix

Symptom	Likely Cause	Diagnostic Command
Pod restarts every 60s	Liveness probe too aggressive; app needs more `initialDelaySeconds` or a startup probe	`kubectl describe pod <name>` → look at `Events`
Pod `Running` but not receiving traffic	Readiness probe failing; app not fully initialized	`kubectl get endpoints <svc>` → check if Pod IP is listed
Rolling update stuck	New Pods never become ready	`kubectl rollout status deployment/<name>`
HPA scaling unexpectedly high	NotReady Pods not counted; utilization calculated on fewer replicas	`kubectl get hpa` → check `CURRENT` vs `TARGET`
`CrashLoopBackOff`	Liveness or startup probe failing repeatedly; or app actually crashing	`kubectl logs --previous <pod>`
Probe works locally but fails in cluster	Path/port mismatch; container listens on `127.0.0.1` instead of `0.0.0.0`	`kubectl exec -it <pod> -- curl localhost:8080/healthz`

CKA Speed Patterns

Write probes from memory: You will see YAML-writing questions. Memorize the structure: probeType: { mechanism: { ... }, timingFields }
Check events first: kubectl describe pod <name> | grep -i probe`
Endpoint check: kubectl get endpoints <svc> shows whether readiness is working
Restart count: kubectl get pod <name> → RESTARTS column tells you if liveness is firing
No imperative probe support: kubectl run cannot add probes. Use YAML manifests or kubectl create with --dry-run=client -o yaml and edit.

Production Best Practices

Practice	Rationale
Separate `/healthz` and `/ready`	Liveness checks “not deadlocked”; readiness checks “dependencies up”. They often test different things.
Always use startup probes for slow apps	JVM, .NET, ML models — anything with >30s startup time.
Keep liveness stricter than readiness	Liveness should catch real failure; readiness should tolerate brief dependency hiccups.
Don’t probe external dependencies in liveness	If a database is down, you don’t want to restart every app Pod. Keep liveness local.
Set `timeoutSeconds` realistically	Default `1s` is too aggressive for remote endpoints or busy containers.
Log probe traffic separately	Exclude health-check endpoints from application request logs to reduce noise.

Pod Fundamentals — The unit that probes inspect
Deployment, ReplicaSet & Replication Controller — Rolling updates depend on readiness
Kubernetes Services — Readiness controls EndpointSlice membership
Kubernetes Architecture — How kubelet, API Server, and kube-proxy coordinate
Kubernetes Autoscaling — HPA counts ready replicas
Kubernetes Resource Requests and Limits — Resource context for probe behavior
CKA Certification — Exam domains where probes appear
CKA Study Roadmap — Day 18: Health Probes
CKA Day 18 — Kubernetes Health Probes Explained — Source video

Tags: kubernetes health-probes liveness readiness startup kubelet cka devops production troubleshooting

Rakesh's Brain

Explorer

Kubernetes Health Probes

Kubernetes Health Probes

The Problem: Kubernetes Is Blind Without Probes

The Three Probe Types

Probe Mechanisms

1. HTTP GET Probe

2. TCP Socket Probe

3. Exec Probe

4. gRPC Probe

Probe Parameters (Timing & Thresholds)

Time-to-Action Math

Liveness Probe Deep Dive

Purpose

What Happens on Failure?

YAML Example

Danger: Liveness Without Startup Probe

Readiness Probe Deep Dive

Purpose

What Happens on Failure?

YAML Example

Readiness and Deployments

Readiness and Autoscaling

Startup Probe Deep Dive

Purpose

How It Works

YAML Example

Full Pod Example: All Three Probes

Probes in the Kubernetes Control Loop

Troubleshooting Matrix

CKA Speed Patterns

Production Best Practices

Table of Contents

Graph View

Latest Blog Posts

Backlinks

Rakesh's Brain

Explorer

Kubernetes Health Probes

Kubernetes Health Probes

The Problem: Kubernetes Is Blind Without Probes

The Three Probe Types

Probe Mechanisms

1. HTTP GET Probe

2. TCP Socket Probe

3. Exec Probe

4. gRPC Probe

Probe Parameters (Timing & Thresholds)

Time-to-Action Math

Liveness Probe Deep Dive

Purpose

What Happens on Failure?

YAML Example

Danger: Liveness Without Startup Probe

Readiness Probe Deep Dive

Purpose

What Happens on Failure?

YAML Example

Readiness and Deployments

Readiness and Autoscaling

Startup Probe Deep Dive

Purpose

How It Works

YAML Example

Full Pod Example: All Three Probes

Probes in the Kubernetes Control Loop

Troubleshooting Matrix

CKA Speed Patterns

Production Best Practices

Related Pages

Table of Contents

Graph View

Latest Blog Posts

Backlinks