Horizontal Pod Autoscaler (HPA)

The Kubernetes controller that automatically scales the number of Pod replicas in a Deployment, ReplicaSet, or StatefulSet based on observed CPU, memory, or custom metrics. The primary production autoscaling tool and a core CKA topic. Synthesized from CKA Day 17 — Kubernetes Autoscaling Explained.

What Is HPA?

HPA is a control loop that runs inside the kube-controller-manager. It periodically queries metrics (via the Metrics Server or custom metrics APIs), compares them against user-defined targets, and adjusts the replicas field of a target workload.

Metrics Server
      │
      ▼
┌─────────────┐      ┌─────────────┐      ┌─────────────┐
│  HPA        │─────▶│  Deployment │─────▶│   Pods      │
│  Controller │      │  replicas   │      │  (more/less)│
└─────────────┘      └─────────────┘      └─────────────┘

Prerequisites

HPA cannot function without three conditions:

  1. Metrics Server installed — provides CPU/memory usage data
  2. Container resources.requests defined — HPA calculates utilization as current_usage / requested_resources
  3. Target has a scale subresource — Deployment, ReplicaSet, StatefulSet, or ReplicaController

If requests are missing, HPA status shows <unknown> and scaling does not occur. Source: CKA Day 16

YAML Anatomy (autoscaling/v2)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-deploy
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
FieldDescription
scaleTargetRefThe workload to scale (Deployment, ReplicaSet, StatefulSet)
minReplicasLowest replica count HPA will maintain
maxReplicasHighest replica count HPA will allow
metricsOne or more metrics to evaluate (CPU, memory, Pods, Object, External)
target.typeUtilization (% of request) or AverageValue (absolute value)
behaviorOptional fine-tuning of scale-up and scale-down speed

Exam Tip: The CKA exam uses autoscaling/v2. Older autoscaling/v1 only supported CPU and lacked behavior tuning. Source: CKA Day 17

How HPA Calculates Desired Replicas

desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue)]

Example:

  • Current replicas: 2
  • Target CPU utilization: 50%
  • Observed average CPU utilization: 80%
  • Desired replicas = ceil[2 * (80 / 50)] = ceil[3.2] = 4 replicas

HPA then updates the Deployment’s spec.replicas to 4. The Deployment controller creates the additional Pods, and the Service’s Endpoints list updates automatically.

Metric Types

TypeWhat It MeasuresExample Target
ResourcePod-level CPU or memoryaverageUtilization: 50
PodsCustom metric averaged per PodPods per second per replica
ObjectMetric from a Kubernetes object (e.g., Ingress requests/sec)Requests per second
ExternalMetric from an external monitoring system (e.g., Prometheus, CloudWatch)Queue depth

For the CKA exam, Resource metrics (CPU and memory) are the most relevant.

Scale Behavior and Stabilization

By default, HPA:

  • Scales up immediately when thresholds are exceeded (no delay)
  • Scales down gradually after a 5-minute stabilization window to avoid flapping

You can customize this with the behavior field:

behavior:
  scaleUp:
    stabilizationWindowSeconds: 0
    policies:
    - type: Percent
      value: 100
      periodSeconds: 15
  scaleDown:
    stabilizationWindowSeconds: 300
    policies:
    - type: Pods
      value: 1
      periodSeconds: 120

This says: scale up as fast as needed (double replicas every 15s), but scale down no more than 1 Pod every 2 minutes, and wait 5 minutes of low load before scaling down at all.

Imperative Commands (CKA Speed Patterns)

# Create HPA for a Deployment (fastest exam method)
kubectl autoscale deployment nginx-deploy --min=1 --max=10 --cpu-percent=50
 
# Create HPA with memory target
kubectl autoscale deployment nginx-deploy --min=1 --max=10 --cpu-percent=50
# Then edit the generated YAML to add memory metrics
 
# Check HPA status
kubectl get hpa
kubectl describe hpa nginx-hpa
 
# View HPA events for troubleshooting
kubectl get events --field-selector involvedObject.name=nginx-hpa
 
# Delete HPA
kubectl delete hpa nginx-hpa

CKA Tip: kubectl autoscale generates a valid autoscaling/v2 HPA manifest. Use --dry-run=client -o yaml to generate and customize before applying. Source: CKA Day 17

Troubleshooting Matrix

SymptomLikely CauseDiagnostic CommandFix
HPA shows <unknown>Metrics Server missing or not readykubectl get pods -n kube-system | grep metricsInstall Metrics Server
HPA shows 0/0No resources.requests in Pod templatekubectl get deploy <name> -o yaml | grep requestsAdd CPU/memory requests
HPA does not scale upTarget already at maxReplicaskubectl get hpaRaise maxReplicas or investigate load
HPA scales but Pods stay PendingCluster has no available node capacitykubectl get nodesAdd nodes or use Cluster Autoscaler
Scale flapping (up/down repeatedly)Stabilization window too shortkubectl describe hpaIncrease stabilizationWindowSeconds
Service not routing to new PodsSelector mismatch or readiness probe failingkubectl get endpointsVerify labels and readiness probes

HPA and Services

When HPA adds Pods, the target workload’s labels are already correct (inherited from the template). The Service’s selector matches those labels, and the Endpoints controller adds the new Pod IPs automatically. Clients using the Service never notice the scaling event. Source: CKA Day 9

HPA and Resource Requests

HPA is the primary consumer of the resources.requests declared in Day 16. The entire utilization percentage is computed relative to the request:

CPU% = (measured CPU millicores) / (requested CPU millicores) * 100

If a container requests 100m and is using 80m, HPA sees 80% utilization. If the target is 50%, HPA scales up. This is why under-provisioning requests (e.g., 1m) causes premature scaling, and over-provisioning (e.g., 2000m) prevents scaling entirely. Source: CKA Day 16

Sources


Tags: kubernetes hpa horizontal-pod-autoscaler autoscaling metrics-server devops cka