Horizontal Pod Autoscaler (HPA)
The Kubernetes controller that automatically scales the number of Pod replicas in a Deployment, ReplicaSet, or StatefulSet based on observed CPU, memory, or custom metrics. The primary production autoscaling tool and a core CKA topic. Synthesized from CKA Day 17 — Kubernetes Autoscaling Explained.
What Is HPA?
HPA is a control loop that runs inside the kube-controller-manager. It periodically queries metrics (via the Metrics Server or custom metrics APIs), compares them against user-defined targets, and adjusts the replicas field of a target workload.
Metrics Server
│
▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ HPA │─────▶│ Deployment │─────▶│ Pods │
│ Controller │ │ replicas │ │ (more/less)│
└─────────────┘ └─────────────┘ └─────────────┘
Prerequisites
HPA cannot function without three conditions:
- Metrics Server installed — provides CPU/memory usage data
- Container
resources.requestsdefined — HPA calculates utilization ascurrent_usage / requested_resources - Target has a
scalesubresource — Deployment, ReplicaSet, StatefulSet, or ReplicaController
If requests are missing, HPA status shows <unknown> and scaling does not occur. Source: CKA Day 16
YAML Anatomy (autoscaling/v2)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: nginx-hpa
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx-deploy
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60| Field | Description |
|---|---|
scaleTargetRef | The workload to scale (Deployment, ReplicaSet, StatefulSet) |
minReplicas | Lowest replica count HPA will maintain |
maxReplicas | Highest replica count HPA will allow |
metrics | One or more metrics to evaluate (CPU, memory, Pods, Object, External) |
target.type | Utilization (% of request) or AverageValue (absolute value) |
behavior | Optional fine-tuning of scale-up and scale-down speed |
Exam Tip: The CKA exam uses
autoscaling/v2. Olderautoscaling/v1only supported CPU and lackedbehaviortuning. Source: CKA Day 17
How HPA Calculates Desired Replicas
desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue)]
Example:
- Current replicas: 2
- Target CPU utilization: 50%
- Observed average CPU utilization: 80%
- Desired replicas = ceil[2 * (80 / 50)] = ceil[3.2] = 4 replicas
HPA then updates the Deployment’s spec.replicas to 4. The Deployment controller creates the additional Pods, and the Service’s Endpoints list updates automatically.
Metric Types
| Type | What It Measures | Example Target |
|---|---|---|
| Resource | Pod-level CPU or memory | averageUtilization: 50 |
| Pods | Custom metric averaged per Pod | Pods per second per replica |
| Object | Metric from a Kubernetes object (e.g., Ingress requests/sec) | Requests per second |
| External | Metric from an external monitoring system (e.g., Prometheus, CloudWatch) | Queue depth |
For the CKA exam, Resource metrics (CPU and memory) are the most relevant.
Scale Behavior and Stabilization
By default, HPA:
- Scales up immediately when thresholds are exceeded (no delay)
- Scales down gradually after a 5-minute stabilization window to avoid flapping
You can customize this with the behavior field:
behavior:
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 1
periodSeconds: 120This says: scale up as fast as needed (double replicas every 15s), but scale down no more than 1 Pod every 2 minutes, and wait 5 minutes of low load before scaling down at all.
Imperative Commands (CKA Speed Patterns)
# Create HPA for a Deployment (fastest exam method)
kubectl autoscale deployment nginx-deploy --min=1 --max=10 --cpu-percent=50
# Create HPA with memory target
kubectl autoscale deployment nginx-deploy --min=1 --max=10 --cpu-percent=50
# Then edit the generated YAML to add memory metrics
# Check HPA status
kubectl get hpa
kubectl describe hpa nginx-hpa
# View HPA events for troubleshooting
kubectl get events --field-selector involvedObject.name=nginx-hpa
# Delete HPA
kubectl delete hpa nginx-hpaCKA Tip:
kubectl autoscalegenerates a validautoscaling/v2HPA manifest. Use--dry-run=client -o yamlto generate and customize before applying. Source: CKA Day 17
Troubleshooting Matrix
| Symptom | Likely Cause | Diagnostic Command | Fix |
|---|---|---|---|
HPA shows <unknown> | Metrics Server missing or not ready | kubectl get pods -n kube-system | grep metrics | Install Metrics Server |
HPA shows 0/0 | No resources.requests in Pod template | kubectl get deploy <name> -o yaml | grep requests | Add CPU/memory requests |
| HPA does not scale up | Target already at maxReplicas | kubectl get hpa | Raise maxReplicas or investigate load |
HPA scales but Pods stay Pending | Cluster has no available node capacity | kubectl get nodes | Add nodes or use Cluster Autoscaler |
| Scale flapping (up/down repeatedly) | Stabilization window too short | kubectl describe hpa | Increase stabilizationWindowSeconds |
| Service not routing to new Pods | Selector mismatch or readiness probe failing | kubectl get endpoints | Verify labels and readiness probes |
HPA and Services
When HPA adds Pods, the target workload’s labels are already correct (inherited from the template). The Service’s selector matches those labels, and the Endpoints controller adds the new Pod IPs automatically. Clients using the Service never notice the scaling event. Source: CKA Day 9
HPA and Resource Requests
HPA is the primary consumer of the resources.requests declared in Day 16. The entire utilization percentage is computed relative to the request:
CPU% = (measured CPU millicores) / (requested CPU millicores) * 100
If a container requests 100m and is using 80m, HPA sees 80% utilization. If the target is 50%, HPA scales up. This is why under-provisioning requests (e.g., 1m) causes premature scaling, and over-provisioning (e.g., 2000m) prevents scaling entirely. Source: CKA Day 16
Sources
Related Pages
- Kubernetes Autoscaling — overview of all four mechanisms
- Vertical Pod Autoscaler (VPA) — when vertical adjustment is preferred
- Kubernetes Resource Requests and Limits — prerequisite for HPA utilization calculations
- Deployment, ReplicaSet & Replication Controller — the workloads HPA scales
- Pod Fundamentals — the unit being replicated
- Kubernetes Services — load balancing across HPA-managed replicas
- Kubernetes Architecture — kube-controller-manager and Metrics Server
- Kubernetes Namespaces — HPA objects are namespace-scoped
- Kubernetes Labels and Selectors — how HPA identifies its target
- Why Kubernetes? — autoscaling as a core orchestration benefit
- CKA Certification — exam domains
- CKA Study Roadmap — Day 17 in the 40-day plan
- Tech Tutorials with Piyush — course creator
Tags: kubernetes hpa horizontal-pod-autoscaler autoscaling metrics-server devops cka