Horizontal Pod Autoscaler (HPA)

The Kubernetes controller that automatically scales the number of Pod replicas in a Deployment, ReplicaSet, or StatefulSet based on observed CPU, memory, or custom metrics. The primary production autoscaling tool and a core CKA topic. Synthesized from CKA Day 17 — Kubernetes Autoscaling Explained.

What Is HPA?

HPA is a control loop that runs inside the kube-controller-manager. It periodically queries metrics (via the Metrics Server or custom metrics APIs), compares them against user-defined targets, and adjusts the replicas field of a target workload.

Metrics Server
      │
      ▼
┌─────────────┐      ┌─────────────┐      ┌─────────────┐
│  HPA        │─────▶│  Deployment │─────▶│   Pods      │
│  Controller │      │  replicas   │      │  (more/less)│
└─────────────┘      └─────────────┘      └─────────────┘

Prerequisites

HPA cannot function without three conditions:

Metrics Server installed — provides CPU/memory usage data
Container resources.requests defined — HPA calculates utilization as current_usage / requested_resources
Target has a scale subresource — Deployment, ReplicaSet, StatefulSet, or ReplicaController

If requests are missing, HPA status shows <unknown> and scaling does not occur. Source: CKA Day 16

YAML Anatomy (autoscaling/v2)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-deploy
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60

Field	Description
`scaleTargetRef`	The workload to scale (Deployment, ReplicaSet, StatefulSet)
`minReplicas`	Lowest replica count HPA will maintain
`maxReplicas`	Highest replica count HPA will allow
`metrics`	One or more metrics to evaluate (CPU, memory, Pods, Object, External)
`target.type`	`Utilization` (% of request) or `AverageValue` (absolute value)
`behavior`	Optional fine-tuning of scale-up and scale-down speed

Exam Tip: The CKA exam uses autoscaling/v2. Older autoscaling/v1 only supported CPU and lacked behavior tuning. Source: CKA Day 17

How HPA Calculates Desired Replicas

desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue)]

Example:

Current replicas: 2
Target CPU utilization: 50%
Observed average CPU utilization: 80%
Desired replicas = ceil[2 * (80 / 50)] = ceil[3.2] = 4 replicas

HPA then updates the Deployment’s spec.replicas to 4. The Deployment controller creates the additional Pods, and the Service’s Endpoints list updates automatically.

Metric Types

Type	What It Measures	Example Target
Resource	Pod-level CPU or memory	`averageUtilization: 50`
Pods	Custom metric averaged per Pod	Pods per second per replica
Object	Metric from a Kubernetes object (e.g., Ingress requests/sec)	Requests per second
External	Metric from an external monitoring system (e.g., Prometheus, CloudWatch)	Queue depth

For the CKA exam, Resource metrics (CPU and memory) are the most relevant.

Scale Behavior and Stabilization

By default, HPA:

Scales up immediately when thresholds are exceeded (no delay)
Scales down gradually after a 5-minute stabilization window to avoid flapping

You can customize this with the behavior field:

behavior:
  scaleUp:
    stabilizationWindowSeconds: 0
    policies:
    - type: Percent
      value: 100
      periodSeconds: 15
  scaleDown:
    stabilizationWindowSeconds: 300
    policies:
    - type: Pods
      value: 1
      periodSeconds: 120

This says: scale up as fast as needed (double replicas every 15s), but scale down no more than 1 Pod every 2 minutes, and wait 5 minutes of low load before scaling down at all.

Imperative Commands (CKA Speed Patterns)

# Create HPA for a Deployment (fastest exam method)
kubectl autoscale deployment nginx-deploy --min=1 --max=10 --cpu-percent=50
 
# Create HPA with memory target
kubectl autoscale deployment nginx-deploy --min=1 --max=10 --cpu-percent=50
# Then edit the generated YAML to add memory metrics
 
# Check HPA status
kubectl get hpa
kubectl describe hpa nginx-hpa
 
# View HPA events for troubleshooting
kubectl get events --field-selector involvedObject.name=nginx-hpa
 
# Delete HPA
kubectl delete hpa nginx-hpa

CKA Tip: kubectl autoscale generates a valid autoscaling/v2 HPA manifest. Use --dry-run=client -o yaml to generate and customize before applying. Source: CKA Day 17

Troubleshooting Matrix

Symptom	Likely Cause	Diagnostic Command	Fix
HPA shows `<unknown>`	Metrics Server missing or not ready	`kubectl get pods -n kube-system \| grep metrics`	Install Metrics Server
HPA shows `0/0`	No `resources.requests` in Pod template	`kubectl get deploy <name> -o yaml \| grep requests`	Add CPU/memory requests
HPA does not scale up	Target already at `maxReplicas`	`kubectl get hpa`	Raise `maxReplicas` or investigate load
HPA scales but Pods stay `Pending`	Cluster has no available node capacity	`kubectl get nodes`	Add nodes or use Cluster Autoscaler
Scale flapping (up/down repeatedly)	Stabilization window too short	`kubectl describe hpa`	Increase `stabilizationWindowSeconds`
Service not routing to new Pods	Selector mismatch or readiness probe failing	`kubectl get endpoints`	Verify labels and readiness probes

HPA and Services

When HPA adds Pods, the target workload’s labels are already correct (inherited from the template). The Service’s selector matches those labels, and the Endpoints controller adds the new Pod IPs automatically. Clients using the Service never notice the scaling event. Source: CKA Day 9

HPA and Resource Requests

HPA is the primary consumer of the resources.requests declared in Day 16. The entire utilization percentage is computed relative to the request:

CPU% = (measured CPU millicores) / (requested CPU millicores) * 100

If a container requests 100m and is using 80m, HPA sees 80% utilization. If the target is 50%, HPA scales up. This is why under-provisioning requests (e.g., 1m) causes premature scaling, and over-provisioning (e.g., 2000m) prevents scaling entirely. Source: CKA Day 16

Sources

CKA Day 17 — Kubernetes Autoscaling Explained: HPA vs VPA

Kubernetes Autoscaling — overview of all four mechanisms
Vertical Pod Autoscaler (VPA) — when vertical adjustment is preferred
Kubernetes Resource Requests and Limits — prerequisite for HPA utilization calculations
Deployment, ReplicaSet & Replication Controller — the workloads HPA scales
Pod Fundamentals — the unit being replicated
Kubernetes Services — load balancing across HPA-managed replicas
Kubernetes Architecture — kube-controller-manager and Metrics Server
Kubernetes Namespaces — HPA objects are namespace-scoped
Kubernetes Labels and Selectors — how HPA identifies its target
Why Kubernetes? — autoscaling as a core orchestration benefit
CKA Certification — exam domains
CKA Study Roadmap — Day 17 in the 40-day plan
Tech Tutorials with Piyush — course creator

Tags: kubernetes hpa horizontal-pod-autoscaler autoscaling metrics-server devops cka

Rakesh's Brain

Explorer

Horizontal Pod Autoscaler (HPA)

Horizontal Pod Autoscaler (HPA)

What Is HPA?

Prerequisites

YAML Anatomy (autoscaling/v2)

How HPA Calculates Desired Replicas

Metric Types

Scale Behavior and Stabilization

Imperative Commands (CKA Speed Patterns)

Troubleshooting Matrix

HPA and Services

HPA and Resource Requests

Sources

Table of Contents

Graph View

Latest Blog Posts

Backlinks

Rakesh's Brain

Explorer

Horizontal Pod Autoscaler (HPA)

Horizontal Pod Autoscaler (HPA)

What Is HPA?

Prerequisites

YAML Anatomy (autoscaling/v2)

How HPA Calculates Desired Replicas

Metric Types

Scale Behavior and Stabilization

Imperative Commands (CKA Speed Patterns)

Troubleshooting Matrix

HPA and Services

HPA and Resource Requests

Sources

Related Pages

Table of Contents

Graph View

Latest Blog Posts

Backlinks