CKA Day 17 — Kubernetes Autoscaling Explained: HPA vs VPA

Day 17 of the 40-day CKA Certification course by Tech Tutorials with Piyush. This lesson introduces the four Kubernetes autoscaling mechanisms, explains when each applies, and demonstrates HPA with a live load-simulation demo.

What Is Scaling in Kubernetes?

Scaling is the ability to change the capacity of a workload to match demand. In Kubernetes, scaling operates at two levels:

LevelWhat ChangesUse Case
Pod-levelNumber of replicas or size of individual PodsHandle traffic spikes, optimize resource usage
Cluster-levelNumber of worker nodesAdd or remove capacity when the cluster itself runs out of room

The lesson demonstrates that Kubernetes offers four distinct autoscaling tools, each targeting a different layer of the stack.

Horizontal vs Vertical Scaling

DimensionHorizontal ScalingVertical Scaling
What it doesAdds or removes Pod replicasIncreases or decreases CPU/memory per Pod
Kubernetes toolHPA (Horizontal Pod Autoscaler)VPA (Vertical Pod Autoscaler)
Best forStateless workloads, variable trafficStateful or single-replica workloads, right-sizing
LimitationRequires the app to handle multiple instancesMay trigger rolling restarts when limits change

Key Insight: Horizontal scaling is the default Kubernetes pattern because Pods are designed to be ephemeral and interchangeable. Vertical scaling is used when you cannot add replicas (e.g., a database pod) or when Pods are over-provisioned and wasting resources. Source: CKA Day 17

The Four Autoscaling Mechanisms

1. Horizontal Pod Autoscaler (HPA)

HPA watches Pod metrics (CPU, memory, or custom metrics) and automatically scales the number of replicas in a Deployment, ReplicaSet, or StatefulSet up or down.

How it works:

  1. Metrics Server collects CPU/memory usage from kubelet
  2. HPA controller (part of kube-controller-manager) reads metrics
  3. If average CPU > target threshold (e.g., 50%), HPA increases replicas
  4. If average CPU < target, HPA decreases replicas (after a stabilization delay)

Prerequisites:

  • Metrics Server must be installed in the cluster
  • Target workload must have resources.requests defined (HPA uses requests as the baseline)
  • Target workload must have a scale subresource (Deployment, ReplicaSet, StatefulSet)

2. Vertical Pod Autoscaler (VPA)

VPA analyzes historical resource usage and recommends (or automatically applies) optimal CPU/memory requests and limits for containers.

Modes:

  • Off: Only generates recommendations (safe for learning)
  • Initial: Applies recommendations only to newly created Pods
  • Auto: Updates running Pods by evicting and recreating them with new resource values

Caution: VPA “Auto” mode can conflict with HPA on the same workload because both try to adjust capacity. The lesson recommends using VPA in “Off” or “Initial” mode when HPA is active, or using them on different workloads. Source: CKA Day 17

3. Cluster Autoscaler

Cluster Autoscaler runs as a Pod in the cluster and adjusts the number of worker nodes by talking to the cloud provider’s API (AWS, GCP, Azure).

Trigger conditions:

  • Scale-out: Pods are stuck Pending because no node has enough capacity
  • Scale-in: A node is underutilized (below a configurable threshold) and all its Pods can be rescheduled elsewhere

Important: Cluster Autoscaler is not part of the core CKA curriculum (it is cloud-specific), but understanding the concept is valuable for real-world operations. Source: CKA Day 17

4. Node Auto-Provisioning (GKE-specific)

A Google Kubernetes Engine (GKE) feature that automatically creates new node pools with the right machine type when workloads cannot be scheduled. This extends Cluster Autoscaler by provisioning not just nodes, but entirely new node pools with custom machine shapes.

Note: Node Auto-Provisioning is mentioned in the lesson as an advanced cloud-managed Kubernetes feature. It is not required for the CKA exam but demonstrates how managed platforms extend native autoscaling. Source: CKA Day 17

HPA Demo: Simulating Load

The lesson demonstrates HPA end-to-end:

  1. Install Metrics Server (if not already present)
  2. Create a Deployment with resources.requests.cpu set
  3. Expose the Deployment via a Service
  4. Create an HPA targeting the Deployment with a CPU target (e.g., 50%)
  5. Generate load using a tool like ab (Apache Bench) or hey
  6. Watch HPA scale up replicas as CPU crosses the threshold
  7. Stop the load and watch HPA scale down after the cooldown period

Key observation: Without resources.requests, HPA cannot calculate utilization percentages. The Pod template must declare requests; otherwise HPA reports <unknown> and does not scale. This reinforces Day 16’s lesson on requests and limits. Source: CKA Day 16

HPA YAML Anatomy

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-deploy
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
FieldPurpose
scaleTargetRefPoints to the Deployment/ReplicaSet/StatefulSet to scale
minReplicasFloor — HPA will not scale below this
maxReplicasCeiling — HPA will not scale above this
metricsList of metrics to evaluate (CPU, memory, or custom)
target.typeUtilization (percentage) or AverageValue (absolute)

CKA Exam Relevance

  • Workloads & Scheduling (~15%): Expect tasks to create an HPA for a Deployment, or troubleshoot why HPA is not scaling.
  • Troubleshooting (~30%): Common HPA failures include missing Metrics Server, missing resources.requests, or incorrect scaleTargetRef.
  • Speed Patterns:
    # Imperative HPA creation (fastest for exam)
    kubectl autoscale deployment nginx-deploy --min=1 --max=10 --cpu-percent=50
     
    # Check HPA status
    kubectl get hpa
    kubectl describe hpa nginx-hpa
     
    # Verify Metrics Server is running
    kubectl get pods -n kube-system | grep metrics-server

Practical Tasks

Reinforce the lesson with the Day 17 exercises in the course GitHub repository: https://github.com/piyushsachdeva/CKA-2024

See Also

Wiki Concepts

Creator / Entity