CKA Day 17 — Kubernetes Autoscaling Explained: HPA vs VPA

Day 17 of the 40-day CKA Certification course by Tech Tutorials with Piyush. This lesson introduces the four Kubernetes autoscaling mechanisms, explains when each applies, and demonstrates HPA with a live load-simulation demo.

What Is Scaling in Kubernetes?

Scaling is the ability to change the capacity of a workload to match demand. In Kubernetes, scaling operates at two levels:

Level	What Changes	Use Case
Pod-level	Number of replicas or size of individual Pods	Handle traffic spikes, optimize resource usage
Cluster-level	Number of worker nodes	Add or remove capacity when the cluster itself runs out of room

The lesson demonstrates that Kubernetes offers four distinct autoscaling tools, each targeting a different layer of the stack.

Horizontal vs Vertical Scaling

Dimension	Horizontal Scaling	Vertical Scaling
What it does	Adds or removes Pod replicas	Increases or decreases CPU/memory per Pod
Kubernetes tool	HPA (Horizontal Pod Autoscaler)	VPA (Vertical Pod Autoscaler)
Best for	Stateless workloads, variable traffic	Stateful or single-replica workloads, right-sizing
Limitation	Requires the app to handle multiple instances	May trigger rolling restarts when limits change

Key Insight: Horizontal scaling is the default Kubernetes pattern because Pods are designed to be ephemeral and interchangeable. Vertical scaling is used when you cannot add replicas (e.g., a database pod) or when Pods are over-provisioned and wasting resources. Source: CKA Day 17

The Four Autoscaling Mechanisms

1. Horizontal Pod Autoscaler (HPA)

HPA watches Pod metrics (CPU, memory, or custom metrics) and automatically scales the number of replicas in a Deployment, ReplicaSet, or StatefulSet up or down.

How it works:

Metrics Server collects CPU/memory usage from kubelet
HPA controller (part of kube-controller-manager) reads metrics
If average CPU > target threshold (e.g., 50%), HPA increases replicas
If average CPU < target, HPA decreases replicas (after a stabilization delay)

Prerequisites:

Metrics Server must be installed in the cluster
Target workload must have resources.requests defined (HPA uses requests as the baseline)
Target workload must have a scale subresource (Deployment, ReplicaSet, StatefulSet)

2. Vertical Pod Autoscaler (VPA)

VPA analyzes historical resource usage and recommends (or automatically applies) optimal CPU/memory requests and limits for containers.

Modes:

Off: Only generates recommendations (safe for learning)
Initial: Applies recommendations only to newly created Pods
Auto: Updates running Pods by evicting and recreating them with new resource values

Caution: VPA “Auto” mode can conflict with HPA on the same workload because both try to adjust capacity. The lesson recommends using VPA in “Off” or “Initial” mode when HPA is active, or using them on different workloads. Source: CKA Day 17

3. Cluster Autoscaler

Cluster Autoscaler runs as a Pod in the cluster and adjusts the number of worker nodes by talking to the cloud provider’s API (AWS, GCP, Azure).

Trigger conditions:

Scale-out: Pods are stuck Pending because no node has enough capacity
Scale-in: A node is underutilized (below a configurable threshold) and all its Pods can be rescheduled elsewhere

Important: Cluster Autoscaler is not part of the core CKA curriculum (it is cloud-specific), but understanding the concept is valuable for real-world operations. Source: CKA Day 17

4. Node Auto-Provisioning (GKE-specific)

A Google Kubernetes Engine (GKE) feature that automatically creates new node pools with the right machine type when workloads cannot be scheduled. This extends Cluster Autoscaler by provisioning not just nodes, but entirely new node pools with custom machine shapes.

Note: Node Auto-Provisioning is mentioned in the lesson as an advanced cloud-managed Kubernetes feature. It is not required for the CKA exam but demonstrates how managed platforms extend native autoscaling. Source: CKA Day 17

HPA Demo: Simulating Load

The lesson demonstrates HPA end-to-end:

Install Metrics Server (if not already present)
Create a Deployment with resources.requests.cpu set
Expose the Deployment via a Service
Create an HPA targeting the Deployment with a CPU target (e.g., 50%)
Generate load using a tool like ab (Apache Bench) or hey
Watch HPA scale up replicas as CPU crosses the threshold
Stop the load and watch HPA scale down after the cooldown period

Key observation: Without resources.requests, HPA cannot calculate utilization percentages. The Pod template must declare requests; otherwise HPA reports <unknown> and does not scale. This reinforces Day 16’s lesson on requests and limits. Source: CKA Day 16

HPA YAML Anatomy

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-deploy
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

Field	Purpose
`scaleTargetRef`	Points to the Deployment/ReplicaSet/StatefulSet to scale
`minReplicas`	Floor — HPA will not scale below this
`maxReplicas`	Ceiling — HPA will not scale above this
`metrics`	List of metrics to evaluate (CPU, memory, or custom)
`target.type`	`Utilization` (percentage) or `AverageValue` (absolute)

CKA Exam Relevance

Workloads & Scheduling (~15%): Expect tasks to create an HPA for a Deployment, or troubleshoot why HPA is not scaling.
Troubleshooting (~30%): Common HPA failures include missing Metrics Server, missing resources.requests, or incorrect scaleTargetRef.

Speed Patterns:

# Imperative HPA creation (fastest for exam)
kubectl autoscale deployment nginx-deploy --min=1 --max=10 --cpu-percent=50
 
# Check HPA status
kubectl get hpa
kubectl describe hpa nginx-hpa
 
# Verify Metrics Server is running
kubectl get pods -n kube-system | grep metrics-server

Practical Tasks

Reinforce the lesson with the Day 17 exercises in the course GitHub repository: https://github.com/piyushsachdeva/CKA-2024

Rakesh's Brain

Explorer

CKA Day 17 — Kubernetes Autoscaling Explained: HPA vs VPA

CKA Day 17 — Kubernetes Autoscaling Explained: HPA vs VPA

What Is Scaling in Kubernetes?

Horizontal vs Vertical Scaling

The Four Autoscaling Mechanisms

1. Horizontal Pod Autoscaler (HPA)

2. Vertical Pod Autoscaler (VPA)

3. Cluster Autoscaler

4. Node Auto-Provisioning (GKE-specific)

HPA Demo: Simulating Load

HPA YAML Anatomy

CKA Exam Relevance

Practical Tasks

See Also

Wiki Concepts

Creator / Entity

Table of Contents

Graph View

Latest Blog Posts

Backlinks

Rakesh's Brain

Explorer

CKA Day 17 — Kubernetes Autoscaling Explained: HPA vs VPA

CKA Day 17 — Kubernetes Autoscaling Explained: HPA vs VPA

What Is Scaling in Kubernetes?

Horizontal vs Vertical Scaling

The Four Autoscaling Mechanisms

1. Horizontal Pod Autoscaler (HPA)

2. Vertical Pod Autoscaler (VPA)

3. Cluster Autoscaler

4. Node Auto-Provisioning (GKE-specific)

HPA Demo: Simulating Load

HPA YAML Anatomy

CKA Exam Relevance

Practical Tasks

See Also

Wiki Concepts

Related Sources

Creator / Entity

Table of Contents

Graph View

Latest Blog Posts

Backlinks