CKA Day 17 — Kubernetes Autoscaling Explained: HPA vs VPA
Day 17 of the 40-day CKA Certification course by Tech Tutorials with Piyush. This lesson introduces the four Kubernetes autoscaling mechanisms, explains when each applies, and demonstrates HPA with a live load-simulation demo.
What Is Scaling in Kubernetes?
Scaling is the ability to change the capacity of a workload to match demand. In Kubernetes, scaling operates at two levels:
| Level | What Changes | Use Case |
|---|---|---|
| Pod-level | Number of replicas or size of individual Pods | Handle traffic spikes, optimize resource usage |
| Cluster-level | Number of worker nodes | Add or remove capacity when the cluster itself runs out of room |
The lesson demonstrates that Kubernetes offers four distinct autoscaling tools, each targeting a different layer of the stack.
Horizontal vs Vertical Scaling
| Dimension | Horizontal Scaling | Vertical Scaling |
|---|---|---|
| What it does | Adds or removes Pod replicas | Increases or decreases CPU/memory per Pod |
| Kubernetes tool | HPA (Horizontal Pod Autoscaler) | VPA (Vertical Pod Autoscaler) |
| Best for | Stateless workloads, variable traffic | Stateful or single-replica workloads, right-sizing |
| Limitation | Requires the app to handle multiple instances | May trigger rolling restarts when limits change |
Key Insight: Horizontal scaling is the default Kubernetes pattern because Pods are designed to be ephemeral and interchangeable. Vertical scaling is used when you cannot add replicas (e.g., a database pod) or when Pods are over-provisioned and wasting resources. Source: CKA Day 17
The Four Autoscaling Mechanisms
1. Horizontal Pod Autoscaler (HPA)
HPA watches Pod metrics (CPU, memory, or custom metrics) and automatically scales the number of replicas in a Deployment, ReplicaSet, or StatefulSet up or down.
How it works:
- Metrics Server collects CPU/memory usage from kubelet
- HPA controller (part of kube-controller-manager) reads metrics
- If average CPU > target threshold (e.g., 50%), HPA increases
replicas - If average CPU < target, HPA decreases
replicas(after a stabilization delay)
Prerequisites:
- Metrics Server must be installed in the cluster
- Target workload must have
resources.requestsdefined (HPA uses requests as the baseline) - Target workload must have a
scalesubresource (Deployment, ReplicaSet, StatefulSet)
2. Vertical Pod Autoscaler (VPA)
VPA analyzes historical resource usage and recommends (or automatically applies) optimal CPU/memory requests and limits for containers.
Modes:
- Off: Only generates recommendations (safe for learning)
- Initial: Applies recommendations only to newly created Pods
- Auto: Updates running Pods by evicting and recreating them with new resource values
Caution: VPA “Auto” mode can conflict with HPA on the same workload because both try to adjust capacity. The lesson recommends using VPA in “Off” or “Initial” mode when HPA is active, or using them on different workloads. Source: CKA Day 17
3. Cluster Autoscaler
Cluster Autoscaler runs as a Pod in the cluster and adjusts the number of worker nodes by talking to the cloud provider’s API (AWS, GCP, Azure).
Trigger conditions:
- Scale-out: Pods are stuck
Pendingbecause no node has enough capacity - Scale-in: A node is underutilized (below a configurable threshold) and all its Pods can be rescheduled elsewhere
Important: Cluster Autoscaler is not part of the core CKA curriculum (it is cloud-specific), but understanding the concept is valuable for real-world operations. Source: CKA Day 17
4. Node Auto-Provisioning (GKE-specific)
A Google Kubernetes Engine (GKE) feature that automatically creates new node pools with the right machine type when workloads cannot be scheduled. This extends Cluster Autoscaler by provisioning not just nodes, but entirely new node pools with custom machine shapes.
Note: Node Auto-Provisioning is mentioned in the lesson as an advanced cloud-managed Kubernetes feature. It is not required for the CKA exam but demonstrates how managed platforms extend native autoscaling. Source: CKA Day 17
HPA Demo: Simulating Load
The lesson demonstrates HPA end-to-end:
- Install Metrics Server (if not already present)
- Create a Deployment with
resources.requests.cpuset - Expose the Deployment via a Service
- Create an HPA targeting the Deployment with a CPU target (e.g., 50%)
- Generate load using a tool like
ab(Apache Bench) orhey - Watch HPA scale up replicas as CPU crosses the threshold
- Stop the load and watch HPA scale down after the cooldown period
Key observation: Without resources.requests, HPA cannot calculate utilization percentages. The Pod template must declare requests; otherwise HPA reports <unknown> and does not scale. This reinforces Day 16’s lesson on requests and limits. Source: CKA Day 16
HPA YAML Anatomy
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: nginx-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx-deploy
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50| Field | Purpose |
|---|---|
scaleTargetRef | Points to the Deployment/ReplicaSet/StatefulSet to scale |
minReplicas | Floor — HPA will not scale below this |
maxReplicas | Ceiling — HPA will not scale above this |
metrics | List of metrics to evaluate (CPU, memory, or custom) |
target.type | Utilization (percentage) or AverageValue (absolute) |
CKA Exam Relevance
- Workloads & Scheduling (~15%): Expect tasks to create an HPA for a Deployment, or troubleshoot why HPA is not scaling.
- Troubleshooting (~30%): Common HPA failures include missing Metrics Server, missing
resources.requests, or incorrectscaleTargetRef. - Speed Patterns:
# Imperative HPA creation (fastest for exam) kubectl autoscale deployment nginx-deploy --min=1 --max=10 --cpu-percent=50 # Check HPA status kubectl get hpa kubectl describe hpa nginx-hpa # Verify Metrics Server is running kubectl get pods -n kube-system | grep metrics-server
Practical Tasks
Reinforce the lesson with the Day 17 exercises in the course GitHub repository: https://github.com/piyushsachdeva/CKA-2024
See Also
Wiki Concepts
- Kubernetes Autoscaling — comprehensive overview of all four mechanisms
- Horizontal Pod Autoscaler (HPA) — deep dive into HPA internals, metrics, and YAML patterns
- Vertical Pod Autoscaler (VPA) — VPA modes, recommendations, and conflict avoidance
- Kubernetes Resource Requests and Limits — prerequisite for HPA utilization calculations
- Deployment, ReplicaSet & Replication Controller — the workloads HPA scales
- Pod Fundamentals — the unit of scaling
- Kubernetes Services — how traffic reaches HPA-scaled Pods
- Kubernetes Architecture — kube-controller-manager and Metrics Server context
- Kubernetes Namespaces — HPA is namespace-scoped
- Kubernetes Labels and Selectors — how HPA targets workloads
- Why Kubernetes? — autoscaling as a primary orchestration benefit
- CKA Certification — exam domains and preparation
- CKA Study Roadmap — Day 17 in the 40-day plan
Related Sources
- CKA Day 16 — Kubernetes Requests and Limits — prerequisite for HPA
Creator / Entity
- Tech Tutorials with Piyush — CKA course creator and Day 17 instructor