Kubernetes Autoscaling
The four mechanisms that let Kubernetes match workload capacity to demand: HPA scales Pod replicas horizontally, VPA adjusts Pod resource footprints vertically, Cluster Autoscaler adds/removes nodes, and Node Auto-Provisioning creates tailored node pools. Synthesized from CKA Day 17 — Kubernetes Autoscaling Explained.
Why Autoscaling Matters
Static replica counts and fixed resource allocations waste money during low traffic and fail during spikes. Kubernetes autoscaling provides:
| Benefit | Description |
|---|---|
| Cost efficiency | Reduce replicas or node count when demand drops |
| Performance resilience | Add capacity automatically before users experience latency |
| Operational simplicity | Eliminate manual 3 AM paging to scale services |
| Right-sizing | VPA recommends or applies optimal CPU/memory per container |
The Four Autoscaling Mechanisms
Kubernetes provides autoscaling at two levels: Pod-level (how big or numerous are my Pods?) and Cluster-level (how many nodes do I have?).
| Mechanism | Level | What It Adjusts | Best For |
|---|---|---|---|
| HPA | Pod | Number of replicas (horizontal) | Stateless apps, web APIs, microservices |
| VPA | Pod | CPU/memory per container (vertical) | Stateful apps, databases, right-sizing |
| Cluster Autoscaler | Cluster | Number of worker nodes | Cloud environments with variable total demand |
| Node Auto-Provisioning | Cluster | Number and type of node pools | GKE and managed Kubernetes with diverse workloads |
Exam Note: HPA is the most commonly tested autoscaling topic on the CKA. VPA is conceptual knowledge. Cluster Autoscaler and Node Auto-Provisioning are real-world tools but rarely appear on the exam. Source: CKA Day 17
Horizontal vs Vertical Scaling
| Dimension | Horizontal Scaling | Vertical Scaling |
|---|---|---|
| Direction | Out (more instances) | Up (bigger instances) |
| Kubernetes tool | HPA | VPA |
| App requirement | Must be stateless or shared-state | Can be stateful; single replica acceptable |
| Speed | Fast (seconds to create Pods) | Slower (may require evictions and restarts) |
| Ceiling | Limited by cluster node capacity | Limited by node size and resource quotas |
Design Principle: Prefer horizontal scaling in Kubernetes. Pods are designed to be cattle, not pets. Vertical scaling is reserved for workloads that cannot be replicated easily. Source: CKA Day 17
How the Mechanisms Interact
┌─────────────────────────────────────────────────────────────┐
│ User Demand │
│ (traffic, queue depth) │
└─────────────────────────────────────────────────────────────┘
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ HPA │ │ VPA │ │ Cluster │
│ (replicas) │ │ (CPU/mem) │ │ Autoscaler │
│ │ │ │ │ (nodes) │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Deployment │ │ Pod specs │ │ Cloud ASG │
│ replicas │ │ resources │ │ / MIG │
└─────────────┘ └─────────────┘ └─────────────┘
Prerequisites for HPA and VPA
Both Pod-level autoscalers depend on accurate resource data:
- Metrics Server must be running in
kube-system - Container
resources.requestsmust be declared — HPA calculates utilization asusage / request - Workload must support scaling — HPA requires a
scalesubresource; VPA requires a Pod spec it can mutate
Without requests, HPA reports <unknown> and does not scale. This is a common troubleshooting scenario linking autoscaling back to resource requests and limits. Source: CKA Day 16
When to Use What
| Scenario | Recommended Tool | Reason |
|---|---|---|
| Web API traffic spikes | HPA | Stateless, fast to replicate, easy to load-balance via Service |
| Database or cache | VPA (Off/Initial) | Stateful, hard to replicate; right-size instead |
| Batch job queue depth | HPA + custom metrics | Scale on queue length, not just CPU |
| Cluster out of capacity | Cluster Autoscaler | Nodes are the bottleneck, not replicas |
| Mixed workloads on GKE | Node Auto-Provisioning | Need GPU nodes for ML, standard nodes for web |
Anti-Patterns
| Anti-Pattern | Why It Fails | Fix |
|---|---|---|
| HPA + VPA Auto on same workload | Both adjust capacity simultaneously → thrashing | Use VPA in “Off” or “Initial” mode, or separate them |
HPA without resources.requests | HPA cannot calculate utilization percentage | Add CPU/memory requests to container specs |
| HPA on a DaemonSet | DaemonSet is one-per-node; replicas are fixed | Use VPA or Node Auto-Provisioning instead |
| Cluster Autoscaler without pod disruption budgets | Scale-in evicts Pods arbitrarily | Add PDBs for critical workloads |
Sources
Related Pages
- Horizontal Pod Autoscaler (HPA) — detailed YAML, metrics, and exam commands
- Vertical Pod Autoscaler (VPA) — modes, recommendations, and conflict avoidance
- Kubernetes Resource Requests and Limits — prerequisite for utilization calculations
- Deployment, ReplicaSet & Replication Controller — the workloads autoscalers target
- Pod Fundamentals — the unit being scaled
- Kubernetes Services — traffic distribution across scaled Pods
- Kubernetes Architecture — kube-controller-manager and Metrics Server
- Kubernetes Namespaces — autoscaling objects are namespace-scoped
- Kubernetes Labels and Selectors — how HPA identifies target workloads
- Why Kubernetes? — autoscaling as a core orchestration benefit
- Kubernetes Health Probes — HPA counts only ready replicas for utilization calculations
- CKA Certification — exam domains and weightings
- CKA Study Roadmap — Day 17 in the 40-day plan
- Tech Tutorials with Piyush — course creator
Tags: kubernetes autoscaling hpa vpa cluster-autoscaler devops cka cost-optimization