Kubernetes Autoscaling

The four mechanisms that let Kubernetes match workload capacity to demand: HPA scales Pod replicas horizontally, VPA adjusts Pod resource footprints vertically, Cluster Autoscaler adds/removes nodes, and Node Auto-Provisioning creates tailored node pools. Synthesized from CKA Day 17 — Kubernetes Autoscaling Explained.

Why Autoscaling Matters

Static replica counts and fixed resource allocations waste money during low traffic and fail during spikes. Kubernetes autoscaling provides:

Benefit	Description
Cost efficiency	Reduce replicas or node count when demand drops
Performance resilience	Add capacity automatically before users experience latency
Operational simplicity	Eliminate manual 3 AM paging to scale services
Right-sizing	VPA recommends or applies optimal CPU/memory per container

The Four Autoscaling Mechanisms

Kubernetes provides autoscaling at two levels: Pod-level (how big or numerous are my Pods?) and Cluster-level (how many nodes do I have?).

Mechanism	Level	What It Adjusts	Best For
HPA	Pod	Number of replicas (horizontal)	Stateless apps, web APIs, microservices
VPA	Pod	CPU/memory per container (vertical)	Stateful apps, databases, right-sizing
Cluster Autoscaler	Cluster	Number of worker nodes	Cloud environments with variable total demand
Node Auto-Provisioning	Cluster	Number and type of node pools	GKE and managed Kubernetes with diverse workloads

Exam Note: HPA is the most commonly tested autoscaling topic on the CKA. VPA is conceptual knowledge. Cluster Autoscaler and Node Auto-Provisioning are real-world tools but rarely appear on the exam. Source: CKA Day 17

Horizontal vs Vertical Scaling

Dimension	Horizontal Scaling	Vertical Scaling
Direction	Out (more instances)	Up (bigger instances)
Kubernetes tool	HPA	VPA
App requirement	Must be stateless or shared-state	Can be stateful; single replica acceptable
Speed	Fast (seconds to create Pods)	Slower (may require evictions and restarts)
Ceiling	Limited by cluster node capacity	Limited by node size and resource quotas

Design Principle: Prefer horizontal scaling in Kubernetes. Pods are designed to be cattle, not pets. Vertical scaling is reserved for workloads that cannot be replicated easily. Source: CKA Day 17

How the Mechanisms Interact

┌─────────────────────────────────────────────────────────────┐
│                        User Demand                            │
│                    (traffic, queue depth)                     │
└─────────────────────────────────────────────────────────────┘
                              │
          ┌───────────────────┼───────────────────┐
          ▼                   ▼                   ▼
   ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
   │     HPA     │     │     VPA     │     │   Cluster   │
   │  (replicas) │     │  (CPU/mem)  │     │  Autoscaler │
   │             │     │             │     │  (nodes)    │
   └──────┬──────┘     └──────┬──────┘     └──────┬──────┘
          │                   │                   │
          ▼                   ▼                   ▼
   ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
   │  Deployment │     │  Pod specs  │     │ Cloud ASG   │
   │  replicas   │     │  resources  │     │ / MIG       │
   └─────────────┘     └─────────────┘     └─────────────┘

Prerequisites for HPA and VPA

Both Pod-level autoscalers depend on accurate resource data:

Metrics Server must be running in kube-system
Container resources.requests must be declared — HPA calculates utilization as usage / request
Workload must support scaling — HPA requires a scale subresource; VPA requires a Pod spec it can mutate

Without requests, HPA reports <unknown> and does not scale. This is a common troubleshooting scenario linking autoscaling back to resource requests and limits. Source: CKA Day 16

When to Use What

Scenario	Recommended Tool	Reason
Web API traffic spikes	HPA	Stateless, fast to replicate, easy to load-balance via Service
Database or cache	VPA (Off/Initial)	Stateful, hard to replicate; right-size instead
Batch job queue depth	HPA + custom metrics	Scale on queue length, not just CPU
Cluster out of capacity	Cluster Autoscaler	Nodes are the bottleneck, not replicas
Mixed workloads on GKE	Node Auto-Provisioning	Need GPU nodes for ML, standard nodes for web

Anti-Patterns

Anti-Pattern	Why It Fails	Fix
HPA + VPA Auto on same workload	Both adjust capacity simultaneously → thrashing	Use VPA in “Off” or “Initial” mode, or separate them
HPA without `resources.requests`	HPA cannot calculate utilization percentage	Add CPU/memory requests to container specs
HPA on a DaemonSet	DaemonSet is one-per-node; replicas are fixed	Use VPA or Node Auto-Provisioning instead
Cluster Autoscaler without pod disruption budgets	Scale-in evicts Pods arbitrarily	Add PDBs for critical workloads

Sources

CKA Day 17 — Kubernetes Autoscaling Explained: HPA vs VPA

Horizontal Pod Autoscaler (HPA) — detailed YAML, metrics, and exam commands
Vertical Pod Autoscaler (VPA) — modes, recommendations, and conflict avoidance
Kubernetes Resource Requests and Limits — prerequisite for utilization calculations
Deployment, ReplicaSet & Replication Controller — the workloads autoscalers target
Pod Fundamentals — the unit being scaled
Kubernetes Services — traffic distribution across scaled Pods
Kubernetes Architecture — kube-controller-manager and Metrics Server
Kubernetes Namespaces — autoscaling objects are namespace-scoped
Kubernetes Labels and Selectors — how HPA identifies target workloads
Why Kubernetes? — autoscaling as a core orchestration benefit
Kubernetes Health Probes — HPA counts only ready replicas for utilization calculations
CKA Certification — exam domains and weightings
CKA Study Roadmap — Day 17 in the 40-day plan
Tech Tutorials with Piyush — course creator

Tags: kubernetes autoscaling hpa vpa cluster-autoscaler devops cka cost-optimization

Rakesh's Brain

Explorer

Kubernetes Autoscaling

Kubernetes Autoscaling

Why Autoscaling Matters

The Four Autoscaling Mechanisms

Horizontal vs Vertical Scaling

How the Mechanisms Interact

Prerequisites for HPA and VPA

When to Use What

Anti-Patterns

Sources

Table of Contents

Graph View

Latest Blog Posts

Backlinks

Rakesh's Brain

Explorer

Kubernetes Autoscaling

Kubernetes Autoscaling

Why Autoscaling Matters

The Four Autoscaling Mechanisms

Horizontal vs Vertical Scaling

How the Mechanisms Interact

Prerequisites for HPA and VPA

When to Use What

Anti-Patterns

Sources

Related Pages

Table of Contents

Graph View

Latest Blog Posts

Backlinks