Kubernetes Autoscaling

The four mechanisms that let Kubernetes match workload capacity to demand: HPA scales Pod replicas horizontally, VPA adjusts Pod resource footprints vertically, Cluster Autoscaler adds/removes nodes, and Node Auto-Provisioning creates tailored node pools. Synthesized from CKA Day 17 — Kubernetes Autoscaling Explained.

Why Autoscaling Matters

Static replica counts and fixed resource allocations waste money during low traffic and fail during spikes. Kubernetes autoscaling provides:

BenefitDescription
Cost efficiencyReduce replicas or node count when demand drops
Performance resilienceAdd capacity automatically before users experience latency
Operational simplicityEliminate manual 3 AM paging to scale services
Right-sizingVPA recommends or applies optimal CPU/memory per container

The Four Autoscaling Mechanisms

Kubernetes provides autoscaling at two levels: Pod-level (how big or numerous are my Pods?) and Cluster-level (how many nodes do I have?).

MechanismLevelWhat It AdjustsBest For
HPAPodNumber of replicas (horizontal)Stateless apps, web APIs, microservices
VPAPodCPU/memory per container (vertical)Stateful apps, databases, right-sizing
Cluster AutoscalerClusterNumber of worker nodesCloud environments with variable total demand
Node Auto-ProvisioningClusterNumber and type of node poolsGKE and managed Kubernetes with diverse workloads

Exam Note: HPA is the most commonly tested autoscaling topic on the CKA. VPA is conceptual knowledge. Cluster Autoscaler and Node Auto-Provisioning are real-world tools but rarely appear on the exam. Source: CKA Day 17

Horizontal vs Vertical Scaling

DimensionHorizontal ScalingVertical Scaling
DirectionOut (more instances)Up (bigger instances)
Kubernetes toolHPAVPA
App requirementMust be stateless or shared-stateCan be stateful; single replica acceptable
SpeedFast (seconds to create Pods)Slower (may require evictions and restarts)
CeilingLimited by cluster node capacityLimited by node size and resource quotas

Design Principle: Prefer horizontal scaling in Kubernetes. Pods are designed to be cattle, not pets. Vertical scaling is reserved for workloads that cannot be replicated easily. Source: CKA Day 17

How the Mechanisms Interact

┌─────────────────────────────────────────────────────────────┐
│                        User Demand                            │
│                    (traffic, queue depth)                     │
└─────────────────────────────────────────────────────────────┘
                              │
          ┌───────────────────┼───────────────────┐
          ▼                   ▼                   ▼
   ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
   │     HPA     │     │     VPA     │     │   Cluster   │
   │  (replicas) │     │  (CPU/mem)  │     │  Autoscaler │
   │             │     │             │     │  (nodes)    │
   └──────┬──────┘     └──────┬──────┘     └──────┬──────┘
          │                   │                   │
          ▼                   ▼                   ▼
   ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
   │  Deployment │     │  Pod specs  │     │ Cloud ASG   │
   │  replicas   │     │  resources  │     │ / MIG       │
   └─────────────┘     └─────────────┘     └─────────────┘

Prerequisites for HPA and VPA

Both Pod-level autoscalers depend on accurate resource data:

  1. Metrics Server must be running in kube-system
  2. Container resources.requests must be declared — HPA calculates utilization as usage / request
  3. Workload must support scaling — HPA requires a scale subresource; VPA requires a Pod spec it can mutate

Without requests, HPA reports <unknown> and does not scale. This is a common troubleshooting scenario linking autoscaling back to resource requests and limits. Source: CKA Day 16

When to Use What

ScenarioRecommended ToolReason
Web API traffic spikesHPAStateless, fast to replicate, easy to load-balance via Service
Database or cacheVPA (Off/Initial)Stateful, hard to replicate; right-size instead
Batch job queue depthHPA + custom metricsScale on queue length, not just CPU
Cluster out of capacityCluster AutoscalerNodes are the bottleneck, not replicas
Mixed workloads on GKENode Auto-ProvisioningNeed GPU nodes for ML, standard nodes for web

Anti-Patterns

Anti-PatternWhy It FailsFix
HPA + VPA Auto on same workloadBoth adjust capacity simultaneously → thrashingUse VPA in “Off” or “Initial” mode, or separate them
HPA without resources.requestsHPA cannot calculate utilization percentageAdd CPU/memory requests to container specs
HPA on a DaemonSetDaemonSet is one-per-node; replicas are fixedUse VPA or Node Auto-Provisioning instead
Cluster Autoscaler without pod disruption budgetsScale-in evicts Pods arbitrarilyAdd PDBs for critical workloads

Sources


Tags: kubernetes autoscaling hpa vpa cluster-autoscaler devops cka cost-optimization