Vertical Pod Autoscaler (VPA)

The Kubernetes tool that analyzes historical and current resource usage to recommend or automatically adjust container CPU and memory requests/limits. VPA solves the opposite problem from HPA: instead of adding more Pods, it makes each Pod the right size. Synthesized from CKA Day 17 — Kubernetes Autoscaling Explained.

What Is VPA?

VPA is not part of the core Kubernetes distribution; it is an official addon maintained by the Kubernetes Autoscaling Special Interest Group (SIG). It consists of three components:

ComponentRole
RecommenderMonitors metrics and computes recommended requests/limits
UpdaterEvicts Pods that need new resource values (in “Auto” or “Initial” mode)
Admission PluginMutates new Pod specs to inject recommended resources at creation time

CKA Note: VPA is conceptual knowledge for the exam. You should know what it does, its three modes, and why it conflicts with HPA. Detailed VPA installation and configuration are not exam topics. Source: CKA Day 17

VPA Modes

VPA operates in three modes that trade off safety vs automation:

ModeWhat HappensUse Case
OffGenerates recommendations only; does not modify workloadsSafe starting point; review recommendations before applying
InitialApplies recommendations only to newly created PodsLow risk; existing Pods keep running with old values
AutoEvicts running Pods and recreates them with updated resourcesFull automation; may cause brief downtime during eviction

Critical Warning: Do not run VPA in “Auto” mode on the same workload as HPA. Both controllers adjust capacity, which causes thrashing: HPA adds replicas, VPA reduces per-Pod resources, HPA removes replicas, VPA increases resources. Choose one primary autoscaler per workload. Source: CKA Day 17

How VPA Works

┌─────────────────────────────────────────────────────────────┐
│                    VPA Architecture                            │
│                                                                │
│  ┌─────────────┐     ┌─────────────┐     ┌─────────────┐    │
│  │ Recommender │────▶│   Updater   │────▶│  Admission  │    │
│  │ (metrics)   │     │ (evictions) │     │  Controller │    │
│  └─────────────┘     └─────────────┘     └─────────────┘    │
│         │                                         │           │
│         ▼                                         ▼           │
│  ┌─────────────────┐                   ┌─────────────────┐  │
│  │ VPA Object      │                   │  New Pod specs  │  │
│  │ (recommendation)│                   │  (mutated)      │  │
│  └─────────────────┘                   └─────────────────┘  │
└─────────────────────────────────────────────────────────────┘
  1. Recommender reads historical metrics from the Metrics Server and time-series stores
  2. It produces a recommendation: “this container should request 200m CPU and 256Mi memory”
  3. The recommendation is stored in the VPA object’s status
  4. Updater (in Auto mode) evicts Pods that are far from the recommendation
  5. Admission Controller intercepts new Pod creation and patches resources.requests/limits before the Pod is scheduled

VPA YAML Anatomy

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: nginx-vpa
  namespace: default
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-deploy
  updatePolicy:
    updateMode: "Off"   # Off | Initial | Auto
  resourcePolicy:
    containerPolicies:
    - containerName: nginx
      minAllowed:
        cpu: 50m
        memory: 100Mi
      maxAllowed:
        cpu: 1000m
        memory: 1Gi
      controlledResources: ["cpu", "memory"]
FieldDescription
targetRefThe workload to analyze (Deployment, ReplicaSet, StatefulSet, DaemonSet)
updatePolicy.updateModeOff (recommend only), Initial (new Pods only), Auto (evict and recreate)
resourcePolicy.containerPoliciesBounds and controlled resources per container
minAllowed / maxAllowedSafety rails — VPA will not recommend outside this range
controlledResourcesWhich resources VPA manages (cpu, memory, or both)

VPA vs HPA: When to Choose Which

ScenarioBest ToolReason
Stateless web service with traffic spikesHPAFast to replicate; load balancer distributes traffic
Database or cache that cannot be replicatedVPASingle replica; right-size instead of replicate
Over-provisioned Pods wasting cluster capacityVPA (Off)Analyze first, then apply recommendations manually
Microservice with predictable daily patternsHPAMatch replica count to demand curve
Pod constantly OOMKilled despite not being under loadVPAMemory limit is too low; VPA raises it
Need both scale-out and right-sizingHPA + VPA (Off/Initial)HPA handles traffic; VPA informs baseline sizing

VPA and Resource Requests/Limits

VPA directly mutates the resources.requests and resources.limits fields that Day 16 teaches. This means:

  • In “Off” mode: You manually apply VPA recommendations to your YAML manifests. This is the safest production workflow.
  • In “Auto” mode: VPA changes are ephemeral (applied live, not saved to Git). This creates configuration drift. GitOps teams usually prefer “Off” mode.

A typical VPA workflow:

  1. Deploy workload with conservative requests
  2. Run VPA in “Off” mode for 24–48 hours
  3. Read recommendations from kubectl describe vpa <name>
  4. Update your Git-tracked manifests with the recommended values
  5. Re-deploy and disable VPA, or keep it in “Off” mode for continuous monitoring

Limitations and Gotchas

LimitationExplanation
Not core KubernetesMust be installed separately; not available on all managed clusters by default
Requires eviction”Auto” mode restarts Pods to apply new resource values
Conflicts with HPABoth autoscalers on the same workload cause oscillation
Does not handle limits-onlyVPA adjusts requests; if you set limits without requests, behavior is undefined
No instant reactionRecommendations are based on historical averages, not real-time spikes
Limited to Pod resourcesVPA does not scale nodes; use Cluster Autoscaler for that

Sources


Tags: kubernetes vpa vertical-pod-autoscaler autoscaling resource-optimization devops cka