Vertical Pod Autoscaler (VPA)

The Kubernetes tool that analyzes historical and current resource usage to recommend or automatically adjust container CPU and memory requests/limits. VPA solves the opposite problem from HPA: instead of adding more Pods, it makes each Pod the right size. Synthesized from CKA Day 17 — Kubernetes Autoscaling Explained.

What Is VPA?

VPA is not part of the core Kubernetes distribution; it is an official addon maintained by the Kubernetes Autoscaling Special Interest Group (SIG). It consists of three components:

Component	Role
Recommender	Monitors metrics and computes recommended requests/limits
Updater	Evicts Pods that need new resource values (in “Auto” or “Initial” mode)
Admission Plugin	Mutates new Pod specs to inject recommended resources at creation time

CKA Note: VPA is conceptual knowledge for the exam. You should know what it does, its three modes, and why it conflicts with HPA. Detailed VPA installation and configuration are not exam topics. Source: CKA Day 17

VPA Modes

VPA operates in three modes that trade off safety vs automation:

Mode	What Happens	Use Case
Off	Generates recommendations only; does not modify workloads	Safe starting point; review recommendations before applying
Initial	Applies recommendations only to newly created Pods	Low risk; existing Pods keep running with old values
Auto	Evicts running Pods and recreates them with updated resources	Full automation; may cause brief downtime during eviction

Critical Warning: Do not run VPA in “Auto” mode on the same workload as HPA. Both controllers adjust capacity, which causes thrashing: HPA adds replicas, VPA reduces per-Pod resources, HPA removes replicas, VPA increases resources. Choose one primary autoscaler per workload. Source: CKA Day 17

How VPA Works

┌─────────────────────────────────────────────────────────────┐
│                    VPA Architecture                            │
│                                                                │
│  ┌─────────────┐     ┌─────────────┐     ┌─────────────┐    │
│  │ Recommender │────▶│   Updater   │────▶│  Admission  │    │
│  │ (metrics)   │     │ (evictions) │     │  Controller │    │
│  └─────────────┘     └─────────────┘     └─────────────┘    │
│         │                                         │           │
│         ▼                                         ▼           │
│  ┌─────────────────┐                   ┌─────────────────┐  │
│  │ VPA Object      │                   │  New Pod specs  │  │
│  │ (recommendation)│                   │  (mutated)      │  │
│  └─────────────────┘                   └─────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Recommender reads historical metrics from the Metrics Server and time-series stores
It produces a recommendation: “this container should request 200m CPU and 256Mi memory”
The recommendation is stored in the VPA object’s status
Updater (in Auto mode) evicts Pods that are far from the recommendation
Admission Controller intercepts new Pod creation and patches resources.requests/limits before the Pod is scheduled

VPA YAML Anatomy

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: nginx-vpa
  namespace: default
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-deploy
  updatePolicy:
    updateMode: "Off"   # Off | Initial | Auto
  resourcePolicy:
    containerPolicies:
    - containerName: nginx
      minAllowed:
        cpu: 50m
        memory: 100Mi
      maxAllowed:
        cpu: 1000m
        memory: 1Gi
      controlledResources: ["cpu", "memory"]

Field	Description
`targetRef`	The workload to analyze (Deployment, ReplicaSet, StatefulSet, DaemonSet)
`updatePolicy.updateMode`	`Off` (recommend only), `Initial` (new Pods only), `Auto` (evict and recreate)
`resourcePolicy.containerPolicies`	Bounds and controlled resources per container
`minAllowed` / `maxAllowed`	Safety rails — VPA will not recommend outside this range
`controlledResources`	Which resources VPA manages (`cpu`, `memory`, or both)

VPA vs HPA: When to Choose Which

Scenario	Best Tool	Reason
Stateless web service with traffic spikes	HPA	Fast to replicate; load balancer distributes traffic
Database or cache that cannot be replicated	VPA	Single replica; right-size instead of replicate
Over-provisioned Pods wasting cluster capacity	VPA (Off)	Analyze first, then apply recommendations manually
Microservice with predictable daily patterns	HPA	Match replica count to demand curve
Pod constantly OOMKilled despite not being under load	VPA	Memory limit is too low; VPA raises it
Need both scale-out and right-sizing	HPA + VPA (Off/Initial)	HPA handles traffic; VPA informs baseline sizing

VPA and Resource Requests/Limits

VPA directly mutates the resources.requests and resources.limits fields that Day 16 teaches. This means:

In “Off” mode: You manually apply VPA recommendations to your YAML manifests. This is the safest production workflow.
In “Auto” mode: VPA changes are ephemeral (applied live, not saved to Git). This creates configuration drift. GitOps teams usually prefer “Off” mode.

A typical VPA workflow:

Deploy workload with conservative requests
Run VPA in “Off” mode for 24–48 hours
Read recommendations from kubectl describe vpa <name>
Update your Git-tracked manifests with the recommended values
Re-deploy and disable VPA, or keep it in “Off” mode for continuous monitoring

Limitations and Gotchas

Limitation	Explanation
Not core Kubernetes	Must be installed separately; not available on all managed clusters by default
Requires eviction	”Auto” mode restarts Pods to apply new resource values
Conflicts with HPA	Both autoscalers on the same workload cause oscillation
Does not handle limits-only	VPA adjusts requests; if you set limits without requests, behavior is undefined
No instant reaction	Recommendations are based on historical averages, not real-time spikes
Limited to Pod resources	VPA does not scale nodes; use Cluster Autoscaler for that

Sources

CKA Day 17 — Kubernetes Autoscaling Explained: HPA vs VPA

Kubernetes Autoscaling — overview of all four mechanisms
Horizontal Pod Autoscaler (HPA) — when to add replicas instead of resizing
Kubernetes Resource Requests and Limits — the fields VPA adjusts
Deployment, ReplicaSet & Replication Controller — workloads VPA targets
Pod Fundamentals — the unit being resized
Kubernetes Architecture — controller and admission controller context
Kubernetes Namespaces — VPA objects are namespace-scoped
Why Kubernetes? — autoscaling as a core orchestration benefit
CKA Certification — exam domains
CKA Study Roadmap — Day 17 in the 40-day plan
Tech Tutorials with Piyush — course creator

Tags: kubernetes vpa vertical-pod-autoscaler autoscaling resource-optimization devops cka

Rakesh's Brain

Explorer

Vertical Pod Autoscaler (VPA)

Vertical Pod Autoscaler (VPA)

What Is VPA?

VPA Modes

How VPA Works

VPA YAML Anatomy

VPA vs HPA: When to Choose Which

VPA and Resource Requests/Limits

Limitations and Gotchas

Sources

Table of Contents

Graph View

Latest Blog Posts

Backlinks

Rakesh's Brain

Explorer

Vertical Pod Autoscaler (VPA)

Vertical Pod Autoscaler (VPA)

What Is VPA?

VPA Modes

How VPA Works

VPA YAML Anatomy

VPA vs HPA: When to Choose Which

VPA and Resource Requests/Limits

Limitations and Gotchas

Sources

Related Pages

Table of Contents

Graph View

Latest Blog Posts

Backlinks