Kubernetes Node Affinity
The advanced positive scheduling primitive that gives Pods fine-grained control over which nodes they run on. Node Affinity is the expressive successor to nodeSelector, supporting rich operators, multiple values, and soft/hard constraints. Critical for dedicated node pools, zone-aware placement, and the CKA exam. Synthesized from CKA Day 15 — Kubernetes Node Affinity Explained.
What Is Node Affinity?
By default, the Kubernetes scheduler tries to spread Pods evenly across healthy nodes. But production workloads often have node preferences: SSD storage for databases, GPU acceleration for ML training, specific zones for data residency, or high-memory nodes for caching.
Node Affinity solves this by letting the Pod spec declare which node labels it prefers or requires. Unlike nodeSelector (which only supports exact equality), Node Affinity supports set-based operators (In, NotIn), existence checks (Exists, DoesNotExist), and numeric comparisons (Gt, Lt). It also distinguishes between hard constraints (must match) and soft preferences (best effort).
Key Insight: Node Affinity is a Pod-level property. It is evaluated by the kube-scheduler during the filtering phase. If a Pod declares a
requiredaffinity and no node matches, the Pod remainsPendingwith a clear event message. If it declares apreferredaffinity, the scheduler assigns a score boost to matching nodes but will place the Pod elsewhere if necessary. Source: CKA Day 15
The Two Scheduling Types
Node Affinity offers two scheduling strategies, both sharing the suffix IgnoredDuringExecution:
| Type | Full Name | Scheduling Behaviour | Existing Pods |
|---|---|---|---|
| Required | requiredDuringSchedulingIgnoredDuringExecution | Hard constraint — Pod is only scheduled on nodes that match. If no match, Pod stays Pending. | Unaffected by label changes |
| Preferred | preferredDuringSchedulingIgnoredDuringExecution | Soft preference — scheduler tries to match but will place on any available node. Uses weight (1–100). | Unaffected by label changes |
Critical Distinction: The suffix
IgnoredDuringExecutionmeans that once a Pod is scheduled, changes to node labels do not cause eviction. This is fundamentally different fromNoExecutetaints, which do evict existing Pods. Node Affinity only affects new scheduling decisions.
YAML Anatomy
Required (Hard Constraint)
apiVersion: v1
kind: Pod
metadata:
name: fast-db
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: In
values:
- ssd
- nvme
containers:
- name: postgres
image: postgres:15Field breakdown:
nodeSelectorTerms— a list; terms are ORed (any term can satisfy)matchExpressions— a list within each term; expressions are ANDed (all must match)matchFields— alternative tomatchExpressions, matches node fields (e.g.,metadata.name)
Preferred (Soft Preference with Weight)
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: disktype
operator: In
values:
- ssd
- weight: 50
preference:
matchExpressions:
- key: zone
operator: In
values:
- us-east-1aThe scheduler computes a score for each node based on how many preferences match and their weights. The node with the highest score wins.
Operator Reference
| Operator | Matches When | Typical Use |
|---|---|---|
In | Key has one of the listed values | SSD or NVMe nodes for databases |
NotIn | Key does not have any of the listed values | Avoid nodes with taint=maintenance |
Exists | Key exists (value irrelevant) | Any node labelled gpu (regardless of GPU model) |
DoesNotExist | Key does not exist | Nodes without a dedicated label |
Gt | Key value > specified integer (numeric) | memory > 64 GB nodes |
Lt | Key value < specified integer (numeric) | cpu < 8 core nodes for lightweight jobs |
Exam Trap:
GtandLtrequire the node label value to be a valid integer string (e.g.,memory: "128"). Non-numeric values cause the expression to evaluate as false.
Node Affinity vs nodeSelector
| Feature | nodeSelector | nodeAffinity |
|---|---|---|
| Operators | = only | In, NotIn, Exists, DoesNotExist, Gt, Lt |
| Soft constraints | ❌ No | ✅ preferredDuringScheduling... |
| Multiple values | ❌ No | ✅ Yes — values: [ssd, nvme] |
| OR logic | ❌ No | ✅ nodeSelectorTerms are ORed |
| AND logic | ✅ Implicit (all selectors) | ✅ matchExpressions within a term are ANDed |
When to use which:
- Use
nodeSelectorfor quick, simple constraints (single label, exact match, hard requirement) - Use
nodeAffinityfor production workloads with complex requirements, soft preferences, or multiple acceptable values
Node Affinity vs Taints/Tolerations
| Dimension | Node Affinity (Attract) | Taints/Tolerations (Repel) |
|---|---|---|
| Direction | Pod actively seeks matching nodes | Node actively rejects non-tolerating Pods |
| Guarantee | Hard affinity guarantees placement on matching nodes | Taints only block; a tolerated Pod may land on any node |
| Existing Pods | IgnoredDuringExecution — no eviction | NoExecute evicts existing Pods |
| Multiple conditions | ✅ Rich operators and expressions | ❌ Limited to key=value+effect |
| Production use | Attract workloads to specialised hardware | Keep general workloads off specialised hardware |
Key Realisation from Source: Taints and tolerations alone cannot guarantee that a workload lands on a specific node type — they only prevent unwanted workloads from landing there. If you want to ensure your GPU workload lands on GPU nodes (and not on untainted general nodes), you must combine taints + tolerations + node affinity. Source: CKA Day 15
The Production Pattern: Taints + Affinity + Tolerations
The canonical pattern for creating a dedicated node pool (e.g., GPU, high-memory, zone-specific):
# 1. Label the node pool
kubectl label node worker-gpu-1 tier=ml
# 2. Taint the node pool (repel everything else)
kubectl taint node worker-gpu-1 gpu=true:NoSchedule
# 3. Deploy workload with both affinity and tolerationapiVersion: v1
kind: Pod
metadata:
name: ml-training
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: tier
operator: In
values:
- ml
tolerations:
- key: "gpu"
operator: "Equal"
value: "true"
effect: "NoSchedule"
containers:
- name: trainer
image: tensorflow/tensorflow:latest-gpuWhy both are needed:
- Without the taint: A general Pod (no toleration) could still be scheduled on
worker-gpu-1if the scheduler chooses it — the affinity only affects this Pod, not others. - Without the affinity: A tolerated Pod could be scheduled on any node (including non-GPU nodes) because toleration merely grants permission to land on tainted nodes.
- With both: Only ML Pods are allowed on GPU nodes (taint gate), and ML Pods are required to land on GPU nodes (affinity gate).
Troubleshooting Node Affinity Mismatches
| Symptom | Likely Cause | Fix |
|---|---|---|
Pod stuck Pending with didn't match Pod's node affinity/selector | requiredDuringScheduling with no matching node labels | Add matching labels to nodes, or relax the affinity to preferred |
| Pod scheduled on wrong node despite affinity | Using preferredDuringScheduling and no matching nodes exist | Switch to requiredDuringScheduling if placement must be guaranteed |
| Affinity ignored after node label change | Expected — IgnoredDuringExecution by design | To evict Pods on label change, use NoExecute taints, not affinity |
Gt/Lt operators not matching | Node label value is not a valid integer string | Ensure label values are quoted numbers: memory: "128" |
CKA Exam Speed Patterns
# Check node labels
kubectl get nodes --show-labels
# Label a node for affinity matching
kubectl label node worker-1 disktype=ssd
# Check why a Pod is Pending (look for affinity messages)
kubectl describe pod <name> | grep -A 10 Events
# Imperative run with nodeName (bypasses affinity entirely)
kubectl run debug --image=busybox --restart=Never \
--overrides='{"spec":{"nodeName":"worker-1"}}'YAML Memory Trick: The required affinity struct is:
affinity → nodeAffinity → requiredDuringSchedulingIgnoredDuringExecution → nodeSelectorTerms → [matchExpressions]. Each expression haskey,operator,values(array). Practise typing this nested structure — it appears frequently on the exam and auto-completion is not available.
Related Pages
- Kubernetes Manual Scheduling —
nodeName,nodeSelector, and comparison withnodeAffinity - Kubernetes Taints and Tolerations — the negative scheduling counterpart; production pattern companion
- Kubernetes Labels and Selectors — the metadata system that affinity queries
- Kubernetes Architecture — kube-scheduler filtering and scoring phases
- Pod Fundamentals — the object that carries affinity rules
- Deployment, ReplicaSet & Replication Controller — controllers that replicate Pods with affinity constraints
- Kubernetes DaemonSet — uses tolerations to run on control plane nodes; affinity can restrict DaemonSet scope
- Kubernetes Static Pods — bypass scheduler entirely; affinity does not apply
- Kubernetes Services — routing is unaffected by affinity; only placement is constrained
- Kubernetes Namespaces — affinity is cluster-wide; namespaces do not limit node selection
- CKA Certification — exam domains and weightings
- CKA Study Roadmap — Day 15 in the 40-day plan
- Tech Tutorials with Piyush — course source
Tags: kubernetes node-affinity scheduling node-management cka devops kube-scheduler nodeselector