Kubernetes Node Affinity

The advanced positive scheduling primitive that gives Pods fine-grained control over which nodes they run on. Node Affinity is the expressive successor to nodeSelector, supporting rich operators, multiple values, and soft/hard constraints. Critical for dedicated node pools, zone-aware placement, and the CKA exam. Synthesized from CKA Day 15 — Kubernetes Node Affinity Explained.

What Is Node Affinity?

By default, the Kubernetes scheduler tries to spread Pods evenly across healthy nodes. But production workloads often have node preferences: SSD storage for databases, GPU acceleration for ML training, specific zones for data residency, or high-memory nodes for caching.

Node Affinity solves this by letting the Pod spec declare which node labels it prefers or requires. Unlike nodeSelector (which only supports exact equality), Node Affinity supports set-based operators (In, NotIn), existence checks (Exists, DoesNotExist), and numeric comparisons (Gt, Lt). It also distinguishes between hard constraints (must match) and soft preferences (best effort).

Key Insight: Node Affinity is a Pod-level property. It is evaluated by the kube-scheduler during the filtering phase. If a Pod declares a required affinity and no node matches, the Pod remains Pending with a clear event message. If it declares a preferred affinity, the scheduler assigns a score boost to matching nodes but will place the Pod elsewhere if necessary. Source: CKA Day 15

The Two Scheduling Types

Node Affinity offers two scheduling strategies, both sharing the suffix IgnoredDuringExecution:

Type	Full Name	Scheduling Behaviour	Existing Pods
Required	`requiredDuringSchedulingIgnoredDuringExecution`	Hard constraint — Pod is only scheduled on nodes that match. If no match, Pod stays `Pending`.	Unaffected by label changes
Preferred	`preferredDuringSchedulingIgnoredDuringExecution`	Soft preference — scheduler tries to match but will place on any available node. Uses `weight` (1–100).	Unaffected by label changes

Critical Distinction: The suffix IgnoredDuringExecution means that once a Pod is scheduled, changes to node labels do not cause eviction. This is fundamentally different from NoExecute taints, which do evict existing Pods. Node Affinity only affects new scheduling decisions.

YAML Anatomy

Required (Hard Constraint)

apiVersion: v1
kind: Pod
metadata:
  name: fast-db
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: disktype
            operator: In
            values:
            - ssd
            - nvme
  containers:
  - name: postgres
    image: postgres:15

Field breakdown:

nodeSelectorTerms — a list; terms are ORed (any term can satisfy)
matchExpressions — a list within each term; expressions are ANDed (all must match)
matchFields — alternative to matchExpressions, matches node fields (e.g., metadata.name)

Preferred (Soft Preference with Weight)

spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        preference:
          matchExpressions:
          - key: disktype
            operator: In
            values:
            - ssd
      - weight: 50
        preference:
          matchExpressions:
          - key: zone
            operator: In
            values:
            - us-east-1a

The scheduler computes a score for each node based on how many preferences match and their weights. The node with the highest score wins.

Operator Reference

Operator	Matches When	Typical Use
`In`	Key has one of the listed values	SSD or NVMe nodes for databases
`NotIn`	Key does not have any of the listed values	Avoid nodes with `taint=maintenance`
`Exists`	Key exists (value irrelevant)	Any node labelled `gpu` (regardless of GPU model)
`DoesNotExist`	Key does not exist	Nodes without a `dedicated` label
`Gt`	Key value > specified integer (numeric)	`memory > 64` GB nodes
`Lt`	Key value < specified integer (numeric)	`cpu < 8` core nodes for lightweight jobs

Exam Trap: Gt and Lt require the node label value to be a valid integer string (e.g., memory: "128"). Non-numeric values cause the expression to evaluate as false.

Node Affinity vs nodeSelector

Feature	`nodeSelector`	`nodeAffinity`
Operators	`=` only	`In`, `NotIn`, `Exists`, `DoesNotExist`, `Gt`, `Lt`
Soft constraints	❌ No	✅ `preferredDuringScheduling...`
Multiple values	❌ No	✅ Yes — `values: [ssd, nvme]`
OR logic	❌ No	✅ `nodeSelectorTerms` are ORed
AND logic	✅ Implicit (all selectors)	✅ `matchExpressions` within a term are ANDed

When to use which:

Use nodeSelector for quick, simple constraints (single label, exact match, hard requirement)
Use nodeAffinity for production workloads with complex requirements, soft preferences, or multiple acceptable values

Node Affinity vs Taints/Tolerations

Dimension	Node Affinity (Attract)	Taints/Tolerations (Repel)
Direction	Pod actively seeks matching nodes	Node actively rejects non-tolerating Pods
Guarantee	Hard affinity guarantees placement on matching nodes	Taints only block; a tolerated Pod may land on any node
Existing Pods	`IgnoredDuringExecution` — no eviction	`NoExecute` evicts existing Pods
Multiple conditions	✅ Rich operators and expressions	❌ Limited to key=value+effect
Production use	Attract workloads to specialised hardware	Keep general workloads off specialised hardware

Key Realisation from Source: Taints and tolerations alone cannot guarantee that a workload lands on a specific node type — they only prevent unwanted workloads from landing there. If you want to ensure your GPU workload lands on GPU nodes (and not on untainted general nodes), you must combine taints + tolerations + node affinity. Source: CKA Day 15

The Production Pattern: Taints + Affinity + Tolerations

The canonical pattern for creating a dedicated node pool (e.g., GPU, high-memory, zone-specific):

# 1. Label the node pool
kubectl label node worker-gpu-1 tier=ml
 
# 2. Taint the node pool (repel everything else)
kubectl taint node worker-gpu-1 gpu=true:NoSchedule
 
# 3. Deploy workload with both affinity and toleration

apiVersion: v1
kind: Pod
metadata:
  name: ml-training
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: tier
            operator: In
            values:
            - ml
  tolerations:
  - key: "gpu"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"
  containers:
  - name: trainer
    image: tensorflow/tensorflow:latest-gpu

Why both are needed:

Without the taint: A general Pod (no toleration) could still be scheduled on worker-gpu-1 if the scheduler chooses it — the affinity only affects this Pod, not others.
Without the affinity: A tolerated Pod could be scheduled on any node (including non-GPU nodes) because toleration merely grants permission to land on tainted nodes.
With both: Only ML Pods are allowed on GPU nodes (taint gate), and ML Pods are required to land on GPU nodes (affinity gate).

Troubleshooting Node Affinity Mismatches

Symptom	Likely Cause	Fix
Pod stuck `Pending` with `didn't match Pod's node affinity/selector`	`requiredDuringScheduling` with no matching node labels	Add matching labels to nodes, or relax the affinity to `preferred`
Pod stuck `Pending` with `Insufficient memory` or `Insufficient cpu`	Node affinity may match, but requested resources cannot fit on matching nodes	Lower requests, choose larger nodes, or add capacity
Pod scheduled on wrong node despite affinity	Using `preferredDuringScheduling` and no matching nodes exist	Switch to `requiredDuringScheduling` if placement must be guaranteed
Affinity ignored after node label change	Expected — `IgnoredDuringExecution` by design	To evict Pods on label change, use `NoExecute` taints, not affinity
`Gt`/`Lt` operators not matching	Node label value is not a valid integer string	Ensure label values are quoted numbers: `memory: "128"`

CKA Exam Speed Patterns

# Check node labels
kubectl get nodes --show-labels
 
# Label a node for affinity matching
kubectl label node worker-1 disktype=ssd
 
# Check why a Pod is Pending (look for affinity messages)
kubectl describe pod <name> | grep -A 10 Events
 
# Imperative run with nodeName (bypasses affinity entirely)
kubectl run debug --image=busybox --restart=Never \
  --overrides='{"spec":{"nodeName":"worker-1"}}'

YAML Memory Trick: The required affinity struct is: affinity → nodeAffinity → requiredDuringSchedulingIgnoredDuringExecution → nodeSelectorTerms → [matchExpressions]. Each expression has key, operator, values (array). Practise typing this nested structure — it appears frequently on the exam and auto-completion is not available.

Practical Practice

Exam-style hands-on tasks for this topic. Complete each task before reviewing the solution. Time yourself — CKA tasks average 5–7 minutes.

Task 1: Hard Node Affinity for GPU Nodes You are asked to schedule Pod gpu-workload only on nodes labeled hardware=gpu. Requirements: Use requiredDuringSchedulingIgnoredDuringExecution with In operator. Verification: kubectl get pod gpu-workload -o wide Solution:
kubectl run gpu-workload --image=nginx --restart=Never --dry-run=client -o yaml > gpu.yaml
# Edit gpu.yaml to add nodeAffinity under spec.affinity:
#   nodeAffinity:
#     requiredDuringSchedulingIgnoredDuringExecution:
#       nodeSelectorTerms:
#       - matchExpressions:
#         - key: hardware
#           operator: In
#           values: ["gpu"]
kubectl apply -f gpu.yaml

Task 2: Preferred Node Affinity with Fallback You are asked to prefer scheduling on hardware=gpu nodes but allow fallback to other nodes. Requirements: Use preferredDuringSchedulingIgnoredDuringExecution with weight 100. Verification: kubectl get pod gpu-workload -o wide Solution:
# Generate a YAML and add the preferred affinity block, then apply
kubectl run gpu-workload --image=nginx --restart=Never --dry-run=client -o yaml > gpu.yaml
# Edit to add preferredDuringSchedulingIgnoredDuringExecution with weight 100
kubectl apply -f gpu.yaml

Task 3: Troubleshoot a Pending Affinity Pod A Pod with affinity rules is stuck Pending. You must verify whether the node labels match. Requirements: List nodes with labels and confirm the required key exists. Verification: kubectl get nodes --show-labels Solution:
kubectl get nodes --show-labels
# If the required label is missing, add it:
kubectl label node worker1 hardware=gpu

Kubernetes Manual Scheduling — nodeName, nodeSelector, and comparison with nodeAffinity
Kubernetes Taints and Tolerations — the negative scheduling counterpart; production pattern companion
Kubernetes Labels and Selectors — the metadata system that affinity queries
Kubernetes Architecture — kube-scheduler filtering and scoring phases
Kubernetes Resource Requests and Limits — resource fit is evaluated with affinity during scheduling
Pod Fundamentals — the object that carries affinity rules
Deployment, ReplicaSet & Replication Controller — controllers that replicate Pods with affinity constraints
Kubernetes DaemonSet — uses tolerations to run on control plane nodes; affinity can restrict DaemonSet scope
Kubernetes Static Pods — bypass scheduler entirely; affinity does not apply
Kubernetes Services — routing is unaffected by affinity; only placement is constrained
Kubernetes Namespaces — affinity is cluster-wide; namespaces do not limit node selection
CKA Certification — exam domains and weightings
CKA Study Roadmap — Day 15 in the 40-day plan
Tech Tutorials with Piyush — course source

Tags: kubernetes node-affinity scheduling node-management cka devops kube-scheduler nodeselector

Rakesh's Brain

Explorer

Kubernetes Node Affinity

Kubernetes Node Affinity

What Is Node Affinity?

The Two Scheduling Types

YAML Anatomy

Required (Hard Constraint)

Preferred (Soft Preference with Weight)

Operator Reference

Node Affinity vs nodeSelector

Node Affinity vs Taints/Tolerations

The Production Pattern: Taints + Affinity + Tolerations

Troubleshooting Node Affinity Mismatches

CKA Exam Speed Patterns

Practical Practice

Table of Contents

Graph View

Latest Blog Posts

Backlinks

Rakesh's Brain

Explorer

Kubernetes Node Affinity

Kubernetes Node Affinity

What Is Node Affinity?

The Two Scheduling Types

YAML Anatomy

Required (Hard Constraint)

Preferred (Soft Preference with Weight)

Operator Reference

Node Affinity vs nodeSelector

Node Affinity vs Taints/Tolerations

The Production Pattern: Taints + Affinity + Tolerations

Troubleshooting Node Affinity Mismatches

CKA Exam Speed Patterns

Practical Practice

Related Pages

Table of Contents

Graph View

Latest Blog Posts

Backlinks