Kubernetes Node Affinity

The advanced positive scheduling primitive that gives Pods fine-grained control over which nodes they run on. Node Affinity is the expressive successor to nodeSelector, supporting rich operators, multiple values, and soft/hard constraints. Critical for dedicated node pools, zone-aware placement, and the CKA exam. Synthesized from CKA Day 15 — Kubernetes Node Affinity Explained.

What Is Node Affinity?

By default, the Kubernetes scheduler tries to spread Pods evenly across healthy nodes. But production workloads often have node preferences: SSD storage for databases, GPU acceleration for ML training, specific zones for data residency, or high-memory nodes for caching.

Node Affinity solves this by letting the Pod spec declare which node labels it prefers or requires. Unlike nodeSelector (which only supports exact equality), Node Affinity supports set-based operators (In, NotIn), existence checks (Exists, DoesNotExist), and numeric comparisons (Gt, Lt). It also distinguishes between hard constraints (must match) and soft preferences (best effort).

Key Insight: Node Affinity is a Pod-level property. It is evaluated by the kube-scheduler during the filtering phase. If a Pod declares a required affinity and no node matches, the Pod remains Pending with a clear event message. If it declares a preferred affinity, the scheduler assigns a score boost to matching nodes but will place the Pod elsewhere if necessary. Source: CKA Day 15

The Two Scheduling Types

Node Affinity offers two scheduling strategies, both sharing the suffix IgnoredDuringExecution:

TypeFull NameScheduling BehaviourExisting Pods
RequiredrequiredDuringSchedulingIgnoredDuringExecutionHard constraint — Pod is only scheduled on nodes that match. If no match, Pod stays Pending.Unaffected by label changes
PreferredpreferredDuringSchedulingIgnoredDuringExecutionSoft preference — scheduler tries to match but will place on any available node. Uses weight (1–100).Unaffected by label changes

Critical Distinction: The suffix IgnoredDuringExecution means that once a Pod is scheduled, changes to node labels do not cause eviction. This is fundamentally different from NoExecute taints, which do evict existing Pods. Node Affinity only affects new scheduling decisions.

YAML Anatomy

Required (Hard Constraint)

apiVersion: v1
kind: Pod
metadata:
  name: fast-db
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: disktype
            operator: In
            values:
            - ssd
            - nvme
  containers:
  - name: postgres
    image: postgres:15

Field breakdown:

  • nodeSelectorTerms — a list; terms are ORed (any term can satisfy)
  • matchExpressions — a list within each term; expressions are ANDed (all must match)
  • matchFields — alternative to matchExpressions, matches node fields (e.g., metadata.name)

Preferred (Soft Preference with Weight)

spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        preference:
          matchExpressions:
          - key: disktype
            operator: In
            values:
            - ssd
      - weight: 50
        preference:
          matchExpressions:
          - key: zone
            operator: In
            values:
            - us-east-1a

The scheduler computes a score for each node based on how many preferences match and their weights. The node with the highest score wins.

Operator Reference

OperatorMatches WhenTypical Use
InKey has one of the listed valuesSSD or NVMe nodes for databases
NotInKey does not have any of the listed valuesAvoid nodes with taint=maintenance
ExistsKey exists (value irrelevant)Any node labelled gpu (regardless of GPU model)
DoesNotExistKey does not existNodes without a dedicated label
GtKey value > specified integer (numeric)memory > 64 GB nodes
LtKey value < specified integer (numeric)cpu < 8 core nodes for lightweight jobs

Exam Trap: Gt and Lt require the node label value to be a valid integer string (e.g., memory: "128"). Non-numeric values cause the expression to evaluate as false.

Node Affinity vs nodeSelector

FeaturenodeSelectornodeAffinity
Operators= onlyIn, NotIn, Exists, DoesNotExist, Gt, Lt
Soft constraints❌ NopreferredDuringScheduling...
Multiple values❌ No✅ Yes — values: [ssd, nvme]
OR logic❌ NonodeSelectorTerms are ORed
AND logic✅ Implicit (all selectors)matchExpressions within a term are ANDed

When to use which:

  • Use nodeSelector for quick, simple constraints (single label, exact match, hard requirement)
  • Use nodeAffinity for production workloads with complex requirements, soft preferences, or multiple acceptable values

Node Affinity vs Taints/Tolerations

DimensionNode Affinity (Attract)Taints/Tolerations (Repel)
DirectionPod actively seeks matching nodesNode actively rejects non-tolerating Pods
GuaranteeHard affinity guarantees placement on matching nodesTaints only block; a tolerated Pod may land on any node
Existing PodsIgnoredDuringExecution — no evictionNoExecute evicts existing Pods
Multiple conditions✅ Rich operators and expressions❌ Limited to key=value+effect
Production useAttract workloads to specialised hardwareKeep general workloads off specialised hardware

Key Realisation from Source: Taints and tolerations alone cannot guarantee that a workload lands on a specific node type — they only prevent unwanted workloads from landing there. If you want to ensure your GPU workload lands on GPU nodes (and not on untainted general nodes), you must combine taints + tolerations + node affinity. Source: CKA Day 15

The Production Pattern: Taints + Affinity + Tolerations

The canonical pattern for creating a dedicated node pool (e.g., GPU, high-memory, zone-specific):

# 1. Label the node pool
kubectl label node worker-gpu-1 tier=ml
 
# 2. Taint the node pool (repel everything else)
kubectl taint node worker-gpu-1 gpu=true:NoSchedule
 
# 3. Deploy workload with both affinity and toleration
apiVersion: v1
kind: Pod
metadata:
  name: ml-training
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: tier
            operator: In
            values:
            - ml
  tolerations:
  - key: "gpu"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"
  containers:
  - name: trainer
    image: tensorflow/tensorflow:latest-gpu

Why both are needed:

  • Without the taint: A general Pod (no toleration) could still be scheduled on worker-gpu-1 if the scheduler chooses it — the affinity only affects this Pod, not others.
  • Without the affinity: A tolerated Pod could be scheduled on any node (including non-GPU nodes) because toleration merely grants permission to land on tainted nodes.
  • With both: Only ML Pods are allowed on GPU nodes (taint gate), and ML Pods are required to land on GPU nodes (affinity gate).

Troubleshooting Node Affinity Mismatches

SymptomLikely CauseFix
Pod stuck Pending with didn't match Pod's node affinity/selectorrequiredDuringScheduling with no matching node labelsAdd matching labels to nodes, or relax the affinity to preferred
Pod scheduled on wrong node despite affinityUsing preferredDuringScheduling and no matching nodes existSwitch to requiredDuringScheduling if placement must be guaranteed
Affinity ignored after node label changeExpected — IgnoredDuringExecution by designTo evict Pods on label change, use NoExecute taints, not affinity
Gt/Lt operators not matchingNode label value is not a valid integer stringEnsure label values are quoted numbers: memory: "128"

CKA Exam Speed Patterns

# Check node labels
kubectl get nodes --show-labels
 
# Label a node for affinity matching
kubectl label node worker-1 disktype=ssd
 
# Check why a Pod is Pending (look for affinity messages)
kubectl describe pod <name> | grep -A 10 Events
 
# Imperative run with nodeName (bypasses affinity entirely)
kubectl run debug --image=busybox --restart=Never \
  --overrides='{"spec":{"nodeName":"worker-1"}}'

YAML Memory Trick: The required affinity struct is: affinity → nodeAffinity → requiredDuringSchedulingIgnoredDuringExecution → nodeSelectorTerms → [matchExpressions]. Each expression has key, operator, values (array). Practise typing this nested structure — it appears frequently on the exam and auto-completion is not available.


Tags: kubernetes node-affinity scheduling node-management cka devops kube-scheduler nodeselector