Kubernetes Jobs

A workload controller designed for finite, batch-style tasks that run to completion. Unlike Deployments or DaemonSets, a Job creates Pods that are expected to terminate successfully and then stop. Synthesized from CKA Day 12 — DaemonSet, Job & CronJob Explained.

What is a Job?

A Job creates one or more Pods and ensures that a specified number of them successfully terminate. Once the required number of completions is reached, the Job is marked as Complete. If a Pod fails, the Job controller restarts it (up to the backoffLimit) until success or failure is declared.

Key Insight: Deployments manage Pods that should run forever. Jobs manage Pods that should finish and exit.

Why Use a Job?

RequirementDeploymentJob
Run a task once to completion❌ (keeps restarting)
Run multiple parallel workers❌ (not designed for batch)✅ (parallelism > 1)
Track completion count✅ (completions field)
Automatic retry on failure❌ (unless readiness fails)✅ (backoffLimit)
Finite lifetime✅ (Pods terminate, Job stays)

Canonical Use Cases

  • Database migrations and schema updates
  • One-off data processing or ETL pipelines
  • Batch report generation (monthly financial reports, analytics exports)
  • CI/CD pipeline steps that execute inside the cluster
  • Backup jobs (export data, compress, upload to object storage)
  • Testing and validation suites run as ephemeral workloads

YAML Structure

apiVersion: batch/v1
kind: Job
metadata:
  name: data-migration
spec:
  completions: 1
  parallelism: 1
  backoffLimit: 4
  activeDeadlineSeconds: 600
  template:
    metadata:
      labels:
        app: migration
    spec:
      restartPolicy: OnFailure
      containers:
      - name: migrator
        image: myapp:latest
        command: ["python", "migrate.py"]

Key fields:

FieldDefaultDescription
completions1Total successful completions required
parallelism1Number of Pods running concurrently
backoffLimit6Retries before marking Job as failed
activeDeadlineSecondsunsetMax duration; Job is terminated if exceeded
restartPolicyRequiredMust be OnFailure or Never; Always is invalid

Parallel Execution Patterns

Sequential (default)

completions: 5, parallelism: 1 — run one Pod after another until 5 successes.

Parallel Workers

completions: 10, parallelism: 3 — run up to 3 Pods at a time until 10 total successes.

Work Queue (Indexed Job)

Set completionMode: Indexed so each Pod gets a unique index (0 to N-1) via the JOB_COMPLETION_INDEX environment variable. Useful for sharded or partitioned batch processing.

Job Lifecycle

PhaseDescription
PendingJob created, Pods not yet scheduled
ActiveAt least one Pod is running
CompleteRequired number of completions reached successfully
FailedbackoffLimit exhausted or activeDeadlineSeconds exceeded

Essential Commands

CommandPurpose
kubectl get jobsList Jobs
kubectl describe job <name>Events, completions, and failures
kubectl logs job/<name>Read logs from the Job’s Pod(s)
kubectl delete job <name>Delete Job and its Pods
kubectl wait --for=condition=complete job/<name>Block until Job finishes

Important Restrictions

  • restartPolicy: Always is invalid inside a Job template. The Job controller must detect termination, so the Pod must not restart automatically.
  • Deleting a Job does not delete completed Pods by default unless you set ttlSecondsAfterFinished. However, kubectl delete job will delete the Job object and its active Pods.
  • A Job is not self-healing in the Deployment sense. If the Pod completes, the Job stays complete. To run it again, you must create a new Job.

CKA Exam Relevance

  • Workloads & Scheduling (~15%): Create a Job from a given spec, or debug why a Job is stuck in Pending or Failed.
  • Troubleshooting (~30%): Check restartPolicy first. If it is Always, the Job will fail validation. Check backoffLimit and activeDeadlineSeconds for premature termination.
  • Speed pattern:
    kubectl create job my-job --image=busybox --dry-run=client -o yaml \
      -- /bin/sh -c "echo hello" > job.yaml
    # Edit apiVersion: batch/v1, add restartPolicy: OnFailure

Sources


Tags: kubernetes job batch workload cka devops etl migration ci-cd