Backup & Disaster Recovery

The platform implements an automated “Snapshot & Ship” strategy for disaster recovery.

💾 Backup Strategy

A daily cron job (11:00 PM) executes a backup.sh script that performs the following steps:

  1. Database Dump: Runs pg_dumpall inside the PostgreSQL container.
  2. Certificates: Snapshots the Traefik acme.json file to preserve SSL certificates and avoid Let’s Encrypt rate limits on restore.
  3. Bundling: Collects all project directories and .env files.
  4. Cloud Storage: Uploads the compressed .tar.gz archive to a secure AWS S3 bucket.
  5. Cleanup: Deletes the local archive after a successful upload to save disk space.

🚑 Restoration Procedure

To restore the environment on a new host:

  1. Extract the backup archive to /root.
  2. Restore SSL certificates: docker cp acme.json traefik_container:/letsencrypt/acme.json.
  3. Startup core infrastructure.
  4. Restore Database: cat dump.sql | docker exec -i db_container psql -U user -d db.

⚙️ Log Rotation Policy

To prevent disk exhaustion, a global Docker policy is enforced via /etc/docker/daemon.json:

  • Max size: 100MB per file.
  • Max files: 3 files per container.

Source: Internal Infrastructure Manual