Post

Backup & Disaster Recovery — The 3-2-1+ Strategy

Backup & Disaster Recovery — The 3-2-1+ Strategy

Backups are the part of homelab infrastructure nobody finds exciting — until they need one. This post covers my multi-site, multi-technology backup strategy and the disaster recovery plan that gives me confidence to experiment freely.

The 3-2-1+ Model

The classic 3-2-1 rule says: 3 copies of data, on 2 different media types, with 1 copy offsite. I extend this with a + for cloud redundancy:

CopyLocationTechnology
1Home PBS (local)Proxmox Backup Server on ZFS + Synology iSCSI
2DC PBS (offsite)Proxmox Backup Server — pull sync from home
3Wasabi S3 (cloud)PBS native S3 sync
4Synology NAS + Wasabi (contingency)Raw vzdump archives via NFS + rclone

Every VM and LXC across both sites is protected by at least three independent backup copies.


Proxmox Backup Server — The Core

PBS is the workhorse. Two instances — one at home, one in the DC — handle all VM and container backups with deduplication, encryption, and verification.

PBS Home

Runs as a VM on the primary home node. It has two datastores:

  • Local ZFS — fast backups on the same NVMe pool as the VMs
  • Synology iSCSI LUN — additional capacity from the NAS over the storage VLAN

Home PVE pushes all VM/LXC backups to PBS Home daily. This is the fastest recovery path — restoring from local PBS takes minutes.

PBS DC

Runs as a VM on the Hetzner node. It serves two purposes:

  1. Local DC backups — DC PVE pushes all workloads to PBS DC daily
  2. Home replica — PBS DC pulls a sync from PBS Home daily over the WireGuard backup tunnel

The pull sync means the DC always has a recent copy of every home workload, stored in a separate namespace. If the home site burns down, I can restore everything from DC.


Backup Flow Summary

Nine active backup flows keep data moving across four locations:

FlowPathMethodSchedule
F1Home PVE → Home PBSPBS pushDaily 01:00
F2Home PBS → DC PBSPBS pull syncDaily 05:00
F3DC PBS → Wasabi S3PBS native S3 syncDaily
F4DC PVE → DC PBSPBS pushDaily 02:00
F5Home PVE → Home Synologyvzdump NFSWeekly + Monthly
F6Home Synology → DC Synologyrsync over SSHDaily 03:20
F7DC PVE → DC Synologyvzdump NFSDaily 04:00
F8DC Synology → Wasabi S3rclone syncDaily
F9Home PBS ↔ SynologyiSCSI LUNAlways-on

The schedule is staggered so flows don’t compete for bandwidth — home backups finish before the cross-site sync begins.


Synology NAS — iSCSI and NFS Target

The Synology DS423+ serves multiple backup roles:

  • iSCSI LUN for PBS Home — presented as a block device over the storage VLAN
  • NFS export for vzdump — weekly and monthly raw backup archives
  • QDevice host — also runs the corosync QDevice for cluster quorum
  • rsync source — pushes vzdump archives to the DC Synology daily

The Synology is approaching 80% capacity, so I monitor it closely. Raw vzdump files are larger than deduplicated PBS backups, but they’re valuable as a technology-independent contingency — I can restore a vzdump archive on any Proxmox node without needing PBS.


Wasabi S3 — Cloud Cold Storage

Wasabi S3 is the final tier — geographically independent cloud storage in EU-Central.

Two buckets:

BucketSourceContent
PBS backupsDC PBSDeduplicated PBS datastore (~768 GB and growing)
vzdump archivesDC SynologyRaw vzdump files with smart recycle retention

PBS has native S3 datastore support, which means the sync is built into PBS itself — no external scripts or cron jobs needed. The DC Synology uses rclone for its vzdump sync.

Why Wasabi?

During initial cloud backup evaluation, I trialed both Wasabi and Backblaze B2. Backblaze had a 1 GB/day free-tier upload cap that made the initial seed painfully slow. Wasabi’s flat-rate pricing with no egress fees won out.


The Borg Decommission

The original cloud backup chain was: PBS → Borg → Hetzner Storage Box → rclone → Wasabi. This worked, but it was fragile — four tools in a chain, each with its own failure modes.

When PBS added native S3 datastore support, I replaced the entire chain:

Before: PBS → Borg export → Hetzner Storage Box → rclone → Wasabi After: PBS → Wasabi S3 (native sync)

Borg was uninstalled from PBS DC, all systemd timers and scripts cleaned up, and the Hetzner Storage Box decommissioned. The credentials are retained in Passbolt for reference, but the service is gone.


Disaster Recovery Plan

Backups are useless without tested recovery procedures. I maintain a DR test plan with three tiers:

Type 1 — Tabletop (After Major Changes)

Walk through recovery steps without executing. Scenarios: VM deletion, data corruption, PBS failure, site outage.

Type 2 — Partial Restore (Quarterly)

Actually restore something:

  • File-level restore from PBS
  • Full VM restore to an isolated network
  • LXC restore and verification

Type 3 — Full DR (Annually)

Complete rebuild: reinstall Proxmox → reconfigure ZFS → deploy PBS → restore all workloads.

Recovery Time Targets

ScenarioSeverityRTOSource
Accidental VM deletionLow~15 minPBS Home
PBS datastore corruptionMedium~1 hourDC PBS
Home site failureHigh2–4 hoursDC PBS
Ransomware / mass deletionHigh4–8 hoursWasabi S3
Full site disasterCritical8–24 hoursWasabi S3

The key insight: PBS local restore is measured in minutes. Cross-site restore is hours. Cloud restore is a day. Design your backup tiers accordingly.


Lessons Learned

  • Deduplicated backups (PBS) for speed, raw vzdump for independence — having both gives you options
  • Pull sync > push sync for cross-site — the DC pulling from home means home doesn’t need DC credentials
  • Test your restores — a backup you haven’t tested is a hope, not a plan
  • Simplify the chain — Borg → Hetzner → rclone → Wasabi was clever but fragile. PBS native S3 is one step
  • Monitor capacity — NAS storage fills up faster than you think, especially with raw vzdump retention
  • Stagger your schedules — backup jobs competing for bandwidth and I/O cause failures

Next: the self-hosted services that all this infrastructure exists to support.

This post is licensed under CC BY 4.0 by the author.