Post

Proxmox Backup Server - Multi-Site Replication

Proxmox Backup Server - Multi-Site Replication

A backup strategy is only as good as the number of independent copies you maintain and how quickly you can restore from them. This post covers my multi-site Proxmox Backup Server deployment - two PBS instances, cross-site replication over WireGuard, cloud offloading to Wasabi S3, and the lessons learned building a 3-2-1+ backup architecture.

Strategy - 3-2-1 Plus Cloud

The classic 3-2-1 rule says: three copies of your data, on two different storage technologies, with one copy offsite. I extend this with a cloud tier for geographic redundancy beyond my own infrastructure:

CopyLocationTechnologyPurpose
PrimaryHome PBS (local ZFS)PBS + ZFSFast local backups and restores
SecondaryHome PBS (Synology iSCSI LUN)PBS + iSCSIOverflow storage on NAS hardware
Offsite replicaDC PBSPBS pull sync over WireGuardGeographic separation
CloudWasabi S3PBS native S3 syncCloud-tier redundancy
ContingencyBoth Synology NAS devicesRaw vzdump archivesTechnology diversity (non-PBS format)

Every VM and LXC across both sites is protected by at least four independent copies. The primary and secondary copies are at home, the offsite replica is at the DC, and the cloud copy is at Wasabi. Raw vzdump archives on Synology NAS devices provide an additional technology-diverse contingency layer.


Architecture

PBS Home - The Source of Truth

The home PBS instance runs as a VM on the primary Proxmox node. It serves as the central collection point for all home cluster backups and the source for cross-site replication.

ItemDetail
TypeVM on primary home node
NICsDual-homed - management VLAN + storage VLAN
DatastoresZFS-backed primary + Synology iSCSI secondary
RoleLayer 1 - primary backup target for all home workloads

Dual Datastores

DatastoreBacking StoragePurpose
Primary (Backups)ZFS pool on host NVMeFast daily backups - shared with VMs on the host
Secondary (Synology)iSCSI LUN from Synology NASOverflow and redundancy - dedicated block device

The primary datastore shares the host’s ZFS pool with running VMs and LXCs. This is not ideal - a backup surge can compete with VM I/O. But with only a 1 TB NVMe, a dedicated backup disk was not an option after decommissioning the third node. I monitor ZFS pool usage closely and keep it below 80%.

The Synology datastore uses an iSCSI LUN presented as a block device. This gives PBS native filesystem access (ext4 on the LUN) rather than the overhead of NFS. The iSCSI target is bound to the storage VLAN interface on the Synology - not the management interface - to keep backup traffic off the management network.

PBS DC - The Offsite Replica

The DC PBS instance runs at Hetzner on the standalone Proxmox node. It has two roles: receiving replicated home backups and backing up local DC workloads.

ItemDetail
TypeVM on DC Proxmox node
NICsDual-homed - management VLAN + backup VLAN
DatastoresLocal disk + Wasabi S3 (cloud tier)
RoleLayer 2 (offsite replica) + Layer 3 (cloud sync)

Dual-Homed Networking

The DC PBS has two network interfaces for a reason:

InterfaceSubnetPurpose
Management NICDC management VLANDefault gateway, web UI, admin access
Backup NICDC backup VLANPBS replication traffic, Synology NFS

Backup traffic must stay on the backup NIC. A static route ensures traffic destined for the home storage subnet routes via the backup interface and through the WireGuard backup tunnel - not the default management gateway. Without this static route, replication silently fails because traffic exits via the wrong interface.


Backup Flows

I maintain nine active backup flows. Here are the key ones:

Flow 1 - Home PVE to Home PBS (Daily 01:00)

All home VMs and LXCs push to the home PBS primary datastore daily at 01:00. Retention: keep-daily 7, keep-hourly 24.

Flow 2 - Home PBS to DC PBS - Pull Sync (Daily 05:00)

The DC PBS pulls a replica of all home backups over the WireGuard backup tunnel. This is the core offsite replication flow.

Why pull instead of push? Pull sync means the DC side initiates the connection. The home PBS only needs to expose its API - it does not need outbound access to the DC. This matches the security model: the DC pulls from home, home does not push to DC.

The sync uses a dedicated service account on the home PBS with read-only permissions scoped to the backup datastore. The DC stores replicated data in a separate namespace to avoid mixing home replicas with local DC backups.

Flow 3 - DC PBS to Wasabi S3 (Daily)

The DC PBS pushes its entire datastore (including the home replica namespace) to Wasabi S3 via PBS’s native S3 datastore support. This replaced an earlier Borg + Hetzner Storage Box setup that was decommissioned in favor of the simpler native integration.

S3 DetailValue
ProviderWasabi
RegionEU Central
BucketDedicated PBS bucket
AccessIAM user with S3 full access policy
CacheLocal cache directory on DC PBS

Flow 4 - DC PVE to DC PBS (Daily 02:00)

All DC VMs and LXCs push to the DC PBS daily at 02:00. Retention: keep-daily 7, keep-weekly 4.

Flow 5 - Raw vzdump to Synology NAS (Weekly/Monthly)

Independent of PBS, raw vzdump archives go to the home Synology via NFS. These are technology-diverse backups - if PBS has a catastrophic bug, vzdump archives are a separate recovery path.

ScheduleRetention
Weekly (Sunday)Keep last 4
Monthly (1st)Keep last 6

Flow 6 - Synology-to-Synology rsync (Daily 03:20)

The home Synology pushes vzdump archives to the DC Synology via rsync over SSH through the WireGuard backup tunnel. The rsync runs without --delete - both sites’ vzdump files coexist in the same directory on the DC Synology. This provides geographic redundancy for the raw vzdump contingency layer.

Flow 7 - DC Synology to Wasabi S3 (Daily)

The DC Synology pushes its vzdump archive directory to a dedicated Wasabi bucket via rclone. This is the cloud copy of the raw vzdump contingency layer.


Full Backup Schedule

TimeFlowDirection
01:00Home PVE to Home PBSLocal push
02:00DC PVE to DC PBSLocal push
Sunday 02:00Home PVE to Synology (vzdump)Weekly NFS push
1st of monthHome PVE to Synology (vzdump)Monthly NFS push
03:20Synology to Synology rsyncCross-site push via WireGuard
04:00DC PVE to DC Synology (vzdump)Local NFS push
05:00Home PBS to DC PBS pull syncCross-site pull via WireGuard
DailyDC PBS to Wasabi S3Cloud push
DailyDC Synology to Wasabi S3Cloud push via rclone

The schedule is staggered to avoid overlapping I/O windows. Home backups complete by 02:00, DC backups by 03:00, cross-site replication by 06:00, and cloud sync fills in the remaining hours.


User and Service Account Model

Both PBS instances use separate user accounts for each purpose:

Home PBS Accounts

AccountRoleUsed By
rootSuperuserAdmin access
pve userBackup operatorHome Proxmox backup jobs
DC sync userRead-only syncDC PBS pull sync credential
Monitor userRead-only monitorPulse uptime dashboard

DC PBS Accounts

AccountRoleUsed By
rootSuperuserAdmin access
backup userBackup operatorDC Proxmox backup jobs + pull sync

Principle of least privilege - the DC sync account on the home PBS can only read backup data, not delete it. This means a compromised DC cannot wipe home backups.


Cloud Migration Story - From Borg to Native S3

When I first set up offsite cloud backups, I used a Borg + Hetzner Storage Box stack. The flow was: DC PBS exports to Borg, Borg pushes to Hetzner Storage Box, and rclone mirrors to Wasabi as a secondary cloud copy.

This stack worked but was complex - four moving parts (PBS export, Borg, Hetzner, rclone) with systemd timers, custom scripts, and multiple credential stores. When Proxmox added native S3 datastore support, I migrated the entire cloud tier to a single flow: DC PBS syncs directly to Wasabi S3.

The decommission involved removing:

  • Borg packages and scripts
  • All systemd timers and services for the Borg pipeline
  • Hetzner Storage Box credentials and targets
  • Backblaze B2 bucket (trialled and rejected due to upload caps)
  • rclone remotes for the old workflow

The result is a single, clean, PBS-native sync job replacing an entire pipeline.


iSCSI Persistence - The Silent Killer

The Synology iSCSI LUN backing the home PBS secondary datastore is rock-solid when configured correctly. When it’s not, you get silent data loss.

Key Configuration Points

SettingValueWhy
Target bindingStorage VLAN interface onlyPrevents iSCSI traffic from traversing management VLAN
Startup modeAutomaticiSCSI reconnects after reboot without manual intervention
fstab entryBy UUIDDevice names can change after reboot; UUID is stable

Common Gotchas

  • Ghost sessions after DSM reboot. If the iSCSI target reboots, stale sessions can prevent reconnection. Logout all sessions, restart the iSCSI initiator, then re-login.
  • VLAN binding drift. DSM updates can reset the iSCSI target’s network portal binding. After every DSM update, verify the target is still bound to the storage VLAN interface.
  • Device name changes. The iSCSI LUN might appear as /dev/sdc on one boot and /dev/sdd on the next. Always mount by UUID in fstab, never by device name.

NFS and Unprivileged LXC Backups

When backing up unprivileged LXC containers to NFS storage (like the Synology vzdump share), you can hit permission failures due to user namespace ID mapping. The LXC runs as a non-root user on the host, and NFS doesn’t understand the mapped UIDs.

The fix is straightforward - set tmpdir: /var/tmp in /etc/vzdump.conf. This tells vzdump to use a local temporary directory instead of the NFS mount for intermediate files, avoiding the permission issue entirely.


Notifications

Both PBS instances send email notifications for backup job completions, failures, and warnings. They relay through the internal Mailcow instance via Postfix:

InstanceRelay TargetStatus
Home PBSMail server via submission portActive
DC PBSMailcow internal IP via submission portActive

The DC PBS relays to Mailcow’s internal IP on the clients VLAN - not the public domain - to avoid hairpin NAT.


Capacity Planning

LocationStorageUsedFreeUtilization
Home PBS - ZFS datastore~1 TB (shared with VMs)Monitor closelyBelow 80% targetShared pool risk
Home PBS - Synology LUNSynology iSCSIManaged by PBS GCSynology capacity~78% on NAS
DC PBS - Local disk~500 GB~170 GB~300 GB36%
DC PBS - Wasabi S3Cloud (unlimited)GrowingN/APay per GB
DC Synology - NFS~14 TB~260 GB~14 TB2%

The home Synology NAS at 78% capacity is the most concerning metric. If it fills up, the vzdump contingency copies and the iSCSI LUN for PBS both stop working. I have alerts set for 85% and 90% thresholds.


Disaster Recovery Scenarios

ScenarioRecovery Path
Single VM/LXC lossRestore from Home PBS (fastest - local ZFS)
Home node failureRestore from Home PBS on surviving node
Home site total lossRestore from DC PBS (offsite replica)
DC site total lossHome PBS retains all home backups; DC workloads restore from DC PBS Wasabi S3
Both sites total lossRestore from Wasabi S3 (cloud tier)
PBS software bugRestore from raw vzdump archives on Synology NAS

The key insight is that each layer is independent. A PBS bug does not affect vzdump archives. A WireGuard tunnel failure does not affect local backups. A site loss does not affect the cloud tier.


Operational Lessons

  • Pull sync is more secure than push. The remote site initiates replication with read-only credentials. The source site never needs outbound access to the destination.

  • Dual-homed PBS is essential. Backup traffic on the management network competes with admin access and monitoring. Separate interfaces, separate VLANs, separate subnets.

  • Native S3 beats Borg pipelines. Fewer moving parts means fewer failure modes. The migration from Borg to native S3 eliminated four services, three scripts, and two credential stores.

  • Raw vzdump is your insurance policy. PBS is excellent, but technology diversity matters. If PBS has a critical bug, vzdump archives in a simple tar format can be restored with basic Linux tools.

  • iSCSI needs babysitting. It’s fast and efficient, but the connection is fragile across reboots and NAS updates. Always mount by UUID, always verify bindings after updates.

  • Wasabi’s 90-day minimum billing matters. Aggressive pruning of recently uploaded objects still incurs charges for the full 90 days. Design your retention policy with this in mind.


What’s Next

The backup architecture is comprehensive and well-tested. Future improvements include automating the manual health checks (ZFS pool capacity, iSCSI mount status, sync job verification) into a single dashboard, and investigating Proxmox’s upcoming improvements to S3 datastore support for potentially replacing the rclone vzdump-to-Wasabi flow with another native PBS sync job.

This post is licensed under CC BY 4.0 by the author.