Install
Backup & restore
Wardengate holds durable state in three places: the Postgres database, the object store for recordings, and a 32-byte key that seals vault material. A full restore needs all three. The built-in wgctl backup command orchestrates the first two; the key is your responsibility to keep safe.
What to back up
- Postgres database — policies, identities, audit metadata, session index, encrypted vault
- Object store — session recordings and their signed manifests
- License bundle — signed file from your account; applied at restore time with
wgctl license activate - Vault sealing key (
WG_SECRET_KEY) — without it, the vault ciphertext in Postgres is useless - KMS key material if you bring your own KMS; the database holds ciphertext only
- TLS certificate and private key — trivially reissuable, but keeping a copy shortens restore time
What you do not need to back up: the internal CA, host keys, or anything else on the /var/lib/wardengate/data volume — that all regenerates on first boot once the database and sealing key are in place.
Backup cadence
- Postgres — daily full snapshot plus continuous WAL archiving
- Recordings — bucket replication to a second region or a separate account; treat the recording bucket itself as the backup and use versioning or object-lock against tampering
- Sealing key — stored in your secrets manager with a documented recovery procedure; not on a schedule
- License bundle — keep a copy alongside infra-as-code
Retain daily full backups for at least 30 days and keep monthly pinned copies for longer audit windows. Two weeks of retention is fine for WAL as long as you can always recover to a daily snapshot.
The wgctl backup command
wgctl backup create runs a consistent dump of the database and optionally copies recordings into a dedicated backup bucket. It is a convenience wrapper — if you already back up Postgres with pg_dump, WAL archiving, or a managed provider, keep doing that and skip the built-in.
wgctl backup create \
--name nightly-$(date +%F) \
--include-recordings \
--since 24h
# Output
# snapshot: nightly-2026-04-19
# database: 234 MB gzipped
# recordings: 48,120 objects (212 GB) replicated
# destination: s3://wg-backups/2026/04/19/
# duration: 6m12sList snapshots and prune by age:
wgctl backup list
wgctl backup prune --older-than 60d
wgctl backup verify --name nightly-2026-04-19verify streams the snapshot end-to-end, recomputes hashes, and confirms every referenced recording is present in the destination. Run it at least weekly; an unverified backup has roughly the same value as no backup.
Backup target configuration
The backup destination is any S3-compatible object store. Configure it once, then the backup commands use it by default. Credentials come from the ambient environment (IAM role, workload identity, or an explicit access key stored as a Secret).
apiVersion: wardengate/v1
kind: BackupTarget
metadata:
name: primary
spec:
type: s3
bucket: wg-backups
region: us-east-1
prefix: production/
serverSideEncryption:
kmsKeyId: arn:aws:kms:us-east-1:123456789012:key/abcd-...
retention:
dailyDays: 30
monthlyMonths: 12wgctl apply -f backup-target.yaml
wgctl backup target test primaryObject-lock is recommended on the backup bucket — it protects against operator error and against ransomware of the kind that targets your backups first.
Scheduled backups
Schedule nightly backups with a Kubernetes CronJob or a systemd timer. The Helm chart ships an opt-in CronJob that runs wgctl backup create inside the cluster.
# values.yaml
backup:
schedule:
enabled: true
cron: "0 2 * * *"
args: ["--include-recordings", "--since", "24h"]
target:
name: primaryRecording retention vs backup
Recording retention (how long recordings are kept at all) is a policy setting and lives in the control plane. Recording backup (how recordings survive bucket loss) is infrastructure. Do not use one to solve the other: a short retention policy that deletes recordings after 30 days means backups of older recordings will also be gone, which is usually the intent, but it is worth being explicit about.
Restore procedure
A full restore is four steps. Do them in order; each step is idempotent.
- Provision a fresh Wardengate install against an empty Postgres — same version the backup was taken from
- Restore the Postgres snapshot into the empty database before the new install finishes bootstrapping
- Rehydrate recordings from the backup bucket back to the live bucket (or point the install at the backup bucket as its live one)
- Re-supply the sealing key and license bundle
# 1. provision, with the service initially scaled to 0
helm upgrade --install wardengate wardengate/wardengate \
--namespace wardengate --create-namespace \
--version 2.4.1 \
--values values.yaml \
--set replicaCount=0 \
--set migrate.enabled=false
# 2. restore the database
pg_restore --clean --if-exists --no-owner \
-d "postgres://wardengate:...@pg.example:5432/wardengate" \
wardengate-2026-04-19.dump
# 3. rehydrate recordings (or skip if bucket is intact)
aws s3 sync s3://wg-backups/production/2026/04/19/recordings/ \
s3://wg-recordings/
# 4. supply the sealing key and activate license
kubectl -n wardengate create secret generic wg-secret-key \
--from-literal=secret="$(cat sealing.key)"
helm upgrade wardengate wardengate/wardengate -n wardengate \
--reuse-values --set replicaCount=3 --set migrate.enabled=true
kubectl -n wardengate exec deploy/wardengate -- \
wgctl license activate --file wardengate-offline.lic --offlineOnce the pods are ready, run wgctl system verify to walk every policy and target and report any dangling references that crept in during the restore.
Point-in-time recovery
For finer-grained recovery than daily snapshots, use Postgres WAL archiving (wal-g, pgBackRest, RDS PITR). Restore the database to a specific timestamp, then run Wardengate against it. Recordings cannot be point-in-timed — they are append-only — so expect the recording set to be at least as current as the database.
RTO and RPO guidance
The numbers below are reasonable defaults for a mid-size deployment; tune based on your own SLAs.
| Scenario | RTO | RPO |
|---|---|---|
| Pod loss in HA cluster | seconds | 0 |
| Postgres failover (managed HA) | < 1 minute | 0 (synchronous replica) |
| Cluster loss, rebuild in same region | < 1 hour | < 5 minutes (WAL) |
| Region loss, cross-region restore | < 4 hours | < 24 hours (daily snapshot) |
Testing restores
A backup you have not restored is a hypothesis. Exercise restores at least quarterly into a scratch cluster. The minimum test is restore the database, restore a week of recordings, bring a replica up, sign in, and play a recording. Automate it if you can — humans skip it otherwise.
# shorthand used by the Helm chart's restore-test job
wgctl system verify --mode restore-test \
--expect-policies 42 \
--expect-targets 617 \
--play-recording latestRelated
- High availability — prevent outages instead of recovering from them
- Upgrading — backups are a hard prerequisite
- Offline install — air-gapped restore flow