Install

Backup & restore

Wardengate holds durable state in three places: the Postgres database, the object store for recordings, and a 32-byte key that seals vault material. A full restore needs all three. The built-in wgctl backup command orchestrates the first two; the key is your responsibility to keep safe.

What to back up

Postgres database — policies, identities, audit metadata, session index, encrypted vault
Object store — session recordings and their signed manifests
License bundle — signed file from your account; applied at restore time with wgctl license activate
Vault sealing key (WG_SECRET_KEY) — without it, the vault ciphertext in Postgres is useless
KMS key material if you bring your own KMS; the database holds ciphertext only
TLS certificate and private key — trivially reissuable, but keeping a copy shortens restore time

What you do not need to back up: the internal CA, host keys, or anything else on the /var/lib/wardengate/data volume — that all regenerates on first boot once the database and sealing key are in place.

Backup cadence

Postgres — daily full snapshot plus continuous WAL archiving
Recordings — bucket replication to a second region or a separate account; treat the recording bucket itself as the backup and use versioning or object-lock against tampering
Sealing key — stored in your secrets manager with a documented recovery procedure; not on a schedule
License bundle — keep a copy alongside infra-as-code

Retain daily full backups for at least 30 days and keep monthly pinned copies for longer audit windows. Two weeks of retention is fine for WAL as long as you can always recover to a daily snapshot.

The `wgctl backup` command

wgctl backup create runs a consistent dump of the database and optionally copies recordings into a dedicated backup bucket. It is a convenience wrapper — if you already back up Postgres with pg_dump, WAL archiving, or a managed provider, keep doing that and skip the built-in.

wgctl backup create \
  --name nightly-$(date +%F) \
  --include-recordings \
  --since 24h

# Output
# snapshot:    nightly-2026-04-19
# database:    234 MB gzipped
# recordings:  48,120 objects (212 GB) replicated
# destination: s3://wg-backups/2026/04/19/
# duration:    6m12s

List snapshots and prune by age:

wgctl backup list
wgctl backup prune --older-than 60d
wgctl backup verify --name nightly-2026-04-19

verify streams the snapshot end-to-end, recomputes hashes, and confirms every referenced recording is present in the destination. Run it at least weekly; an unverified backup has roughly the same value as no backup.

Backup target configuration

The backup destination is any S3-compatible object store. Configure it once, then the backup commands use it by default. Credentials come from the ambient environment (IAM role, workload identity, or an explicit access key stored as a Secret).

apiVersion: wardengate/v1
kind: BackupTarget
metadata:
  name: primary
spec:
  type: s3
  bucket: wg-backups
  region: us-east-1
  prefix: production/
  serverSideEncryption:
    kmsKeyId: arn:aws:kms:us-east-1:123456789012:key/abcd-...
  retention:
    dailyDays: 30
    monthlyMonths: 12

wgctl apply -f backup-target.yaml
wgctl backup target test primary

Object-lock is recommended on the backup bucket — it protects against operator error and against ransomware of the kind that targets your backups first.

Scheduled backups

Schedule nightly backups with a Kubernetes CronJob or a systemd timer. The Helm chart ships an opt-in CronJob that runs wgctl backup create inside the cluster.

# values.yaml
backup:
  schedule:
    enabled: true
    cron: "0 2 * * *"
    args: ["--include-recordings", "--since", "24h"]
  target:
    name: primary

Recording retention vs backup

Recording retention (how long recordings are kept at all) is a policy setting and lives in the control plane. Recording backup (how recordings survive bucket loss) is infrastructure. Do not use one to solve the other: a short retention policy that deletes recordings after 30 days means backups of older recordings will also be gone, which is usually the intent, but it is worth being explicit about.

Restore procedure

A full restore is four steps. Do them in order; each step is idempotent.

Provision a fresh Wardengate install against an empty Postgres — same version the backup was taken from
Restore the Postgres snapshot into the empty database before the new install finishes bootstrapping
Rehydrate recordings from the backup bucket back to the live bucket (or point the install at the backup bucket as its live one)
Re-supply the sealing key and license bundle

# 1. provision, with the service initially scaled to 0
helm upgrade --install wardengate wardengate/wardengate \
  --namespace wardengate --create-namespace \
  --version 2.4.1 \
  --values values.yaml \
  --set replicaCount=0 \
  --set migrate.enabled=false

# 2. restore the database
pg_restore --clean --if-exists --no-owner \
  -d "postgres://wardengate:...@pg.example:5432/wardengate" \
  wardengate-2026-04-19.dump

# 3. rehydrate recordings (or skip if bucket is intact)
aws s3 sync s3://wg-backups/production/2026/04/19/recordings/ \
             s3://wg-recordings/

# 4. supply the sealing key and activate license
kubectl -n wardengate create secret generic wg-secret-key \
  --from-literal=secret="$(cat sealing.key)"
helm upgrade wardengate wardengate/wardengate -n wardengate \
  --reuse-values --set replicaCount=3 --set migrate.enabled=true
kubectl -n wardengate exec deploy/wardengate -- \
  wgctl license activate --file wardengate-offline.lic --offline

Once the pods are ready, run wgctl system verify to walk every policy and target and report any dangling references that crept in during the restore.

Point-in-time recovery

For finer-grained recovery than daily snapshots, use Postgres WAL archiving (wal-g, pgBackRest, RDS PITR). Restore the database to a specific timestamp, then run Wardengate against it. Recordings cannot be point-in-timed — they are append-only — so expect the recording set to be at least as current as the database.

RTO and RPO guidance

The numbers below are reasonable defaults for a mid-size deployment; tune based on your own SLAs.

Scenario	RTO	RPO
Pod loss in HA cluster	seconds	0
Postgres failover (managed HA)	< 1 minute	0 (synchronous replica)
Cluster loss, rebuild in same region	< 1 hour	< 5 minutes (WAL)
Region loss, cross-region restore	< 4 hours	< 24 hours (daily snapshot)

Testing restores

A backup you have not restored is a hypothesis. Exercise restores at least quarterly into a scratch cluster. The minimum test is restore the database, restore a week of recordings, bring a replica up, sign in, and play a recording. Automate it if you can — humans skip it otherwise.

# shorthand used by the Helm chart's restore-test job
wgctl system verify --mode restore-test \
  --expect-policies 42 \
  --expect-targets 617 \
  --play-recording latest

High availability — prevent outages instead of recovering from them
Upgrading — backups are a hard prerequisite
Offline install — air-gapped restore flow