Operate

Admin guide

This guide covers the operational surface administrators use after the platform is running: wiring identity, authoring policy, onboarding targets, configuring session recording, building approval workflows, and keeping the control plane highly available.

Users, groups, and directory sync

Wardengate is identity-native: it does not own users. Every principal originates in your IdP or directory and flows in via SCIM, LDAP sync, or just-in-time provisioning on first sign-in. Local accounts exist only for the bootstrap admin and for service principals used by automation.

We recommend SCIM where your IdP supports it — deprovisioning is the part that burns teams when it misfires, and SCIM gives you a clean off-boarding path. For directories without SCIM, the built-in LDAP sync runs every five minutes and honours nested groups.

IdP configuration (SAML, OIDC, SCIM)

An IdP connection is an IdentityProvider resource. You can have multiple — for example, Okta for employees and a partner-only OIDC connection for contractors — with distinct policy footprints.

apiVersion: wardengate/v1
kind: IdentityProvider
metadata:
  name: corp-okta
spec:
  protocol: saml
  metadataUrl: https://corp.okta.example.com/app/wardengate/sso/metadata
  attributeMapping:
    email: email
    displayName: name
    groups: wardengate_groups
  allowedDomains: ["corp.example.com"]
  scim:
    enabled: true
    tokenSecret: corp-okta-scim-token

Apply it with wgctl apply -f corp-okta.yaml. The admin console will show the provider in a pending state until a first successful SSO flow; once a user signs in, test the SCIM loop by disabling them in the IdP and watching the session terminate.

Policy authoring

Policies are the contract between identity and access. Keep them small, composable, and expressed in terms of groups — never in terms of individual users. A small pattern library goes a long way:

One baseline policy per environment (prod, staging, dev) granting read-only access to a named group.
One write policy per environment, approval-gated, with a shorter session cap.
Targeted break-glass policies bound to on-call rotations from your incident management tool.

apiVersion: wardengate/v1
kind: Policy
metadata:
  name: prod-ssh-write
spec:
  principals:
    groups: ["sre"]
  targets:
    tags: ["env=prod"]
  protocols: ["ssh"]
  accounts: ["deploy", "root"]
  constraints:
    mfa: step-up
    approval:
      required: true
      approvers: ["group:sre-leads"]
      quorum: 1
      window: 30m
    hours: "mon-fri 08:00-20:00 America/Los_Angeles"
    sessionMinutes: 45
  recording:
    mode: full
    redact: ["AWS_SECRET_ACCESS_KEY", "DATABASE_URL"]

Target onboarding (SSH, RDP, databases)

A target is anything Wardengate brokers a session to. Each target declares its protocol, network address, and the accounts available for brokered use. Accounts can be local credentials stored in the Wardengate vault or references to an external secrets manager.

SSH

wgctl target create \
  --name web-01.prod \
  --protocol ssh \
  --address 10.10.4.20:22 \
  --account deploy:vault/ssh/deploy-prod \
  --tag env=prod --tag role=web

RDP

wgctl target create \
  --name win-dc01.prod \
  --protocol rdp \
  --address 10.10.2.5:3389 \
  --account "CORP\\svc_admin:vault/win/dc-admin" \
  --tag env=prod --tag role=dc

Database

wgctl target create \
  --name billing-ro.prod \
  --protocol postgres \
  --address billing.prod.rds.example:5432 \
  --account readonly:vault/pg/billing-ro \
  --tag env=prod --tag data=pii

Session recording

Recording is configured per policy, not globally. The three modes are off, metadata (connect/disconnect, commands attempted), and full (keystroke and screen capture for interactive protocols; wire capture for databases). Recordings are encrypted client-side with per-session keys and written to object storage.

Configure redaction rules on the policy to scrub patterns before they are persisted. Recordings are immutable once sealed; the console can produce a signed manifest for a given time range suitable for handing to auditors.

Approval workflows

Approval-gated policies produce an approval request when a user initiates a connection. Requests can be routed to Slack, Microsoft Teams, PagerDuty, or email; approvers click through to the admin console, see the session context, and approve or deny. Approvals carry their own audit trail tied to the resulting session.

apiVersion: wardengate/v1
kind: ApprovalRoute
metadata:
  name: sre-leads-slack
spec:
  channel: slack
  target: "#sre-approvals"
  webhookSecret: slack-approvals-hook
  fallback:
    channel: email
    target: "sre-leads@example.com"
    afterMinutes: 10

High availability

A highly available Wardengate deployment has three control plane replicas spread across failure domains, two or more gateway replicas per region, and a Postgres instance with synchronous replication. The control plane is shared-nothing — all state lives in Postgres and object storage, so replicas can be added or lost freely.

Gateways are stateless and can be recycled without draining — active sessions are cut and user clients reconnect through whichever gateway the load balancer sends them to next. For graceful maintenance, cordon a gateway with wgctl gateway drain <name>; it finishes in-flight sessions and refuses new ones.

Backup and restore

Back up Postgres with whatever tool you use for your other stateful services — that covers policies, identities, audit metadata, and the encrypted vault. Recordings live in object storage and should be backed up by bucket replication, not by the control plane. To validate a backup, run the restore into a scratch cluster and point a gateway at it; the wgctl system verify command walks every policy and target and reports any dangling references.

Observability

The control plane exposes Prometheus metrics on /metrics, OpenTelemetry traces for every API request, and a structured audit stream you can forward to a SIEM (Splunk, Sumo, Elastic, and generic syslog are supported out of the box). Gateways publish their own metric set covering session count, policy evaluation latency, and protocol-level errors.