Wardengate

Operate

Alerts

Alerts are the opinionated half of the audit stream: the subset of events that warrant waking somebody up, paging a rotation, or opening a ticket. Wardengate ships with a bundle of default rules tuned against the things that go wrong at 3 AM, and a small rule DSL so you can layer your own.

Built-in alerts

The shipped rules are enabled on a fresh install and tuned to fire only when there is real work to do. Each rule carries a default severity, a default destination, and a runbook link. You can override any of those without forking the rule.

RuleFires onDefault severity
policy-denied-burst> 20 policy.denied from one actor in 5 minwarning
approval-timeoutAn approval request expired without a decisioninfo
recording-storage-degradedObject-storage write errors > 1 % over 5 min, or spool > 80 %critical
idp-sync-failureAny idp.sync_failed, grouped per providerwarning
license-expiring30, 14, 7, 1 days before license expirywarning
gateway-unhealthyAny gateway offline > 2 heartbeatscritical
audit-chain-breakHash-chain verification failurecritical
siem-sink-dlqDead-letter queue for any sink > 100 eventswarning
command-filter-criticalcommand.denied with severity=criticalcritical

Custom rules

An AlertRule is a query against the event stream plus a threshold plus a destination. Rules run inside the control plane — the same events that feed SIEM feed these — so there is no external alert engine to keep in sync.

apiVersion: wardengate/v1
kind: AlertRule
metadata:
  name: offhours-prod-ssh
spec:
  query: |
    event_type = "session.opened"
    AND target.tags.env = "prod"
    AND target.protocol = "ssh"
    AND hour_of_day NOT IN 08..20
    AND weekday IN (mon, tue, wed, thu, fri)
  window: 5m
  threshold: { count: 1 }
  severity: warning
  title: "Off-hours prod SSH session by {{ .actor.email }}"
  description: |
    {{ .actor.email }} opened an SSH session to {{ .target.name }}
    at {{ .ts | local }} — outside business hours.
  runbook: "https://runbooks.example.com/wardengate/offhours-prod-ssh"
  destinations:
    - slack:#sec-oncall
    - pagerduty:wardengate-soc
  silence:
    dedupeKey: "{{ .actor.email }}-{{ .target.name }}"
    for: 30m

Query DSL

The DSL is SQL-flavoured and field-typed. Nested envelope fields are accessed with dot notation (actor.email, target.tags.env, context.rule). Time is expressed in the source timezone of the event or the cluster timezone when a rule declares one.

  • Boolean: AND, OR, NOT.
  • Comparison: =, !=, >, <, IN, NOT IN, MATCHES (regex), EXISTS.
  • Aggregations: count_by, rate, distinct_count.
  • Time helpers: hour_of_day, weekday, in_window(…).
# 5 denies in 60s from any single actor
query: |
  event_type = "policy.denied"
  COUNT_BY actor.email OVER 60s > 5

# Any recording export by someone outside the auditors group
query: |
  event_type = "recording.exported"
  AND NOT "auditors" IN actor.groups

Destinations

Alert destinations are declared separately from rules so you can swap “page the primary” and “post to #sec-oncall” without editing every rule. A destination is referred to by a short ID like pagerduty:wardengate-soc.

apiVersion: wardengate/v1
kind: AlertDestination
metadata:
  name: wardengate-soc
spec:
  kind: pagerduty
  integrationKeySecret: pd-wardengate-soc
  severityMap:
    critical: critical
    warning: warning
    info: info
---
apiVersion: wardengate/v1
kind: AlertDestination
metadata:
  name: sec-oncall
spec:
  kind: slack
  channel: "#sec-oncall"
  webhookSecret: slack-sec-oncall
  mention:
    onSeverity: [critical]
    userGroup: "sec-leads"
---
apiVersion: wardengate/v1
kind: AlertDestination
metadata:
  name: corp-opsgenie
spec:
  kind: opsgenie
  apiKeySecret: opsgenie-api
  team: security
  region: us
---
apiVersion: wardengate/v1
kind: AlertDestination
metadata:
  name: corp-webhook
spec:
  kind: webhook
  url: https://corp.example.com/alerts/wardengate
  signing:
    algorithm: hmac-sha256
    secret: wg-alert-hmac
    header: X-Wardengate-Signature

Email is supported but deliberately not the default — email is a bad place for “wake the on-call” and a mediocre place for long-lived context. Use it for daily digests, not for pages.

Severity levels

  • info — reaches a channel or digest. No paging. Use for approval timeouts, license-approaching milestones, and successful admin actions worth surfacing.
  • warning — reaches a channel and, on sustained condition, pages a daytime rotation. Use for most operational signals.
  • critical — pages immediately. Reserve for integrity failures, recording storage loss, and signals that imply an attacker.

Silence and snooze

Silences mute matching alerts for a bounded window. They are first-class resources — not a per-destination mute — so a silence applies regardless of which destination the alert would have gone to. This matters when your SIEM and your paging provider would otherwise double up.

wgctl alerts silence add \
  --match "rule=offhours-prod-ssh AND actor.email=dan@corp" \
  --until "2026-04-21T06:00:00Z" \
  --reason "scheduled overnight maintenance INC-4420"

silence  sil_01HZ8F... created  expires 2026-04-21T06:00:00Z

Every silence is audit-logged, carries its author and reason, and automatically expires. Silences cannot be made permanent — an alert that needs to go away forever belongs either in the rule (disable it) or in the destination (route it elsewhere).

Runbook links

Every alert payload includes a runbook URL, rendered into the payload using the template fields from the matched event. This is the single highest-value field on an alert — the responder should never have to go hunting for context. For built-in rules the default URL points to the public Wardengate runbook page; override it to point at your own wiki with the specifics your team cares about.

Alert-to-ticket automation

For low-severity alerts that should leave a paper trail but do not need a page, attach a ticket destination. Wardengate supports Jira, ServiceNow, GitHub Issues, and Linear out of the box. The first firing opens a ticket; follow-up firings within the silence window comment on the existing ticket instead of flooding the queue.

apiVersion: wardengate/v1
kind: AlertDestination
metadata:
  name: corp-jira-sec
spec:
  kind: jira
  endpoint: https://corp.atlassian.net
  project: SEC
  issueType: "Security alert"
  authSecret: jira-api-token
  titleTemplate: "[WG] {{ .rule }} - {{ .actor.email }}"
  bodyTemplate: |
    {{ .description }}

    **Actor:** {{ .actor.email }}
    **Target:** {{ .target.name }}
    **Time:** {{ .ts }}
    **Runbook:** {{ .runbook }}
  updatePolicy: commentWhileSilenced

The ticket ID is written back onto the firing record, so the next time the same dedup key fires Wardengate updates the same issue and can close it when the underlying condition clears.

Testing a rule before rollout

Every rule supports dryRun: true, which evaluates matches against live events but suppresses the notification. Combine with wgctl alerts test to back-test against the last N days from the audit store — the output tells you how many times the rule would have fired, which actors would have triggered it, and which hours of the day concentrate the hits.

wgctl alerts test offhours-prod-ssh --since 30d

matches     47
actors      9 distinct
busiest day Sat 2026-04-06 (11 hits, mostly dan@corp)
suggest     consider adding dan@corp to the on-call exception list

Next steps

  • Notifications — the lower-intensity cousin of alerts: approvals, digests, report deliveries.
  • SIEM export — the raw event stream these rules run on top of.
  • Audit & reporting — where every alert firing also lands as a permanent record.