Operate
Alerts
Alerts are the opinionated half of the audit stream: the subset of events that warrant waking somebody up, paging a rotation, or opening a ticket. Wardengate ships with a bundle of default rules tuned against the things that go wrong at 3 AM, and a small rule DSL so you can layer your own.
Built-in alerts
The shipped rules are enabled on a fresh install and tuned to fire only when there is real work to do. Each rule carries a default severity, a default destination, and a runbook link. You can override any of those without forking the rule.
| Rule | Fires on | Default severity |
|---|---|---|
| policy-denied-burst | > 20 policy.denied from one actor in 5 min | warning |
| approval-timeout | An approval request expired without a decision | info |
| recording-storage-degraded | Object-storage write errors > 1 % over 5 min, or spool > 80 % | critical |
| idp-sync-failure | Any idp.sync_failed, grouped per provider | warning |
| license-expiring | 30, 14, 7, 1 days before license expiry | warning |
| gateway-unhealthy | Any gateway offline > 2 heartbeats | critical |
| audit-chain-break | Hash-chain verification failure | critical |
| siem-sink-dlq | Dead-letter queue for any sink > 100 events | warning |
| command-filter-critical | command.denied with severity=critical | critical |
Custom rules
An AlertRule is a query against the event stream plus a threshold plus a destination. Rules run inside the control plane — the same events that feed SIEM feed these — so there is no external alert engine to keep in sync.
apiVersion: wardengate/v1
kind: AlertRule
metadata:
name: offhours-prod-ssh
spec:
query: |
event_type = "session.opened"
AND target.tags.env = "prod"
AND target.protocol = "ssh"
AND hour_of_day NOT IN 08..20
AND weekday IN (mon, tue, wed, thu, fri)
window: 5m
threshold: { count: 1 }
severity: warning
title: "Off-hours prod SSH session by {{ .actor.email }}"
description: |
{{ .actor.email }} opened an SSH session to {{ .target.name }}
at {{ .ts | local }} — outside business hours.
runbook: "https://runbooks.example.com/wardengate/offhours-prod-ssh"
destinations:
- slack:#sec-oncall
- pagerduty:wardengate-soc
silence:
dedupeKey: "{{ .actor.email }}-{{ .target.name }}"
for: 30mQuery DSL
The DSL is SQL-flavoured and field-typed. Nested envelope fields are accessed with dot notation (actor.email, target.tags.env, context.rule). Time is expressed in the source timezone of the event or the cluster timezone when a rule declares one.
- Boolean:
AND,OR,NOT. - Comparison:
=,!=,>,<,IN,NOT IN,MATCHES(regex),EXISTS. - Aggregations:
count_by,rate,distinct_count. - Time helpers:
hour_of_day,weekday,in_window(…).
# 5 denies in 60s from any single actor
query: |
event_type = "policy.denied"
COUNT_BY actor.email OVER 60s > 5
# Any recording export by someone outside the auditors group
query: |
event_type = "recording.exported"
AND NOT "auditors" IN actor.groupsDestinations
Alert destinations are declared separately from rules so you can swap “page the primary” and “post to #sec-oncall” without editing every rule. A destination is referred to by a short ID like pagerduty:wardengate-soc.
apiVersion: wardengate/v1
kind: AlertDestination
metadata:
name: wardengate-soc
spec:
kind: pagerduty
integrationKeySecret: pd-wardengate-soc
severityMap:
critical: critical
warning: warning
info: info
---
apiVersion: wardengate/v1
kind: AlertDestination
metadata:
name: sec-oncall
spec:
kind: slack
channel: "#sec-oncall"
webhookSecret: slack-sec-oncall
mention:
onSeverity: [critical]
userGroup: "sec-leads"
---
apiVersion: wardengate/v1
kind: AlertDestination
metadata:
name: corp-opsgenie
spec:
kind: opsgenie
apiKeySecret: opsgenie-api
team: security
region: us
---
apiVersion: wardengate/v1
kind: AlertDestination
metadata:
name: corp-webhook
spec:
kind: webhook
url: https://corp.example.com/alerts/wardengate
signing:
algorithm: hmac-sha256
secret: wg-alert-hmac
header: X-Wardengate-SignatureEmail is supported but deliberately not the default — email is a bad place for “wake the on-call” and a mediocre place for long-lived context. Use it for daily digests, not for pages.
Severity levels
- info — reaches a channel or digest. No paging. Use for approval timeouts, license-approaching milestones, and successful admin actions worth surfacing.
- warning — reaches a channel and, on sustained condition, pages a daytime rotation. Use for most operational signals.
- critical — pages immediately. Reserve for integrity failures, recording storage loss, and signals that imply an attacker.
Silence and snooze
Silences mute matching alerts for a bounded window. They are first-class resources — not a per-destination mute — so a silence applies regardless of which destination the alert would have gone to. This matters when your SIEM and your paging provider would otherwise double up.
wgctl alerts silence add \
--match "rule=offhours-prod-ssh AND actor.email=dan@corp" \
--until "2026-04-21T06:00:00Z" \
--reason "scheduled overnight maintenance INC-4420"
silence sil_01HZ8F... created expires 2026-04-21T06:00:00ZEvery silence is audit-logged, carries its author and reason, and automatically expires. Silences cannot be made permanent — an alert that needs to go away forever belongs either in the rule (disable it) or in the destination (route it elsewhere).
Runbook links
Every alert payload includes a runbook URL, rendered into the payload using the template fields from the matched event. This is the single highest-value field on an alert — the responder should never have to go hunting for context. For built-in rules the default URL points to the public Wardengate runbook page; override it to point at your own wiki with the specifics your team cares about.
Alert-to-ticket automation
For low-severity alerts that should leave a paper trail but do not need a page, attach a ticket destination. Wardengate supports Jira, ServiceNow, GitHub Issues, and Linear out of the box. The first firing opens a ticket; follow-up firings within the silence window comment on the existing ticket instead of flooding the queue.
apiVersion: wardengate/v1
kind: AlertDestination
metadata:
name: corp-jira-sec
spec:
kind: jira
endpoint: https://corp.atlassian.net
project: SEC
issueType: "Security alert"
authSecret: jira-api-token
titleTemplate: "[WG] {{ .rule }} - {{ .actor.email }}"
bodyTemplate: |
{{ .description }}
**Actor:** {{ .actor.email }}
**Target:** {{ .target.name }}
**Time:** {{ .ts }}
**Runbook:** {{ .runbook }}
updatePolicy: commentWhileSilencedThe ticket ID is written back onto the firing record, so the next time the same dedup key fires Wardengate updates the same issue and can close it when the underlying condition clears.
Testing a rule before rollout
Every rule supports dryRun: true, which evaluates matches against live events but suppresses the notification. Combine with wgctl alerts test to back-test against the last N days from the audit store — the output tells you how many times the rule would have fired, which actors would have triggered it, and which hours of the day concentrate the hits.
wgctl alerts test offhours-prod-ssh --since 30d
matches 47
actors 9 distinct
busiest day Sat 2026-04-06 (11 hits, mostly dan@corp)
suggest consider adding dan@corp to the on-call exception listNext steps
- Notifications — the lower-intensity cousin of alerts: approvals, digests, report deliveries.
- SIEM export — the raw event stream these rules run on top of.
- Audit & reporting — where every alert firing also lands as a permanent record.