Incident response plan

How a probectl security incident is detected, triaged, handled, evidenced, and disclosed. Companion to SECURITY.md (reporting intake + coordinated disclosure) and threat-model.md. Two scopes, kept deliberately distinct:

Product/vendor incidents — a vulnerability or compromise in probectl code, releases, or build infrastructure. Vendor-led; affects all deployments.
Deployment incidents — a compromise inside one operator's installation. Operator-led under their own IR; probectl's job is to give them the evidence tooling and this playbook's deployment sections.

1. Severity matrix

Sev	Definition	Examples	Acknowledge	Mitigation target
SEV-1	Cross-tenant data exposure (the declared highest-severity failure — tenant isolation is the outermost boundary); RCE in agent/control plane; compromised release artifact or signing path	tenant A reads B; malicious artifact signed	4h	fix or kill-switch ≤ 24h; advisory ≤ 72h
SEV-2	Auth/RBAC bypass within a tenant; audit-chain integrity break; secret exposure; AI egress gate bypass	consent gate skipped; WORM verify fails	24h	≤ 7 days
SEV-3	Vulnerability with significant mitigating factors; DoS of a plane; dependency CVE in a reachable path	fairness bypass; exploitable parser crash	72h	next patch release
SEV-4	Hardening gap, defense-in-depth finding, non-reachable dep CVE	scanner findings	7 days	scheduled

Cross-tenant anything starts at SEV-1 until proven otherwise. When in doubt, classify up.

2. Roles

Solo-founder reality, stated plainly: the maintainer is Incident Commander, investigator, and fixer. The compensating structure is this written plan; predefined external deputies (counsel for notification-duty questions; the affected operator's security team for deployment incidents — co-responders by design, since they hold the infrastructure); and an immutable trail (the signed WORM audit export, see ../hardening.md §0b) so every action is reviewable after the fact. The role table will be re-cut at the first security hire.

Role	Holder today	Duties
Incident Commander	maintainer	declare sev, run timeline, own comms
Investigator/fixer	maintainer (+ operator's team for deployment scope)	root cause, patch, verify
Counsel	external (engaged per incident)	notification obligations, advisory wording

3. Response flow

Intake. A report through the SECURITY.md channels (private advisory / security.txt contact), a CI security gate (security-scan.yml), the WORM chain-verification alarm, or an operator report.
Declare + log. Assign a severity; open a private timeline doc; record UTC timestamps for every action from this point on.
Contain.
- Product scope: pull or revoke the affected artifacts (cosign identities make "which artifact" provable) and gate the release lane.
- Deployment scope: the operator isolates per the runbooks — agents halt-on-error via the rollout machinery, credentials rotate, and any insecure override is audited.
Preserve evidence — before remediation mutates state.
- Export the audit chains: the provider stream's signed WORM segments are already off-database and chain-verified (internal/audit/worm.go); snapshot the tenant streams.
- Run scripts/backup_postgres.sh and scripts/backup_clickhouse.sh to capture store state with SHA-256 manifests.
- Generate a support bundle (internal/support) for config/version posture.
- Hash and retain the agent binaries/objects in question, comparing against the baked-in integrity manifest and the release signatures.
Eradicate + recover. Fix forward; release through the normal lane (the red-CI refusal stays on); upgrade the fleet as staged waves of signed artifacts with halt-on-error — an emergency changes the wave sizes, never the verification.
Notify (§4) and disclose (§5).
Post-incident review within 7 days: timeline, root cause, a tracked follow-up for every systemic gap found, and a re-review of the threat model if a trust boundary was involved.

4. Customer notification

The self-hosted nuance: the vendor cannot see deployments, so notification means telling operators what to check and what to ship, fast.

Channels: GitHub Security Advisory (the canonical record) plus release notes; direct contact for known enterprise/MSP operators.
SEV-1: an initial notice within 72h of confirmation, even if incomplete — affected versions, severity, interim mitigations, and indicators of compromise (how an operator checks their own audit chains and registry for the pattern). Update cadence at least every 72h until resolved.
SEV-2: notice with the fix release.
Content discipline: facts only — affected versions, the upgrade path (../ops/fleet-rollout.md), and verification steps (../ops/verify-artifacts.md). MSP operators re-notify their tenants under their own obligations; the per-tenant audit trail gives them the evidence to do it.

5. Disclosure

Coordinated disclosure per SECURITY.md: reporter credit, advisory published when a fix is available, CVE requested via GitHub for qualifying issues. Embargo target ≤ 90 days, shorter when exploitation is observed.

6. Drill

Tabletop this plan twice yearly (one product-scope, one deployment-scope using the backup/failover drills as the recovery legs) and after any SEV-1/2. Record outcomes in the review log below.

Version	Date	Change
1.0	2026-06-07	Initial plan