Advanced data governance (Enterprise: `governance`)

What this is. One place for a privacy-strict organization to control how a tenant's data is classified, redacted, retained, located, and encrypted. This feature adds the new classification + redaction mechanism and composes it with capabilities that ship elsewhere in probectl, so an operator sees one coherent governance view per tenant rather than five scattered settings.

Concern	Where it lives	Edition
Data classification (IPs-as-PII)	`internal/govern`	core mechanism
Redaction / masking	`internal/govern`	core mechanism
Configurable retention + cross-store erasure	`internal/tenantlife`	core (a compliance right)
Residency controls	siloed stores / region topology	provider / core
BYOK / HYOK + no-downtime rotation	`ee/tenantkeys`	`byok` (Enterprise)
Remote-AI egress consent (enforcement)	`internal/ai` egress gate	core (fail-closed; consent is set via the governance policy)
The governance POLICY + composed view	`ee/governance`	`governance` (Enterprise)

The split that matters: the classification + redaction mechanism is core — a redacted export is useful to anyone, so it works on any deployment with no license. The per-tenant policy + operator surface is the governance Enterprise feature, installed onto the core govern seam at the attach seam.

Data classification

Every sensitive data category has a sensitivity class, ordered low → high: public < internal < confidential < pii < restricted.

Category	Default class	Examples
`ip_address`	pii (the headline)	source / dest / exporter / next-hop IPs, probe targets
`email`	pii	operator / contact emails
`geo`	pii	city / region / coordinates
`mac_address`	confidential	device MACs
`hostname`	internal	device / exporter hostnames
`user_agent`	internal	RUM user agents
`asn`	public	autonomous-system numbers
`credential`	restricted	secrets, tokens, wrapped keys, BYOK refs

IPs-as-PII is the headline. Under GDPR and similar regimes an IP address is personal data, so ip_address defaults to pii and is masked by default whenever redaction is active. A tenant's governance policy can re-classify any category (e.g. treat hostname as pii).

Redaction / masking

When redaction is active, every category at or above the policy's redaction floor (default pii) is masked. The strategies:

Strategy	Behavior	Example (`203.0.113.42`)
`partial` (default)	keep a coarse, non-identifying prefix	`203.0.113.0/24` (IPv4 → /24; IPv6 → /48; email → `a***@domain`; MAC → OUI)
`hash`	stable, non-reversible pseudonym (correlatable)	`sha256:1a2b…` (16-hex prefix)
`drop`	remove entirely	`` (empty)
`none`	leave as-is	unchanged

restricted (credentials) always drops in clear — secrets never leave the deployment in a governed export, regardless of strategy. All hashing routes through the FIPS-swappable internal/crypto provider — no raw crypto primitives outside it.

Redacted export

The tenant-portability export gains a redacted mode:

GET /v1/lifecycle/export?redact=true     # mask PII per the tenant's policy

and a tenant whose governance policy sets redact_export: true always gets a redacted export, even without the query parameter. The manifest carries "redacted": true. Postgres rows and flow records are masked column-by-category (IPs, emails, geo, MACs, …) while non-sensitive fields (counts, protocol, names) survive. Malformed lines pass through untouched, so the bundle stays well-formed.

The redaction mechanism is core (the ?redact=true toggle works on any deployment with the PII-floor default). The governance feature adds per-tenant policy — custom classifications, a custom floor, and forced export redaction.

The governance policy + composed view

The provider plane exposes one place for a tenant's data governance (governance-gated; the routes 404 when unlicensed):

GET /provider/v1/tenants/{id}/governance — the composed view: the effective classification of every category + the redaction policy + remote-AI egress consent + residency + isolation model + retention + BYOK status.
PUT /provider/v1/tenants/{id}/governance — set the policy: classification overrides, the redaction floor (redact_from), redact_export, and the tenant's remote-AI egress consent (ai_remote_egress). Audited to the separate, tamper-evident provider audit stream (provider.governance_set), admin-only, and blocked by the read-only license degrade.

The policy persists in tenant_governance (migration 0033; migration 0037 adds the ai_remote_egress consent column): a tenant reads its own policy under RLS, the provider plane writes it. It is on the silo deny list (never copied into a per-tenant silo schema) and is erased with the tenant at offboarding. The resolver installs onto the core govern seam, so redacted exports honor per-tenant overrides.

ai_remote_egress is the tenant's opt-in for sending its telemetry summaries to a remote AI model. It defaults to false, and the core AI egress gate refuses remote synthesis for a non-consenting tenant — no consent row, no database, or a read error all resolve to denied (fail closed). The governance policy is only where the consent is recorded; the disclosure of exactly what a remote call sends, and the other gates in front of it, is ai-egress.md.

Retention, erasure & residency (composed, not re-implemented)

The governance view shows these together; it does not re-enforce them. Each is owned by its own subsystem:

Retention + cross-store erasure is core (internal/tenantlife): configurable flow retention plus verifiable deletion across Postgres / ClickHouse / TSDB / object storage, with a recomputable attestation. Erasure covers all live stores; backups are the operator's documented backup-TTL (PROBECTL_BACKUP_RETENTION_NOTE) — a governed deletion is not a backup purge. See runbooks/tenant-offboarding.md.
Residency is siloed stores pinned to a region, plus the region topology. Strict tenants run siloed so their stores stay in the permitted region rather than replicating globally. See isolation.md, multi-region.md.
BYOK / HYOK + no-downtime rotation is the byok Enterprise feature (ee/tenantkeys): per-tenant customer-held keys, rotation with retired-versions-decrypt-only (no downtime), and crypto-offboarding. See byok.md.

Watch-outs

Erasure must cover all stores, including the backup policy. Erasure clears the live stores and attests it; backups expire per your documented TTL.
BYOK key-unavailability fails safe. An unreachable / destroyed key is an error, never a silent fallback to a shared key.
Rotation across high-volume stores is deferred-rewrap. New data uses the new key immediately; old data re-seals on write — no downtime.
Redaction is best-effort masking, not anonymization. partial keeps a network prefix and hash is correlatable. For irreversible removal, use erasure.

Advanced data governance (Enterprise: governance)