Advanced data governance (Enterprise: governance)
What this is. One place for a privacy-strict organization to control how a tenant's data is classified, redacted, retained, located, and encrypted. This feature adds the new classification + redaction mechanism and composes it with capabilities that ship elsewhere in probectl, so an operator sees one coherent governance view per tenant rather than five scattered settings.
| Concern | Where it lives | Edition |
|---|---|---|
| Data classification (IPs-as-PII) | internal/govern |
core mechanism |
| Redaction / masking | internal/govern |
core mechanism |
| Configurable retention + cross-store erasure | internal/tenantlife |
core (a compliance right) |
| Residency controls | siloed stores / region topology | provider / core |
| BYOK / HYOK + no-downtime rotation | ee/tenantkeys |
byok (Enterprise) |
| Remote-AI egress consent (enforcement) | internal/ai egress gate |
core (fail-closed; consent is set via the governance policy) |
| The governance POLICY + composed view | ee/governance |
governance (Enterprise) |
The split that matters: the classification + redaction mechanism is core — a
redacted export is useful to anyone, so it works on any deployment with no license.
The per-tenant policy + operator surface is the governance Enterprise
feature, installed onto the core govern seam at the attach seam.
Data classification
Every sensitive data category has a sensitivity class, ordered low → high:
public < internal < confidential < pii < restricted.
| Category | Default class | Examples |
|---|---|---|
ip_address |
pii (the headline) | source / dest / exporter / next-hop IPs, probe targets |
email |
pii | operator / contact emails |
geo |
pii | city / region / coordinates |
mac_address |
confidential | device MACs |
hostname |
internal | device / exporter hostnames |
user_agent |
internal | RUM user agents |
asn |
public | autonomous-system numbers |
credential |
restricted | secrets, tokens, wrapped keys, BYOK refs |
IPs-as-PII is the headline. Under GDPR and similar regimes an IP address is
personal data, so ip_address defaults to pii and is masked by default whenever
redaction is active. A tenant's governance policy can re-classify any category
(e.g. treat hostname as pii).
Redaction / masking
When redaction is active, every category at or above the policy's redaction
floor (default pii) is masked. The strategies:
| Strategy | Behavior | Example (203.0.113.42) |
|---|---|---|
partial (default) |
keep a coarse, non-identifying prefix | 203.0.113.0/24 (IPv4 → /24; IPv6 → /48; email → a***@domain; MAC → OUI) |
hash |
stable, non-reversible pseudonym (correlatable) | sha256:1a2b… (16-hex prefix) |
drop |
remove entirely | `` (empty) |
none |
leave as-is | unchanged |
restricted (credentials) always drops in clear — secrets never leave the
deployment in a governed export, regardless of strategy. All hashing routes
through the FIPS-swappable internal/crypto provider — no raw crypto
primitives outside it.
Redacted export
The tenant-portability export gains a redacted mode:
GET /v1/lifecycle/export?redact=true # mask PII per the tenant's policy
and a tenant whose governance policy sets redact_export: true always gets a
redacted export, even without the query parameter. The manifest carries
"redacted": true. Postgres rows and flow records are masked column-by-category
(IPs, emails, geo, MACs, …) while non-sensitive fields (counts, protocol, names)
survive. Malformed lines pass through untouched, so the bundle stays well-formed.
The redaction mechanism is core (the ?redact=true toggle works on any
deployment with the PII-floor default). The governance feature adds per-tenant
policy — custom classifications, a custom floor, and forced export redaction.
The governance policy + composed view
The provider plane exposes one place for a tenant's data governance
(governance-gated; the routes 404 when unlicensed):
GET /provider/v1/tenants/{id}/governance— the composed view: the effective classification of every category + the redaction policy + remote-AI egress consent + residency + isolation model + retention + BYOK status.PUT /provider/v1/tenants/{id}/governance— set the policy: classification overrides, the redaction floor (redact_from),redact_export, and the tenant's remote-AI egress consent (ai_remote_egress). Audited to the separate, tamper-evident provider audit stream (provider.governance_set), admin-only, and blocked by the read-only license degrade.
The policy persists in tenant_governance (migration 0033; migration 0037
adds the ai_remote_egress consent column): a tenant reads its
own policy under RLS, the provider plane writes it. It is on the silo deny list
(never copied into a per-tenant silo schema) and is erased with the tenant at
offboarding. The resolver installs onto the core govern seam, so redacted
exports honor per-tenant overrides.
Remote-AI egress consent
ai_remote_egress is the tenant's opt-in for sending its telemetry summaries to
a remote AI model. It defaults to false, and the core AI egress gate
refuses remote synthesis for a non-consenting tenant — no consent row, no
database, or a read error all resolve to denied (fail closed). The
governance policy is only where the consent is recorded; the disclosure of
exactly what a remote call sends, and the other gates in front of it, is
ai-egress.md.
Retention, erasure & residency (composed, not re-implemented)
The governance view shows these together; it does not re-enforce them. Each is owned by its own subsystem:
- Retention + cross-store erasure is core (
internal/tenantlife): configurable flow retention plus verifiable deletion across Postgres / ClickHouse / TSDB / object storage, with a recomputable attestation. Erasure covers all live stores; backups are the operator's documented backup-TTL (PROBECTL_BACKUP_RETENTION_NOTE) — a governed deletion is not a backup purge. Seerunbooks/tenant-offboarding.md. - Residency is siloed stores pinned to a region, plus the region topology.
Strict tenants run siloed so their stores stay in the permitted region rather
than replicating globally. See
isolation.md,multi-region.md. - BYOK / HYOK + no-downtime rotation is the
byokEnterprise feature (ee/tenantkeys): per-tenant customer-held keys, rotation with retired-versions-decrypt-only (no downtime), and crypto-offboarding. Seebyok.md.
Watch-outs
- Erasure must cover all stores, including the backup policy. Erasure clears the live stores and attests it; backups expire per your documented TTL.
- BYOK key-unavailability fails safe. An unreachable / destroyed key is an error, never a silent fallback to a shared key.
- Rotation across high-volume stores is deferred-rewrap. New data uses the new key immediately; old data re-seals on write — no downtime.
- Redaction is best-effort masking, not anonymization.
partialkeeps a network prefix andhashis correlatable. For irreversible removal, use erasure.