Per-tenant key isolation / BYOK
What this is. probectl can encrypt each tenant's sensitive at-rest values under that tenant's own encryption key. That makes tenants cryptographically separable — even with raw database access, you cannot read tenant A's sealed data without tenant A's key — and it turns offboarding into a key-destruction event: destroy the key, and any sealed data that ever lingers in a backup becomes permanently unreadable.
Think of it as the cryptographic complement to siloed storage (see isolation.md): silos separate where a tenant's data lives; per-tenant keys separate who can ever decrypt it. BYOK ("Bring Your Own Key") is the stronger variant where the key lives in your secret manager and probectl never holds it.
Edition. This lives under ee/ (the commercial tree) and is unlocked by the
byok license feature (Enterprise tier; MSP crypto-offboarding consumes it too).
Without the license, deployments keep the deployment-wide envelope key (below)
and the /v1/security/keys API surface stays hidden — it returns 404, not a
locked-feature error.
The sealing model
Every sensitive tenant-owned value passes through one core seam,
internal/tenantcrypto. (Today that means alert-channel secrets — webhook HMAC
keys, integration tokens; the set grows consumer-by-consumer, not by changing the
mechanism.) Stored values are self-describing — a short prefix tells the
reader exactly how the value was sealed:
| Stored prefix | Sealed by | Notes |
|---|---|---|
| (none) | nothing (legacy / dev plaintext) | read as-is; re-sealed on the next write |
dv1: |
the deployment envelope — one master key (PROBECTL_ENVELOPE_KEY) |
the default whenever an envelope key is configured |
tk1:<version>: |
the per-tenant keyring (this feature) | the tenant id and key version are bound into the AEAD |
Because reads dispatch on the prefix, turning on per-tenant keys never migrates
existing data: old dv1: rows keep decrypting (the deployment sealer stays
registered as an opener), and new writes seal under the tenant key. Backward
compatibility is structural, not a migration step you have to run.
The fail-safe rule (non-negotiable). Once a value is sealed under a tenant key, opening it requires that key. An unavailable, unresolvable, or destroyed key is an error — never a silent fallback to the deployment master or any shared key. (A recognized sealed prefix with no installed sealer also refuses to pass through as plaintext, so a misconfiguration cannot leak ciphertext as if it were cleartext.)
Key hierarchy
%%{init: {'theme':'base','themeVariables':{'background':'#0d1117','primaryColor':'#161b22','primaryTextColor':'#e6edf3','primaryBorderColor':'#3b82f6','lineColor':'#8b949e','secondaryColor':'#21262d','tertiaryColor':'#0d1117','clusterBkg':'#161b22','clusterBorder':'#30363d','fontFamily':'ui-monospace, SFMono-Regular, Menlo, monospace'},'flowchart':{'curve':'basis','nodeSpacing':55,'rankSpacing':55,'padding':12}}}%%
flowchart TD
subgraph managed["managed mode"]
direction TB
M1["deployment master key<br/>(PROBECTL_ENVELOPE_KEY)"]
M2["each tenant's key<br/>(random 32B, stored wrapped in tenant_keys)"]
M3["that tenant's values<br/>(AES-256-GCM via internal/crypto)"]
M1 -->|wraps| M2 -->|encrypts| M3
end
subgraph byok["byok mode"]
direction TB
B1["customer's secret manager<br/>(Vault / CyberArk / cloud KMS, via secret refs)"]
B2["tenant key<br/>(base64, exactly 32 bytes)"]
B3["that tenant's values"]
B1 -->|holds| B2 -->|encrypts| B3
end
REF["probectl stores ONLY the reference<br/>(e.g. vault:kv/tenants/acme#kek) and resolves it<br/>at use time — the material is never persisted"] -.-> B2
In managed mode probectl generates a random 32-byte key per tenant, encrypts
("wraps") it under the deployment master key, and stores the wrapped form in the
tenant_keys table. In BYOK mode the key lives in your secret manager;
probectl stores only the reference and resolves it at use time.
A tenant's first seal auto-provisions a managed v1 key. The encryption binds
tenant:<id>:<caller-context>:v<version> as additional authenticated data (AAD)
into every ciphertext, so a sealed blob cannot be replayed into another tenant's
row even by a direct database write — the decrypt would fail the authenticity
check.
Rotation (no downtime)
POST /v1/security/keys/rotate (requires the security.keys permission,
admin-seeded):
{"mode": "managed"}— mint a new wrapped key as version N+1.{"mode": "byok", "byok_ref": "vault:kv/tenants/acme#kek"}— activate BYOK against the given reference.
The previous version is retired, not destroyed: new data seals under the new version immediately, while existing data keeps decrypting under the version it was written with. There is no re-encryption pass and no downtime. (Bulk re-wrap of old rows is deliberately deferred — values re-seal naturally on their next write.)
The BYOK lockout guard. A byok_ref is probe-resolved before activation:
a dead reference is rejected, so you cannot rotate yourself into a key probectl
cannot reach. After activation, though, you own the lock. If you later revoke
probectl's access to the reference (or delete the secret), your sealed data
becomes unreadable and sealing fails — that is the feature, not a bug, and there
is no recovery path through probectl. Document your secret manager's own
backup/escrow policy accordingly.
Cryptographic offboarding
Tenant erasure destroys the tenant's entire key chain before the deletion
attestation is sealed: every version's wrapped key is nulled and the chain's
state set to destroyed. The tenant_keys rows survive as evidence, but
material-free. Any tk1: ciphertext that ever escaped into a backup window is
now permanently unreadable.
The deletion attestation carries a tenant_keys line recording how many key
versions were crypto-shredded. Deployments without the byok feature record
"no per-tenant keyring installed" — stated honestly, never implied. Destroyed
chains refuse re-keying: a destroyed tenant cannot silently get a fresh v1 by
writing new data.
API and UI surfaces
GET /v1/security/keys— the key chain (version / mode / state / timestamps). Key material never crosses this API in either direction.POST /v1/security/keys/rotate— managed rotation or BYOK activation.- Tenant Admin → Encryption keys card: chain state, managed rotation, BYOK activation. Hidden entirely when unlicensed.
- Rotations are audited (
security.key_rotate, in the tenant audit stream).
Configuration
| Key | Meaning |
|---|---|
PROBECTL_ENVELOPE_KEY |
base64 32-byte deployment master key. Required for dv1 sealing, and (as the wrap root) for managed per-tenant keys. A licensed BYOK deployment must still set it. |
PROBECTL_ENVELOPE_KEY_ID |
key identifier recorded in dv1 values (default dev). |
| Secret backends | BYOK references resolve through the same resolver as every other secret reference — vault:, cyberark:, aws:, azure:, gcp: (see secrets.md). |
Operational notes
- Key cache: unwrapped or resolved keys are cached in memory for 30 seconds (rotation and destroy purge the tenant's cache immediately). A BYOK revocation therefore takes effect within seconds, not at the next session.
- Key-store outage: if Postgres is unavailable for the
tenant_keystable, seal and open fail with an error (fail safe). Telemetry ingestion is unaffected — only sealed-value reads/writes (e.g. using an alert-channel secret) degrade. - What is NOT per-tenant-keyed: the bulk telemetry stores (ClickHouse / TSDB) rely on isolation-model separation plus verifiable deletion at offboarding; per-tenant keys cover the sensitive-value class. Extending coverage to another data class is a consumer-by-consumer decision through the same seam.