FinOps / egress cost observability
What this is
probectl already sees your network traffic (the flow stream). This engine puts
dollars on that traffic. It lives in the control plane (internal/cost) and
answers FinOps questions: which service or team is spending money on the
network, which conversations are crossing expensive boundaries ("chatty
services"), how cost trends hour to hour, and whether a team is about to blow its
monthly budget.
The mechanism, in one line: take the bytes probectl observes, label each flow with a traffic class (same zone, cross-AZ, cross-region, internet egress), multiply the bytes by a per-class price, and attribute the result to a service and team. Cheap to compute, because the bytes are already flowing.
Three ground rules shape the whole feature:
- Volume × public pricing, not billing. Cloud billing APIs differ between providers and lag by hours or days, so the engine prices observed egress volume against published list rates. This is an attribution-and-detection tool — "who is generating cross-region traffic and roughly what does it cost" — not a reconciliation of your actual invoice. Full cloud-billing reconciliation is out of scope by design.
- Degrade gracefully. With no price table, the engine runs in volume-only
mode: bytes are still attributed, dollars are never invented, and
priced: falseis surfaced everywhere. With no zone rules, locality classes areunknownand the UI says so. It never refuses and never guesses a rate. - Pricing freshness is visible. The built-in rates are representative list prices from public pricing pages, carrying a source, an as-of date, and a license note. Operators override them with their own current or negotiated rates. The as-of date is always displayed, so staleness is visible rather than hidden.
How traffic is classified and priced
First the flow is classified by where its two ends sit. probectl resolves each address to a zone/region using operator-declared CIDR rules (it cannot guess your subnet layout), then:
| Class | Meaning | Default rate ($/GiB) |
|---|---|---|
same_zone |
both ends map to the same zone | 0 (free on the major clouds) |
inter_az |
same region, different zones | 0.01 |
inter_region |
different regions | 0.02 |
internet_egress |
source is mapped, destination is a public address | 0.09 |
unknown |
zones unmapped (or destination is private/unresolvable) | unpriced — volume is still tracked |
unknown is the honest fallback: the bytes are counted, but no dollar figure is
attached (the default price table simply has no rate for it). Classification
uses longest-prefix matching, so a /24 rule wins over an overlapping /16.
Zone and ownership maps are operator-declared:
# CIDR → zone (region is derived from the trailing zone letter, or set explicit zone/region)
export PROBECTL_COST_ZONES="10.0.1.0/24=us-east-1a,10.0.2.0/24=us-east-1b,10.9.0.0/16=eu-west-1a"
# CIDR → service:team (attribution + showback)
export PROBECTL_COST_SERVICES="10.0.1.0/24=checkout:payments,10.0.2.0/24=inventory:logistics"
# Monthly USD budgets; a breach raises ONE cost-plane signal per month
export PROBECTL_COST_BUDGETS="team:payments=500,service:checkout=120"
To override the prices, point PROBECTL_COST_PRICES_FILE at a JSON file in the
PriceTable shape. A malformed file fails startup — silently mispriced cost
data is worse than none. (To run with no pricing at all, set
PROBECTL_COST_PRICED=false instead.)
{
"per_gib": { "inter_az": 0.01, "inter_region": 0.02, "internet_egress": 0.08 },
"source": "negotiated rates, FY26 agreement",
"as_of": "2026-06-01",
"license": "internal"
}
Outputs
GET /v1/cost/summary(permissionmetrics.read) — the tenant's totals, by-class / by-service / by-team breakdowns, the top "chatty" zone pairs, a 7-day hourly trend, budget status, and the honesty flags (cost_running,priced,zones_mapped) plus the pricing provenance. A zone-pair conversation is flaggedchattyonce it crosses 1 GiB of paid cross-AZ or cross-region traffic (same-zone and internet traffic do not count toward chatty, since the point is to surface money quietly leaking across internal boundaries).- Budget alerts — crossing a monthly budget raises a
cost.budget_exceededsignal (planecost) into the incident pipeline. It fires once per budget per month (alert-fatigue control) and re-arms on month rollover. Signals only: probectl never throttles traffic or touches your bill — a detection is a signal, never an enforcement point. - Cost page (
/cost) — the light native summary: totals with pricing provenance, team showback, chatty cross-AZ conversations, budget status, and explicit volume-only / zones-unmapped notices. Deep dashboarding is federated to Grafana (seedocs/ecosystem-integrations.md); cost series ride the same flow analytics the Grafana datasource already exposes, so there is no separate dashboard surface to maintain.
Mechanics
%%{init: {'theme':'base','themeVariables':{'background':'#0d1117','primaryColor':'#161b22','primaryTextColor':'#e6edf3','primaryBorderColor':'#3b82f6','lineColor':'#8b949e','secondaryColor':'#21262d','tertiaryColor':'#0d1117','clusterBkg':'#161b22','clusterBorder':'#30363d','fontFamily':'ui-monospace, SFMono-Regular, Menlo, monospace'},'flowchart':{'curve':'basis','nodeSpacing':55,'rankSpacing':55,'padding':12}}}%%
flowchart LR
FL[flow stream] --> CC[CostConsumer]
CC --> E[cost.Engine<br/>tenant-partitioned]
Z[zone rules] -.classify.-> E
O[owner rules] -.attribute.-> E
P[price table<br/>public list / override] -.price.-> E
E -->|summary| API[GET /v1/cost/summary]
E -->|"cost.budget_exceeded signal"| INC[incident pipeline]
API --> UI[Cost page + Grafana]
All state is tenant-partitioned — tenant isolation is the platform's outermost
boundary (see the Non-negotiables).
Attribution maps are bounded: once a per-tenant map hits 1024 keys, further
entries collapse into (other) so memory can't grow without limit. A flow
record arriving without a tenant is dropped at the boundary. The in-memory
engine is rebuilt from the stream on restart — the durable, queryable series
live in the TSDB/Grafana path, not in this process.
Configuration
| Variable | Default | Purpose |
|---|---|---|
PROBECTL_COST_ENABLED |
true |
the engine + flow consumer (local-only processing) |
PROBECTL_COST_ZONES |
(none) | CIDR→zone rules (cidr=zone[/region],…) |
PROBECTL_COST_SERVICES |
(none) | CIDR→service:team attribution rules |
PROBECTL_COST_BUDGETS |
(none) | monthly USD budgets (team:payments=500,…) |
PROBECTL_COST_PRICES_FILE |
(none) | JSON price-table override (built-in public list rates otherwise) |
PROBECTL_COST_PRICED |
true |
false switches to volume-only mode (no pricing at all) |