probectl /docs GitHub ↗

CI pipeline overview

A deeply-technical-ELI5 map of what happens every time you push — what runs, why each gate exists, and what "green" actually proves. The source of truth is .github/workflows/ci.yml; this doc explains it.

When it runs

ci.yml triggers on push to main and on every pull request. A concurrency group keyed on the ref cancels any in-progress run when you push again, so a rapid second push abandons the first (no wasted runners). The workflow starts with only contents: read permission — it can read the repo but not write to it, so CI can never push to main.

The repo's other three workflows do not run on a normal push:

Workflow Trigger Purpose
ci.yml push to main + every PR the gate described here
release.yml push of a v* tag build/sign/publish a release — its require-green-ci job refuses any tag whose commit doesn't have a green ci run
nightly.yml daily 03:17 UTC (+ manual) the slow stuff: e2e (black-box full stack against the real compose dependencies), ingest-bench (consumer hot-path benchmarks), and scale-gate-m (the M-profile SLO regression guard, make scale-gate-m, plus an M-tier full-stack run against real Kafka + Prometheus)
security-scan.yml weekly, Mondays 05:17 UTC (+ manual) scheduled dependency-CVE scans (govulncheck, npm-audit, trivy-fs) that upload evidence artifacts and fail on Critical findings — so a new CVE in an unchanged repo still goes red

The shape: fan-out, then one umbrella

A push is like sending the change through a checkpoint with 32 specialist inspectors, each examining one thing, all in parallel. A 33rd job — verify-all — is the supervisor: it needs: 28 of them, runs with if: always(), writes a receipt artifact, and fails loudly listing any non-green gate. It is the one status you actually watch: green verify-all = the whole pipeline passed.

Four jobs sit outside the roll-up and report as their own checks: commitlint and dco run only on pull requests (on a push to main they'd register as "skipped" and falsely redden the umbrella), and image-scan and sbom are the image-vulnerability and bill-of-materials jobs. If you turn on branch protection, docs/ops/branch-protection.md recommends requiring verify-all plus those four explicitly.

The guiding principle, visible throughout the repo: every rule in the Non-negotiables has a matching CI gate, and the gates execute the thing rather than just inspecting it — they boot real kernels, real databases, real Kafka, and drill real failovers. Verification you ran, not verification you described.

The jobs, by stage

1. Cheap gatekeeping — "is this change even well-formed?"

  • action-pins — every GitHub Action is pinned to a commit SHA, not a moving tag, so a hijacked upstream action can't slip in; a second pass (scripts/check_supply_pins.sh) rejects :latest image tags and unpinned tool installs anywhere in the workflows (supply-chain).
  • secret-scan — gitleaks: no credentials/keys committed. Deliberate redaction-test fixtures are allowlisted in .gitleaks.toml.
  • commitlint — commit messages follow Conventional Commits (feat(...), fix(ci): ...).
  • dco — every commit carries a Signed-off-by (the Developer Certificate of Origin; what git commit -s adds).
  • no-devauth-in-release — proves the dev-auth bypass is physically absent from a release binary (symbol + literal absence + a boot-time refusal of PROBECTL_AUTH_MODE=dev), and compiles a tagged build to confirm the check isn't vacuous.

2. Static checks — compile, lint, spec; no services yet

  • lint-go / lint-python — gofmt + go vet + golangci-lint on Go, plus the repo's guard scripts (crypto primitives only via internal/crypto, no swallowed errors, hardened HTTP clients, no string-built SQL, unified TLS config); ruff/black on the BGP analyzer.
  • editions-gate — enforces the open-core boundary via make editions-gate: scripts/check_editions_imports.sh (self-tested) proves core never imports ee/, then the core-only build (-tags probectl_core) compiles and passes its tests with the ee/ tree linked out entirely.
  • fips-gate — the FIPS artifact (-tags probectl_fips, validated Go Cryptographic Module via GOFIPS140) builds, and its power-on self-test passes with the validated module active — the build is real, not just tagged.
  • protobuf lint, then buf breaking against main (the schemas are the agent/bus wire contract, so incompatible changes block the merge — see CONTRIBUTING.md), then regenerates and asserts the committed *.pb.go match the .proto files (no drift, no codegen poisoning).
  • openapi-gate — every registered /v1 route exactly matches the OpenAPI 3.1 spec; no undocumented routes ship.
  • migration-gate — DB migrations are additive / expand-only: it rejects destructive or blocking changes (drop column, column-type change, rename, adding NOT NULL) so release N's schema still works with N-1's code during a rolling upgrade.

3. Tests + coverage

  • test-go — the Go unit suite across every workspace module, run with -race, plus a fuzz smoke over the untrusted-input parsers, the backup-encryption/erasure gate, an agent-overhead bench smoke, and cross-compile checks (linux amd64+arm64; the endpoint agent additionally builds for macOS and Windows).
  • test-python — the BGP analyzer's tests, installed from a hash-locked requirements file (which is itself checked against pyproject.toml for drift), with an 85% coverage floor.
  • coverage — per-package statement-coverage floors on the service-free logic/parser/probe packages (scripts/check_coverage.sh); retains coverage.out + a per-package summary as a downloadable receipt artifact (see docs/quality/coverage.md).
  • web — the frontend: typecheck, lint, production build; the test step is the WCAG 2.2 AA accessibility + theme-token gate.
  • browser-worker — the Playwright browser-synthetic worker, run inside the official Playwright image (a real scripted login, timings + failure screenshot).

4. eBPF — building and loading real BPF objects

  • ebpf-kernel-matrix — boots real LTS kernels (5.15 and 6.6) in QEMU (via vimto) and actually loads + attaches the BPF programs (tracepoint + uprobes, one flush cycle). One matrix entry raises kernel lockdown to INTEGRITY inside the VM and proves load+attach still works on a hardened kernel. amd64 runs under KVM; the arm64 runner has no /dev/kvm, so it compiles + digest-verifies the arm64 objects but skips the (too-slow emulated) boot.
  • ebpf-image-live — the shipped probectl-ebpf-agent image must carry the live CO-RE loader (built from Dockerfile.ebpf, asserting -tags=ebpf), not the fixture replayer.

5. Real-infrastructure suites — spin up actual Postgres / Kafka / ClickHouse

  • cross-tenant-isolation — the tenant-isolation non-negotiable, executed (make test-isolation): RLS posture + cross-tenant injection against a real Postgres (over TLS, sslmode=verify-full like production) and a real ClickHouse (including its row-policy DDL).
  • integration — boots the control plane against real Postgres (TLS, verify-full) plus a remote-write Prometheus; migrations are idempotent, /readyz passes, the result pipeline round-trips — and internal/store must hold its 60% integration-coverage floor.
  • perf-smoke — a cheap, repeatable latency/throughput baseline; the first place a pooled-cardinality or RLS-cost regression would surface.
  • backup-drill — backup → wipe → restore actually runs; nonce-marked rows must survive a full database drop.
  • failover-drill — kills the primary, promotes the streaming standby, and times RTO (first write on the new primary) / RPO (acked rows lost).
  • load-smoke — synthetic agents → real Kafka → the production consumer (retry/DLQ + cardinality caps) → Prometheus remote-write → tenant-scoped PromQL, end to end.

6. AI quality

  • rca-eval — a blocking evaluation of the AI root-cause analysis against fixtures, with quality floors (0.85 / 0.85). If RCA quality regresses, the build fails — the AI is held to a measured bar, not just "it compiled."

7. Deploy / packaging config

  • helm-gate — the Helm charts lint for every reference profile and uphold the secure-by-default invariants (non-root/read-only pod, NetworkPolicy/PDB/HPA, drain probe, HSTS, no default credentials); the rendered manifests are schema-validated with kubeconform, the GitOps (ArgoCD/Flux) manifests are checked, and the shipped compose file must pass docker compose config.
  • terraform-gate — the Terraform module is fmt-clean and terraform validates via an example root that consumes it.

8. Supply chain & artifacts

  • dependency-scan — vulnerability scan of the dependency graph.
  • build-images — builds the multi-arch container images.
  • image-scan — scans those built images for known vulnerabilities.
  • sbom — emits a software bill of materials for the build.

9. The umbrella

  • verify-all — depends on the gates above, writes verify-all-summary.json as a receipt, and goes red listing any gate that wasn't green. This is the single signal of overall pipeline health.

What a green pipeline proves

When verify-all is green for a commit, you've demonstrated — on that exact commit — that the code:

  • compiles and passes lint in every mode (normal, FIPS, eBPF);
  • passes unit, real-Postgres, real-ClickHouse, and real-Kafka tests;
  • keeps tenants isolated at the storage layer (RLS, enforced and tested);
  • ships container images that scan clean, with an SBOM and SHA-pinned actions;
  • didn't drift its API spec, protobufs, or migrations, and keeps zero-downtime upgrade compatibility;
  • holds the AI's root-cause quality above its floor;
  • survives a backup/restore and a primary failover.

That is a heavier pipeline than most projects run — closer to what a regulated enterprise expects — which matches where probectl is aimed (sovereign, multi-tenant, source-available).

What the pipeline cannot self-apply

Branch-protection enforcement (making GitHub refuse to merge a red PR) is a server-side repository setting, not something a workflow in the tree can turn on. Until an admin configures it, every check above is advisory at the merge button — the recommended ruleset (require verify-all, plus the four jobs outside its roll-up) and the one-time console steps are in docs/ops/branch-protection.md. The release path has its own independent backstop either way: release.yml refuses to publish a tag whose commit lacks a green ci run.

Rendered live from github.com/imfeelingtheagi/probectl — found a mistake? edit this page.