Enterprise Production Signoff Runbook
This runbook covers the evidence-collection steps that operators must run on their own infrastructure before a StacyVM enterprise multi-worker deployment is considered signed off for production. All automated CI gates (cluster conformance, mTLS smoke, runtime certification) must pass first.Prerequisites
- StacyVM binary built for the target platform (
make buildor release artifact). - A Postgres cluster accessible from the control plane.
- At least one worker host with the target runtime installed (Docker, Firecracker, etc.).
- A PKI that can issue TLS certificates (corporate CA, Vault, cert-manager, etc.).
1. Worker RPC mTLS smoke with deployment-issued certificates
CI validates mTLS with ephemeral certificates. This step proves the same path works with your actual PKI.Prerequisites
Issue three certificates from your deployment CA:| File | Subject | SAN |
|---|---|---|
ca.crt | Your CA certificate | — |
worker.crt / worker.key | Worker RPC server cert | IP:<worker-ip> or DNS:<worker-hostname> |
cp.crt / cp.key | Control-plane client cert | CN=stacyvm-control-plane |
Run
Expected output
What it proves
- Control plane authenticates to worker RPC over TLS (mutual auth).
- Worker presents a valid server cert signed by the deployment CA.
- Sandbox spawn, status, exec, and destroy all succeed over the mTLS channel.
Record
Retain the script output (or a screenshot) as evidence. Reference it in your change-management ticket using the format:2. Runtime certification on each worker host
Run this on every worker host for every runtime it will serve. The report becomes the durable evidence artifact.Docker / gVisor / Kata
docker with gvisor or kata. The script will
check for the runtime in docker info and attempt docker run --runtime=runsc.
Firecracker
Auto-start mode (no external server needed)
If you want to certify the binary itself rather than a running cluster:What the report covers
| Check | Meaning |
|---|---|
docker.cli | Docker CLI found in PATH |
docker.daemon | Docker daemon reachable |
docker.seccomp | seccomp advertised by docker info |
docker.run | docker run alpine echo ok succeeds |
stacyvm.ready | StacyVM API responds to /api/v1/ready |
stacyvm.provider_health | Provider health endpoint returns healthy |
stacyvm.spawn | Sandbox spawned via target runtime |
stacyvm.exec | Command executed in sandbox (exit 0) |
stacyvm.destroy | Sandbox destroyed |
Record
Retain the Markdown report. Every worker host that serves production traffic must have a report on file before go-live. Reference them in your change-management ticket:3. Postgres migration rehearsal
Run before every binary upgrade that includes a database schema change.Expected output
STACYVM_DATABASE_DSN set and the server will apply migrations automatically
on startup. Then re-run pg-rehearse to confirm.
4. OIDC/SSO sign-off
For deployments usingauth.oidc_enabled, validate the configuration before
exposing to users.
auth.oidc_* checks must be [PASS]. Common failure modes:
| Lint output | Fix |
|---|---|
OIDC issuer is not set | Add auth.oidc_issuer pointing to your IdP |
no OIDC verification key configured | Add auth.oidc_jwks_url or auth.oidc_public_key_file |
OIDC audience not set | Add auth.oidc_audience matching your IdP’s client audience |
no OIDC group-to-role mappings | Add at least auth.oidc_admin_groups |

