sembl-stack

Documentation

Everything on this page is the behavior of the shipped tool, not aspiration. The guided run described below ships in v0.2.0; every other section applies from v0.1.1 onward.

Overview

sembl-stack runs the full pipeline around an AI coding agent and makes it accountable. The pipeline is nine stages; three you should understand before anything else:

  • Bounds — before the agent runs, the task is turned into a declared contract: which paths may change (editable_paths), which must not (forbidden_areas), and how much churn is acceptable (churn_budget).
  • Execute in a sandbox — the agent writes in a disposable clone of your repo. Your working tree is never touched during a run.
  • The gateSembl judges the actual diff against the bounds and returns PASS, WARN, or BLOCK, deterministically. A BLOCK is fed back to the agent as a retry; it is never applied and never merged.

What sembl-stack does not promise: that the model writes better code. It promises the process around the model is correct, recorded, and can't be talked out of its verdict.

Install

pip install sembl-stack

Requirements:

  • Python 3.10+
  • git on PATH — the sandbox stage clones your repo, so runs happen inside git repositories
  • Optionally, a coding agent for real runs: Claude Code, Aider, or OpenCode (see Executors). Without one, the mock executor runs the whole pipeline with zero credentials.

The core install includes the gate (sembl), the LangGraph orchestrator, MCP transport, and the guided run. There is nothing extra to install for the default experience.

The guided run

Bare sembl-stack in a repo directory starts an inline, step-by-step run in your own terminal — arrow-key choices and plain text prompts, in the style of the Claude Code / Codex CLIs. No flags, no YAML editing, no separate app.

1 · Repository

The guide confirms the directory it detected. If it isn't a git repository, you're offered a safe demo scaffold — a tiny module, a git repo, and a starter task, created in place without touching anything that exists. Decline it and the guide tells you exactly what to do instead (git init + commit, then relaunch).

2 · Agent & keys

One list of every way sembl-stack can run AI work, with live status — is Claude Code installed, which API-key environment variables are set, is OpenCode on PATH. Options you can't use yet are greyed out with the concrete fix next to them. Nothing here ever asks you to paste an API key: sembl-stack stores the name of the environment variable, never the value.

? How should AI work run? ❯ Claude Code (your login) found: C:\...\claude.CMD API key (env var) none of ANTHROPIC_API_KEY, OPENAI_API_KEY, OPENROUTER_API_KEY are set OpenCode (local/BYO model) not found on PATH — install and ensure `opencode` is on PATH No AI — preview the mechanics always available (deterministic demo executor)

Your choice is saved as a profile (~/.sembl/profile.json) and the step is skipped on the next run. Re-open it any time with sembl-stack --reconfigure.

3 · The task

Describe the change in plain English, then say which paths the agent may touch — prefilled with your repo's top-level directories, and with your previous answers on later runs. The guide writes task.yaml and bounds.json for you; you never edit them by hand. Typos in paths are caught at the prompt (src/componenets doesn't exist — did you mean src/components/?), because bounds pointing at a directory that doesn't exist would silently constrain the agent to nothing.

4 · The run

The loop streams as it happens — including the part most tools hide, a first attempt getting blocked and retried:

[+] bounds [ ] execute attempt 1 — agent writing (sandboxed)… [+] execute attempt 1 [x] gate BLOCK [+] execute attempt 2 [+] gate PASS PASS (after 2 attempts) receipt: .sembl/runs/20260704-175743-dc5901/ apply: sembl-stack apply 20260704-175743-dc5901

The verdict ends with the one next command. Nothing has touched your working tree yet — apply is the explicit step that does.

A guided run and a headless sembl-stack loop are byte-identical: same adapters, same artifacts, same verdicts. The guide is a front-end to the machinery, not a different machine.

Artifacts & the run store

Every hand-off between stages is a typed, serialized artifact. Every run leaves all of them on disk under .sembl/runs/<run-id>/:

.sembl/runs/20260704-175743-dc5901/ ├─ task.json # what was asked ├─ bounds.json # the declared contract the gate enforced ├─ change.json # the real unified diff the executor produced ├─ verdict.json # PASS/WARN/BLOCK + reasons + the diff's SHA-256 ├─ merge-record.json # written at merge: what shipped, under whose PASS └─ trace.json # every stage transition, timed

This is the paper trail. If a change reached production through sembl-stack, this directory answers how: what was asked, what contract was declared, what diff was actually produced, what the gate said about it, and who merged it under which verdict. Inspect any run with sembl-stack runs <id>.

Command reference

Bare sembl-stack is the guided run. Everything below is the same machinery, scriptable.

Daily commands

sembl-stack The guided run: repo → agent & keys → task → watch the gated loop. --reconfigure re-opens the agent step.
init Scaffold sembl.stack.yaml + a starter task.yaml + bounds.json from a preset (--preset just-gate | gate+sandbox | full-loop). In a fresh directory it also creates the demo module and git repo, so the starter loop is runnable immediately.
doctor Config-aware preflight: reads your config, then checks exactly the tools and credentials those layers need — git, the executor CLI, key env vars, a bounds source. Run it whenever something fails.
loop task.yaml The full wiring headlessly: plan → execute → gate, retry on BLOCK, record the run. What CI calls. run is an alias.
runs [<id>] List recorded runs, or inspect one run's complete artifact trail.
apply <run-id> Apply a run's accepted patch to your real tree. Recomputes the patch hash against the verdict's recorded subject first — a verdict issued for a different diff, or a BLOCK, is refused.

Pipeline stages, individually

Each stage is also a standalone command, consuming and producing artifact files — useful for wiring sembl-stack into an existing pipeline one piece at a time.

boundsL2: Task → Bounds.
specgraphL2: Task(+Bounds) → SpecGraph.
contextL1: index the repo with the context graph and report its size.
executeL3: Task + Bounds → Change (runs the configured agent).
verifyL5: Change + Bounds → Verdict (the gate, standalone).
reviewL5.5: diff → advisory ReviewReport. A signal, never a gate.
reconcileL5.5: SpecGraph + CodeGraph → advisory alignment report (--live runs it against the indexed repo).
mergeL6.5: Verdict(PASS) → MergeRecord. Refuses to ship files the verdict never saw.
deployL7: Verdict(PASS) → Delivery.
postdeployL8: Delivery → Verdict — gate the live deployment (health + payload), trigger rollback on failure.

Introspection

layersList every available adapter per layer.
dashLive run dashboard (optional TUI: pip install "sembl-stack[tui]"). The same data is always available as text via runs.
rsiPer-executor iterations-to-green and cost readout across recorded runs.

Configuration

One file: sembl.stack.yaml in your repo. This is the full shape, as scaffolded by init --preset full-loop:

layers: spec: sembl # L2 — task becomes bounds execute: claude # L3 — claude | aider | opencode | mock sandbox: clone # L4 — clone | worktree verify: sembl # L5 — the gate transport: spec: cli # cli | mcp verify: cli options: execute: model: # blank = the operator's default model timeout: 900 # seconds before an attempt counts as failed loop: max_attempts: 3 strict: true # WARN escalates to BLOCK when strict tracing: langfuse: false

Precedence — how the effective config is resolved, identically for the guided run and loop:

  1. A sembl.stack.yaml in the repo wins outright.
  2. Otherwise, your saved profile (~/.sembl/profile.json — the agent-&-keys choice from the guided run) supplies the executor.
  3. Otherwise, built-in defaults.

Swapping a layer is a one-line edit — execute: aider — and the rest of the pipeline doesn't notice, because stages only ever see each other's artifacts.

task.yaml

text: "Add a live character counter under the feedback textarea" repo: "." # spec_path: "./specs/001-feature" # optional: a Spec Kit feature dir / tasks.md

Bounds come from spec_path when present, otherwise from a bounds.json next to the task file:

{ "editable_paths": ["src/components/"], "forbidden_areas": ["supabase/", "scripts/"], "churn_budget": { "max_files": 20, "max_lines": 1000 } }

Presets

just-gate Gate any diff, nothing else. Needs only sembl. The smallest possible adoption: your diff, the bounds, a verdict.
gate+sandbox (default) The entire loop with the deterministic mock executor — zero API keys, zero cost. The mock deliberately misbehaves on its first attempt so you can watch BLOCK → feedback → retry → PASS work before connecting a real agent.
full-loop A real agent writes, the sandbox contains, Sembl gates. Requires an executor (see below).

Executors & credentials

claude Claude Code, driven headlessly (claude -p) under your own login. sembl-stack never handles or stores a token — install Claude Code, run claude once to log in, done.
aider Aider, using an API key from an environment variable.
opencode OpenCode — local or bring-your-own-model. Needs opencode on PATH.
mock Deterministic demo executor. No credentials, no network, same pipeline.

For API-key executors, sembl-stack recognizes ANTHROPIC_API_KEY, OPENAI_API_KEY, and OPENROUTER_API_KEY. The profile stores which variable name to use — the value stays in your environment, is never written to disk by sembl-stack, and is scrubbed from recorded executor output.

Verdicts, BLOCK & retry

The gate returns one of three verdicts about a diff, judged against the bounds:

PASS Every changed file is inside editable_paths, nothing touches forbidden_areas, churn is within budget.
WARN Advisory findings. Under strict: true (the default), WARN escalates to BLOCK — the loop treats it as a failure.
BLOCK The diff violated the contract. The verdict's reasons are fed back to the executor as context and the loop retries, up to max_attempts. A BLOCK is never applied, never merged.

The verdict is deterministic: the same diff against the same bounds always produces the same answer. There is no model in the gate to argue with.

The verdict binding

Most agent pipelines stop at "the check passed." sembl-stack binds every verdict to the exact change it judged:

  • Every verdict is stamped with the SHA-256 and file set of the diff it judged.
  • apply recomputes the patch hash and refuses a verdict issued for a different patch.
  • merge refuses if the merge would ship files the verdict never saw.
  • The escape hatch exists (--skip-binding-check) but its use is recorded permanently in the MergeRecord. There is no quiet way past the gate.

The consequence: a PASS can only ever ship the change it was issued for. Swap the diff after the verdict and the pipeline stops.

Advisory review (L5.5)

Separate from the gate, the stack can run a code-quality pass over the same diff. It is advisory by design — a signal for humans, never a merge condition, because a probabilistic reviewer must not hold override power over a deterministic contract.

llm Bring-your-own agent-CLI reviewer — by default the same Claude Code login that can drive the executor. No extra account.
coderabbit CodeRabbit CLI, if you use it. An optional second opinion.

Run it with sembl-stack review. The two axes are complementary and measured: the gate catches contract violations reviewers miss, reviewers catch quality issues outside the contract's vocabulary.

Headless & CI use

Everything the guide does is a command, so a gated loop drops into CI directly:

# .github/workflows/gated-change.yml (job steps) - run: pip install sembl-stack - run: sembl-stack doctor - run: sembl-stack loop task.yaml # exits non-zero if no PASS - run: sembl-stack runs # the receipt, in the job log

Because runs live in .sembl/runs/, the artifact trail can be uploaded as a build artifact — the audit record travels with the build that produced it. apply and merge keep their binding checks in CI too; there is no CI-mode bypass.

Troubleshooting

sembl-stack doctor Always the first move. It reads your actual config and checks exactly what those layers need, with the one concrete fix per failing check.
"not a git repository" The sandbox clones your repo, so runs need git. git init + one commit, or accept the guide's demo scaffold in an empty directory.
"could not derive bounds" The task has no bounds source: add a bounds.json next to task.yaml, or set spec_path. The guided run and init always create one.
Executor not found / not logged in The agent step shows live status and the exact fix per option. Headlessly: doctor reports the same checks.
Every attempt BLOCKS Read the verdict's reasons in runs <id> — most often the bounds are narrower than the change actually requires. Widen editable_paths deliberately; that's the contract doing its job, not a bug.
Wrong saved agent sembl-stack --reconfigure re-opens the agent & keys step.

Found a bug? Open an issue. Security reports: see SECURITY.md.