Documentation
Everything on this page is the behavior of the shipped tool, not aspiration. The guided run described below ships in v0.2.0; every other section applies from v0.1.1 onward.
Overview
sembl-stack runs the full pipeline around an AI coding agent and makes it accountable. The pipeline is nine stages; three you should understand before anything else:
- Bounds — before the agent runs, the task is turned into a declared contract:
which paths may change (
editable_paths), which must not (forbidden_areas), and how much churn is acceptable (churn_budget). - Execute in a sandbox — the agent writes in a disposable clone of your repo. Your working tree is never touched during a run.
- The gate — Sembl judges the actual diff against the bounds and returns PASS, WARN, or BLOCK, deterministically. A BLOCK is fed back to the agent as a retry; it is never applied and never merged.
What sembl-stack does not promise: that the model writes better code. It promises the process around the model is correct, recorded, and can't be talked out of its verdict.
Install
pip install sembl-stackRequirements:
- Python 3.10+
- git on PATH — the sandbox stage clones your repo, so runs happen inside git repositories
- Optionally, a coding agent for real runs: Claude Code, Aider, or OpenCode (see
Executors). Without one, the
mockexecutor runs the whole pipeline with zero credentials.
The core install includes the gate (sembl), the LangGraph
orchestrator, MCP transport, and the guided run. There is nothing extra to install for the
default experience.
The guided run
Bare sembl-stack in a repo directory starts an inline, step-by-step
run in your own terminal — arrow-key choices and plain text prompts, in the style of the
Claude Code / Codex CLIs. No flags, no YAML editing, no separate app.
1 · Repository
The guide confirms the directory it detected. If it isn't a git repository, you're offered a
safe demo scaffold — a tiny module, a git repo, and a starter task, created in place without
touching anything that exists. Decline it and the guide tells you exactly what to do instead
(git init + commit, then relaunch).
2 · Agent & keys
One list of every way sembl-stack can run AI work, with live status — is Claude Code installed, which API-key environment variables are set, is OpenCode on PATH. Options you can't use yet are greyed out with the concrete fix next to them. Nothing here ever asks you to paste an API key: sembl-stack stores the name of the environment variable, never the value.
? How should AI work run?
❯ Claude Code (your login) found: C:\...\claude.CMD
API key (env var) none of ANTHROPIC_API_KEY, OPENAI_API_KEY, OPENROUTER_API_KEY are set
OpenCode (local/BYO model) not found on PATH — install and ensure `opencode` is on PATH
No AI — preview the mechanics always available (deterministic demo executor)Your choice is saved as a profile (~/.sembl/profile.json) and the
step is skipped on the next run. Re-open it any time with
sembl-stack --reconfigure.
3 · The task
Describe the change in plain English, then say which paths the agent may touch — prefilled
with your repo's top-level directories, and with your previous answers on later runs. The guide
writes task.yaml and bounds.json for you; you
never edit them by hand. Typos in paths are caught at the prompt
(src/componenets doesn't exist — did you mean src/components/?),
because bounds pointing at a directory that doesn't exist would silently constrain the agent to nothing.
4 · The run
The loop streams as it happens — including the part most tools hide, a first attempt getting blocked and retried:
[+] bounds
[ ] execute attempt 1 — agent writing (sandboxed)…
[+] execute attempt 1
[x] gate BLOCK
[+] execute attempt 2
[+] gate PASS
PASS (after 2 attempts)
receipt: .sembl/runs/20260704-175743-dc5901/
apply: sembl-stack apply 20260704-175743-dc5901The verdict ends with the one next command. Nothing has touched your working tree yet —
apply is the explicit step that does.
A guided run and a headless sembl-stack loop are byte-identical:
same adapters, same artifacts, same verdicts. The guide is a front-end to the machinery, not a
different machine.
Artifacts & the run store
Every hand-off between stages is a typed, serialized artifact. Every run leaves all of them
on disk under .sembl/runs/<run-id>/:
.sembl/runs/20260704-175743-dc5901/
├─ task.json # what was asked
├─ bounds.json # the declared contract the gate enforced
├─ change.json # the real unified diff the executor produced
├─ verdict.json # PASS/WARN/BLOCK + reasons + the diff's SHA-256
├─ merge-record.json # written at merge: what shipped, under whose PASS
└─ trace.json # every stage transition, timedThis is the paper trail. If a change reached production through sembl-stack, this directory
answers how: what was asked, what contract was declared, what diff was actually produced, what
the gate said about it, and who merged it under which verdict. Inspect any run with
sembl-stack runs <id>.
Command reference
Bare sembl-stack is the guided run. Everything below is the same
machinery, scriptable.
Daily commands
sembl-stack |
The guided run: repo → agent & keys → task → watch the gated loop.
--reconfigure re-opens the agent step. |
init |
Scaffold sembl.stack.yaml + a starter
task.yaml + bounds.json from a preset
(--preset just-gate | gate+sandbox | full-loop). In a fresh
directory it also creates the demo module and git repo, so the starter loop is runnable
immediately. |
doctor |
Config-aware preflight: reads your config, then checks exactly the tools and credentials those layers need — git, the executor CLI, key env vars, a bounds source. Run it whenever something fails. |
loop task.yaml |
The full wiring headlessly: plan → execute → gate, retry on BLOCK, record
the run. What CI calls. run is an alias. |
runs [<id>] |
List recorded runs, or inspect one run's complete artifact trail. |
apply <run-id> |
Apply a run's accepted patch to your real tree. Recomputes the patch hash against the verdict's recorded subject first — a verdict issued for a different diff, or a BLOCK, is refused. |
Pipeline stages, individually
Each stage is also a standalone command, consuming and producing artifact files — useful for wiring sembl-stack into an existing pipeline one piece at a time.
bounds | L2: Task → Bounds. |
specgraph | L2: Task(+Bounds) → SpecGraph. |
context | L1: index the repo with the context graph and report its size. |
execute | L3: Task + Bounds → Change (runs the configured agent). |
verify | L5: Change + Bounds → Verdict (the gate, standalone). |
review | L5.5: diff → advisory ReviewReport. A signal, never a gate. |
reconcile | L5.5: SpecGraph + CodeGraph → advisory alignment report
(--live runs it against the indexed repo). |
merge | L6.5: Verdict(PASS) → MergeRecord. Refuses to ship files the verdict never saw. |
deploy | L7: Verdict(PASS) → Delivery. |
postdeploy | L8: Delivery → Verdict — gate the live deployment (health + payload), trigger rollback on failure. |
Introspection
layers | List every available adapter per layer. |
dash | Live run dashboard (optional TUI:
pip install "sembl-stack[tui]"). The same data is always available
as text via runs. |
rsi | Per-executor iterations-to-green and cost readout across recorded runs. |
Configuration
One file: sembl.stack.yaml in your repo. This is the full shape,
as scaffolded by init --preset full-loop:
layers:
spec: sembl # L2 — task becomes bounds
execute: claude # L3 — claude | aider | opencode | mock
sandbox: clone # L4 — clone | worktree
verify: sembl # L5 — the gate
transport:
spec: cli # cli | mcp
verify: cli
options:
execute:
model: # blank = the operator's default model
timeout: 900 # seconds before an attempt counts as failed
loop:
max_attempts: 3
strict: true # WARN escalates to BLOCK when strict
tracing:
langfuse: falsePrecedence — how the effective config is resolved, identically for the
guided run and loop:
- A
sembl.stack.yamlin the repo wins outright. - Otherwise, your saved profile (
~/.sembl/profile.json— the agent-&-keys choice from the guided run) supplies the executor. - Otherwise, built-in defaults.
Swapping a layer is a one-line edit — execute: aider — and the rest
of the pipeline doesn't notice, because stages only ever see each other's artifacts.
task.yaml
text: "Add a live character counter under the feedback textarea"
repo: "."
# spec_path: "./specs/001-feature" # optional: a Spec Kit feature dir / tasks.mdBounds come from spec_path when present, otherwise from a
bounds.json next to the task file:
{
"editable_paths": ["src/components/"],
"forbidden_areas": ["supabase/", "scripts/"],
"churn_budget": { "max_files": 20, "max_lines": 1000 }
}Presets
just-gate |
Gate any diff, nothing else. Needs only sembl. The smallest
possible adoption: your diff, the bounds, a verdict. |
gate+sandbox (default) |
The entire loop with the deterministic mock executor — zero API keys, zero cost. The mock deliberately misbehaves on its first attempt so you can watch BLOCK → feedback → retry → PASS work before connecting a real agent. |
full-loop |
A real agent writes, the sandbox contains, Sembl gates. Requires an executor (see below). |
Executors & credentials
claude |
Claude Code, driven headlessly (claude -p) under
your own login. sembl-stack never handles or stores a token — install
Claude Code, run claude once to log in, done. |
aider |
Aider, using an API key from an environment variable. |
opencode |
OpenCode — local or bring-your-own-model. Needs opencode on PATH. |
mock |
Deterministic demo executor. No credentials, no network, same pipeline. |
For API-key executors, sembl-stack recognizes ANTHROPIC_API_KEY,
OPENAI_API_KEY, and OPENROUTER_API_KEY. The
profile stores which variable name to use — the value stays in your environment, is
never written to disk by sembl-stack, and is scrubbed from recorded executor output.
Verdicts, BLOCK & retry
The gate returns one of three verdicts about a diff, judged against the bounds:
PASS |
Every changed file is inside editable_paths, nothing touches
forbidden_areas, churn is within budget. |
WARN |
Advisory findings. Under strict: true (the default), WARN
escalates to BLOCK — the loop treats it as a failure. |
BLOCK |
The diff violated the contract. The verdict's reasons are fed back to the executor as
context and the loop retries, up to max_attempts. A BLOCK is
never applied, never merged. |
The verdict is deterministic: the same diff against the same bounds always produces the same answer. There is no model in the gate to argue with.
The verdict binding
Most agent pipelines stop at "the check passed." sembl-stack binds every verdict to the exact change it judged:
- Every verdict is stamped with the SHA-256 and file set of the diff it judged.
applyrecomputes the patch hash and refuses a verdict issued for a different patch.mergerefuses if the merge would ship files the verdict never saw.- The escape hatch exists (
--skip-binding-check) but its use is recorded permanently in the MergeRecord. There is no quiet way past the gate.
The consequence: a PASS can only ever ship the change it was issued for. Swap the diff after the verdict and the pipeline stops.
Advisory review (L5.5)
Separate from the gate, the stack can run a code-quality pass over the same diff. It is advisory by design — a signal for humans, never a merge condition, because a probabilistic reviewer must not hold override power over a deterministic contract.
llm |
Bring-your-own agent-CLI reviewer — by default the same Claude Code login that can drive the executor. No extra account. |
coderabbit |
CodeRabbit CLI, if you use it. An optional second opinion. |
Run it with sembl-stack review. The two axes are complementary and
measured: the gate catches contract violations reviewers miss, reviewers catch quality issues
outside the contract's vocabulary.
Headless & CI use
Everything the guide does is a command, so a gated loop drops into CI directly:
# .github/workflows/gated-change.yml (job steps)
- run: pip install sembl-stack
- run: sembl-stack doctor
- run: sembl-stack loop task.yaml # exits non-zero if no PASS
- run: sembl-stack runs # the receipt, in the job logBecause runs live in .sembl/runs/, the artifact trail can be
uploaded as a build artifact — the audit record travels with the build that produced it.
apply and merge keep their binding checks in
CI too; there is no CI-mode bypass.
Troubleshooting
sembl-stack doctor |
Always the first move. It reads your actual config and checks exactly what those layers need, with the one concrete fix per failing check. |
| "not a git repository" | The sandbox clones your repo, so runs need git. git init +
one commit, or accept the guide's demo scaffold in an empty directory. |
| "could not derive bounds" | The task has no bounds source: add a bounds.json next to
task.yaml, or set spec_path. The guided
run and init always create one. |
| Executor not found / not logged in | The agent step shows live status and the exact fix per option. Headlessly:
doctor reports the same checks. |
| Every attempt BLOCKS | Read the verdict's reasons in runs <id> — most often
the bounds are narrower than the change actually requires. Widen
editable_paths deliberately; that's the contract doing its job,
not a bug. |
| Wrong saved agent | sembl-stack --reconfigure re-opens the agent & keys step. |
Found a bug? Open an issue. Security reports: see SECURITY.md.