Data-Workbench Codification
This session crystallised a working principle into code. LLMs cannot be trusted with invariants β so the destructive file operations of a data-workbench (notebook patches, large-file writes, git checkouts, Downloads-to-repo syncs) now live in Python and hooks, not in prompt-level rules. Four documented-and-recurring gotchas were pulled up from the "LLM should remember toβ¦" layer to the "runtime refuses to proceed" layer. One patched notebook ran end-to-end on A100 with zero warnings as evidence that the pipeline delivers.
Architecture
Trust boundary
LLMs write intent. Code runtime performs actions and refuses when preconditions fail.
.workbench/proposed-actions.jsonl"] L2["Invoke narrow CLI
python workbench/apply.py"] end subgraph Runtime["Code runtime"] G["PreToolUse guard
data-workbench-guard.sh"] E["Executor
workbench/apply.py"] P["Preflight
workbench/preflight.py"] R["Renderer
workbench/render.py"] end subgraph Files["Files and state"] F1[".ipynb / .parquet / .pkl
large artefacts"] F2["git working tree"] F3[".workbench/log.jsonl
audit trail"] end L1 -->|propose| E L2 -->|invoke| G G -->|deny or allow| E E -->|validate| P E -->|mutate| F1 E -->|mutate| F2 E -->|append| F3 R -->|read| F1 R -->|read| F2 R -->|read| F3 R -->|write| H["workbench.html
live dashboard"] classDef llm fill:#1e1830,stroke:#a078d8,color:#e0d0ff classDef code fill:#12261a,stroke:#5fc95f,color:#c0f0c0 classDef files fill:#261f12,stroke:#f0a040,color:#f0d8a0 class L1,L2 llm class G,E,P,R code class F1,F2,F3,H files
Notebook lifecycle
Colab produces a fresh file on every download. The workbench discipline treats Downloads as canonical, the repo copy as derivative, and never overwrites in place.
basic_notebook.ipynb
basic_notebook (1).ipynb
basic_notebook (2).ipynb"] B -->|sync-colab-notebook| C["basic/basic_notebook.ipynb
repo file"] C -->|patch-notebook| D["basic/basic_notebook_patched.ipynb
separate target"] D -->|upload| A E["guard hook"] -.->|deny in-place| C F["compile check"] -.->|refuse on SyntaxError| D classDef cloud fill:#261212,stroke:#d07070,color:#f0b0b0 classDef local fill:#12262a,stroke:#70c0d0,color:#b0e0e8 classDef guard fill:#261f12,stroke:#f0a040,color:#f0d8a0 class A cloud class B,C,D local class E,F guard
Preflight and abort
Every expensive cell opens with checks. Any failure raises SystemExit(1) β
Colab Run-All halts, except: can't swallow it.
banner on stderr
recovery: llm.model.to(cuda)"] P2 -->|0 warnings| Run["Run 2,537 reviews
10-15 it/s on A100"] P2 -->|any warning| AB2["SystemExit 1
list every warning
refuse batch"] classDef start fill:#12261a,stroke:#5fc95f,color:#c0f0c0 classDef check fill:#12181e,stroke:#7090d0,color:#b0c8e0 classDef abort fill:#261212,stroke:#d07070,color:#f0b0b0 classDef run fill:#261f12,stroke:#f0a040,color:#f0d8a0 class Start start class P1,P2 check class AB1,AB2 abort class Run run
The four pieces (what got built)
| File | What it does | Role |
|---|---|---|
workbench/hooks/data-workbench-guard.sh |
PreToolUse hook fragment, sourced from the global file-guardrail-hook.sh. Denies in-place .ipynb writes and in-place .parquet/.pkl/.h5/.hdf5/.npz writes β₯1MB. Target must end in _patched, _NEW, _draft, or _v\d+. |
Refusal at tool level. LLM can't even attempt a bad write. |
workbench/apply.py |
Executor CLI. Subcommands: sync-downloads, sync-colab-notebook (handles the (N) suffix), revert-file (with Downloads-newer safety), patch-notebook (compile-check or refuse), list-proposed, log-tail. |
Sole runtime actor for destructive ops. Logs every action to .workbench/log.jsonl. |
workbench/preflight.py |
Canonical require_gpu(pipe), require_files(*paths), require_clean_warnings(fn). Each aborts with SystemExit(1) and a loud stderr banner on failure. |
Cell-level refusal. Expensive pipelines don't run if preconditions fail. |
workbench/render.py |
HTML dashboard generator. Sections: git status, ~/Downloads vs repo diff for watched file types, proposed actions from Claude, executor log tail. Auto-refreshes every 5s. |
Visibility surface. Pierre sees state changes without asking. |
Documented + Recurring β Code Enforcement
Four gotchas that had been documented at prompt level but recurred because the rule lived in the LLM's memory, not in code. Each is now a refusal, not a reminder.
| # | Gotcha | Was (prompt rule) | Now (code-enforced) |
|---|---|---|---|
| 1 | In-place .ipynb overwrite stomps fresh Colab downloads |
before "Check Downloads before editing; diff first" β in skill since March. | after Guard hook denies in-place .ipynb writes. Target must end in _patched/_NEW. Hook refusal is a structured permissionDecision: deny β LLM sees the rejection immediately. |
| 2 | HF pipeline device-pinned at creation runs silently on CPU | before "Add an assert pipe.model.device.type == 'cuda'" β documented in the pinning learning since 2026-04-18 morning. |
after workbench/preflight.py :: require_gpu(pipe). SystemExit(1) banner with recovery instructions. Inline form in patches/item32-gpu-preflight.json for no-import Colab contexts. |
| 3 | Substring-whitelist warmup guard false-greens on new warnings | before gen_warnings = [w for w in caught if 'max_new_tokens' in str(w.message) or ...] β keyword list maintained by hand. |
after Strict assert not caught β any warning captured in the warmup aborts, with every message printed. No new warning can slip past a stale keyword list. |
| 4 | \n-in-heredoc escape mangling produces broken Python in patched cells |
before "Escape carefully when building notebook source" β hit repeatedly for weeks. | after patch-notebook compiles every modified code cell via ast.parse / compile(). Any SyntaxError β exit code 4 with line-numbered report. Paired with new learning mandating the line-list idiom over triple-quoted heredoc strings. |
| 5 | Mermaid syntax drift β training data is on v9/v10; v11 parser is stricter | before No rule. Rendered ` ` and unquoted labels with parens. Broke silently in the deployed v11 CDN. |
after New learning reference_mermaid_syntax.md mandates @11 CDN, <br> not <br/>, quoted labels for anything with special chars. Live docs URL captured. |
Item 32 β Before and After
Same three sample reviews, same cell, same runtime. What changed is what the pipeline refuses to be vague about.
Before Β· basic_notebook (3).ipynb Β· 12:46
[stderr, cell 80 outputs[0]] The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
[stdout, cell 80 outputs[1]] Pre-flight OK β no generation-config warnings. Running Qwen/Qwen2.5-7B-Instruct on 2,537 reviews (batch=16)... Review: Too many students from two local colleges go her leave rubbish... Topics: ['rubbish in changing rooms', 'overcrowding', 'disgusting behavior'] ...
Two output streams. The guard printed "Pre-flight OK" while stderr showed the actual warning. False-negative. Run proceeded on CPU until the next cell exposed the device pinning.
After Β· basic_notebook_patched.ipynb Β· 13:43
[stdout, cell 80 outputs[0] β sole output] [preflight] GPU ok: cuda:0 Pre-flight OK β no warnings captured. Running Qwen/Qwen2.5-7B-Instruct on 2,537 reviews (batch=16)... Review: Too many students from two local colleges go her leave rubbish... Topics: ['rubbish in changing rooms', 'overcrowding', 'disgusting behavior'] ...
Single clean stdout. [preflight] GPU ok: cuda:0 confirms the device before
anything heavy runs. Strict warmup captured zero warnings β the temperature/top_p/top_k
warning is gone at source, not suppressed. Full pipeline ran cell 80 β cell 104 on A100.
model.generation_config has non-None sampling
defaults, and the user's GenerationConfig set do_sample=False
without overriding them. The merged config had greedy-mode + sampling params, so transformers
warned. Fix: null temperature / top_p / top_k explicitly in BASE_GEN_CFG.
Warning never fires; strict guard stays green.
Learnings captured
Updated
brain-vault/learnings/feedback_pipeline_device_pinned_at_creation.mdβ added a 2026-04-18 recurrence note, documented that the prior rule was prompt-level, pointed toworkbench/preflight.py :: require_gpu(pipe)as the canonical enforcement. Embedded the meta-principle explicitly: "documented + recurring β move to code enforcement."
New
brain-vault/learnings/feedback_notebook_patch_line_list_and_compile.mdβ bans triple-quoted Python-source-in-heredoc construction for notebook cells. Mandates the line-list-joined-with-"\n"idiom. Documents thecompile()refusal enforcement. Paired with the pinning learning as the prototype examples of the meta-pattern.brain-vault/learnings/reference_mermaid_syntax.mdβ current mermaid CDN (@11), quoted-labels rule,<br>vs<br/>, edge-label constraints. Training data through early 2025 references v9/v10 conventions that fail in v11's stricter parser. Source of truth URL captured.
Indexed
All three learnings are in brain-vault/learnings/_INDEX.md. A fourth standalone
learning for the meta-principle itself β "documented + recurring β code enforcement" β
is proposed and open for Pierre to decide whether to split out or leave embedded.
File manifest
| Path | Status | Size / note |
|---|---|---|
pace-nlp-project/workbench/apply.py | new + extended | ~280 lines (with compile-refuse) |
pace-nlp-project/workbench/render.py | new | ~260 lines |
pace-nlp-project/workbench/preflight.py | new | ~100 lines |
pace-nlp-project/workbench/hooks/data-workbench-guard.sh | new | ~55 lines |
pace-nlp-project/workbench/patches/item32-gpu-preflight.json | new | one-patch JSON for Colab |
pace-nlp-project/workbench/report/index.html | new + fixed | this file Β· v11 mermaid |
pace-nlp-project/basic/basic_notebook_patched.ipynb | new | 2.73 MB, three patches, syntax-clean |
pace-nlp-project/.gitignore | edited | + .workbench/ |
brain-vault/skills/workbench.md | edited | 3 inserts: safe-execution, warning bullets, ledger stub |
brain-vault/learnings/feedback_pipeline_device_pinned_at_creation.md | edited | recurrence note + enforcement pointer |
brain-vault/learnings/feedback_notebook_patch_line_list_and_compile.md | new | line-list + compile() discipline |
brain-vault/learnings/reference_mermaid_syntax.md | new | v11 syntax rules + CDN |
brain-vault/learnings/_INDEX.md | edited | three new pointers |
brain-vault/sessions/2026-04-18-auditor-workbench-codification.md | new | session handoff |
Next pickup
- Wire the guard hook β append one line to
C:/Users/acebu/projects/acebuddy/scripts/file-guardrail-hook.sh, just before its finalexit 0:source "C:/Users/acebu/projects/pace-nlp-project/workbench/hooks/data-workbench-guard.sh"
Until this is wired, the discipline is documentation plus narrow-CLI invocation β not enforced at the LLM tool boundary. - Sync the successful Colab run back β
python workbench/apply.py sync-colab-notebook basic/basic_notebook.ipynb. Will pick up the newestbasic_notebook*.ipynbin Downloads automatically, including the_patchedvariant at 13:43. - Revert or promote the dirty repo copy β
basic/basic_notebook.ipynbstill sits on the in-place patch from earlier. Clean it up viapython workbench/apply.py revert-file basic/basic_notebook.ipynb(safety checks the Downloads version first) or let the sync in step 2 overwrite it with the fresh Colab result. - Test the wired guard β in a new Claude Code session, ask to overwrite
basic_notebook.ipynbin place. Expect a structured[FILE-GUARD]deny with a suggestion to write to_patched.ipynb. - Commit everything β
/commit. Workbench tooling, patched notebook, skill updates, learnings.
Deferred (documented, not built)
- Executor-level preflight refusal β extend
patch-notebookso it refuses to write a cell whose source callspipe(/llm(/.generate(/clf(/.predict(withoutrequire_gpu(or an equivalent CUDA assert appearing earlier in the same cell. Would make device-pinning regressions mechanically impossible via any workbench patch. - Warnings ledger β
.workbench/warnings-ledger.jsonl+ capture + normalize + match againstbrain-vault/learnings/andv_learnings. Design documented in the workbench skill; no code yet. - Lift workbench/ to its own repo β once a second workbench-class project exists, promote the tooling out of
pace-nlp-project/workbench/to a standalone~/projects/data-workbench/. v2.persona_commentstable β flagged in the earlier spine/brain audit. Without it, Auditor can't produce per-persona rating tallies from captured comments, only qualitative notes.