Data-Workbench Codification

Session-close report · 2026-04-18 · Auditor 🧑‍⚖️ · session 4ee82f98-f7b0-4536-b663-139249451616
Built in: pace-nlp-project · Lifted to: data-workbench (2026-04-18)

This session crystallised a working principle into code. LLMs cannot be trusted with invariants — so the destructive file operations of a data-workbench (notebook patches, large-file writes, git checkouts, Downloads-to-repo syncs) now live in Python and hooks, not in prompt-level rules. Four documented-and-recurring gotchas were pulled up from the "LLM should remember to…" layer to the "runtime refuses to proceed" layer. One patched notebook ran end-to-end on A100 with zero warnings as evidence that the pipeline delivers.

Core principle, crystallised: documented + recurring → move the fix to code. Every new class of data-workbench bug gets a compile-time, pre-write, or pre-run check in the executor. Refusal is the point — a warning replicates the prompt-level failure mode.

Architecture

Trust boundary

LLMs write intent. Code runtime performs actions and refuses when preconditions fail.

flowchart TB subgraph LLM["LLM layer"] L1["Propose action
.workbench/proposed-actions.jsonl"] L2["Invoke narrow CLI
python workbench/apply.py"] end subgraph Runtime["Code runtime"] G["PreToolUse guard
data-workbench-guard.sh"] E["Executor
workbench/apply.py"] P["Preflight
workbench/preflight.py"] R["Renderer
workbench/render.py"] end subgraph Files["Files and state"] F1[".ipynb / .parquet / .pkl
large artefacts"] F2["git working tree"] F3[".workbench/log.jsonl
audit trail"] end L1 -->|propose| E L2 -->|invoke| G G -->|deny or allow| E E -->|validate| P E -->|mutate| F1 E -->|mutate| F2 E -->|append| F3 R -->|read| F1 R -->|read| F2 R -->|read| F3 R -->|write| H["workbench.html
live dashboard"] classDef llm fill:#1e1830,stroke:#a078d8,color:#e0d0ff classDef code fill:#12261a,stroke:#5fc95f,color:#c0f0c0 classDef files fill:#261f12,stroke:#f0a040,color:#f0d8a0 class L1,L2 llm class G,E,P,R code class F1,F2,F3,H files

Notebook lifecycle

Colab produces a fresh file on every download. The workbench discipline treats Downloads as canonical, the repo copy as derivative, and never overwrites in place.

flowchart TB A["Colab run on A100"] -->|download| B["~/Downloads/
basic_notebook.ipynb
basic_notebook (1).ipynb
basic_notebook (2).ipynb"] B -->|sync-colab-notebook| C["basic/basic_notebook.ipynb
repo file"] C -->|patch-notebook| D["basic/basic_notebook_patched.ipynb
separate target"] D -->|upload| A E["guard hook"] -.->|deny in-place| C F["compile check"] -.->|refuse on SyntaxError| D classDef cloud fill:#261212,stroke:#d07070,color:#f0b0b0 classDef local fill:#12262a,stroke:#70c0d0,color:#b0e0e8 classDef guard fill:#261f12,stroke:#f0a040,color:#f0d8a0 class A cloud class B,C,D local class E,F guard

Preflight and abort

Every expensive cell opens with checks. Any failure raises SystemExit(1) — Colab Run-All halts, except: can't swallow it.

flowchart TD Start["Cell 80 starts"] --> P1{"require_gpu"} P1 -->|CUDA present and on cuda| P2{"warmup guard"} P1 -->|CUDA missing or on CPU| AB1["SystemExit 1
banner on stderr
recovery: llm.model.to(cuda)"] P2 -->|0 warnings| Run["Run 2,537 reviews
10-15 it/s on A100"] P2 -->|any warning| AB2["SystemExit 1
list every warning
refuse batch"] classDef start fill:#12261a,stroke:#5fc95f,color:#c0f0c0 classDef check fill:#12181e,stroke:#7090d0,color:#b0c8e0 classDef abort fill:#261212,stroke:#d07070,color:#f0b0b0 classDef run fill:#261f12,stroke:#f0a040,color:#f0d8a0 class Start start class P1,P2 check class AB1,AB2 abort class Run run

The four pieces (what got built)

File	What it does	Role
`workbench/hooks/data-workbench-guard.sh`	PreToolUse hook fragment, sourced from the global `file-guardrail-hook.sh`. Denies in-place `.ipynb` writes and in-place `.parquet/.pkl/.h5/.hdf5/.npz` writes ≥1MB. Target must end in `_patched`, `_NEW`, `_draft`, or `_v\d+`.	Refusal at tool level. LLM can't even attempt a bad write.
`workbench/apply.py`	Executor CLI. Subcommands: `sync-downloads`, `sync-colab-notebook` (handles the `(N)` suffix), `revert-file` (with Downloads-newer safety), `patch-notebook` (compile-check or refuse), `list-proposed`, `log-tail`.	Sole runtime actor for destructive ops. Logs every action to `.workbench/log.jsonl`.
`workbench/preflight.py`	Canonical `require_gpu(pipe)`, `require_files(*paths)`, `require_clean_warnings(fn)`. Each aborts with `SystemExit(1)` and a loud stderr banner on failure.	Cell-level refusal. Expensive pipelines don't run if preconditions fail.
`workbench/render.py`	HTML dashboard generator. Sections: git status, `~/Downloads` vs repo diff for watched file types, proposed actions from Claude, executor log tail. Auto-refreshes every 5s.	Visibility surface. Pierre sees state changes without asking.

Documented + Recurring → Code Enforcement

Four gotchas that had been documented at prompt level but recurred because the rule lived in the LLM's memory, not in code. Each is now a refusal, not a reminder.

#	Gotcha	Was (prompt rule)	Now (code-enforced)
1	In-place `.ipynb` overwrite stomps fresh Colab downloads	before "Check Downloads before editing; diff first" — in skill since March.	after Guard hook denies in-place `.ipynb` writes. Target must end in `_patched`/`_NEW`. Hook refusal is a structured `permissionDecision: deny` — LLM sees the rejection immediately.
2	HF pipeline device-pinned at creation runs silently on CPU	before "Add an assert `pipe.model.device.type == 'cuda'`" — documented in the pinning learning since 2026-04-18 morning.	after `workbench/preflight.py :: require_gpu(pipe)`. `SystemExit(1)` banner with recovery instructions. Inline form in `patches/item32-gpu-preflight.json` for no-import Colab contexts.
3	Substring-whitelist warmup guard false-greens on new warnings	before `gen_warnings = [w for w in caught if 'max_new_tokens' in str(w.message) or ...]` — keyword list maintained by hand.	after Strict `assert not caught` — any warning captured in the warmup aborts, with every message printed. No new warning can slip past a stale keyword list.
4	`\n`-in-heredoc escape mangling produces broken Python in patched cells	before "Escape carefully when building notebook source" — hit repeatedly for weeks.	after `patch-notebook` compiles every modified code cell via `ast.parse` / `compile()`. Any `SyntaxError` → exit code 4 with line-numbered report. Paired with new learning mandating the line-list idiom over triple-quoted heredoc strings.
5	Mermaid syntax drift — training data is on v9/v10; v11 parser is stricter	before No rule. Rendered ` ` and unquoted labels with parens. Broke silently in the deployed v11 CDN.	after New learning `reference_mermaid_syntax.md` mandates `@11` CDN, `<br>` not `<br/>`, quoted labels for anything with special chars. Live docs URL captured.

Item 32 — Before and After

Same three sample reviews, same cell, same runtime. What changed is what the pipeline refuses to be vague about.

Before · `basic_notebook (3).ipynb` · 12:46

[stderr, cell 80 outputs[0]]
The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.

[stdout, cell 80 outputs[1]]
Pre-flight OK — no generation-config warnings.
Running Qwen/Qwen2.5-7B-Instruct on 2,537 reviews (batch=16)...

Review: Too many students from two local colleges go her leave rubbish...
Topics: ['rubbish in changing rooms', 'overcrowding', 'disgusting behavior']
...

Two output streams. The guard printed "Pre-flight OK" while stderr showed the actual warning. False-negative. Run proceeded on CPU until the next cell exposed the device pinning.

After · `basic_notebook_patched.ipynb` · 13:43

[stdout, cell 80 outputs[0] — sole output]
[preflight] GPU ok: cuda:0
Pre-flight OK — no warnings captured.
Running Qwen/Qwen2.5-7B-Instruct on 2,537 reviews (batch=16)...

Review: Too many students from two local colleges go her leave rubbish...
Topics: ['rubbish in changing rooms', 'overcrowding', 'disgusting behavior']
...

Single clean stdout. [preflight] GPU ok: cuda:0 confirms the device before anything heavy runs. Strict warmup captured zero warnings — the temperature/top_p/top_k warning is gone at source, not suppressed. Full pipeline ran cell 80 → cell 104 on A100.

Why the "after" path only has one output: the temperature/top_p/top_k warning was emitted because Qwen's shipped model.generation_config has non-None sampling defaults, and the user's GenerationConfig set do_sample=False without overriding them. The merged config had greedy-mode + sampling params, so transformers warned. Fix: null temperature / top_p / top_k explicitly in BASE_GEN_CFG. Warning never fires; strict guard stays green.

Learnings captured

Updated

brain-vault/learnings/feedback_pipeline_device_pinned_at_creation.md — added a 2026-04-18 recurrence note, documented that the prior rule was prompt-level, pointed to workbench/preflight.py :: require_gpu(pipe) as the canonical enforcement. Embedded the meta-principle explicitly: "documented + recurring → move to code enforcement."

New

brain-vault/learnings/feedback_notebook_patch_line_list_and_compile.md — bans triple-quoted Python-source-in-heredoc construction for notebook cells. Mandates the line-list-joined-with-"\n" idiom. Documents the compile() refusal enforcement. Paired with the pinning learning as the prototype examples of the meta-pattern.
brain-vault/learnings/reference_mermaid_syntax.md — current mermaid CDN (@11), quoted-labels rule, <br> vs <br/>, edge-label constraints. Training data through early 2025 references v9/v10 conventions that fail in v11's stricter parser. Source of truth URL captured.

Indexed

All three learnings are in brain-vault/learnings/_INDEX.md. A fourth standalone learning for the meta-principle itself — "documented + recurring → code enforcement" — is proposed and open for Pierre to decide whether to split out or leave embedded.

File manifest

Path	Status	Size / note
`data-workbench/workbench/apply.py`	lifted from pace-nlp-project	~280 lines · ROOT=cwd
`data-workbench/workbench/render.py`	lifted	~260 lines · ROOT=cwd
`data-workbench/workbench/preflight.py`	lifted	~100 lines
`data-workbench/workbench/hooks/data-workbench-guard.sh`	lifted	~55 lines
`data-workbench/workbench/patches/item32-gpu-preflight.json`	lifted	one-patch JSON for Colab
`data-workbench/report/index.html`	lifted	this file · v11 mermaid
`pace-nlp-project/basic/basic_notebook_patched.ipynb`	new	2.73 MB, three patches, syntax-clean
`pace-nlp-project/.gitignore`	edited	+ `.workbench/`
`brain-vault/skills/workbench.md`	edited	3 inserts: safe-execution, warning bullets, ledger stub
`brain-vault/learnings/feedback_pipeline_device_pinned_at_creation.md`	edited	recurrence note + enforcement pointer
`brain-vault/learnings/feedback_notebook_patch_line_list_and_compile.md`	new	line-list + compile() discipline
`brain-vault/learnings/reference_mermaid_syntax.md`	new	v11 syntax rules + CDN
`brain-vault/learnings/_INDEX.md`	edited	three new pointers
`brain-vault/sessions/2026-04-18-auditor-workbench-codification.md`	new	session handoff

Next pickup

Wire the guard hook — append one line to C:/Users/acebu/projects/acebuddy/scripts/file-guardrail-hook.sh, just before its final exit 0:
```
source "C:/Users/acebu/projects/data-workbench/workbench/hooks/data-workbench-guard.sh"
```
Until this is wired, the discipline is documentation plus narrow-CLI invocation — not enforced at the LLM tool boundary.
Sync the successful Colab run back — python workbench/apply.py sync-colab-notebook basic/basic_notebook.ipynb. Will pick up the newest basic_notebook*.ipynb in Downloads automatically, including the _patched variant at 13:43.
Revert or promote the dirty repo copy — basic/basic_notebook.ipynb still sits on the in-place patch from earlier. Clean it up via python workbench/apply.py revert-file basic/basic_notebook.ipynb (safety checks the Downloads version first) or let the sync in step 2 overwrite it with the fresh Colab result.
Test the wired guard — in a new Claude Code session, ask to overwrite basic_notebook.ipynb in place. Expect a structured [FILE-GUARD] deny with a suggestion to write to _patched.ipynb.
Commit everything — /commit. Workbench tooling, patched notebook, skill updates, learnings.

Deferred (documented, not built)

Executor-level preflight refusal — extend patch-notebook so it refuses to write a cell whose source calls pipe( / llm( / .generate( / clf( / .predict( without require_gpu( or an equivalent CUDA assert appearing earlier in the same cell. Would make device-pinning regressions mechanically impossible via any workbench patch.
Warnings ledger — .workbench/warnings-ledger.jsonl + capture + normalize + match against brain-vault/learnings/ and v_learnings. Design documented in the workbench skill; no code yet.
~~Lift workbench/ to its own repo~~ — done 2026-04-18 (this turn). Now at ~/projects/data-workbench/workbench/; ROOT resolves to caller cwd so pace-nlp-project and any future workbench-class project can both invoke the same tooling.
v2.persona_comments table — flagged in the earlier spine/brain audit. Without it, Auditor can't produce per-persona rating tallies from captured comments, only qualitative notes.

Auditor 🧑‍⚖️ · session close · 2026-04-18 · source data-workbench/report/index.html · deployed data-workbench.pages.dev · PIN-gated · mermaid@11

🔒 Data-Workbench Codification

Architecture

Trust boundary

Notebook lifecycle

Preflight and abort

The four pieces (what got built)

Documented + Recurring → Code Enforcement

Item 32 — Before and After

Before · basic_notebook (3).ipynb · 12:46

After · basic_notebook_patched.ipynb · 13:43

Learnings captured

Updated

New

Indexed

File manifest

Next pickup

Deferred (documented, not built)

Before · `basic_notebook (3).ipynb` · 12:46

After · `basic_notebook_patched.ipynb` · 13:43