πŸ”’ Data-Workbench Codification

Session-close report Β· 2026-04-18. Same pin as PACE.

Data-Workbench Codification

Session-close report Β· 2026-04-18 Β· Auditor πŸ§‘β€βš–οΈ Β· session 4ee82f98-f7b0-4536-b663-139249451616
Project: pace-nlp-project Β· Workbench-class

This session crystallised a working principle into code. LLMs cannot be trusted with invariants β€” so the destructive file operations of a data-workbench (notebook patches, large-file writes, git checkouts, Downloads-to-repo syncs) now live in Python and hooks, not in prompt-level rules. Four documented-and-recurring gotchas were pulled up from the "LLM should remember to…" layer to the "runtime refuses to proceed" layer. One patched notebook ran end-to-end on A100 with zero warnings as evidence that the pipeline delivers.

Core principle, crystallised: documented + recurring β†’ move the fix to code. Every new class of data-workbench bug gets a compile-time, pre-write, or pre-run check in the executor. Refusal is the point β€” a warning replicates the prompt-level failure mode.

Architecture

Trust boundary

LLMs write intent. Code runtime performs actions and refuses when preconditions fail.

flowchart TB subgraph LLM["LLM layer"] L1["Propose action
.workbench/proposed-actions.jsonl"] L2["Invoke narrow CLI
python workbench/apply.py"] end subgraph Runtime["Code runtime"] G["PreToolUse guard
data-workbench-guard.sh"] E["Executor
workbench/apply.py"] P["Preflight
workbench/preflight.py"] R["Renderer
workbench/render.py"] end subgraph Files["Files and state"] F1[".ipynb / .parquet / .pkl
large artefacts"] F2["git working tree"] F3[".workbench/log.jsonl
audit trail"] end L1 -->|propose| E L2 -->|invoke| G G -->|deny or allow| E E -->|validate| P E -->|mutate| F1 E -->|mutate| F2 E -->|append| F3 R -->|read| F1 R -->|read| F2 R -->|read| F3 R -->|write| H["workbench.html
live dashboard"] classDef llm fill:#1e1830,stroke:#a078d8,color:#e0d0ff classDef code fill:#12261a,stroke:#5fc95f,color:#c0f0c0 classDef files fill:#261f12,stroke:#f0a040,color:#f0d8a0 class L1,L2 llm class G,E,P,R code class F1,F2,F3,H files

Notebook lifecycle

Colab produces a fresh file on every download. The workbench discipline treats Downloads as canonical, the repo copy as derivative, and never overwrites in place.

flowchart TB A["Colab run on A100"] -->|download| B["~/Downloads/
basic_notebook.ipynb
basic_notebook (1).ipynb
basic_notebook (2).ipynb"] B -->|sync-colab-notebook| C["basic/basic_notebook.ipynb
repo file"] C -->|patch-notebook| D["basic/basic_notebook_patched.ipynb
separate target"] D -->|upload| A E["guard hook"] -.->|deny in-place| C F["compile check"] -.->|refuse on SyntaxError| D classDef cloud fill:#261212,stroke:#d07070,color:#f0b0b0 classDef local fill:#12262a,stroke:#70c0d0,color:#b0e0e8 classDef guard fill:#261f12,stroke:#f0a040,color:#f0d8a0 class A cloud class B,C,D local class E,F guard

Preflight and abort

Every expensive cell opens with checks. Any failure raises SystemExit(1) β€” Colab Run-All halts, except: can't swallow it.

flowchart TD Start["Cell 80 starts"] --> P1{"require_gpu"} P1 -->|CUDA present and on cuda| P2{"warmup guard"} P1 -->|CUDA missing or on CPU| AB1["SystemExit 1
banner on stderr
recovery: llm.model.to(cuda)"] P2 -->|0 warnings| Run["Run 2,537 reviews
10-15 it/s on A100"] P2 -->|any warning| AB2["SystemExit 1
list every warning
refuse batch"] classDef start fill:#12261a,stroke:#5fc95f,color:#c0f0c0 classDef check fill:#12181e,stroke:#7090d0,color:#b0c8e0 classDef abort fill:#261212,stroke:#d07070,color:#f0b0b0 classDef run fill:#261f12,stroke:#f0a040,color:#f0d8a0 class Start start class P1,P2 check class AB1,AB2 abort class Run run

The four pieces (what got built)

File What it does Role
workbench/hooks/data-workbench-guard.sh PreToolUse hook fragment, sourced from the global file-guardrail-hook.sh. Denies in-place .ipynb writes and in-place .parquet/.pkl/.h5/.hdf5/.npz writes β‰₯1MB. Target must end in _patched, _NEW, _draft, or _v\d+. Refusal at tool level. LLM can't even attempt a bad write.
workbench/apply.py Executor CLI. Subcommands: sync-downloads, sync-colab-notebook (handles the (N) suffix), revert-file (with Downloads-newer safety), patch-notebook (compile-check or refuse), list-proposed, log-tail. Sole runtime actor for destructive ops. Logs every action to .workbench/log.jsonl.
workbench/preflight.py Canonical require_gpu(pipe), require_files(*paths), require_clean_warnings(fn). Each aborts with SystemExit(1) and a loud stderr banner on failure. Cell-level refusal. Expensive pipelines don't run if preconditions fail.
workbench/render.py HTML dashboard generator. Sections: git status, ~/Downloads vs repo diff for watched file types, proposed actions from Claude, executor log tail. Auto-refreshes every 5s. Visibility surface. Pierre sees state changes without asking.

Documented + Recurring β†’ Code Enforcement

Four gotchas that had been documented at prompt level but recurred because the rule lived in the LLM's memory, not in code. Each is now a refusal, not a reminder.

# Gotcha Was (prompt rule) Now (code-enforced)
1 In-place .ipynb overwrite stomps fresh Colab downloads before "Check Downloads before editing; diff first" β€” in skill since March. after Guard hook denies in-place .ipynb writes. Target must end in _patched/_NEW. Hook refusal is a structured permissionDecision: deny β€” LLM sees the rejection immediately.
2 HF pipeline device-pinned at creation runs silently on CPU before "Add an assert pipe.model.device.type == 'cuda'" β€” documented in the pinning learning since 2026-04-18 morning. after workbench/preflight.py :: require_gpu(pipe). SystemExit(1) banner with recovery instructions. Inline form in patches/item32-gpu-preflight.json for no-import Colab contexts.
3 Substring-whitelist warmup guard false-greens on new warnings before gen_warnings = [w for w in caught if 'max_new_tokens' in str(w.message) or ...] β€” keyword list maintained by hand. after Strict assert not caught β€” any warning captured in the warmup aborts, with every message printed. No new warning can slip past a stale keyword list.
4 \n-in-heredoc escape mangling produces broken Python in patched cells before "Escape carefully when building notebook source" β€” hit repeatedly for weeks. after patch-notebook compiles every modified code cell via ast.parse / compile(). Any SyntaxError β†’ exit code 4 with line-numbered report. Paired with new learning mandating the line-list idiom over triple-quoted heredoc strings.
5 Mermaid syntax drift β€” training data is on v9/v10; v11 parser is stricter before No rule. Rendered `
` and unquoted labels with parens. Broke silently in the deployed v11 CDN.
after New learning reference_mermaid_syntax.md mandates @11 CDN, <br> not <br/>, quoted labels for anything with special chars. Live docs URL captured.

Item 32 β€” Before and After

Same three sample reviews, same cell, same runtime. What changed is what the pipeline refuses to be vague about.

Before Β· basic_notebook (3).ipynb Β· 12:46

[stderr, cell 80 outputs[0]]
The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
[stdout, cell 80 outputs[1]]
Pre-flight OK β€” no generation-config warnings.
Running Qwen/Qwen2.5-7B-Instruct on 2,537 reviews (batch=16)...

Review: Too many students from two local colleges go her leave rubbish...
Topics: ['rubbish in changing rooms', 'overcrowding', 'disgusting behavior']
...

Two output streams. The guard printed "Pre-flight OK" while stderr showed the actual warning. False-negative. Run proceeded on CPU until the next cell exposed the device pinning.

After Β· basic_notebook_patched.ipynb Β· 13:43

[stdout, cell 80 outputs[0] β€” sole output]
[preflight] GPU ok: cuda:0
Pre-flight OK β€” no warnings captured.
Running Qwen/Qwen2.5-7B-Instruct on 2,537 reviews (batch=16)...

Review: Too many students from two local colleges go her leave rubbish...
Topics: ['rubbish in changing rooms', 'overcrowding', 'disgusting behavior']
...

Single clean stdout. [preflight] GPU ok: cuda:0 confirms the device before anything heavy runs. Strict warmup captured zero warnings β€” the temperature/top_p/top_k warning is gone at source, not suppressed. Full pipeline ran cell 80 β†’ cell 104 on A100.

Why the "after" path only has one output: the temperature/top_p/top_k warning was emitted because Qwen's shipped model.generation_config has non-None sampling defaults, and the user's GenerationConfig set do_sample=False without overriding them. The merged config had greedy-mode + sampling params, so transformers warned. Fix: null temperature / top_p / top_k explicitly in BASE_GEN_CFG. Warning never fires; strict guard stays green.

Learnings captured

Updated

New

Indexed

All three learnings are in brain-vault/learnings/_INDEX.md. A fourth standalone learning for the meta-principle itself β€” "documented + recurring β†’ code enforcement" β€” is proposed and open for Pierre to decide whether to split out or leave embedded.

File manifest

PathStatusSize / note
pace-nlp-project/workbench/apply.pynew + extended~280 lines (with compile-refuse)
pace-nlp-project/workbench/render.pynew~260 lines
pace-nlp-project/workbench/preflight.pynew~100 lines
pace-nlp-project/workbench/hooks/data-workbench-guard.shnew~55 lines
pace-nlp-project/workbench/patches/item32-gpu-preflight.jsonnewone-patch JSON for Colab
pace-nlp-project/workbench/report/index.htmlnew + fixedthis file Β· v11 mermaid
pace-nlp-project/basic/basic_notebook_patched.ipynbnew2.73 MB, three patches, syntax-clean
pace-nlp-project/.gitignoreedited+ .workbench/
brain-vault/skills/workbench.mdedited3 inserts: safe-execution, warning bullets, ledger stub
brain-vault/learnings/feedback_pipeline_device_pinned_at_creation.mdeditedrecurrence note + enforcement pointer
brain-vault/learnings/feedback_notebook_patch_line_list_and_compile.mdnewline-list + compile() discipline
brain-vault/learnings/reference_mermaid_syntax.mdnewv11 syntax rules + CDN
brain-vault/learnings/_INDEX.mdeditedthree new pointers
brain-vault/sessions/2026-04-18-auditor-workbench-codification.mdnewsession handoff

Next pickup

  1. Wire the guard hook β€” append one line to C:/Users/acebu/projects/acebuddy/scripts/file-guardrail-hook.sh, just before its final exit 0:
    source "C:/Users/acebu/projects/pace-nlp-project/workbench/hooks/data-workbench-guard.sh"
    Until this is wired, the discipline is documentation plus narrow-CLI invocation β€” not enforced at the LLM tool boundary.
  2. Sync the successful Colab run back β€” python workbench/apply.py sync-colab-notebook basic/basic_notebook.ipynb. Will pick up the newest basic_notebook*.ipynb in Downloads automatically, including the _patched variant at 13:43.
  3. Revert or promote the dirty repo copy β€” basic/basic_notebook.ipynb still sits on the in-place patch from earlier. Clean it up via python workbench/apply.py revert-file basic/basic_notebook.ipynb (safety checks the Downloads version first) or let the sync in step 2 overwrite it with the fresh Colab result.
  4. Test the wired guard β€” in a new Claude Code session, ask to overwrite basic_notebook.ipynb in place. Expect a structured [FILE-GUARD] deny with a suggestion to write to _patched.ipynb.
  5. Commit everything β€” /commit. Workbench tooling, patched notebook, skill updates, learnings.

Deferred (documented, not built)