Pairs with: Lecture 11 — Why Observability Belongs Inside the Harness and Lecture 12 — Why Every Session Must Leave a Clean State. Time: ~90 min. Difficulty: Advanced. Prerequisites: Module 09 checkpoint.
Module 10. Observability and Clean State
Why this module
./verify.sh tells you whether a session passed; it does not tell you what happened. When an agent's run finishes red, you want enough trace to attribute the failure to a defense layer without re-running the session. And when a run finishes green, you want the next session to start from a clean state — five dimensions, no half-applied edits, no orphaned .noted/ data, no stale feature_list.json in_progress entries that nobody is actually working on.
Lecture 11 splits observability into runtime (what did the system do) and process (why did we accept that). Lecture 12 names the five clean-state dimensions and the empirical observation that ~30% of "broken" sessions are actually previous sessions' debris. This module gives you both.
Concepts
- Runtime observability — durable artifacts the system emits at runtime: structured logs, traces, exit codes. Answers what happened.
- Process observability — durable artifacts the human/agent emits at decision time: sprint contract, evaluator rubric, decision log. Answers why this was accepted.
- Sprint contract — pre-task agreement: scope, verification standard, exclusions. Written before implementation; checked at completion.
- Evaluator rubric — structured scoring of the finished work along several dimensions (correctness, scope adherence, verification rigor, handoff readiness, etc.).
- Five clean-state dimensions —
- build/typecheck: Layer 1 of
verify.shis green. - tests: Layers 2 and 3 are green.
- progress:
PROGRESS.md's Next Action is current and accurate. - artifacts: no scratch files (
*.tmp,.DS_Store, debugconsole.logs). - startup:
./init.shexits 0 from a freshly-cleaned state.
- build/typecheck: Layer 1 of
→ Read Lecture 11 and Lecture 12 for the long-form treatments.
Lab
Step 1 — Add structured runtime logging
src/log.ts:
ts
import { mkdirSync, appendFileSync } from "node:fs";
const LOG_PATH = process.env.NOTED_LOG ?? "logs/run.jsonl";
function ensure() {
mkdirSync(LOG_PATH.replace(/\/[^/]+$/, ""), { recursive: true });
}
export function log(event: string, fields: Record<string, unknown> = {}) {
ensure();
const line = JSON.stringify({ ts: new Date().toISOString(), event, ...fields });
appendFileSync(LOG_PATH, line + "\n");
}Use it in src/cli.ts (a single call per command):
ts
import { log } from "./log.ts";
// ...
log("cmd_start", { cmd, args: rest });
try {
// existing dispatch
log("cmd_end", { cmd, exit: 0 });
} catch (e: any) {
log("cmd_end", { cmd, exit: 1, error: e.message });
throw e;
}Add logs/ to .gitignore. The course never commits log lines — only the schema.
Step 2 — Wire logs into the harness
Add to init.sh:
sh
echo "==> Tail of logs/run.jsonl (last 5):"
tail -n 5 logs/run.jsonl 2>/dev/null || trueNow every clock-in shows the last few events from the previous session. If the previous session crashed, you see it — no more starting on a red build without realizing it.
Step 3 — Author a sprint contract
Copy ../../resources/openai-advanced/repo-template/PLANS.md as a reference for the format. Then create sprint-contract.md:
md
# Sprint Contract — Module 10 sprint
## Scope
- Add structured runtime logging to `src/cli.ts`.
- Add a sprint contract (this file) and an evaluator rubric.
- Write `scripts/clean-exit.mjs` that probes the five clean-state dimensions.
- Promote `clean-exit` into the bootstrap-contract probe.
## Verification standard
- `./verify.sh` exits 0.
- `node scripts/clean-exit.mjs` exits 0.
- `feature_list.json` shows no `in_progress` features at session end.
## Exclusions
- No new feature commands. No LLM integration (sidebar only, in `docs/LLM_SIDEBAR.md`).
- No changes to the citation rule.
- No removal of the boundary check.
## Definition of done
All four scope items shipped, all three verification standards green,
`PROGRESS.md` updated, commit recorded.The contract is the written promise of what this session will and will not change. At the end you grade against it.
Step 4 — Author an evaluator rubric
Copy ../../resources/templates/evaluator-rubric.md. Save as evaluator-rubric.md. Tune it once for noted-cli:
md
# Evaluator Rubric — noted-cli
Score each session 0/1/2 along these dimensions. A 0 anywhere is a fail.
| Dimension | 0 (fail) | 1 (acceptable) | 2 (strong) |
|--------------------------|-----------------------------------|-------------------------------|-------------------------------|
| Correctness | verify.sh red | verify.sh green | + e2e exercises new behavior |
| Scope adherence | edits outside `feature.scope` | scope respected | + scope reduced where possible|
| Verification rigor | unit tests only | layers 1 + 2 green | + layer 3 covers new behavior |
| Handoff readiness | PROGRESS.md stale | PROGRESS.md current | + DECISIONS.md updated if needed |
| Clean state | clean-exit fails | clean-exit passes | + logs/ tail informative |Step 5 — Build the clean-state probe
scripts/clean-exit.mjs:
js
import { execSync } from "node:child_process";
import { readFile, stat } from "node:fs/promises";
let dim, fail = 0;
function check(name, fn) {
process.stdout.write(` ${name.padEnd(20)} `);
try { fn(); console.log("PASS"); }
catch (e) { console.log("FAIL — " + (e.message ?? e)); fail++; }
}
console.log("clean-exit dimensions:");
check("build/typecheck", () => execSync("pnpm typecheck", { stdio: "pipe" }));
check("tests", () => { execSync("pnpm test", { stdio: "pipe" }); execSync("pnpm e2e", { stdio: "pipe" }); });
check("progress", () => {
const md = require("node:fs").readFileSync("PROGRESS.md", "utf8");
if (!/## Next Action/.test(md)) throw new Error("no Next Action");
if (/TODO\s*$/m.test(md)) throw new Error("Next Action ends with TODO");
});
check("artifacts", () => {
const dirty = execSync("git status --porcelain").toString();
if (dirty.trim()) throw new Error("uncommitted changes");
const tmp = execSync("git ls-files --others --exclude-standard | grep -E '\\.(tmp|swp|bak)$|^\\.DS_Store$' || true").toString();
if (tmp.trim()) throw new Error("scratch files: " + tmp.trim());
});
check("startup", () => execSync("./init.sh", { stdio: "pipe" }));
if (fail) { console.error(`\n${fail} dimension(s) failed; not clean`); process.exit(1); }
console.log("\nclean state: all five dimensions green");Run it:
sh
node scripts/clean-exit.mjsIterate until all five are green. The artifacts check is usually the one that fails first — uncommitted edits are the most common debris.
Step 6 — (Optional sidebar) Wire noted ask to a real LLM
If you want to see what changes when a real model touches the harness, this is the moment. Otherwise skip this step entirely — the rest of the course works offline.
Create docs/LLM_SIDEBAR.md:
md
# Sidebar — wiring `noted ask` to Claude
This is optional course material; not required to finish Module 11.
## Steps
1. `pnpm add @anthropic-ai/sdk`
2. Set `ANTHROPIC_API_KEY` in your environment.
3. Add a flag `--llm` to `noted ask`. When set, the command:
- retrieves top-k snippets via the existing keyword index;
- calls Claude with a small prompt that includes the snippets and the
query;
- prints the model's answer followed by the citation lines.
## Caching
Use prompt caching: mark the snippet block with the SDK's
`cache_control: { type: "ephemeral" }` so repeated questions over the same
corpus are cheap.
## Why this is a sidebar
The course is about the harness, not the model. Whether `noted ask` uses
keyword retrieval or a frontier model, the verification rules from
`docs/citation-rule.md` and the layers in `verify.sh` are unchanged.Step 7 — Update artifacts and commit
PROGRESS.md:
md
## Current State
- Last verified: Module 10 checkpoint (`./verify.sh` 0, `node scripts/clean-exit.mjs` 0).
- Active feature: none.
## Next Action
Open Module 11 capstone; run the ablation study.DECISIONS.md:
md
## D-004 — `logs/run.jsonl` is the single runtime log
- Date: 2025-MM-DD (Module 10)
- Context: Multiple log files complicate session debugging. One JSONL file
is greppable and easily tailed by `init.sh`.
- Consequence: All commands log via `src/log.ts`; no `console.log` for
diagnostics. Style rule promoted.sh
git add .
git commit -q -m "module-10: runtime + process observability + clean-state probe"Verification
sh
./verify.sh >/dev/null 2>&1 && \
node scripts/clean-exit.mjs >/tmp/m10.log 2>&1 && \
grep -q "clean state: all five dimensions green" /tmp/m10.log && \
test -f sprint-contract.md && \
test -f evaluator-rubric.md && \
test -f src/log.ts && \
echo "M10 OK"Expected:
M10 OKCommon pitfalls
- Logging to
console.log. Mixed with command output, that breakspipe-to-jqand Module 11's ablation script. Logs go tologs/run.jsonlonly. - Treating the sprint contract as a retrospective. It is prospective — written before the work. Updating it during the session to "match what you did" defeats the purpose.
- Letting clean-exit's
artifactscheck pass whengit statusis clean but ignored files exist..gitignoreis not the same as clean. Inspectls -Aif the probe disagrees with your eyes. - Skipping
init.shinclean-exit. A clean state meansinit.shruns from cold. If you do not run it, you have no proof the next session can start.
Next
Module 11 — Capstone: Ablation Study. You have the full harness. Now run an agent against it twice — once with the harness intact, once with it stripped — and produce a quality document showing the measurable difference.
