Pairs with: Lecture 11 — Why Observability Belongs Inside the Harness and Lecture 12 — Why Every Session Must Leave a Clean State. Time: ~90 min. Difficulty: Advanced. Prerequisites: Module 09 checkpoint.

Module 10. Observability and Clean State

Why this module

./verify.sh tells you whether a session passed; it does not tell you what happened. When an agent's run finishes red, you want enough trace to attribute the failure to a defense layer without re-running the session. And when a run finishes green, you want the next session to start from a clean state — five dimensions, no half-applied edits, no orphaned .noted/ data, no stale feature_list.json in_progress entries that nobody is actually working on.

Lecture 11 splits observability into runtime (what did the system do) and process (why did we accept that). Lecture 12 names the five clean-state dimensions and the empirical observation that ~30% of "broken" sessions are actually previous sessions' debris. This module gives you both.

Concepts

Runtime observability — durable artifacts the system emits at runtime: structured logs, traces, exit codes. Answers what happened.
Process observability — durable artifacts the human/agent emits at decision time: sprint contract, evaluator rubric, decision log. Answers why this was accepted.
Sprint contract — pre-task agreement: scope, verification standard, exclusions. Written before implementation; checked at completion.
Evaluator rubric — structured scoring of the finished work along several dimensions (correctness, scope adherence, verification rigor, handoff readiness, etc.).
Five clean-state dimensions —
1. build/typecheck: Layer 1 of verify.sh is green.
2. tests: Layers 2 and 3 are green.
3. progress: PROGRESS.md's Next Action is current and accurate.
4. artifacts: no scratch files (*.tmp, .DS_Store, debug console.logs).
5. startup: ./init.sh exits 0 from a freshly-cleaned state.

→ Read Lecture 11 and Lecture 12 for the long-form treatments.

Lab

Step 1 — Add structured runtime logging

src/log.ts:

import { mkdirSync, appendFileSync } from "node:fs";

const LOG_PATH = process.env.NOTED_LOG ?? "logs/run.jsonl";

function ensure() {
  mkdirSync(LOG_PATH.replace(/\/[^/]+$/, ""), { recursive: true });
}

export function log(event: string, fields: Record<string, unknown> = {}) {
  ensure();
  const line = JSON.stringify({ ts: new Date().toISOString(), event, ...fields });
  appendFileSync(LOG_PATH, line + "\n");
}

Use it in src/cli.ts (a single call per command):

import { log } from "./log.ts";
// ...
log("cmd_start", { cmd, args: rest });
try {
  // existing dispatch
  log("cmd_end", { cmd, exit: 0 });
} catch (e: any) {
  log("cmd_end", { cmd, exit: 1, error: e.message });
  throw e;
}

Add logs/ to .gitignore. The course never commits log lines — only the schema.

Step 2 — Wire logs into the harness

Add to init.sh:

echo "==> Tail of logs/run.jsonl (last 5):"
tail -n 5 logs/run.jsonl 2>/dev/null || true

Now every clock-in shows the last few events from the previous session. If the previous session crashed, you see it — no more starting on a red build without realizing it.

Step 3 — Author a sprint contract

Copy ../../resources/openai-advanced/repo-template/PLANS.md as a reference for the format. Then create sprint-contract.md:

# Sprint Contract — Module 10 sprint

## Scope

- Add structured runtime logging to `src/cli.ts`.
- Add a sprint contract (this file) and an evaluator rubric.
- Write `scripts/clean-exit.mjs` that probes the five clean-state dimensions.
- Promote `clean-exit` into the bootstrap-contract probe.

## Verification standard

- `./verify.sh` exits 0.
- `node scripts/clean-exit.mjs` exits 0.
- `feature_list.json` shows no `in_progress` features at session end.

## Exclusions

- No new feature commands. No LLM integration (sidebar only, in `docs/LLM_SIDEBAR.md`).
- No changes to the citation rule.
- No removal of the boundary check.

## Definition of done

All four scope items shipped, all three verification standards green,
`PROGRESS.md` updated, commit recorded.

The contract is the written promise of what this session will and will not change. At the end you grade against it.

Step 4 — Author an evaluator rubric

Copy ../../resources/templates/evaluator-rubric.md. Save as evaluator-rubric.md. Tune it once for noted-cli:

# Evaluator Rubric — noted-cli

Score each session 0/1/2 along these dimensions. A 0 anywhere is a fail.

| Dimension                | 0 (fail)                          | 1 (acceptable)                | 2 (strong)                    |
|--------------------------|-----------------------------------|-------------------------------|-------------------------------|
| Correctness              | verify.sh red                     | verify.sh green               | + e2e exercises new behavior  |
| Scope adherence          | edits outside `feature.scope`     | scope respected               | + scope reduced where possible|
| Verification rigor       | unit tests only                   | layers 1 + 2 green            | + layer 3 covers new behavior |
| Handoff readiness        | PROGRESS.md stale                 | PROGRESS.md current           | + DECISIONS.md updated if needed |
| Clean state              | clean-exit fails                  | clean-exit passes             | + logs/ tail informative      |

Step 5 — Build the clean-state probe

scripts/clean-exit.mjs:

import { execSync } from "node:child_process";
import { readFile, stat } from "node:fs/promises";

let dim, fail = 0;
function check(name, fn) {
  process.stdout.write(`  ${name.padEnd(20)} `);
  try { fn(); console.log("PASS"); }
  catch (e) { console.log("FAIL — " + (e.message ?? e)); fail++; }
}

console.log("clean-exit dimensions:");

check("build/typecheck", () => execSync("pnpm typecheck", { stdio: "pipe" }));
check("tests", () => { execSync("pnpm test", { stdio: "pipe" }); execSync("pnpm e2e", { stdio: "pipe" }); });

check("progress", () => {
  const md = require("node:fs").readFileSync("PROGRESS.md", "utf8");
  if (!/## Next Action/.test(md)) throw new Error("no Next Action");
  if (/TODO\s*$/m.test(md)) throw new Error("Next Action ends with TODO");
});

check("artifacts", () => {
  const dirty = execSync("git status --porcelain").toString();
  if (dirty.trim()) throw new Error("uncommitted changes");
  const tmp = execSync("git ls-files --others --exclude-standard | grep -E '\\.(tmp|swp|bak)$|^\\.DS_Store$' || true").toString();
  if (tmp.trim()) throw new Error("scratch files: " + tmp.trim());
});

check("startup", () => execSync("./init.sh", { stdio: "pipe" }));

if (fail) { console.error(`\n${fail} dimension(s) failed; not clean`); process.exit(1); }
console.log("\nclean state: all five dimensions green");

Run it:

node scripts/clean-exit.mjs

Iterate until all five are green. The artifacts check is usually the one that fails first — uncommitted edits are the most common debris.

Step 6 — (Optional sidebar) Wire `noted ask` to a real LLM

If you want to see what changes when a real model touches the harness, this is the moment. Otherwise skip this step entirely — the rest of the course works offline.

Create docs/LLM_SIDEBAR.md:

# Sidebar — wiring `noted ask` to Claude

This is optional course material; not required to finish Module 11.

## Steps

1. `pnpm add @anthropic-ai/sdk`
2. Set `ANTHROPIC_API_KEY` in your environment.
3. Add a flag `--llm` to `noted ask`. When set, the command:
   - retrieves top-k snippets via the existing keyword index;
   - calls Claude with a small prompt that includes the snippets and the
     query;
   - prints the model's answer followed by the citation lines.

## Caching

Use prompt caching: mark the snippet block with the SDK's
`cache_control: { type: "ephemeral" }` so repeated questions over the same
corpus are cheap.

## Why this is a sidebar

The course is about the harness, not the model. Whether `noted ask` uses
keyword retrieval or a frontier model, the verification rules from
`docs/citation-rule.md` and the layers in `verify.sh` are unchanged.

Step 7 — Update artifacts and commit

PROGRESS.md:

## Current State
- Last verified: Module 10 checkpoint (`./verify.sh` 0, `node scripts/clean-exit.mjs` 0).
- Active feature: none.

## Next Action
Open Module 11 capstone; run the ablation study.

DECISIONS.md:

## D-004 — `logs/run.jsonl` is the single runtime log
- Date: 2025-MM-DD (Module 10)
- Context: Multiple log files complicate session debugging. One JSONL file
  is greppable and easily tailed by `init.sh`.
- Consequence: All commands log via `src/log.ts`; no `console.log` for
  diagnostics. Style rule promoted.

git add .
git commit -q -m "module-10: runtime + process observability + clean-state probe"

Verification

./verify.sh >/dev/null 2>&1 && \
node scripts/clean-exit.mjs >/tmp/m10.log 2>&1 && \
grep -q "clean state: all five dimensions green" /tmp/m10.log && \
test -f sprint-contract.md && \
test -f evaluator-rubric.md && \
test -f src/log.ts && \
echo "M10 OK"

Expected:

M10 OK

Common pitfalls

Logging to console.log. Mixed with command output, that breaks pipe-to-jq and Module 11's ablation script. Logs go to logs/run.jsonl only.
Treating the sprint contract as a retrospective. It is prospective — written before the work. Updating it during the session to "match what you did" defeats the purpose.
Letting clean-exit's artifacts check pass when git status is clean but ignored files exist. .gitignore is not the same as clean. Inspect ls -A if the probe disagrees with your eyes.
Skipping init.sh in clean-exit. A clean state means init.sh runs from cold. If you do not run it, you have no proof the next session can start.

Module 11 — Capstone: Ablation Study. You have the full harness. Now run an agent against it twice — once with the harness intact, once with it stripped — and produce a quality document showing the measurable difference.

Module 10. Observability and Clean State ​

Why this module ​

Concepts ​

Lab ​

Step 1 — Add structured runtime logging ​

Step 2 — Wire logs into the harness ​

Step 3 — Author a sprint contract ​

Step 4 — Author an evaluator rubric ​

Step 5 — Build the clean-state probe ​

Step 6 — (Optional sidebar) Wire noted ask to a real LLM ​

Step 7 — Update artifacts and commit ​

Verification ​

Common pitfalls ​

Next ​