Pairs with: Lecture 04 — Why One Giant Instruction File Fails. Time: ~60 min. Difficulty: Intermediate. Prerequisites: Module 03 checkpoint.
Module 04. Splitting Instructions
Why this module
The simplest way to make instructions thorough is to put everything in one file. The first few paragraphs of every agent session then read perfectly — and the agent ignores the rules buried in the middle. Liu et al. 2023 documented the lost-in-the-middle effect empirically: relevant content placed in the middle of a long context is used at ~30% the rate of content at the extremes. This module reproduces that failure on your own repo and fixes it.
You will build noted-cli's noted ask command in this module too — but that is not the point. The point is that you will need a rule about citation formatting, and you will deliberately put it where the agent will miss it.
Concepts
- Instruction bloat — instructions consuming a large share of the context window. Past about 10% of the agent's context, signal-to-noise drops sharply.
- Lost in the middle — empirical effect: long inputs underuse content at the center. The opening few hundred tokens and closing few hundred get attention; the middle gets skimmed.
- Routing file — a short top-level entry file that names topics and points at where they live. Hard constraints sit at the top and the very bottom; everything else lives in topic docs the agent can pull on demand.
- Topic doc — a focused file (
docs/<topic>.md) for one concern. Cited from the routing file by name. - Hard constraint hoisting — the rules that must hold are duplicated at both extremes of the routing file (top and bottom), where attention is highest.
→ Read Lecture 04 for the long-form treatment, the Liu et al. citation, and the worked example.
Lab
Step 1 — Add the ask command
src/commands/ask.ts:
ts
import { readFile } from "node:fs/promises";
import type { IndexFile, NotesFile } from "../store/types.ts";
import { readNotes } from "../store/io.ts";
const STOP = new Set(["the", "and", "for", "with", "a", "an", "of", "to", "is", "in", "on"]);
const tokenize = (s: string) =>
s.toLowerCase().match(/[a-z0-9]{3,}/g)?.filter((t) => !STOP.has(t)) ?? [];
export async function runAsk(query: string, k = 3): Promise<number> {
const idx: IndexFile = JSON.parse(await readFile(".noted/index.json", "utf8"));
const { notes }: NotesFile = await readNotes();
const score = new Map<string, number>();
for (const t of new Set(tokenize(query))) {
const entry = idx.tokens.find((e) => e.token === t);
if (!entry) continue;
for (const id of entry.note_ids) score.set(id, (score.get(id) ?? 0) + 1);
}
const ranked = Array.from(score.entries())
.sort((a, b) => b[1] - a[1])
.slice(0, k)
.map(([id]) => notes.find((n) => n.id === id)!)
.filter(Boolean);
if (ranked.length === 0) {
console.log("no matches");
return 0;
}
for (const n of ranked) {
const snippet = n.body.split("\n").find((l) => l.trim()) ?? "";
console.log(`[${n.id}] ${n.title}`);
console.log(` ${snippet.slice(0, 120)}`);
console.log(` cite: ${n.path}`);
}
return 0;
}Wire it into src/cli.ts:
ts
// add to the imports
import { runAsk } from "./commands/ask.ts";ts
// add to the switch
case "ask":
if (!rest[0]) { console.error("ask requires a query"); process.exit(2); }
process.exit(await runAsk(rest.join(" ")));And update the help text to include noted ask "<query>".
Step 2 — Deliberately bloat AGENTS.md
Open AGENTS.md and append the following giant section between the existing Working Rules and Required Artifacts sections. The point is to bury an important rule.
md
## Implementation Notes
(Imagine ~400 lines of plausible-looking guidance here: TypeScript style,
test conventions, naming, file headers, import order, error-message
formatting, log-level conventions, performance budgets, the history of
why we use `tsx` instead of `ts-node`, branching strategy, commit-message
format, code-review etiquette, on-call rotation, etc.)
...
Because this is the middle of a long file, the rule below is exactly where
attention drops:
**Citation rule (load-bearing): every line of `noted ask` output that names a
note MUST include `cite: <absolute path>`.**
...
(Imagine more sections here: deployment notes, secrets handling, postmortem
templates, glossary.)You do not need to type 400 lines. Add just enough filler that the citation rule sits roughly in the middle. About 80 – 120 lines of plausible-but-irrelevant headings will do.
Step 3 — Run an agent against the bloated file
Open a fresh agent session. Prompt:
Read AGENTS.md and modify
noted askto print only the title and body snippet (no path), so output is cleaner.
Most of the time the agent will do exactly that, silently violating the citation rule that was three screens of scrolling away. Capture the output:
sh
./bin/noted ask "alpha"If cite: is missing, you have reproduced lost-in-the-middle on your own laptop.
Step 4 — Reset and split
sh
git stash -u
git checkout -- .Now refactor AGENTS.md into a routing file. The shape:
md
# AGENTS.md — noted-cli
Hard constraints (read every session):
1. One feature at a time. Do not stack work.
2. Citations rule: every `noted ask` line that names a note MUST include
`cite: <absolute path>`. See `docs/citation-rule.md`.
3. Definition of Done lives in `docs/PRODUCT.md`. Do not relax it.
## Topic docs
| Topic | File |
|---------------------|-------------------------------|
| Architecture | `docs/ARCHITECTURE.md` |
| Product scope | `docs/PRODUCT.md` |
| Citation rule | `docs/citation-rule.md` |
| Style & conventions | `docs/STYLE.md` |
| Session log | `PROGRESS.md` (added M05) |
## Workflow
1. `./init.sh` (added M06)
2. Read `PROGRESS.md` (added M05)
3. Pick the next item from `feature_list.json` (added M08)
4. Work
5. `./verify.sh` (added M09)
6. Update `PROGRESS.md`, commit
## Hard constraints (repeated)
1. One feature at a time.
2. Citations rule: every `noted ask` line that names a note MUST include
`cite: <absolute path>`.
3. Definition of Done lives in `docs/PRODUCT.md`.Notice the duplication of the hard constraints. That is intentional — the top and bottom of the file are where attention is highest.
Step 5 — Move the bloated content into topic docs
Create:
docs/citation-rule.md— just the citation rule plus a one-paragraph rationale.docs/STYLE.md— TypeScript conventions, naming, error formatting.- Whatever other topics you invented in Step 2.
Each topic doc is one concern. None of them is more than ~150 lines. Anything bigger means the topic itself wants to split.
Step 6 — Re-run the agent task
Reset again and re-run the same Step 3 prompt against the split repo:
sh
git stash -u
git checkout -- .
./bin/noted ask "alpha"This time the agent should follow the citation rule — the rule lives at the top of AGENTS.md and in its own dedicated file. If it still violates it, the rule is still buried; check the routing file.
Step 7 — Record the comparison
Append to docs/cold-start-log.md:
md
## 2025-MM-DD (after Module 04)
Reproduced lost-in-the-middle: bloated AGENTS.md placed the citation rule
in the middle, agent missed it. Split into routing + topic docs, agent
followed the rule. Citation rule is now duplicated at the top and bottom
of `AGENTS.md`; full text in `docs/citation-rule.md`.Step 8 — Commit
sh
git add .
git commit -q -m "module-04: split AGENTS.md into routing + topic docs"Verification
sh
test -f docs/citation-rule.md && \
test -f docs/STYLE.md && \
[ "$(wc -l < AGENTS.md)" -lt 80 ] && \
grep -c "Citations rule" AGENTS.md | grep -q "^2$" && \
echo "M04 OK"Expected:
M04 OKThe size check (< 80 lines) is the routing-file constraint. The duplicate-count check enforces hard-constraint hoisting.
Common pitfalls
- Splitting too eagerly. A topic doc per paragraph is worse than a bloated entry file. Aim for ~5 – 10 topic docs total at the end of the course.
- Forgetting the duplication. Hard constraints belong at both extremes of the routing file. If you only put them once, attention drops on long sessions.
- Letting the topic-doc index drift. Every time you add a topic doc, the routing file's table grows. If they fall out of sync, agents fail to find the doc.
- Putting examples in the routing file. Examples live in topic docs. The routing file points; it does not illustrate.
Next
Module 05 — Multi-Session Continuity. You can route an agent to the right doc; you cannot yet hand off across sessions. Module 05 fixes that with PROGRESS.md and DECISIONS.md.
