The engineering rule library: how a methodology compounds across client engagements

Every engineering team accumulates lessons. Most teams accumulate them in heads (the senior engineer who remembers the gotcha), in chat history (the Slack thread someone DM-replies to when the question comes up again), or in wikis nobody reads. Those three accumulators have one thing in common: AI sessions don't read them. So the agent re-derives the same failure modes on PR #189 that the team already paid for on PR #142.

The fix is structural, not motivational. A rule library is a project-scoped set of files where each file is one rule, named for what it prevents, with the failure mode that produced it and the enforcement that catches it. The rules ship with the codebase via git. They auto-load into every new AI session via the project's CLAUDE.md. They get read before any work begins because the session-start invariant says they do. After enough rules accumulate, the library is the moat: any team that wants to ship at your reliability has to write the same rules from scratch, paying the same failure modes you already paid for.

This post is the full pattern. File shape, index pattern, session-start loop, what compounds across projects and what stays scoped. Worked examples from PickNDeal's rule library (50+ rules) and the cross-project layer (29 rules that travel between codebases).

The file shape

Every rule is a single markdown file in docs/context/ with a name that describes what the rule prevents. We use the feedback_*.md prefix because rules are observations fed back from production. The file content is opinionated:

# Rule headline — a one-line statement of what to do or not do.

# Why (the failure mode that produced this rule)
#   What went wrong, in concrete terms. Not "we had a bug"; the exact
#   class of failure with enough detail that an engineer reading this
#   recognises the shape if they see it in their own code.

# Where we caught it (or where it costs the most when missed)
#   The specific time and place. Anchors the rule to a real event;
#   makes it harder to argue around.

# Enforcement
#   - Concrete checks. CI gates, lint rules, runtime asserts, code
#     review patterns. As code-level as possible. Where code-level
#     enforcement isn't possible, the rule lives as a review-gate.
#   - Each enforcement is independently load-bearing. If one fails,
#     the others still catch the violation.

# Cross-references (optional)
#   - Links to related rules in the library, principles in /method,
#     blog posts that describe the production story.

The headline is the most important part. It is what shows in the index. It is what an agent reads first when scanning for relevance. It is what an engineer scans when bisecting a failure. Write it as “don't do X” or “always do Y”, never as a noun phrase. “Schema migration safety” is a noun phrase; “Schema changes require migration files alongside the consuming code” is a rule headline. The agent treats the first as a topic; the second as an instruction.

The index

docs/context/INDEX.md is the file the CLAUDE.md import auto-loads on every session. The index is link-first and one-line:

# Project context index

Read the file, not just the line. Summaries are pointers, not rules.

## Schema and deployment

- [feedback_schema_changes_require_migration.md](feedback_schema_changes_require_migration.md) —
  Schema changes need migration files alongside the consuming code, in the
  same commit. Three production-shape 500s in three days at PickNDeal
  pre-launch shakedown taught us this.
- [feedback_build_before_restart.md](feedback_build_before_restart.md) —
  Build to a temp dir, atomically swap, then restart. Don't let the
  running process see a partial build. Health-check before commit.

## Webhook and auth

- [feedback_hmac_plus_timestamp_plus_event_id.md](feedback_hmac_plus_timestamp_plus_event_id.md) —
  Every outbound webhook signed (HMAC-SHA256 over body), timestamped
  (rejects > 5 min skew), with X-Event-Id for receiver dedupe.
- [feedback_per_key_scopes_not_per_user.md](feedback_per_key_scopes_not_per_user.md) —
  Credentials carry their own scopes, never the owning user's. Forecloses
  least-privilege from the first integration onward.

[... 40+ more entries ...]

Every index entry follows the same pattern: link, en-dash, one-line summary that ends with the source of the lesson. The session reads the index on turn one, identifies which entries are relevant to the current task, then reads the full rule files for those entries. The summary is enough to know a rule exists; it is never enough to act on the rule. That distinction is itself a rule (the session-start invariant in the cross-project library).

The session-start loop

The auto-loading happens through the project's root CLAUDE.md. Claude Code (and the equivalents in Cursor / other agents) reads this file on every session, and the file imports the index:

# project/CLAUDE.md

## Session-start invariant

On the FIRST turn of every session, before any diff or factual claim:
1. Read the user's first message; extract every topic.
2. Open docs/context/INDEX.md.
3. For every index entry whose one-line summary overlaps a topic,
   call Read on the linked file. Summaries are not rules — they are
   pointers to rules.
4. If several entries look relevant, read them all up front. It is
   always cheaper than a wrong diff.

## Context index

See docs/context/INDEX.md.

## Hard-won rules (in-line, the most-cited ones)

[... the 5-10 most-cited rules inlined for fastest reference ...]

The session-start invariant is itself rule #1. Without it, the rest of the library never gets read — the session skims the index, thinks it knows what's there, and proceeds. With it, every relevant rule gets loaded before any work happens. The cost is small (4-10 file reads per session); the benefit is the rule library actually constrains behaviour instead of decorating the repo.

Three-way reference: agents, Claude memory, index

The library lives in three reinforcing places. The agent reads it via the project CLAUDE.md on turn one of every session, no exception. The Claude memory layer (user-level + per-project) holds cross-project rules and prior decisions that survive the session boundary. The repo-versioned index travels with the git clone, so a brand-new contributor on a fresh laptop still gets the rules.

Any one of the three on its own fails: agent-only doesn't survive compaction; memory-only doesn't travel between contributors; repo-only requires the agent to remember to read it. Three together close the loop. The session reads memory + repo on turn one and starts fully loaded; compaction is recoverable because the memory persists; contributor onboarding is automatic because the repo travels.

What compounds across projects, what stays scoped

Some rules are universally true. “Don't commit secrets to git.” “Every outbound webhook needs HMAC + timestamp + event-id.” “Build to a temp dir before restart.” These live in the cross-project layer (a separate rule library that travels in the user-level CLAUDE.md, imported into every project).

Other rules are project-specific. “Use order_item_assignments for multi-supplier mapping, not the order table.” “Mobile chat uses refetchInterval: 5000 because there's no Supabase Realtime on mobile.” These live in the project's docs/context/ and don't propagate.

The classification matters. Cross-project bloat (project-specific rules creeping up into the shared layer) destroys the library's usefulness on Project N+1. Project bloat (cross-project rules duplicated locally) makes them harder to update consistently. The convention: anything that ever appeared on more than one project moves to the cross-project layer; anything specific to a single codebase stays local.

How rules get added (the codification ritual)

A rule does not get written from a meeting. A rule gets written from a failure. The cadence is:

Failure mode lands. Production incident, agent doing the wrong thing, near-miss caught in review.
Same failure mode lands a second time. Sometimes you fix it both times and move on.
Same failure mode is about to land a third time. Now it codifies. The third occurrence is the trigger; before that the rule is premature.
Write the rule file with the failure mode, the enforcement, and the link to the third incident. Add the index entry.
The CLAUDE.md import picks it up on the next session, and the rule starts catching the fourth (and Nth) occurrence before it ships.

The third-occurrence threshold matters. Codifying on the first incident produces a library of speculative rules that don't survive their first counter-example. Codifying on the second is tempting but often a one-off pattern that won't recur. The third occurrence is the signal that this is a class, not an instance, and the rule is worth the maintenance cost.

Why this is the moat

Any consultancy can claim a methodology. The rule library is the asset that can't be cloned from a deck. It is the accumulated cost of paying every failure mode you've ever shipped, encoded as files the next session reads. PickNDeal's library is 50+ rules; the cross-project layer adds 29 more. Each one represents a class of failure that doesn't recur on PickNDeal-derived projects.

The compounding return is not the time saved on the first occurrence (small, because writing the rule costs about the same as fixing it). The return is the second, third, tenth occurrence on adjacent projects that never happen. By the time we're shipping engagement #5, the rules from engagements 1-4 are catching failures before they ship, and the time saved per engagement keeps growing.

Worked examples (the production stories behind specific rules)

The library is not abstract. Each rule was written after a real incident. Three of the most-cited rules have full production stories on this journal:

Three production outages from one schema migration — the failure mode that produced feedback_schema_changes_require_migration.md and the deploy script hardening that enforces it.
MCP server authentication: the three invariants we built it on — how three existing rules in the library forced the MCP auth design before any code was written. Demonstrates how the rule library forecloses bad designs ahead of implementation.
Idempotent webhook handlers in Next.js 16 — the HMAC + timestamp + event-id rule, the failure mode that produced it (Stripe Connect duplicate-charge during a partial-outage retry), and the working Next.js 16 reference implementation.
Building a human-in-the-loop approval queue for an AI agent — the queue pattern that enforces principle 04 (human-in-the-loop on the fault line), with the dispatcher loop and the demotion ladder that earns gates off.

How to start your own rule library

You don't start with 50 rules. You start with 1. Pick the most recent failure that cost you a deploy or a customer; write the rule file. Add the index entry. Update the CLAUDE.md to import the index. The agent now reads the rule on the next session. Wait until the next failure; ask whether this is a recurrence of an existing rule (in which case the rule is incomplete, sharpen it) or a new class of failure (in which case it's a new rule). After three months you have 6-10 rules and the agent starts visibly catching things it would have shipped before.

The first 10 rules are the hardest. They define the shape of the library and the convention for what graduates from a one-off into a codified rule. After 10, the library starts paying for itself: the cost of adding rule #11 is mostly mechanical (copy the file shape, fill in the new failure mode) and the benefit compounds because every session inherits all 10 prior rules.

What we ship into client engagements

Every client engagement ships with a project-scoped rule library installed in their codebase. The starter set is roughly 5-10 rules adapted from our cross-project layer that apply to their stack (webhook handling, deploy safety, secrets at rest, etc). The first 3 months of the engagement, we codify their team's specific failure modes as they surface. By the time we hand over, the library has 15-25 rules and the team owns it: they extend it, sharpen it, and the next contributor who joins inherits everything that came before.

The library is the deliverable. Not the audit-trail infra, not the human-review-loop UI, not the MCP server scaffold (all of which we also ship). The library is what compounds after we leave. Everything else is supporting infrastructure for the library to do its work.

More on the broader methodology this is part of: the seven principles at /method. The full pattern applied to a production codebase: the PickNDeal case study.