Journal

Field notes from
shipping agents.

What we learn shipping AI agent systems with audit trail, human-review gates, and a growing rule library. Failure modes caught and codified, deploy patterns hardened, extraction techniques generalised. Dated and technically specific, drawn from our own products. Client engagements stay confidential by default; only generalised patterns and anonymised lessons appear here.

24 May 2026 · 10 min read

Streaming Claude tool use in Next.js 16 without breaking the agent loop

How to stream a Claude response while a tool call is being assembled mid-stream, in a Next.js 16 App Router route handler, without dropping the tool-use block, double-firing the tool, or freezing the client when the model decides to call multiple tools in one turn. The production pattern from a live AI agent.

Read the post
21 May 2026 · 9 min read

8 weeks of an AI agent in production: what the engineering team actually does each week

A week-by-week account of running an AI agent in a real production system. Not what the agent does, but what the engineers around it do: which rules they add, which guards they tighten, which assumptions break. The worked example is the PickNDeal AI offer agent; the lessons generalize.

Read the post
19 May 2026 · 7 min read

Why we test AI agents against real APIs by default, and only mock the exceptions

The dominant advice on testing AI agents is "mock by default, real API only for integration tests." We do the opposite. Real APIs are the default; mocks are reserved for upstream services without test environments. The four conditions that justify a mock when we use one.

Read the post
19 May 2026 · 9 min read

How the AI agent’s tool dispatcher actually fires: typed tools, role-scoped callers, mutation gates

The prompt is throwaway; the dispatcher is the surface. A walkthrough of the typed-tool definition, the per-role allowlist, the closure that scopes every call to the acting user, and the audit trail every invocation writes. Stack-agnostic pattern with a worked example from our stack.

Read the post
16 May 2026 · 11 min read

The engineering rule library: how a methodology compounds across client engagements

The rule library is the asset that survives every reset, contributor change, and fork. File shape, index pattern, session-start enforcement loop, and how rules compound across projects. The keystone post for the agentic engineering method.

Read the post
16 May 2026 · 9 min read

Building a human-in-the-loop approval queue for an AI agent

Reviewing every agent action manually defeats the point. Not reviewing any action defeats the point in a different way. The queue pattern that enforces human-in-the-loop on the fault line: single table, one-tap approve/reject, structured diff, demotion ladder.

Read the post
16 May 2026 · 8 min read

Idempotent webhook handlers in Next.js 16: HMAC + timestamp + event-id

Most webhook handler bugs we have audited share one assumption: that the sender delivers each event exactly once. None do. The three rules that prevent every duplicate-charge, duplicate-notification, and duplicate-state-write we have seen in production.

Read the post
12 May 2026 · 6 min read

MCP server authentication: the three invariants we built it on

Most public MCP examples ship with a hardcoded API key in env. We did not, because the agentic engineering rule library had ruled it out before we started. The design rationale, plus why rules compound across projects.

Read the post
9 May 2026 · 8 min read

How we extracted PayoutKit from PickNDeal: agentic code surgery in one weekend

Most teams rebuild the same hardened patterns project after project because clean extraction from a running production system feels too expensive. Agentic engineering makes it routine: diagnostic agent run, manifest as contract, production-shape verification.

Read the post
8 May 2026 · 6 min read

Three production outages from one schema migration, and the deploy script we hardened to stop them

Drizzle generates queries against the schema your code was deployed with, not the schema in production. Three 500s in a row taught us why schema-touch detection has to gate the deploy.

Read the post