Field notes from
shipping agents.
What we learn shipping AI agent systems with audit trail, human-review gates, and a growing rule library. Failure modes caught and codified, deploy patterns hardened, extraction techniques generalised. Dated and technically specific, drawn from our own products. Client engagements stay confidential by default; only generalised patterns and anonymised lessons appear here.
Streaming Claude tool use in Next.js 16 without breaking the agent loop
How to stream a Claude response while a tool call is being assembled mid-stream, in a Next.js 16 App Router route handler, without dropping the tool-use block, double-firing the tool, or freezing the client when the model decides to call multiple tools in one turn. The production pattern from a live AI agent.
Read the post8 weeks of an AI agent in production: what the engineering team actually does each week
A week-by-week account of running an AI agent in a real production system. Not what the agent does, but what the engineers around it do: which rules they add, which guards they tighten, which assumptions break. The worked example is the PickNDeal AI offer agent; the lessons generalize.
Read the postWhy we test AI agents against real APIs by default, and only mock the exceptions
The dominant advice on testing AI agents is "mock by default, real API only for integration tests." We do the opposite. Real APIs are the default; mocks are reserved for upstream services without test environments. The four conditions that justify a mock when we use one.
Read the postHow the AI agent’s tool dispatcher actually fires: typed tools, role-scoped callers, mutation gates
The prompt is throwaway; the dispatcher is the surface. A walkthrough of the typed-tool definition, the per-role allowlist, the closure that scopes every call to the acting user, and the audit trail every invocation writes. Stack-agnostic pattern with a worked example from our stack.
Read the postThe engineering rule library: how a methodology compounds across client engagements
The rule library is the asset that survives every reset, contributor change, and fork. File shape, index pattern, session-start enforcement loop, and how rules compound across projects. The keystone post for the agentic engineering method.
Read the postBuilding a human-in-the-loop approval queue for an AI agent
Reviewing every agent action manually defeats the point. Not reviewing any action defeats the point in a different way. The queue pattern that enforces human-in-the-loop on the fault line: single table, one-tap approve/reject, structured diff, demotion ladder.
Read the postIdempotent webhook handlers in Next.js 16: HMAC + timestamp + event-id
Most webhook handler bugs we have audited share one assumption: that the sender delivers each event exactly once. None do. The three rules that prevent every duplicate-charge, duplicate-notification, and duplicate-state-write we have seen in production.
Read the postMCP server authentication: the three invariants we built it on
Most public MCP examples ship with a hardcoded API key in env. We did not, because the agentic engineering rule library had ruled it out before we started. The design rationale, plus why rules compound across projects.
Read the postHow we extracted PayoutKit from PickNDeal: agentic code surgery in one weekend
Most teams rebuild the same hardened patterns project after project because clean extraction from a running production system feels too expensive. Agentic engineering makes it routine: diagnostic agent run, manifest as contract, production-shape verification.
Read the postThree production outages from one schema migration, and the deploy script we hardened to stop them
Drizzle generates queries against the schema your code was deployed with, not the schema in production. Three 500s in a row taught us why schema-touch detection has to gate the deploy.
Read the post