Most webhook handler bugs we have audited share one assumption: that the sender will deliver each event exactly once. None do. Stripe retries with backoff for up to 3 days. Polar retries five times in the first hour. GitHub retries until it gets a 2xx. Every webhook source treats network retries as part of the protocol, and any handler that does not is a duplicate-charge waiting to happen.
The rule we ship into every agentic system is: every outbound webhook is signed (HMAC-SHA256 over body), timestamped (X-Timestamp, receiver rejects anything more than five minutes skewed from now), and carries a unique X-Event-Id the receiver dedupes against. Inbound handlers verify all three before touching state. This post walks through the implementation in Next.js 16 + Drizzle, with the specific gotchas that surface when you wire it against real Stripe webhooks.
Why the three together, not just signing
HMAC signing alone proves the sender is who they say they are. It does not stop replay attacks: an attacker who intercepts a valid signed request can replay it indefinitely. Timestamp checking (X-Timestamp + receiver clock-skew window) closes that gap. Event-id deduplication on the receiver side closes the legitimate-retry gap, where the same signed timestamped request lands twice because the sender did not get a 2xx within its retry window.
Drop any one and the other two leak. Drop signing and anyone can forge events. Drop timestamp and a replayed event from yesterday looks identical to a fresh one. Drop event-id and a legitimate retry double-credits the customer.
The Next.js 16 route shape
// app/api/webhooks/stripe/route.ts
import { NextResponse } from 'next/server';
import { createHmac, timingSafeEqual } from 'node:crypto';
import { db, webhook_events } from '@/lib/db';
const SIGNING_SECRET = process.env.STRIPE_WEBHOOK_SECRET!;
const MAX_SKEW_SECONDS = 300; // 5 min — Stripe's recommendation
export async function POST(req: Request) {
// 1. Read body as raw text — JSON.parse later, after sig verification.
// Stripe's signature is over the bytes you received, not over the
// re-serialised JSON. Parse first and the signature breaks.
const body = await req.text();
// 2. Pull header values. Stripe sends t= and v1= comma-separated.
const sig = req.headers.get('stripe-signature');
if (!sig) {
return NextResponse.json({ error: 'no signature' }, { status: 400 });
}
const parts = Object.fromEntries(sig.split(',').map((s) => s.split('=')));
const ts = parseInt(parts['t'] ?? '0', 10);
const sent = parts['v1'];
if (!ts || !sent) {
return NextResponse.json({ error: 'malformed signature' }, { status: 400 });
}
// 3. Timestamp window — reject anything stale or future-skewed.
const now = Math.floor(Date.now() / 1000);
if (Math.abs(now - ts) > MAX_SKEW_SECONDS) {
return NextResponse.json({ error: 'stale' }, { status: 400 });
}
// 4. HMAC over "ts.body". Constant-time compare to the sent v1.
const expected = createHmac('sha256', SIGNING_SECRET)
.update(`${ts}.${body}`)
.digest('hex');
if (!timingSafeEqual(Buffer.from(expected), Buffer.from(sent))) {
return NextResponse.json({ error: 'bad signature' }, { status: 400 });
}
// 5. Now safe to parse + dedupe on event id.
const event = JSON.parse(body) as { id: string; type: string; data: unknown };
const dedupe = await db.insert(webhook_events).values({
id: event.id,
type: event.type,
received_at: new Date(),
}).onConflictDoNothing().returning();
if (dedupe.length === 0) {
// We have already processed this event id. 200 OK — do nothing.
// Returning 4xx here would cause the sender to retry indefinitely.
return NextResponse.json({ ok: true, deduped: true });
}
// 6. Dispatch only after dedupe succeeded. Wrap in try/catch — if the
// dispatcher throws, the row stays, the sender retries, we re-process
// in the next attempt (idempotent by definition).
try {
await dispatch(event);
} catch (err) {
// Roll back the dedupe row so the retry can succeed next time.
await db.delete(webhook_events).where(eq(webhook_events.id, event.id));
throw err;
}
return NextResponse.json({ ok: true });
}The webhook_events table
The dedupe table is intentionally minimal. Primary key on event_id with an INSERT ... ON CONFLICT DO NOTHING returns whether this was a new insert. That is the entire dedupe mechanism.
// schema/webhook-events.ts
import { pgTable, text, timestamp } from 'drizzle-orm/pg-core';
export const webhook_events = pgTable('webhook_events', {
id: text('id').primaryKey(), // sender's event id, never our own uuid
type: text('type').notNull(), // e.g. 'payment_intent.succeeded'
received_at: timestamp('received_at').notNull(),
});One foot-gun: do not use your own UUID as the primary key. The whole point is that the SENDER's event id is the dedupe key. Using a generated UUID means every retry inserts a new row and the dedupe never fires.
Three things that break this in production
1. The reverse proxy strips the raw body
Some hosting platforms parse JSON in their edge layer before your handler sees the request. The signature was computed over the original bytes; the parsed-and-reserialised version has different whitespace, different key order, different escaping. Signature fails. Next.js 16 with the App Router gives you the raw body via req.text(); do not use req.json() until after verification.
2. The dispatcher swallows errors silently
If your dispatch function catches its own errors and returns success, the dedupe row stays AND the downstream state is wrong AND the sender thinks the webhook succeeded AND no retry comes. The state is permanently inconsistent and you find out three days later from a customer support ticket. The pattern above rolls back the dedupe row on dispatcher failure so the sender's retry can actually replay the event.
3. The dedupe row outlives the retention window
If you never prune the webhook_events table, it grows unboundedly. After enough rows, the ON CONFLICT lookup gets slow. Add a cron that deletes rows older than the sender's longest retry window (Stripe: 3 days; GitHub: indefinite, so cap at 7 days; etc.). The cron should NOT run during peak webhook hours; pick the lowest-traffic window for your sender.
What this prevents in production
- Duplicate charges: legitimate retries do not double-credit. The dedupe key matches the sender's event id.
- Replay attacks: an intercepted request older than 5 min is rejected at the timestamp check before signature verification even runs.
- Forged requests: HMAC signature verification rejects anything not signed by the sender's secret.
- Silent state corruption: dispatcher failures roll back the dedupe row so the retry can fix the inconsistency.
Where this rule comes from
The pattern was extracted from PickNDeal's Stripe Connect surface (Phase 7 + Phase 9 of the case study). Before we hardened it, we shipped a single webhook handler without dedupe and had two duplicate Stripe Connect destination charges during a single deploy where a retry landed during a partial outage. The fix made it into the rule library that day. Every subsequent webhook handler in our codebase (and every webhook surface in client engagements) starts from this template.
The rule lives in docs/context/feedback_hmac_plus_timestamp_plus_event_id.md and is read by every agent session that touches a webhook handler. See the agentic engineering method, principle 06: “Cron + queue + webhook is the holy trinity.”
More on the rule library this pattern comes from in The agentic engineering method. The same patterns wired into a production system: the PickNDeal case study.