How we extracted PayoutKit from PickNDeal, agentic code surgery in one weekend

Most engineering teams build operational systems that contain reusable patterns. A Stripe Connect integration that took months to harden against real edge cases. A webhook system with proper retry semantics. An ERP sync layer with idempotency guarantees. The same patterns get rebuilt from scratch on the next project because clean extraction from a running production system feels too expensive to attempt.

This post is how we used agentic engineering to extract a hardened Stripe Connect integration from PickNDeal (a running food marketplace) into its own clean repository. Once extracted, the same module can extend a sibling product in the portfolio, ship as the starter for the next system, or live as a standalone service. The point isn't the Stripe Connect code itself. The point is that when agents read and write your codebase under proper governance, diagnostic-before-write, manifest as contract, production-shape verification, audit trail of every decision, moving validated functionality from one operational system into another becomes routine instead of a quarter-long project.

The setup

PickNDeal had a hardened Stripe Connect surface: destination charges with on-top platform fees, KYC onboarding via Stripe Express, multi-supplier payouts, saved cards, business invoicing for Danish buyers, idempotent webhook handlers. All running in production, fixed against real edge cases over months.

The work was already done. The question was whether we could lift it out cleanly without taking another month, and end up with a module that holds together as its own codebase rather than a pile of files with broken cross-references back to the source.

Phase 1, diagnostic agent run (no writes)

The first agent run touched no files. Read the entire PickNDeal codebase, classify every file that touches Stripe Connect into one of three buckets:

Extract verbatim: generic Stripe handlers, types, error mappers, idempotency helpers. Nothing source-specific in these files; they move as-is.
Sanitise: files with Danish strings, business-specific naming (supplier, restaurant, group_order), or PickNDeal-specific role enums. These move with translation.
Drop: marketplace-specific business logic (RFQ flow, group-order matching, multi-supplier offer mechanics) that doesn't belong in a generic Stripe Connect module.

Output: a structured manifest of about 40 files with a one-line note per file. A diagnostic-only run is a method rule. Agents that read are infinitely safer than agents that write, and the manifest becomes the contract for the next phase. We reviewed it together, found two files misclassified (“sanitise” that should have been “drop”), corrected, ran phase 2.

Phase 2, generate the clean repo

Second agent run took the manifest and produced the new repo. Generic Express/Custom account onboarding flows. Two-role demo (vendor + buyer) that exercises the Stripe Connect mechanics without leaking the marketplace context. RoleSwitchPrompt component for the role-mismatch UX. Webhook handlers with the same idempotency patterns we'd hardened in production. All as a fresh git history with human-style commit messages, just clear English describing what each commit does.

Critical guardrail: the agent ran with no network access to the source PickNDeal repo during this phase. Everything was sourced from the manifest from phase 1. This prevents two failure modes: (a) accidentally pulling in adjacent files that weren't in the extraction set, and (b) leaking the original repo's commit history into the new one. The manifest is the contract; if the contract is wrong, you fix the manifest, not the generated code.

Phase 3, production-shape verification, not toy-shape

Reproducing in production-shape rather than toy-shape is one of the method's rules. For a Stripe Connect module, that means running the full flow against real Stripe APIs in test mode, not against mocked fixtures.

The plan generated by the method:

Create an Express account via the new repo's onboarding flow
Complete the Stripe-hosted onboarding redirect with test data
Trigger a destination charge: $50 base + $5 platform fee = $55 buyer-pays, $50 to vendor, $5 retained
Verify webhooks fire: account.updated + payment_intent.succeeded
Verify order auto-flips to paid in the demo DB
Verify saved cards CRUD (add / list / set default / detach)

Two issues surfaced. First: destination-charge metadata wasn't round-tripping correctly because the extraction had dropped a piece of code that re-attached metadata after webhook receipt. Fixed by re-adding the function with generic naming. Second: the webhook signing-secret env var name had been renamed in the extraction (good, generic name) but one usage site was missed. Caught by curl, fixed in two minutes. Both were exactly the class of failure that production-shape verification exists to catch before integration.

Phase 4, deploy the extracted module

The extracted module deploys with the same build-then-swap pattern that runs the source codebase. Build to a temp directory, atomically swap, the running process keeps serving the old build the entire time, health-check the new build before commit, rollback automatically if it fails. This is method rule 02 (codify what production teaches) showing its compounding return: the deploy pattern was extracted into the rule library from PickNDeal's pre-launch shakedown, and the new module inherits it for free.

What you do with the extracted module from here depends on the system. Wire it into a sibling product as a vendored library. Ship it as a microservice with its own runtime. Hand it to a client as a starter for their next system. The extraction phase is the part that used to cost a quarter; the integration phase is normal engineering.

What the method delivered

An extracted module (codebase + tests + demo + documentation) lifted out of a running production system in one focused extraction sprint
Zero leaked source-specific strings, zero source-codebase references in the new repo
End-to-end verification against real third-party APIs in test mode, not mocked fixtures
Audit trail of every agent decision, reviewable file-by-file
Shared maintenance: any production fix in the source codebase's Stripe Connect code is a candidate fix for the extracted module, and vice versa

What this looks like applied to your codebase

Your team likely has one or two systems where the same shape applies. An internal SDK that could be lifted out of a monolith. A workflow framework that has matured past its original use case and deserves to be reused across products. A hardened integration layer that the team next door rebuilds from scratch every quarter. A microservice candidate sitting inside a larger application, waiting for someone to find the energy for the extraction. The method is applicable wherever the source code is hardened and the extraction surface is well-defined.

The order, diagnostic, generate, verify, deploy, holds. The human-review gates on each phase hold. The production-shape verification rule holds. What changes per project is the extraction manifest in phase 1; that's the engagement-specific work, and it's where the method earns its time-saving.

The compounding return is bigger than the time saved on the first extraction. Every pattern you lift out cleanly is a pattern that doesn't need to be rewritten next time. Your portfolio extends itself from what's already working in production, rather than from scratch.

The method this came from: /method. The source codebase the extraction came out of: the PickNDeal case study.