Case story · finance · 24 people · 6 weeks

Three half-built agents and an owner who'd stopped opening Notion.

A 24-person finance ops firm. Their owner had been the bottleneck for every AI question for six months. We were brought in for a Layer 02 engagement. Six weeks later, she wasn't the bottleneck anymore.

SectorFinance ops

Size24 people

Layer02 → 03

Engagement6 weeks

StageActive agents · 11 → 3

Before

Six months of agent prototypes nobody owned.

The owner — call her R — had spent the first half of 2025 building agents in Notion. Three of them. One for client onboarding emails, one for invoice reconciliation, one for a "research analyst" that had never quite worked. By Q4, the senior team had quietly stopped using two of them.

Every new AI question landed on R's desk. "Should we try this for refunds?" "Can we use it for the new client intake?" She'd open Notion, see the three half-finished things, and feel the cost of every previous "yes."

The Tuesday she called us, she'd been counting. Eleven different AI surfaces were live in the agency — most of them she hadn't built and didn't know existed. A senior had a Claude project running invoice categorization. Two analysts were using a custom GPT for client summaries. The phantom layer was bigger than the official one.

"I'm not afraid of AI. I'm afraid of being the only one in the company who knows what's actually running."

What we did · 6 weeks

Six weeks. Layer 02, then a sliver of Layer 03.

We didn't pitch. We started with a half-day with R and three seniors, mapping every workflow that had an agent in it or near it. By end of week one we had a single page with eleven entries on it — every one of them named, with an owner penciled in.

Week 1

Map what's already running

Half-day with the owner and three seniors. Every agent, prompt, custom GPT, automation. Eleven surfaces, four owners, two phantom agents nobody could remember building.

Week 2

Kill the phantoms

Two agents had no owner and no users. We retired them. One had drift on its categorization that nobody had caught. We turned it off and wrote the postmortem with the senior who'd built it.

Week 3

HITL where it mattered

The refund agent was running unattended on a category that touched compliance. We installed a human-in-the-loop checkpoint, scoped its authority, and wrote the audit trail. Two-day build.

Week 4

Senior team owns the next round

R stopped attending agent design sessions. The seniors picked the next workflow themselves — client intake, with two HITL gates baked in from the start. We reviewed; we didn't build.

Week 5–6

One-page audit

A single page their CISO actually read. Layer 03 deliverable. The active agent count went from 11 to 3 — all named, owned, reviewed monthly. R stopped being CC'd on AI questions.

Outcome

What changed wasn't the AI. It was the org around it.

R got fourteen hours a week back — measured, not estimated. The senior team owns agent design now; they review the audit themselves at the monthly ops sync. After week four, exactly zero AI support tickets routed to R.

The agents that survived weren't more sophisticated than the ones we killed. They were owned. That was the difference.

11 → 3active agents

From phantom layer to scoped, owned, reviewable.

14 hrsowner / week recovered

Time spent answering AI questions. Now spent elsewhere.

0tickets to owner after w4

Senior team owns the answers. R reviews, doesn't decide.

"The owner stopped being the AI bottleneck for her own agency. That was the deliverable."