For about an hour on a Tuesday, I was certain the problem belonged to Cloudflare.
It didn’t. The problem was mine, wearing Cloudflare’s face. And the hour I spent being wrong taught me more about building with AI than anything I’ve shipped this year.
Here’s what I was building, and why it matters even if you never touch a line of code.
The AI system that runs the back office of my studio had a weight problem. Over months it had collected tools, dozens of small single-purpose ones. Look up a project. List the overdue work. Pull a contact. Each useful on its own. Each, on its own, harmless.
The cost of tools is invisible until you go looking for it. Every tool an AI agent can reach has to be described to the model, and that description sits in the model’s working memory on every request, whether the model uses the tool or not. Fifty-odd tools meant the model read a small instruction manual about everything it could do before it ever got to what I’d actually asked. Throat-clearing, every single time.
And the questions I cared about were never single-tool questions. “Which active projects are overdue, for a client I’ve gone quiet on?” isn’t one lookup. It’s three or four, fetched separately and stitched together by hand. When every one of those calls is metered and billed, that isn’t just slow. It’s a line item.
Cloudflare shipped a pattern this year they call Code Mode. The idea is almost rude in its simplicity: stop handing the model fifty buttons. Hand it two. One to see what data exists. One to write real code that fetches what it needs, joins it, and returns only the answer. The program runs in a sealed sandbox. The model never holds the keys to anything. It gets one narrow way to ask a question.
I rebuilt the read layer on it. Fifty-odd tools became two. The manual the model hauled around on every request shrank by roughly eight to one. The four-call question became one call. On paper, a clean win.
Then the centerpiece broke.
The part that actually runs the model’s little program failed. Not intermittently. Every time, with the same cryptic error. I ran it in my local test harness. Failed. I ran it against a local dev server. Same failure. I deployed it to Cloudflare’s real production edge, the live thing, and watched it fail there too. Three different environments. One identical error.
This is the moment that matters, and it has nothing to do with code.
The capability I was leaning on is new. Officially beta. And when something fails identically across every environment you can think of, the careful, experienced, responsible conclusion is the obvious one: it isn’t fully baked yet. The platform doesn’t support what you’re trying to do. You’re early.
I believed it. I had a second design, documented and blessed, that would technically work while quietly giving up the cleanest version of the security model I wanted. I wrote it up. I was one commit from shipping it. I’d already started rehearsing the story I’d tell other people: the runtime’s still in beta, so we went with the proven fallback. True-sounding. Tidy. Absolving.
The error message had been telling me the answer the whole time. I just wasn’t reading it.
It never said “unsupported.” It said, near enough, “you called this function without the one argument it requires.” The function expected a small configuration object. I’d handed it nothing. One missing pair of empty braces. I added them and redeployed.
Green. Everywhere. The test harness, the dev server, the production edge. The “platform limitation” that had survived three environments and nearly rewrote my architecture was a single missing argument in code I’d written myself.
What stays with me is how good the wrong answer felt.
When you work at the edge of new models, new runtimes, anything beta, there is always a plausible, face-saving explanation within arm’s reach: the tool isn’t ready yet. It’s dangerous precisely because it’s sometimes true. And it’s seductive because it does two things at once. It ends the investigation, and it moves the fault outside of you.
But “the platform can’t” is a diagnosis. The wrong diagnoses are rarely the ones you couldn’t have caught. They’re the ones you accepted because they were convenient. The same move as blaming the market instead of the offer.
The discipline is narrower than that, and less heroic. Read the actual error, not your story about it. Reproduce the failure in the smallest case and refuse to accept “the system can’t” until you’ve honestly ruled out “I haven’t yet.”
People keep asking what skill matters now that the tools are this good and getting cheaper by the month. This is it. The tooling is a commodity racing toward free. The leverage that’s left is judgment, and the most valuable judgment I exercised all week was knowing the difference between a wall and a door I hadn’t tried the handle on.
The agents do the work now. The humans build the pipelines and, more than that, check them. So the job isn’t knowing all the answers anymore. It’s keeping the nerve to ask, of a limitation everyone around you has already accepted: is that actually true?
It usually isn’t.
P.S. If you’re putting AI into the guts of how your business runs and want a second set of eyes on what’s a wall and what’s a door, that’s the conversation I have for a living: mlsebastian.com/book/diagnostic-call
