What would refute Paper 006falsification criteria specified up frontRefutationObservableRefutation #1 — Deflationary diagnosisobservable: code production remains binding constraint at5-7 yr in mature agentic orgsRefutation #2 — Mixer Mode absent in transitioned practitionersobservable: 30+ org qualitative study shows hat rotation,not mixer patternNOT a refutationorg fails implementation; product fails commercially;sub-categories wrong cut

What would actually refute this

1 de junio de 2026·Foundations

What would make me retract. Not a rhetorical question — the answer is named, and you can use it to check the framework against your own organization without taking my word for any of it.

I've been writing about Mixer Mode and Meta-Software for a while now, and the most useful thing I can do this week is say out loud what would force me to throw the whole thing in the bin. Because if I can't, what I'm running isn't a framework. It's a sales pitch with footnotes.

Why Naming the Refutations Is Part of the Framework

Popper's old point still does the work: a theory that doesn't specify what would refute it isn't testable, and a theory that isn't testable isn't scientific in any operative sense (The Logic of Scientific Discovery, 1959). You can love it, you can sell it, you can run conferences on it. You just can't say it's wrong, which means you can't say it's right either.

I'll add a quieter corollary that applies specifically to theories of organizational practice: a theory about how organizations work that doesn't specify what would refute it ends up as consulting in disguise. The mechanics are familiar. The framework explains every outcome — success confirms it, failure is "poor implementation", partial adoption is "early days". Nothing observable counts against it. The author keeps the audience and the audience keeps the bill.

The paper names the refutations explicitly, in §7. This post unpacks them so that a CIO or VP Eng can run the check themselves, in their own context, without waiting for me to publish. If the test fails in your organization, I'd actually like to know — those are the cases that move the framework forward, not the ones that confirm it.

Fatal Refutation #1: The Deflationary Diagnosis

The claim sitting underneath everything else is deflationary, not hype-y: in organizations that produce software, code production is no longer the dominant problem. The dominant problem has moved upstream into specification, and downstream into verification and governance. The framework is built on top of that diagnosis. If the diagnosis is wrong, the framework collapses, and I should retract.

What would need to be true for the diagnosis to be wrong: at the 5-to-7-year horizon, in mature organizations that genuinely adopted agentic coding (not pilots, not vendor demos — actual senior practitioners using agents as part of their daily flow), the binding constraint still lives in line production. Senior time still goes to writing the line. Output is still throttled by typing speed, not by clarity of intent or by ability to audit what the machine produced.

The concrete observable is mundane: measure the distribution of senior practitioner time in a longitudinal cohort. Pick twenty mature organizations. Sample senior engineering time at year 1, year 3, year 5. If the composition of that time doesn't change — if the proportion spent on specification, on review, on governance doesn't grow at the expense of line writing — the diagnosis was wrong. Not partially wrong. Wrong in its central claim. And then everything I've built on top of it has to come down.

Fatal Refutation #2: Systematic Absence of Mixer Mode

The second claim is about how the people who crossed the transition actually operate. Pillar 1 of the framework says: practitioners who made the jump to working productively with agents operate multi-channel modulated — Producto, Arquitectura, Código, QA held in parallel and faded up and down — not by rotating hats sequentially. That's the empirical claim. If it's wrong, Pillar 1 falls.

What would need to be true to refute it: a qualitative study across thirty or more organizations, cross-industry, where the practitioners who clearly succeeded with agents systematically describe their work as hat rotation with switching cost — "first I put on the architect hat for an hour, then I take it off and put on the developer hat" — and not as parallel channels modulated. Not one or two cases. A systematic pattern. The framework would be predicting one shape of expertise and reality would be showing another.

The articulation protocol matters here, and it comes from Polanyi (Personal Knowledge, 1958). You don't ask people to theorize their work. You ask them to narrate a concrete decision from last week, out loud, and you watch the shape of the narrative. If it consistently comes out sequential-by-role with explicit switching cost, Pillar 1 doesn't apply. The metaphor I chose was wrong, and a better one — closer to the hat rotation that the data is actually showing — needs to replace it.

What Doesn't Count as Refutation (Even Though It Sounds Like It)

This is the part where most frameworks cheat, so let me be precise. An organization that fails operating Mixer Mode refutes nothing. It refutes its implementation. Maybe they didn't have the senior practitioners. Maybe the tooling wasn't there. Maybe leadership pulled the plug at month four. None of that says anything about whether the underlying claim — that mature operation is multi-channel — is true. It says something about that organization's adoption path.

A Meta-Software product that fails commercially doesn't refute the need for the Meta-Software layer either. Product failure is overdetermined. Wrong timing, wrong pricing, wrong distribution, wrong founding team. The category can be real and any individual product in it can die. What would speak against the category is something different: that after five years, no product in the space finds purchase anywhere, and the work that the layer was supposed to do gets done well by a different layer entirely. That's a harder, slower signal — and the honest answer is it's too early to read it.

The four sub-categories of Meta-Software I proposed — the cut into governance, observability, contract enforcement, and orchestration — might also be the wrong cut. That's the easiest place for me to be wrong, and I expect to revise it. But "wrong cut" calls for re-cutting, not for abandoning. The difference matters: refuting the partition is local, refuting the category is structural. Conflating the two is the move I want to avoid.

Interview protocolhow to test Pillar 1 in 90 minutes1Interview 5 senior practitioners working withagents 6+ months.2Ask about a concrete decision from last week.3Watch narrative shape — sequential hats (Pillar 1does not apply) vs. parallel channels modulated(Pillar 1 consistent — preceptorship work ahead).

How You Test It in Your Org Without Waiting for My Paper

You don't need the paper. You don't need me. The test is small enough that any engineering leader can run it in a week, and the results will tell you more about your context than my framework will.

Pick five of your senior practitioners — the ones who've been working with agents for six months or more, daily, on real work. Sit with each of them for forty minutes. Ask them to walk you through a concrete decision from last week. Not a hypothetical, not a war story, a decision: a moment where they were doing something with an agent and chose a direction. Let them narrate it in their own words. Don't lead. Don't offer metaphors. Watch the shape of what comes out.

If the narrative comes out as sequential hats — "first I thought about the architecture, then I switched to the code, then I thought about the tests" — with explicit switching cost between phases, then Pillar 1 doesn't apply in your context. That's valuable information, not a failure of the test. It tells you that whatever your senior practitioners are doing, it's not what I'm describing, and the right intervention for your organization is different from the one the framework recommends.

If the narrative comes out as parallel channels — "I was looking at this and at the same time the tests were running in my head and I knew the architecture was going to bend if I let this happen" — but they struggle to articulate it cleanly, the theory is consistent and you've just discovered preceptorship work ahead. The skill is there; the language to transfer it to juniors isn't. That's the gap the framework predicts you'll find, and it's the one that pays back the most when you close it.

If the answer is messy — two practitioners run mixer, three run hats — you've learned that the transition isn't uniform in your shop, which is also useful. The framework doesn't claim everyone gets there. It claims that the ones who do, operate in a specific way. Five interviews aren't a study. They're enough to start telling you which conversation to have next.

If you ran something like this in your org, what did you find? I'm especially interested in cases where Pillar 1 didn't show up — those are the ones that would make me rethink.

#TechLeadership #Innovation #Strategy #DigitalTransformation

Escríbenos por WhatsApp