How I organized a super-agent by planes instead of layers

Heads-up before we start: Atalaya is still WIP — I'll open-source it once it stabilizes. What follows is the working design, not a finished product. I started Atalaya with classic DDD — domain, application, infrastructure, presentation. Three months in I had a service where "causality" lived split across three layers and nobody could grab it whole. Back to the whiteboard. What came out was planes, not layers.

The public testbench at gitea.rlabs.cl/rlabs-cl/atalaya-testbench shows enough of the shape to follow along. The core itself is still internal because the design is moving — the testbench is the public-facing piece, and that's deliberate while the architectural decisions are still settling.

The Pain: DDD Layers Applied to a Super-Agent

Atalaya is a super-agent — its job is incident triage across the RLABS platform, with the agent reading signals, deciding interventions, executing them, and reporting back. The kind of thing that touches every operational system in the org over a single day.

The first cut used the DDD stack I'd used a hundred times before. Domain at the bottom, application coordinating use cases, infrastructure for adapters, presentation for human-facing output. Clean. Familiar. Easy to onboard new developers. For about a quarter, it looked like the right model.

The pain started showing up around the third month, around a specific concept: causality. The agent needed to be able to reconstruct, after the fact, why it did what it did — which signal triggered the path, what evidence backed the decision, which policies were in force at the time. Causality is the thing that makes an agent's decisions auditable, and in a regulated environment it's not optional.

The problem was that causality didn't fit in any one layer. The triggering signal lived in infrastructure (Slack webhook, Gitea event). The decision logic lived in domain. The orchestration of the response lived in application. The summary to the human lived in presentation. Reconstructing the causal chain meant assembling fragments from all four layers, in the right order, with no part of the codebase owning the thing as a whole.

And then the change cost: every time we wanted to adjust how triage was decided, we touched all four layers. Because layers assume a top-down flow — and an agent doesn't flow top-down. An agent flows in circles. The architecture was modeling something the system didn't actually do.

The Intuition: Concerns Are Orthogonal, Not Stacked

The insight that broke the layered model was that the cross-cutting concerns in an agent aren't really cross-cutting at all — they're the primary dimensions of the system, and treating them as orthogonal to a vertical stack is a category error.

In classic DDD, layers are vertical: presentation depends on application, application depends on domain, domain knows nothing of infrastructure (or knows it only through inversion). The dependencies point downward and the abstractions live in the lower layers. That structure makes total sense for systems where data flows top-to-bottom — user request enters at presentation, gets processed by application, hits domain logic, persists via infrastructure, returns up the stack.

But in a super-agent, the things that matter — causality, evidence, generation, state, policy — aren't stacked on top of each other. They're perpendicular to the flow. Causality is a concern that runs alongside every operation. Evidence is generated and consumed at every step. Policy is applied throughout. None of them sit naturally below or above the others. Forcing them into a vertical layering creates implicit cross-cuts that break encapsulation in the worst way: the cross-cuts aren't visible, they're just there, scattered.

The move I made was to redraw them not as layers but as planes, parallel to the flow rather than stacked across it. A feature still flows through the system — from trigger to decision to action to record. But as it flows, it crosses the planes it cares about, and the planes it doesn't care about remain undisturbed.

The 10 Planes (No Reverence — They're a Revisable Decision)

I want to be clear that these aren't a manifesto. They're a working list, and they've changed over the last few months. They started at six and grew to ten as the system met cases the original cut didn't handle. They'll change again.

Causality: why the agent decided X. Reconstructible at runtime from a trail that every decision-making site writes into. This is the plane that the layered model was failing to encapsulate, and it's the one that justified the whole redesign.

Evidence: what data backs each step. Immutable atoms with lineage. Every claim the agent makes is traceable to an evidence atom, and the atom carries its own provenance.

Generation: production of artifacts. Code, docs, mermaid diagrams, reports — anything the agent emits as a deliverable. Separated from execution because the lifecycle of an artifact (versioning, review, distribution) is different from the lifecycle of an action.

State: what's persistent in the agent between invocations. Not just data — also context, working hypotheses, in-flight decisions.

Policy: named business rules. Not buried in code branches but declared, with names, with versioning, with explicit applicability conditions.

Orchestration: the pull-mode runner that executes jobs. Pulls from a queue, executes, reports back. The runner is δ-2 in our internal versioning — there have been two architectural pivots on it already, and there will probably be more.

Observation: external telemetry. Prometheus for metrics, OpenTelemetry for traces, Loki for logs. The plane that exposes the agent's behavior to standard monitoring tooling.

Integration: adapters to Slack, Gitea, GitHub, MinIO. Each one is an adapter in the hexagonal sense — port on the domain side, concrete implementation on this plane.

Contracts: AsyncAPI schemas versioned as a separate sub-project. The contract repo is independent of the agent repo because the contracts have consumers who shouldn't be coupled to the agent's release cadence.

Auto-generated documentation: .atalaya/*.md files that the RepoDigestAgent commits. The agent documents its own state. This plane didn't exist in the original six — it emerged from operation, which I'll come back to.

How the Shift Materializes in Code

The planes aren't conceptual labels — they're enforced structurally. Each plane has its own Python module, its own test set, its own interface, and its own logical maintainer. The maintainer doesn't have to be a different person from every other plane's maintainer; what matters is that ownership is named per plane, not per layer.

A new feature — say, "triage of failed payouts" — declares the planes it touches. In our case that one would touch policy (the rules for what counts as a failed payout), integration (the connector to the payout system), causality (the audit trail of why a particular payout was flagged), evidence (the atoms that justify each flag), and generation (the report the agent emits for the human reviewer). It doesn't touch observation, state, contracts, orchestration, or auto-docs — those planes don't get modified for this feature.

DDD layers don't give you that guarantee. When a new feature lands in a layered system, it almost always touches all four layers, because that's how layers work. The vertical stack is a coupling structure. The horizontal planes are a separation structure.

The supporting stack underneath is conventional. Hexagonal FastAPI in core. The δ-2 pull-mode runner. A read-only mcp-adapter for external agent consumption. Schemas in a separate sub-repo. None of that is novel. What's novel — for me, in this system — is that the conventional stack is underneath the plane model rather than competing with it.

When Planes Are Worse Than Layers

I want to name the cases where this is the wrong call, because the model is not universal and pretending otherwise would be exactly the kind of move I criticize in other people's frameworks.

If your system has a genuinely linear top-down flow — a typical CRUD web app, a standard request-response API where the flow is request → validation → business logic → persistence → response — DDD layers are simpler and sufficient. The horizontal-planes model is overhead for a problem you don't have. Use the simpler model.

If you have fewer than three "big concerns" — fewer than three things that genuinely cut across the flow and demand their own ownership — the overhead of ten modules doesn't justify itself. The plane model pays back in the cases where the concerns are numerous and orthogonal, not where they're few and stacked.

If your team doesn't have the culture to revisit architecture every six months — planes get unbalanced and turn into chaos. The model assumes ongoing curation. Planes can grow new responsibilities and lose old ones, and without periodic pruning the inventory drifts. Layers, for all their other flaws, are more forgiving of architectural neglect.

And practically: Atalaya started with six planes, not ten. The four that joined later did so because operating cases demanded them. The model assumes you'll add and split planes as the system matures, and a team that treats the original cut as canonical will end up with a different problem than the one I started with.

What Three Months in Production Taught Me

A few things that surprised me, in order of impact.

Onboarding a new dev got faster, not slower. I expected the ten-plane model to be harder to explain than four layers. It wasn't. "Here are the ten planes, you'll start touching these two" turned out to be more concrete than "here's the layered structure, you'll touch all four for any feature". The plane-list is a map. The layer-stack was an abstraction.

Refactoring within a plane didn't break other planes. Rewriting the evidence plane to use a different atom format didn't touch policy at all. In the layered version, the equivalent refactor would have rippled across layers and required cross-team coordination. The blast radius shrank.

Tensions appeared at the edges, where something is 50% one plane and 50% another. The answer that emerged: the thing lives in one plane and exposes a port to the other, not in both. That choice is sometimes painful — there's a clear case for either home — but "in both" turns out to be the worst option, because then both maintainers feel responsibility and neither has authority.

A new plane emerged that I hadn't planned: auto-generated documentation. The RepoDigestAgent kept producing .atalaya/*.md files describing system state, and the lifecycle of those files (when they get regenerated, who owns the regeneration, how they get reviewed) didn't fit cleanly into any of the existing planes. So it became its own. The model absorbed it without major reorganization, which I take as a small signal that the underlying decomposition is at least reasonable.

Transferable Principle

When concerns are orthogonal — not stacked — planes beat layers. The model isn't universally better. It's better for systems where the cross-cutting concerns are numerous and the flow is non-linear, and worse for systems where the flow is linear and the concerns are few.

The test I'd offer, if you're considering whether your system fits: pick three imaginary future features. Trace each one through your current architecture. If every feature would touch all four DDD layers, and two of those features would touch the same four layers in different ways — meaning the layers aren't actually encapsulating the concerns, they're just slicing the codebase — then layers are the wrong model for what you're building. Something more like planes is probably the right move, whether you call it that or not.

And to close on the part I want to be most careful about: planes aren't a new architectural religion. They're the natural response when what you're designing is an agent, not a CRUD. The metaphor will get superseded the moment somebody finds a better one. The principle underneath — that decomposition should follow the shape of the system's actual concerns rather than a default vertical stack — is the part worth keeping.

Does your architecture assume a top-down flow your real system no longer respects? What concept in your codebase lives split across three layers with no owner? Send me a DM or reach out via the contact channels at rlabs.cl.

#AI #Engineering #Architecture #Platform #GenAI