doctrinearchitecturedecisions

What we are not building, and under what conditions we would

Toutami publishes its deferred-capabilities discipline: how we decide what crosses the line from research-interesting to in-production, the four-step gate we apply, and two specific cases of how the rule played out.

PC
Payel Chowdhury
May 19, 20268 min read

What we are not building, and under what conditions we would

There is a file in our codebase called `agents/memory/LONG_HORIZON_MEMORY_STRATEGY.md`. It is two pages long. Most of it is a list of things we have decided not to build, alongside the conditions under which we would reconsider each one.

We wrote it for ourselves. We are publishing it because the discipline of writing down what you are deliberately not building, and stating the trigger that would change your mind, is one of the few things we have learned that we wish we had known earlier.

This is not a piece about restraint. We ship plenty. It is a description of how, at a four-product company at our stage, we decide what crosses the line from "research interesting" to "in production." The answer turns out to be a small, dry rule with surprisingly large consequences.

The artifact

The file is structured as a series of capability entries. Each entry has four fields. The capability name. The current state in our system, which is usually "not implemented" or "stub only." The reason we are not building it now. And the trigger, written as a sentence, that would cause us to reconsider.

The list at the time of writing this includes ten entries. They are documented in detail in our public architecture record, ADR-004 §10. A representative subset:

- **Tiered agent memory** (working, session, episodic, semantic). Current state: a four-layer model lives inside one product, Voyami, where it is needed. Reason we are not extracting: only one product needs it today, and the contract that would make it reusable would have to be invented in advance of any second user. Trigger: a second product needs more than two of the four layers and the schemas converge.

- **Cross-product knowledge graph.** Current state: a domain-specific graph exists inside Voyami's reasoning code at `products/voyami/core/ai-runtime/src/agents/core/knowledge_graph/`. It is not exposed as a shared capability. Reason: outside Voyami, the graph would need a different schema, and we do not yet have evidence that any product besides Voyami would query it. Trigger: a second product proposes a query pattern that the existing graph could satisfy with non-trivial schema reuse.

- **MCP adapter.** Current state: our internal tool specs are MCP-compatible in shape (see `platform/ai/agent-sdk/src/toutami_agent_sdk/`). We have not built the adapter that exposes them over the MCP wire protocol. Reason: no external consumer is asking for it. Trigger: a real integration partner, or our own choice to expose tooling to outside agents.

- **Durable execution.** Current state: ACE handles long-running, multi-step orchestration through a PostgreSQL-backed job lifecycle (`platform/controlplane/ace/docs/ACE.md`). It does not yet handle resumable execution across process restarts beyond what the database provides. Reason: the cases where we have lost state to a process restart are bounded and recoverable. Trigger: a slice that cannot tolerate the recovery window we currently accept.

- **Semantic caching.** Current state: the model gateway caches by exact prompt. There is no embedding-based or normalized-form caching. Reason: our prompts have not yet stabilized enough that semantic caching would meaningfully reduce duplicate work. Trigger: we have a prompt corpus large enough and stable enough that the cache hit rate from a semantic key would justify the indexing cost.

The list goes on. It is, deliberately, boring to read. The point is not the content of any individual entry. It is the format. The same four fields. Every time.

Why we wrote it down

The pattern of failure that prompted the file is familiar to anyone who has watched an AI company at any stage. Research-leaning founders build too much platform. They extract the abstraction first and then look for a second user. By the time the second user shows up, the abstraction is wrong, and the platform team has spent six months serving an audience of one. Product-leaning founders do the opposite. They never extract. The same primitive gets reimplemented three times across three products, and by the time someone notices, the surface area of the reimplementations has diverged enough that consolidation costs more than the original duplication would have.

I have seen both. I have personally caused both, in earlier roles, at greater scale. The discipline of writing down "we are not building this, and here is what would change our minds" is the smallest tool I know of that addresses both failure modes. It forces the platform-leaning instinct to wait. It forces the product-leaning instinct to articulate. And it leaves a written record so that, in a year, when someone proposes the capability again, we can ask: did the trigger fire?

There is a second benefit, which we did not anticipate when we started the file. It is a fast way to onboard new engineers. They can read it in fifteen minutes and understand both what we build and what we deliberately do not, without having to reverse-engineer either from the codebase.

The four-step gate

The decision rule itself is described in `REPO_CONVENTIONS.md` §7 and developed in `docs/adr/004-ai-platform-layer.md` §3. A capability is allowed to move into our shared platform layer (the `platform/` directory) only when four conditions hold simultaneously.

First, it is already used in one product. We do not extract speculatively. The capability has to have been built once, against a real consumer, with the rough edges that come from real use.

Second, a second product needs it. This is the load-bearing condition. The pattern of failure described above is, almost always, the result of confusing "useful in principle" with "needed by a second user." A capability that is useful in principle but lives in one product is, for us, allowed to keep living there.

Third, the capability is stable enough to version. If the API would still be changing every week, the cost of every other product depending on it is higher than the cost of the duplication we are avoiding.

Fourth, the capability has an owner and a test suite. Not "we should eventually." Already does, before the move.

When all four hold, the capability moves to `platform/`. When any one of them fails, it stays where it is, and the deferral is documented in the same format used in the file described above.

The analogy we have found most useful is the Java standard library versus the Stripe SDK. `java.util` is enormous and accumulates capabilities through inclusion. The Stripe SDK is small and accumulates capabilities through extraction. We are trying to build the Stripe SDK and not the Java standard library. The four-step gate is the only mechanism we know of that consistently produces the second shape.

Two cases in the wild

The model gateway crossed the gate. It is now at `platform/ai/model-gateway/`. The story is short. Voyami needed a single point of routing between domain agents and the multiple model providers we use. The first version lived inside Voyami's agent runtime. When Brickami began calling LLMs and reimplemented the routing logic in a slightly different shape, the second-product condition fired. Within a sprint, the gateway moved to `platform/`, the Voyami and Brickami consumers cut over, and the third condition (stability) was easier to enforce because both products were now using a single artifact. Total cost of extraction was about a week of engineering, almost entirely spent on cleaning up the call sites rather than on the gateway itself.

The knowledge graph has not crossed the gate. It lives inside Voyami at `products/voyami/core/ai-runtime/src/agents/core/knowledge_graph/`. Internally, it is a reasonable piece of infrastructure. Externally, there is no second consumer. Brickami's information model is structured around properties, decisions, and assumptions. The shape of the queries Brickami would want to ask is not isomorphic to Voyami's. We do not know what the right unified abstraction would be. The four-step gate's reading: the second-product condition has not fired, and so the graph stays where it is. The deferral entry in our memory strategy file records this and names the trigger that would cause us to reconsider, which would be a Brickami query pattern that aligned closely enough with Voyami's that consolidation looked cheaper than divergence.

Neither outcome is permanent. The model gateway could, in principle, fragment again if a future product needed routing semantics that did not fit the unified version. The knowledge graph could, in principle, become a shared capability if a second consumer's needs converged. The rule is not about the final architecture. It is about the moment of the decision.

What this is and is not

This is not a claim that we have figured out platform engineering. We have not. We have made specific mistakes already, two of which are documented in the same file alongside the deferrals. The discipline of writing things down does not eliminate the mistakes. It does narrow the range of possible mistakes, and it does make the mistakes easier to find when they happen.

This is also not a recommendation that every company write a file like this. At very early stages, the right move is to build the thing twice and not think about extraction at all. At very late stages, the right move is to have a real platform team and a real architecture review process. The middle, where we are, is where the deferred-capabilities file does the most work. We are large enough that the cost of unchecked extraction would be real. We are small enough that we do not have a separate platform team whose job is to enforce the rule. The file substitutes for the team.

The file will probably stop being useful at some point, and we will replace it with a better mechanism. When that happens, we expect the trigger will be the same kind of trigger that fires for any capability in the file: a specific moment, written down, where the cost of the existing rule exceeds the cost of inventing a new one.

We will let you know.


The detailed ADR is at `docs/adr/004-ai-platform-layer.md` in our architecture record. The deferred-capabilities file is maintained at `agents/memory/LONG_HORIZON_MEMORY_STRATEGY.md`. The repo conventions that govern the gate live at `REPO_CONVENTIONS.md` §7.

Enjoyed this post? Share it with others interested in intelligent living.