How Voyami's orchestration actually works

Last week I was tracing a Voyami request through the system to debug a hotel recommendation that came back two seconds slower than I expected. The trace passed through four distinct planes before any LLM call was made. The exercise reminded me that we have not written down what each plane actually does in production. This post is that record.

The four planes

A Voyami request travels through four planes between the user typing and a trip plan appearing on the screen.

The first plane is the web or mobile client. The web version is React 18 with Vite, Tailwind, Shadcn, and Zustand. The mobile version is React Native. The client carries the user's intent and the session context.

The second plane is a Node middleware service. It handles authentication, rate limiting, and conversation-state management. It is the system's outer-facing entry point.

The third plane is AMILEN, our routing facade. AMILEN is a Java service that does one job: classify the user's intent, pick the right downstream service or agent, and assemble the context that service needs. It is documented at `platform/amilen/README.md`. AMILEN is deliberately not allowed to grow into a product brain. It has an API/SPI separation: the public API (`IntentClassifier`, `Router`, `ContextAssembler`, `MemoryStore`) is small and stable, and the implementations sit behind the SPI. When a request lands, AMILEN decides where it goes and stops there.

The fourth plane is the serving fabric. Seventeen Java gRPC services running on ports 50055 through 50069, and a Python `ai-runtime` that contains ten domain agents. This is where the work actually happens.

AMILEN: front-door, not brain

The most common source of orchestration confusion in AI startups is the routing layer. Teams that build a router often grow it into a generalized reasoning engine over the next six months. By month nine, the router knows everything about every product and has become the platform team's full-time job.

We have explicitly chosen not to do that. AMILEN's job is, in order: classify intent, route, assemble context, hand off. It does not reason about tradeoffs. It does not call LLMs. It does not aggregate results from multiple services. The reasoning is downstream.

The rule is documented at `docs/architecture/AGENT_SYSTEMS_POV.md` §4.2. If AMILEN starts doing work that belongs to a downstream service, we move it. The cost of letting the router accumulate intelligence is much higher than the cost of building a slightly thinner router and a slightly fatter service.

The ten domain agents

Inside the Python ai-runtime, there are ten domain agents. They live at `products/voyami/core/ai-runtime/src/agents/`. Each is a Python class. The names map directly to travel sub-domains: Hotel, Flight, Restaurant, Attraction, Destination, Experience, Shopping, Transportation, Weather, and TravelWisdom.

Each agent is responsible for one domain. Hotel calls Amadeus for inventory, applies user preferences, and returns ranked candidates. Flight does the same for flights, with its own scoring logic. Restaurant integrates with Viator and a few specialty providers. The agents do not talk to each other directly. They write to a shared session state, and AMILEN, or a downstream coordinator service, decides what to do with the combined output.

A small honest detail. The Python agents do not call LLMs today. The `anthropic` and `openai` packages are declared in `pyproject.toml`, but they are not imported by any agent. The reasoning that uses an LLM happens in Java, in a service called `LLMService` that the agents indirectly depend on through gRPC. This is documented as a current reality at `products/voyami/docs/00_SYSTEM_REALITY_SNAPSHOT.md` §1.5.

We have not collapsed this. The split exists because of how we built Voyami in stages. The Java services predate the Python ai-runtime. The LLM integration was implemented in Java first. The Python layer was added later as a place for domain logic that was easier to express in Python and that did not need to live in a Java service. The current architecture reflects that history.

We will probably consolidate at some point. The trigger we have written down is whether the cost of cross-language coordination starts to exceed the cost of moving one side. Until then, the system runs in two languages, the LLM calls happen in Java, and the domain agents in Python aggregate tool outputs.

Skills, agents, and workflows

Inside the Python ai-runtime, the architecture is governed by a three-word principle described at `products/voyami/docs/02_WORKFLOW_SKILL_AGENT_ARCHITECTURE.md` §3.

A skill is the unit of intelligence. It is one thing the system knows how to do. "Score a hotel for a given user persona" is a skill. So is "estimate a flight's reliability." Skills are testable in isolation.

An agent is the unit of orchestration. It composes skills and tool calls into a coherent response for one domain. The Hotel agent is an agent. It uses skills like "score a hotel," tool calls like "fetch availability from Amadeus," and produces an output that downstream services can consume.

A workflow is the unit of user value. It is what the user actually gets. Building an itinerary is a workflow. It is what AMILEN routes to.

The principle that holds this together is a discipline: skills do not orchestrate, agents do not own user value, workflows do not own intelligence. When a skill starts orchestrating, we move it. When an agent starts being the user-facing thing, we wrap it in a workflow. The lines blur in code from time to time, and the doc lists the anti-patterns we look for in review.

What we deliberately did not build

This post is titled "how Voyami's orchestration actually works," not "how Toutami orchestrates across life domains." That phrasing is deliberate. We do not have a Finance agent. We do not have a Health agent. We do not have a multi-product agent fabric. The orchestrator described here is for Voyami, runs against Voyami's travel domain, and has no scope beyond that.

The deferral is written down. The relevant entry is in `agents/memory/LONG_HORIZON_MEMORY_STRATEGY.md`, in the same form as every other deferred capability we keep tracked there: name, current state, reason, and the trigger that would change our mind.

When a second product calls for a capability that the existing Voyami orchestration would need to share, we will reconsider. Until then, the orchestrator is for one product.

Why this shape

The four-plane architecture is the result of three constraints, not a green-field design choice.

The first constraint is latency. A user expects an itinerary in seconds. That ruled out architectures where every cross-domain decision passes through a central reasoner. We needed the work to happen in parallel, in the domain agent that owns it, and to be assembled at the edges.

The second constraint is testability. We needed each domain agent to be testable in isolation, with mocked tools, without standing up the whole stack. That argued for clear domain boundaries and small public interfaces. The skill-agent-workflow principle is what fell out of that.

The third constraint is the team. The system was built primarily by one person, with two more people joining as the architecture solidified. A four-plane architecture with strict boundaries between planes is the kind of architecture a very small team can hold in its head. A multi-domain agent fabric is not.

We will let you know when that changes.

The architecture record for AMILEN is at `platform/amilen/README.md`. The domain agents live at `products/voyami/core/ai-runtime/src/agents/`. The skill-agent-workflow principle is documented at `products/voyami/docs/02_WORKFLOW_SKILL_AGENT_ARCHITECTURE.md`. The current state of what is and is not built is at `products/voyami/docs/00_SYSTEM_REALITY_SNAPSHOT.md`.