Tokamak

Tokamak is currently internal to Menlo. We’re using it ourselves first — testing, iterating, and getting it right before we open it up. If you’re here, you’re watching it get built.

The Name

A Tokamak is a fusion reactor that uses a magnetic field to generate a nearly limitless, clean source of energy. Inside Menlo, it’s our metaphor for the same idea applied to software — a factory that generates a nearly limitless source of output, without burning people out.

What Is Tokamak

Tokamak is Menlo’s internal product factory.

It’s the layer that sits above your coding tools — above Claude Code, above Codex — and turns product goals into shipped software.

If Claude Code is an engineer, Tokamak is the company around that engineer: the one that defines the work, assigns it, tracks what it costs, and remembers what was built and why.

It’s built for solo developers, PMs, and founders who want one place to run agents, review work, and track spend — and for teams who need shared visibility across all of it.

Stage	What it does
Plan	Convert ideas into structured issues and tasks with defined outcomes
Run	Deploy agents with the right context in sandboxed execution environments
Observe	See task status, execution traces, decision lineage, diffs, and costs in one place
Remember	Every output stays attached to the task. Nothing gets lost between sessions

Why We’re Building It

AI agents are multiplying. Nobody knows what they’re doing or what they cost.

Right now, AI usage at most companies is scattered — different tools, different keys, different accounts, no shared visibility. You end up with agents working in parallel on things that don’t connect, runaway token spend no one sees coming, and zero record of what was actually built.

Tokamak fixes this. One place where work is defined, agents are hired, outputs are reviewed, and every dollar is tracked.

The shift: Software development is moving from a headcount equation to a CapEx equation. “We need 3 more engineers” is becoming “we need $X of agent capacity.” Tokamak is built for that new question.

What Tokamak Does

Hire Agents, Not Subscriptions

You don’t subscribe to a tool. You hire a worker.

Pool your Claude Code seats, Codex API keys, Jan models — into one roster. Tokamak works with the agents you already use: Claude Code, OpenAI Codex, Amp, opencode, OpenClaw. Assign them to issues. Every session is bound to a task from the first token. Every output is traceable back to why it was built.

Know What Things Cost

Every issue becomes a cost center.

Feature shipped? You’ll know the agent runtime, tokens used, and dollar cost — down to the issue. Over time, this becomes something genuinely new: a complete financial record of what it actually cost to build your product.

Stay In Control

Agents do the work. You own the outcome.

Tokamak is not autonomous-first. Every agent runs in an isolated sandbox — containers, cloud terminals, ephemeral environments — so nothing bleeds between sessions or tasks. Every change is reviewable. Every decision is traceable. Context and outputs stay attached to tasks — so when you pick up where you left off, everything is still there. Humans define the goals; agents execute toward them. This is the practical version of AI-native software development — not a vision statement, a workflow that works today.

Under the Hood

Tokamak is built on three integrated layers:

Agent Orchestration — routes tasks to the right agent, prepares context, manages sequencing and dependencies, handles model selection, and gates human review at the right moments
Agent Runtime — manages the full execution lifecycle: spinning up sandboxed environments, capturing diffs and logs, and monitoring resource usage
Observability — surfaces execution traces, decision lineage, cost per task, throughput metrics, and anomaly detection in a unified view

These aren’t separate products bolted together. They’re one system that coordinates from intent to output.

What We’re Working Toward

The goal: one system where work is defined, organized, executed, and remembered.

Right now we’re building the foundation — agent hiring, cost tracking, and shared team visibility. That’s Horizon 1.

Horizon 2 is where it gets interesting: version-controlled product strategy, goal-to-code traceability, and institutional memory that compounds the longer your team uses it. The longer Tokamak runs on your repo, the smarter your agents get about your product — without any retraining.

The long arc: you define a business goal. Agents work toward it. Tokamak tracks everything in between.

Why Not Just Use Claude?

Claude is one of the best coding agents in the world. We use it inside Tokamak.

But Claude won’t build a system for managing its own costs. It won’t route work to Codex when that’s the better call. It won’t version-control your product strategy or tell you what a feature cost to ship. It’s not their business — it’s ours.

That’s the layer Tokamak owns.

⚠️

Still a work in progress. We’re using Tokamak internally at Menlo to build Menlo’s products. If we see market fit, we’ll open it up. Watch tokamak.sh for updates.