How we built the loop that runs our company with AI

Every small company decides in the dark. The signals that matter (leads, clients, cash, affiliates) live on separate screens, and the decision comes from the gut of whoever holds the context in their head that day. Worse: almost nobody goes back to measure whether the bet paid off. In June 2026 we built a loop that takes the decision out of gut feel and closes it on measurement. We call it Company OS, and it runs inside our own site.

Why run the company as a loop

A decision is not the end of the work, it is the start of a cycle: a signal becomes a recommendation, an accepted recommendation becomes a decision with a hypothesis and a metric, and a week later the metric says whether the hypothesis held. Without that closing, a company repeats the same mistakes because it has no memory of what worked. The loop exists to close the gap between deciding and measuring.

Repo-native, not a separate service

The first architecture decision was not to build a service on the side. The signals the loop needs are already in our Postgres (Neon): pipeline leads, production client telemetry, Stripe milestones and reconciliation, affiliate referrals. An external service would only add syncing, latency and one more place to break. The loop lives in the same Next.js app, reads the same database and ships in the same deploy.

Typed collectors: raw signal before interpretation

Each source has a collector that returns a typed SignalBatch, the raw datum before any judgment. That boundary matters. The collector does not decide what is relevant, it only delivers the fact (how many leads came in, what the cash is, how many referrals). The next stage interprets. Separating collection from interpretation makes each part testable on its own and lets us swap a source without touching the synthesis.

The hard part: making the decision measurable

The heart of the loop is not generating a recommendation, it is making the decision measurable automatically. Each decision carries a structured target metric: not loose text like "improve conversion", but enough to be queried by machine later (the source, the query, the measurement window, the success direction and the baseline recorded the moment the decision was accepted). Without the baseline captured at that moment, "did it work?" turns into an argument. With it, the weekly review compares the measured value against the starting point and answers with a number.

The loop closes on measurement, not on a suggestion

This is where the vocabulary discipline that became a design rule comes in. A proposed recommendation is never called a decision. The daily brief proposes 1 to 3 recommendations (status proposed); they only become a decision when the founder accepts. And a decision is only closed when the weekly review measures the result and writes an Outcome Score (-1, 0 or 1). Forcing those boundaries in the schema prevents the most common management-dashboard mistake: confusing intent with commitment, and commitment with result.

The AI proposes, the founder decides

It is the same principle as our client-facing AI projects: the model does not act on its own where the action is hard to reverse. Claude synthesizes the brief and proposes; accepting a recommendation is a human click, on a dedicated action endpoint. The loop does not decide for the founder, it guarantees that every decision taken has a hypothesis, a metric and a review date. The gain is not removing the human, it is leaving no bet without a scoreboard.

What we would do differently

We would have captured the baseline from the very first decision. In the early versions the target metric was free text, and measuring later meant reconstructing the starting point from memory (imprecise and arguable). Structuring the target metric with a baseline at acceptance time is what turned the loop from a pretty diary into a cycle that learns.

The loop we use is the one we sell

Company OS is an internal application (Track 3) running in production for ourselves. Architecture, decisions and what it delivers day to day are in the full case. If your operation decides in the dark and you want a loop that closes on measurement, tell us your context.