A Month of Building Rebyte From My Phone

For the last thirty days I shipped Rebyte — every service, every deploy, every production fix — without opening a laptop. Not because phones are the future of coding. Because once you understand what is actually in the box, the phone is enough.

Most people never think about doing this, because the picture in their head is "writing code on a 6-inch screen," which is absurd. The picture is wrong. The phone is not a coding device. It is a remote control for a fleet of computers that each already know what they are for. The interesting question is not "can you type fast enough?" It is "what does the system on the other end of that remote have to look like?" This is the answer.

A normal loop looks like this. I am outside, halfway through a walk. I speak a change into the phone. A few minutes later one machine tells me the frontend diff is ready, another tells me the logs are clean, a third tells me the production path still passes, and I decide whether to ship. The phone is not doing the work. It is collapsing the work into something small enough for a human to steer while moving through the day.

The scope: what is actually being managed

Rebyte is not a single app. It is a distributed system with five public surfaces and a stack of infrastructure underneath:

A TypeScript SPA at app.rebyte.ai.
An Astro landing site at rebyte.ai.
A Node.js relay on Cloud Run at api.rebyte.ai, with a Temporal worker shipped beside it.
The runtime that lives inside every cloud VM we hand to a user.
A TypeScript SDK with an e2b-compatible shape, plus iOS and Android apps under three product brands (Rebyte, Adits, Vox).

Underneath: Postgres on Cloud SQL, GCS event buckets, an LLM proxy, the VM image those machines boot from, and the orchestration that runs all of it across GCP and AWS. Every one of these surfaces shipped something in the last thirty days. None of it shipped from a laptop. The fact that this is even a coherent paragraph is the part most people miss when they hear "from a phone."

The machines behind the phone

I am not coding on the phone. The phone drives a small fleet of Rebyte cloud VMs that code on my behalf. To make that possible, there are really two layers underneath it: the support infrastructure that keeps the fleet usable, and the working machines that actually own pieces of the product.

Rebyte itself. Every agent in the fleet is a Rebyte cloud VM running on my Claude Code and Codex subscriptions. Rebyte holds the refresh tokens, mints short-lived access tokens into each VM as it starts, and rotates them on my behalf. I authenticate once; the long-lived secret never leaves the platform. This is the primitive that makes the rest possible — agents I can stand up, tear down, and trust without re-doing auth a dozen times a day.
Tailscale. The mesh that makes everything else look like one machine. Cloud VMs, the Mac mini at home, and my phone all sit on the same private network. The frontend agent on GCP can reach the mini at home as if they were on the same LAN. Without the mesh, every cross-machine call is a tunnel I'd have to think about. With it, "the network" is one flat thing.
A Mac mini at home. One job: be the residential corner of the mesh. Plenty of services — Reddit being the loudest example — block every major cloud provider's egress range. If I want an agent to read a Reddit thread, fetch a forum page, or hit a site that doesn't trust GCP, the request has to bounce through the mini. It is not a build farm. It is not a CI box. It is the one machine on the mesh with a normal home IP, and that turns out to be irreplaceable.
GCP service accounts, narrowly scoped. Each VM gets its own service account with the smallest scope it needs. The infrastructure VM can reach the build host and artifact bucket. The deploy VM can touch Cloud Run and Cloud SQL. The DevOps VM has read-only access to logs. The mobile VM can trigger release builds. No agent gets keys it does not need.

None of this is clever. The point is that the supporting cast collapses into a single primitive — files in a repo, machines on a mesh — that every agent already knows how to use.

Then there are the working machines themselves. On any given day there are five or six Rebyte VMs running, sometimes ten. Each one is specialized and stays specialized for weeks at a time.

The infrastructure VM. Owns the code that runs inside user VMs, the VM image, and the lower-level infrastructure. When something is wrong inside the machine a user is connected to, this is the agent I talk to.
The frontend / relay VM. Owns the React SPA and the Node.js relay. Auth, routing, data layer, UI. Most of the day-to-day product work happens here.
The deploy VM. Doesn't write code. Runs the deploy skills, gates production pushes, and is the only agent allowed to touch the production scripts.
The DevOps VM. The agent I rely on most and think about least. It owns the wall of scheduled skills described in the next section, and turns their output into a few sentences my phone can read.
The mobile VM. Owns the iOS and Android release flow. It handles app changes, kicks off signed builds, and reports back when a mobile release is ready.
One or two scratch VMs. Whatever the week needs — a research spike, a one-off migration, a draft of a new skill.

The hardest part of running the fleet is not the agents. It is the mental model. Each VM has a surface area, a credential set, a repo, a history. Routing the right ask to the right VM — and not letting their boundaries blur — is the actual skill. Get it wrong and you end up debugging low-level infrastructure work on the frontend VM, with neither the toolchain nor the context to make progress.

The interface: voice in, voice out

None of this works if the loop on the phone is slow. You cannot peck out instructions to four agents on a thumb keyboard and call it a day. To make this usable, we had to do three specific things.

First: a highly optimized mobile PWA. The iOS app is not out yet. The thing I actually use every day is the PWA, and it had to feel native enough that I stopped thinking about the difference. Fast load, clean navigation, reliable background resume, and a layout that works on a phone without precision tapping. If the PWA felt like a shrunken desktop app, the whole experiment would have collapsed.

Second: a type-less mode. On a phone, sustained typing is the wrong interaction. So we added a mode where the product assumes you are not going to type. The input is dictated speech: noisy, misrecognized, corrected halfway through. The composer is built around that reality instead of pretending a keyboard is the primary path. If shipping from a phone required careful thumb-typing, it would not be real.

Third: spoken read-back. When an agent finishes a task, the result has to be something I can hear, not something I have to read. So we built a dedicated read-out feature in the app: every response, every diff summary, every DevOps report can be played as speech with one tap. The frontend renders the agent's output into a form a TTS voice can speak in a way I can actually understand — not a wall of code, but a paragraph someone could read aloud. That deliberate compression is the whole point. It pulls me out of the detail and leaves a decision: ship, retry, ignore.

Mouth in, ear out. No screen attention required. I can correct mid-walk, mid-drive, mid-coffee. The phone becomes the smallest possible remote control for a fleet of computers that each know what they are for.

How the fleet shares memory

Once you have five or six machines doing different jobs, the next problem is obvious: how do they share context without me re-explaining everything to each one?

The answer we landed on is a GitHub repo called shared-memories. It is the company's long-term memory in markdown: one file per topic, organized by subsystem, with the important flows written down in the place the relevant agent will actually look.

The reason this works is simple. What agents natively want is not a fancy knowledge graph. They want a shared file system. They want files they can read, diff, edit, commit, and pull. GitHub is already a very good version of that. So instead of inventing a new memory layer, we use the thing the agents already understand best: a repo.

That repo gives us progressive disclosure for free. The top-level docs explain the broad shape of the system. The folder for a subsystem holds the operational details. The skill file for one task holds the exact steps for that task. An agent starts broad, then drills down only as far as it needs.

It also doubles as the issue tracker. Pending bugs live there too, one markdown file per issue, close to the code and workflow that own them. So the same place that stores "how this system works" also stores "what is currently broken" and "what we already know about it."

Every VM clones the repo. Every agent reads its local instructions on startup. When I learn a new flag, gotcha, or recovery step, I tell the nearest agent to write it down, commit it, and push it. The next machine that pulls has the update. That is the memory system.

The unseen half

The visible half of the month is the conversations — me talking to an agent, the agent doing a thing. The unseen half is everything that keeps running when I say nothing at all. Tests. Checks. Log scans. Quiet operational loops. We manage that with a wall of scheduled skills, each with one job, each running on its own clock. They are a big part of why a phone is enough.

Golden tests against production, hourly. A scheduled skill spins up a fresh sandbox and walks through the canonical user paths against the real production endpoints — sign in, create a workspace, run a task, tear it down. It writes a pass/fail line, plus the failing step if any. If something breaks, the next read-out on my phone leads with it. I learn about regressions before I would have noticed them by hand, and they are real regressions, against real prod, not a mocked test environment.
Routine test and verification tasks. Some scheduled jobs are simple quality-control loops: run a test, verify a flow, check that a critical integration still behaves the way it should, and report only if the answer changes. The point is not sophistication. The point is that I do not have to remember to ask.
Log scans, hourly, by signature. Another skill pulls the last hour of app and infrastructure logs, classifies them by error signature, deduplicates the noise, and produces a one-screen summary. New errors get flagged. Known errors get counted. The agent does not page me; it tells me which one of three or four ongoing fires is hottest.
System monitors, each a single skill. One scheduled job watches for stuck VMs. Another watches the LLM proxy budget. Another watches Temporal queues. Another watches certificate expiry. Each is single-purpose, scheduled, with one clear output line. If the line is "OK," I never see it. If it isn't, I do.
Slow background work. Database backfills, log archival, key rotation, dead-skill cleanup — anything that is "do this once a day and stop bothering me" — lives in the same scheduled-skill pattern.

The shape is consistent. Each task is a Rebyte skill: a markdown spec the agent reads, a tightly scoped set of tools it is allowed to call, and a clock that wakes it up. None of them needs me on a laptop. Most of them do not need me at all. By the time I open the phone in the morning, the system has already been testing itself, checking itself, and deciding what is worth interrupting me for.

This is the part most "ship from your phone" stories quietly skip. You cannot run testing, quality control, and DevOps from a 6-inch screen by reacting faster. You do it by pushing as much of that work as possible into scheduled systems that keep running without you.

Deploys, builds, and the things a sandbox cannot do

A sandbox is a real Linux VM, but it is not infinite. The trick is keeping the agent in the sandbox while the heavy work happens elsewhere.

Deploys go through CLIs. Production deploys land on GCP and AWS, and they all go through the official CLIs — gcloud, the Cloud Run deploy command, aws for the AWS-side services, firebase for hosting. The deploy VM has the right service accounts and the right scripts. I tell it what to ship; it runs the CLI and reports back. There is no magic deploy console. There is just a CLI, in a sandbox, on my behalf — exactly what I would have run on a laptop, except a sandboxed agent runs it for me and reads the output back as a sentence.

Mobile builds export to the cloud. iOS and Android builds are too heavy and too credential-bound for a general-purpose sandbox. So they are exported to a cloud build service, with the signing material managed at the build provider. The agent commits and pushes to GitHub; the build kicks off in the cloud; TestFlight gets the result. The phone never sees a build log. I just hear, eventually, that the build is up.

The VM image build. The image every Rebyte VM boots from is built on a dedicated build host on GCP. The infrastructure VM does not run the build itself; it knows which machine should do the heavy work, how to trigger it, and how to pull the result back. Sandbox is the brain. Build host is the muscle.

This pattern repeats. Wherever the sandbox cannot do the work, it learns to delegate to a machine that can. The agent is never the bottleneck.

What it actually feels like

I open the phone. There is a short list of agents, one per VM, named for the job it does. I say what I want. I hear back. I confirm. I move on.

One message says the frontend change is ready for review. Another says the deploy path is clean. Another says the latest production log scan found nothing new. If I need more detail, I can ask for it. If I do not, I do not have to go looking. The important part is not that the phone somehow replaced a laptop. It is that the system on the other end has already compressed the situation into something I can understand quickly.

That is why it feels a little strange at first. You are steering a large system through summaries and read-backs instead of through windows and terminals. Done badly, it would feel like guesswork. Done well, it feels like management: direct enough to move quickly, abstract enough that you are not trapped in the machinery.

While I am on a walk, the frontend agent is rebasing a branch. The deploy agent is shipping the relay. The infrastructure agent is finishing a fix inside the VM runtime that the deploy agent will pick up next. The DevOps agent has already filed today's hourly log report. The mobile agent has queued an iOS release. The golden-test skill is, as it does every hour, telling me prod is fine. None of it is on my screen. All of it is on my phone, in the form of a few quiet sentences read out loud.

Why bother

I did not do this because phones are the future of coding. I did it because pushing the claim to its extreme — you can ship the entire product from your phone — was the most honest stress test I could put on Rebyte itself. The fleet pattern, the voice surface, the read-back, the wall of scheduled skills, the boundary between what a sandbox can do and what it has to delegate — all of it had to be real, because there was no laptop to fall back on.

There is also an important caveat. This only works because I know this codebase extremely well. Doing it from a phone feels a little like feeling around an elephant in the dark: you are never holding the whole thing in your hands at once. What makes that manageable is not the phone. It is familiarity. I already know the shape of the system well enough to guess where a change will land, what it is likely to touch, and which machine should own it.

If I were brand new to the codebase, I would be much less confident doing this. When you do not already understand the scope of a change, the phone becomes much more dangerous. The method works best once the mental map is already there. In that sense, it is not a replacement for deep knowledge of the system. It is what deep knowledge of the system makes possible.

Everything that survived this month is, by definition, good enough for the much easier desktop case. That is the point of the exercise. If it works from a phone, with your voice, in a noisy room, while you are doing something else — it works.

The laptop is open again now. I am in less of a hurry to use it.