Armeet Singh Jatyani

Sandbox Coding Agents Suck (right now)

Background sandbox coding agents like Cursor, Codex, and Copilot have not really taken off. As a YC founder who uses coding agents daily, here is why.

1) They cannot run the code or tests

If I see one more PR from Copilot that does not compile or fails a basic typecheck, I am going to lose it. What is missing: Coding agents should run on machines with the full developer environment. That means browser access, Sentry logs, Linear context, internal documentation, and debugging tools. Everything a software engineer at the company would have access to. Ramp figured this out, which is why they built their own internal tool. Cursor and Codex still fall short.

2) Developers cannot easily access or control the agent’s machine

At some point in the future, we will not need to babysit agents. We will just point them at a problem and they will write perfect code. That is not reality today. Limited visibility and limited control bottleneck adoption, especially for fast moving startups and enterprise teams.

Why I am ranting

Like Ramp, I found existing implementations from Cursor, Codex, and Copilot pretty underwhelming. Here's what I actually want:

We solved this internally by building a tool called Chopin. We're going to open source it soon, because this is infrastructure you should not have to reimplement yourself. I'll be sharing updates. Follow if you're interested.