Where Is the Moat
I spent some time taking apart the published distributions of few AI coding tools. Binaries, bundles and whatever is on my local when the product is installed.
I started to just keep working at them, and monitor what it is happening. The question that kept me interested is why do I use one surface more than the other. And is it just me?
The position naturally was that does the best model wins, or is tooling more important. As I am writing this, I am also noticing that some editor-embedded tools just launched their own coding models too. So it feels like both sides are pushing the envelop, whether it is model developer or so called wrapper over the models.
Different Personalities
What I found is that these tools are not competing for same things all the time. There is a fundamental difference and probably different bets that these companies are taking to embed AI into our (developer) experience.
For e.g. in the CLI-first tool, the entire codebase was just a large prompt engineering machine, tool definitions, and orchestration logic. Its small and fast with the goal for optimizing the message for the model.
Whereas the editor-embedded tool seems to have AI woven into the editing experience at every level. Multi-file composition, codebase-aware context, and recently it started spinning up cloud VMs to run longer agentic tasks.
So the first approach is to ship zero AI code itself. It exposes APIs for extensions to plug into and delegates all the prompt engineering to them. Architecturally clean. But it means every extension independently reinvents context management.
The platform API was the most instruction-heavy of the set, with explicit baked-in behavioral directives that the others don't have.
The sandboxed binary is built in a compiled language — the only non-JavaScript product. It ships its own sandboxing primitives baked into the binary rather than delegating to cloud VMs or OS permissions.
Different models, all have their pros and cons on where the leverage actually is.
Where they all land
While the architecture is different for all, they try to do the same thing.
I expected more divergence initially especially for CLI-first vs editor-embedded, vs platform API but it is not the case. On the things that matter there was clear convergence.
All tools reads before it writes - like do not touch without reading any file.
There is a strong emphasis on planning mode to understand the problem before acting on it.
Each one injects working directory, shell type, and platform into the conversation automatically. So that models know where it is all the time.
Every one pushes for shorter output. And tools calling is structured, go read file, run command, apply patch, search. The entire agentic loop runs through this interface.
This signals that things which are working for the overall setup. And sort of also separates the essential from the accidental.
Where the moat actually is
One thing was clear that just the model is not the moat. All major tools surface all frontier models. Models are important but not everything.
Context management under pressure. Getting a good answer at turn one is easy. Getting a good answer 150,000 tokens into a conversation, when the context window is nearly full, is not. The tools that handle this well know exactly what's in the window, what it costs per section, and what to drop first when space runs out. They truncate by priority — low-priority sections go first, critical instructions survive. The tools that don't do this degrade unpredictably at exactly the moment you need them most.
Tool quality. The jump from read/write/run to search and parsing code with diff/patch with git and sandbox is not linear in some ways. I saw usage of compiled WASM parsers for multiple languages on client side. When the tool can parse code structure rather than treat it as text, it can help the model produces structure-aware diffs and code refactoring. So with the same model, you can have different different results.
Memory layers. So for handling context between session, i found tools to have three level- user-level defaults, project-level conventions and session-level state. This ensures that model never starts cold. There can be team standards, which individuals can customize for their preference. Without this the model has no idea for e.g. if this is the third time you've mentioned the same architectural constraint.
The learning flywheel. This is probably the hardest to replicate - the more the users the more real sessions to understand what works and what does not. For e.g. there was a large number of feature flag in one and if you have a keen eye across the versions, you can see what survives. When one disappears entirely, it failed. This is how compounding works.
Distribution lock-in. For products like this distribution is always the key. You maybe able to close the learning gaps but distribution lock-in doesn't close just by being better. It takes years to build Marketplace presence, deep IDE integration etc. These are structural advantages that a new tool accumulates slowly or not at all.
What it would take to build one
I think the convergence in techniques across tools tells us what works. That is your probably a recipe to start with.
I think if someone wants to build something capable using a strong open model the recipe would be roughly as below.
A shell. CLI or editor extension to start with. Then have streaming output, async tool execution and a way to interrupt a running task.
Context assembly. You need some form of a prompt builder that assembles sections lazily - like identity, memory files, environment context, tool definitions etc. It should also track token cost per section and some kind of priority-based truncation so the model doesn't silently just hit the ceiling. Maybe a fast token counter something like a character count divided by 4 can be surprisingly accurate with a fallback to the real tokenizer when it matters.
Tools. At minimum we need a read file, write/patch file, run command and search codebase. We probably also need AST aware search for code, git operations and sandboxed execution. Everything then probably needs to be wired through structured function calling.
Memory. A markdown file at the project root that loads automatically. And some user-level defaults, directory level overrides if needed. Simple formats but across multiple layers. The point should be that model never starts from zero every time.
The loop. Seems to be simple - Read, plan, execute and then verify. An explicit planning phase then an execution phase that batches parallel-safe operations. And a check before calling something done.
A solid model with this scaffolding will outperform a frontier model without it. The primitives matter more than the parameters only up to a point.
What you can't engineer around
The biggest challenge is the learning gap which is different from giving the models scaffoldings. You need sessions, real feedback and iterations. A new tool starts at zero and earns that signal slowly. The existing ones compounding it.
Then how do we make it easy for everyone to try out an nail distribution. Building better doesn't automatically close it.
Finally I am not saying we need more tools. I found it wonderful that both the frontier model teams and wrappers around them have to push the envelop to create a moat.