Comment by embedding-shape
19 hours ago
Look interesting, eager to play around with it! Devstral was a neat model when it released and one of the better ones to run locally for agentic coding. Nowadays I mostly use GPT-OSS-120b for this, so gonna be interesting to see if Devstral 2 can replace it.
I'm a bit saddened by the name of the CLI tool, which to me implies the intended usage. "Vibe-coding" is a fun exercise to realize where models go wrong, but for professional work where you need tight control over the quality, you can obviously not vibe your way to excellency, hard reviews are required, so not "vibe coding" which is all about unreviewed code and just going with whatever the LLM outputs.
But regardless of that, it seems like everyone and their mother is aiming to fuel the vibe coding frenzy. But where are the professional tools, meant to be used for people who don't want to do vibe-coding, but be heavily assisted by LLMs? Something that is meant to augment the human intellect, not replace it? All the agents seem to focus on off-handing work to vibe-coding agents, while what I want is something even tighter integrated with my tools so I can continue delivering high quality code I know and control. Where are those tools? None of the existing coding agents apparently aim for this...
Their new CLI agent tool [1] is written in Python unlike similar agents from Anthropic/Google (Typescript/Bun) and OpenAI (Rust). It also appears to have first class ACP support, where ACP is the new protocol from Zed [2].
[1] https://github.com/mistralai/mistral-vibe
[2] https://zed.dev/acp
I did not know A2A had a competitor :(
They're different use cases, ACP is for clients (UIs, interfaces)
> Their new CLI agent tool [1] is written in
This is exactly the CLI I'm referring to, whose name implies it's for playing around with "vibe-coding", instead of helping professional developers produce high quality code. It's the opposite of what I and many others are looking for.
I think that's just the name they picked. I don't mind it. Taking a glance at what it actually does, it just looks like another command line coding assistant/agent similar to Opencode and friends. You can use it for whatever you want not just "vibe coding", including high quality, serious, professional development. You just have to know what you're doing.
>vibe-coding
A surprising amount of programming is building cardboard services or apps that only need to last six months to a year and then thrown away when temporary business needs change. Execs are constantly clamoring for semi-persistent dashboards and ETL visualized data that lasts just long enough to rein in the problem and move on to the next fire. Agentic coding is good enough for cardboard services that collapse when they get wet. I wouldn't build an industrial data lake service with it, but you can certainly build cardboard consumers of the data lake.
You are right.
But there is nothing more permanent that a quickly hacked together prototype or personal productivity hack that works. There are so many Python (or Perl or Visual Basic) scripts or Excel spreadsheets - created by people who have never been "developers" - which solve in-the-trenches pain points and become indispensable in exactly the way _that_ xkcd shows.
> But where are the professional tools, meant to be used for people who don't want to do vibe-coding, but be heavily assisted by LLMs? Something that is meant to augment the human intellect, not replace it?
Claude Code not good enough for ya?
Claude Code has absolutely zero features that help me review code or do anything else than vibe-coding and accept changes as they come in. We need diff-comparisons between different executions, tailored TUI for that kind of work and more. Claude Code is basically a MVP of that.
Still, I do use Claude Code and Codex daily as there is nothing better out there currently. But they still feel tailored towards vibe-coding instead of professional development.
I really do not want those things in Claude COde - I much prefer choosing my own diff tools etc. and running them in a separate terminal. If they start stuffing too much into the TUI they'd ruin it - if you want all that stuff built in, they have the VS Code integration.
3 replies →
IntelliJ's AI service as a PR summarizer that I have found very helpful
> Claude Code has absolutely zero features that help me review code
Err, doesn’t it have /review?
What’s wrong with using GIT for reviewing the changes?
3 replies →
> where are the professional tools, meant to be used for people who don't want to do vibe-coding, but be heavily assisted by LLMs?
This is what we're building at Brokk: https://brokk.ai/
Quick intro: https://blog.brokk.ai/introducing-lutz-mode/
Did you try Aider?
I did, although a long time ago, so maybe I need to try it again. But it still seems to be stuck in a chat-like interface instead of something tailored to software development. Think IDE but better.
When I think "IDE but better", a Claude Code-like interface is increasingly what I want.
If you babysit every interaction, rather than reviewing a completed unit of work of some size, you're wasting your time second-guessing that the model won't "recover" from stupid mistakes. Sometimes that's right, but more often than not it corrects itself faster than you can.
And so it's far more effective to interact with it far more async, where the UI is more for figuring out what it did if something doesn't seem right, than for working live. I have Claude writing a game engine in another window right now, while writing this, and I have no interest in reviewing every little change, because I know the finished change will look nothing like the initial draft (it did just start the demo game right now, though, and it's getting there). So I review no smaller units of change than 30m-1h, often it will be hours, sometimes days, between each time I review the output, when working on something well specified.
It has a new “watch files” mode where you can work interactively. You just code normally but can send commands to the llm via a special string. Its a great way if interacting with LLMs, if only they where much faster.
1 reply →
If your goal is to edit code and not discuss it aider also supports a watch mode. You can keep adding comments about what you want it to do in a minimal format and it will make changes to the files and you can diff/revert them.
I think Aider is closest to what you want.
The chat interface is optimal to me because you often are asking questions and seeking guidance or proposals as you are making actual code changes. On reason I do like it is that its default mode of operation is to make a commit for each change it makes. So it is extremely clear what the AI did vs what you did vs what is a hodge podge of both.
As others have mentioned, you can integrate with your IDE through the watch mode. It's somewhat crude but still useful way. But I find myself more often than not just running Aider in a terminal under the code editor window and chatting with it about what's in the window.
9 replies →
I created a very unprofessional tool, which apparently does what you want!
While True:
0. Context injected automatically. (My repos are small.)
1. I describe a change.
2. LLM proposes a code edit. (Can edit multiple files simultaneously. Only one LLM call required :)
3. I accept/reject the edit.
> run locally for agentic coding. Nowadays I mostly use GPT-OSS-120b for this
What kind of hardware do you have to be able to run a performant GPT-OSS-120b locally?
RTX Pro 6000, ends up taking ~66GB when running the MXFP4 native quant with llama-server/llama.cpp and max context, as an example. Guess you could do it with two 5090s with slightly less context, or different software aimed at memory usage efficiency.
That has 96GB GDDR7 ECC, to save people looking it up.
The model is 64GB (int4 native), add 20GB or so for context.
There are many platforms out there that can run it decently.
AMD strix halo, Mac platforms. Two (or three without extra ram) of the new AMD AI Pro R9700 (32GB of RAM, $1200), multi consumer gpu setups, etc.
Mbp 128gb.
High quality code is a thing from the past
What matters is high quality specifications including test cases
> High quality code is a thing from the past
Says the person who will find themselves unable to change the software even in the slightest way without having to large refactors across everything at the same time.
High quality code matters more than ever, would be my argument. The second you let the LLM sneak in some quick hack/patch instead of correctly solving the problem, is the second you invite it to continue doing that always.
I dunno...
I have a feeling this will only supercharge the long established industry practice of new devs or engineering leadership getting recruited and immediately criticising the entire existing tech stack, and pushing for (and often succeeding) a ground up rewrite in language/framework de jour. This is hilariously common in web work, particularly front end web work. I suspect there are industry sectors that're well protected from this, I doubt people writing firmware for fuel injection and engine management systems suffer too much from this, the Javascript/Nodejs/NPM scourge _probably_ hasn't hit the PowerPC or 68K embedded device programming workflow. Yet...
"high quality specifications" have _always_ been a thing that matters.
In my mind, it's somewhat orthogonal to code quality.
Waterfall has always been about "high quality specifications" written by people who never see any code, much less write it. Agile make specs and code quality somewhat related, but in at least some ways probably drives lower quality code in the pursuit of meeting sprint deadlines and producing testable artefacts at the expense of thoroughness/correctness/quality.