Comment by slim
5 hours ago
You can compete by being smart and using less-than-sota models and build a more solid business around them
5 hours ago
You can compete by being smart and using less-than-sota models and build a more solid business around them
I use whatever model is SOTA. I switch between them in order to avoid lock in.
>I use whatever model is SOTA. I switch between them in order to avoid lock in.
What's your competitive edge here? Shaving off an hour of a feature delivery? Not having to see the code that is produced?
Not sure about OP, I usually make Opus 4.8 on Extra thinking level implement features for me on a specific project, while I'm busy with other stuff.
For a change, I let DeepSeek V4 Pro implement it on Max thinking level. Nothing too out there - some DB migrations, some Django back end changes and Vue SPA front end changes.
Implementation time in total including tests was a few hours, so nothing too egregious. However, one of the migrations would break with pre-existing data, one of the column references in the entity was wrong, the API endpoint wasn't made consistently with the others in adjacent code (e.g. permission checks) and the front end had a Pinia state related issue and submitting one of the forms didn't work.
Tooling was run, ruff, ty, Oxfmt, Oxlint, also Docker build was green across the board, but the overall feature just didn't work. In both cases, sub-agents with clear context would review the code for serious/critical issues, at least three in parallel and do review loops until they spot nothing.
Opus spent another hour fixing it, needed a few iterations, because I couldn't be bothered there.
> What's your competitive edge here? Shaving off an hour of a feature delivery? Not having to see the code that is produced?
The difference largely was not needing to waste time in fixing all sorts of subtle bugs that sub-optimal models will produce, worse yet if it was some sort of a serious project and those wouldn't have been spotted but instead that slop would have gotten shipped.
That said, Opus isn't ideal either and messed up a whole bunch when I was training some neural nets and try to process a bunch of satellite data and configure Garage to store them so that tiles can be served from a slow HDD and stuff like that.
I think that DeepSeek V4 Pro and GLM 5.2 are cool though, it's just that you want as many checks and tests as you can throw at any given problem, or use languages that make shipping completely broken code increasingly likely.