Comment by SaberTail
16 days ago
I'd speculate we had a few factors working against us that made us hit the "limit" sooner.
Several different engineering teams from different parts of the company had to come together for this, and the overall architecture was modular, so there was a lot of complexity before we had to start integrating. We have some company-wide standards and conventions, but they don't cover everything. To work on the code, you might need to know module A does something one way and module B does it in a different way because different teams were involved. That was implicit in how human engineers worked on it, and so it wasn't explicitly explained to the coding agents.
The project was in the life sciences space, and the quality of code in the training data has to be worse than something like a B2B SaaS app. A lot of code in the domain is written by scientists, not software engineers, and only needs to work long enough to publish the paper. So any code an LLM writes is going to look like that by default unless an engineer is paying attention.
I don't know that either of those would be insurmountable if the company were willing to burn more tokens, but I'd guess it's an order of magnitude more than we spent already.
There are politics as well. There have been other changes in the company, and it seems like the current leadership wants to free up resources to work on completely different things, so there's no will to throw more tokens at untangling the mess.
I don't disbelieve the success stories, but I think most of them are either at the level of following already successful patterns instead of doing much novel, or from companies with much bigger budgets for inference. If Anthropic burns a bunch of money to make a C compiler, they can make it back from increased investor hype, but most companies are not in that position.
No comments yet
Contribute on Hacker News ↗