Comment by nullbio
18 hours ago
Times are changing. The open-weight models have needed time to catch up, but they're finally at a point now where we can get almost frontier level capabilities for coding.
I just wish we had a way to actually benchmark them properly though. Still seems no one has solved the problem of software architecture, brittleness and bloat as the codebase grows. Models love to add stuff, but they rarely clean up as they go. In a perfect world they'd do both near equally as they're developing.
It would be nice if there was an "architecture quality" benchmark that distilled the essence of what it means to have a good architecture, but I suppose that's an open research question with a lot of variables? Like how is good architecture actually quantified and measured? Is there a mechanism that can be re-used across all codebases to clearly denote one that is good and one that is bad, or is it highly subjective and depend on the lens you're looking at it from? Is there a lot more to it than just "how much refactoring effort is required to extend this in the future?".
Surely this is something that has been well researched - yet I never really hear anything about it. Makes me wonder why.
> Surely this is something that has been well researched - yet I never really hear anything about it. Makes me wonder why.
Occam’s razor rings true here: where’s the money in it?