Comment by benreesman

1 day ago

They're remarkably useless on stuff they've seen but not had up-weighted in the training set. Even the best ones (Opus 4 running hot, Qwen and K2 will surprise you fairly often) are a net liability in some obscure thing.

Probably the starkest example of this is build system stuff: it's really obvious which ones have seen a bunch of `nixpkgs`, and even the best ones seem to really struggle with Bazel and sometimes CMake!

The absolute prestige high-end ones running flat out burning 100+ dollars a day and it's a lift on pre-SEO Google/SO I think... but it's not like a blowout vs. a working search index. Back when all the source, all the docs, and all the troubleshooting for any topic on the whole Internet were all above the fold on Google? It was kinda like this: type a question in the magic box and working-ish code pops out. Same at a glory-days FAANG with the internal mega-grep.

I think there's a whole cohort or two who think that "type in the magic box and code comes out" is new. It's not new, we just didn't have it for 5-10 years.