← Back to context

Comment by swatcoder

20 minutes ago

Which is what many of us have been calling out from the word go.

Because of how they're built and the distribution of samples they're inescapably trained on, they're strongly biased towards demonstrative code that focuses on the task being presented rather than the wider engineering concerns or global context. Even export post-training reinforces this pattern since it's still just optimizing on "is this a quality example of X as prompted"

Without a big new insight in like scale to transformers+LLM's themselves, there's likely decades of work still needed on models, post-training techniques, scaffolds/tools, and prompting before we can don't have to worry about this issue when using them. It takes a long time and a lot of practice for technology to mature, even when it's revolutionary.