← Back to context

Comment by CamperBob2

2 hours ago

Eventually, we are going to figure out how to do more inference with less RAM. There is simply no way that current transformer-based LLMs are the right thing to do. LLMs still rely on emergent properties that no one fully understands, where the sheer quantity of weights and duration of training are the dominant factors driving performance.

There is no reason on God's green earth why a coding model should need to ingest all of Shakespeare, five dozen gluten-free cookbooks, the complete works of Stephen King, and 30 GB of bad fanfic from alt.binaries.furry. Yet for reasons nobody understands, all of that crap is somehow needed in order to achieve the best output quality and accuracy in unrelated fields. This state of ignorance can't last. Language models shouldn't need 10% of the RAM they are taking now.

Every other point you raise is very valid, but I really don't think hardware is going to be the problem that everybody assumes it will be.