← Back to context

Comment by rco8786

5 days ago

> Instead, use modular methods like retrieval-augmented generation, adapters, or prompt-engineering — these techniques inject new information without damaging the underlying model’s carefully built ecosystem.

So obviously this is what most of us are already doing, I would venture. But there's a pretty big "missing middle" here. RAG/better prompts serve to provide LLMs with the context they need for a specific task, but are heavily limited by context windows. I know they've been growing quite a bit, but from my usage it still seems that things further back in the window get forgotten about pretty regularly.

Fine tuning was always the pitch for the solution to that. By baking the "context" you need directly into the LLM. Very few people or companies are actually doing this though, because it's expensive and you end up with an outdated model by the time you're done...if you even have the data you need to do it in the first place.

So where we're left is basically without options for systems that need more proprietary knowledge than we can reasonably fit into the context window.

I wonder if there's anyone out there attempting to do some sort of "context compression". An intermediary step that takes our natural language RAG/prompts/context and compresses it into a data format that the LLM can understand (vectors of some sort?) but are a fraction of the tokens that the natural language version would take.

edit After I wrote this I fed this into chatgpt and asked if there were techniques i am missing. It introduced me to Lora (which I suppose are the "adapters" mentioned in the OP). and now I have a whole new rabbithole to climb down. AI is pretty cool sometimes.