Comment by hiccuphippo
4 hours ago
The article says the LLM has to load 15540 tokens every time, I wonder if that can be reduced while retaining the context maybe with deduplications, removing superfluous words, using shorter expressions with the same meaning or things like that.
No comments yet
Contribute on Hacker News ↗