Comment by londons_explore
2 years ago
The embedding method that nearly all LLM's use puts them at a severe disadvantage because they can't 'see' the spelling of common words. That makes it hard to infer things like 'past tense words end with an e'.
With small modifications, the exact characters could be exposed to the model, in addition to the current tokens, but it would require a full retraining, which would cost $$$$$$$$.
You remind me of the ELMo architecture.
https://paperswithcode.com/method/elmo
So, next week on HF?