Comment by ACCount37

2 months ago

There are a few weirder training methods that involve wiring explicit bits of knowledge into the model.

I imagine that if you use them hard enough with the same exact text, you can attain full word for word memorization. This may be intentional, or a side effect of trying to wire other knowledge into the model while this document is also loaded into the context.