Comment by algoth1

10 hours ago

This really makes me think if it would be feasible to make an llm trained exclusively on toki pona (https://en.wikipedia.org/wiki/Toki_Pona)

4 comments

algoth1

MarkusQ 7 hours ago

There isn't enough training data though, is there? The "secret sauce" of LLMs is the vast amount of training data available + the compute to process it all.

algoth1 5 hours ago
I think you could probably feed a copy of a toki pona grammar book to a big model, and have it produce ‘infinite’ training data
- MarkusQ 2 hours ago
  
  This is essentially a distillation on the bigger model; you'd wind up surfacing a lot of artifacts from the host model, amplifying them in the same way repeated photocopying introduces errors.
  https://dailyai.com/2025/05/create-a-replica-of-this-image-d...
- eden-u4 4 hours ago
  
  There are not enough samples in that book to generate new "infinite" data.