Comment by politelemon
7 hours ago
> By the end of training, the model produces names like "kamon", "karai", "anna", and "anton". None of them are copies from the dataset.
Hey, I am able to see kamon, karai, anna, and anton in the dataset, it'd be worth using some other names: https://raw.githubusercontent.com/karpathy/makemore/988aa59/...
You are absolutely right. The whole post reads like AI generated.
The rate they are posting new articles on random subjects is also a pretty indicative of a content mill.
In 3 days they've covered machine learning, geometry, cryptography, file formats and directory services.
I had to look up what a content mill is. I'm not one, I think. It's "random" stuff because my interests are different. These posts are not written sequentially, I've been working on them (except for this MicroGPT one) for weeks and only publishing now.
1 reply →
I didn't get that sense from the prose; it didn't have the usual LLM hallmarks to me, though I'm not enough of an expert in the space to pick up on inaccuracies/hallucinations.
The "TRAINING" visualization does seem synthetic though, the graph is a bit too "perfect" and it's odd that the generated names don't update for every step.
For me it was the prose that alarmed me. Short sentences, aggressive punctuation, desperately trying to keep you engaged. It is totally possible to ask the model to choose a different style - I think that's either the default or corresponds to tastes of the content creators
ISWYDT
Thanks, will fix