Comment by politelemon

2 months ago

> By the end of training, the model produces names like "kamon", "karai", "anna", and "anton". None of them are copies from the dataset.

Hey, I am able to see kamon, karai, anna, and anton in the dataset, it'd be worth using some other names: https://raw.githubusercontent.com/karpathy/makemore/988aa59/...

16 comments

politelemon

Reply

ayhanfuat 2 months ago

You are absolutely right. The whole post reads like AI generated.

jsheard 2 months ago
The rate they are posting new articles on random subjects is also a pretty indicative of a content mill.
In 3 days they've covered machine learning, geometry, cryptography, file formats and directory services.
- jsheard 2 months ago
  
  Addendum - now they've changed the dates of several articles retroactively, to increase the spacing.
- growingswe 2 months ago
  
  I had to look up what a content mill is. I'm not one, I think. It's "random" stuff because my interests are different. These posts are not written sequentially, I've been working on them (except for this MicroGPT one) for weeks and only publishing now.
  
  6 replies →
- 5o1ecist 2 months ago
  
  [flagged]
re 2 months ago
I didn't get that sense from the prose; it didn't have the usual LLM hallmarks to me, though I'm not enough of an expert in the space to pick up on inaccuracies/hallucinations.
The "TRAINING" visualization does seem synthetic though, the graph is a bit too "perfect" and it's odd that the generated names don't update for every step.
- oytis 2 months ago
  
  For me it was the prose that alarmed me. Short sentences, aggressive punctuation, desperately trying to keep you engaged. It is totally possible to ask the model to choose a different style - I think that's either the default or corresponds to tastes of the content creators
butterisgood 2 months ago

ISWYDT

growingswe 2 months ago

Thanks, will fix