Comment by klyrs

2 years ago

I like your question, and I cannot answer it. But I have a benchmark: I can write a Markov chain "language model" in around 10-20 lines of Python, with zero external libraries -- with tokenization and "training" on a text file, and generating novel output. I wrote it in several minutes and didn't bother to save it.

I'm curious how much time & code it would take to implement this LLM stuff at a similar level of quality and performance.

2 comments

klyrs

soraki_soladead 2 years ago

FLOPs by perplexity by samples is an interesting way to compare this family of models.

fennecbutt 2 years ago

Generally LLM architectures are pretty low code, I thought (not written one myself).

Then all of the complexity comes with the training/weight data.