Comment by UncleOxidant

11 hours ago

That sounds kind of amazing. But you're not actually doing the machine learning in Racket, are you? Is your Racket code generating other code like PyTorch?

I'm doing the learning in racket because the bottleneck is human understanding.

That mnist takes 30 minutes per epoch isn't a worry when I don't even know what vector addition should look like.

  • This is a complete tangent, but since you mentioned MNIST: I accidentally discovered Tsetlin machines this week when someone on r/Julia asked if anyone with an AMD GPU could run the benchmark in their package called Tsetlin.jl. I've got an AMD GPU so I was happy to oblige. Then I looked at what the benchmark was doing: it was training an MNIST classifier to 98% accuracy in 9 seconds - that seemed like a couple of orders of magnitude too fast. I was flabbergasted and wondered what the heck this thing was and that's when I learned about Tsetlin machines. I went on (with the help of Claude) to implement one in an FPGA and again was flabbergasted when it only took 2k LUTs to implement a Tsetlin machine for MNIST classification in hardware.

    • Well yes, you have to use one of the newer mnist variants these days if you want to get anything meaningful. A linear classifier gets something like 87% on the original one.

  • > I don't even know what vector addition should look like.

    I think you're trying to imply you're inventing something new and racket enables you to explore... But what I read (as someone with a PhD in deep learning that has worked on sparsity) is you actually don't know the prior art and you're using racket as an excuse to reinvent a whole bunch of stuff that already exists in plenty of mature libraries in more mundane languages (including python/pytorch). Which is of course fine for personal growth but please don't oversell racket as a "superpower" - to wit I can manipulate any part of my stack too because it's all written in cpp.

    • I once replaced IEEE 754 floating point numbers in a model by balanced ternary floating point numbers.

      It took me 20 minutes.

      Tell me how you'd do that in cpp?