Comment by jmward01
20 hours ago
Sounds fun/stressful/rewarding. I'm most interested in the update at the end though 'Launch was a success! 400K+ views, and multiple companies reached to use my IP.' I too, like probably 1 in 5 of the people reading this, think I have figured out some major problems with LLMs (context and computation research) but have wondered the best way to 'release' and get value out of it. I can see training being a little easier in that you release weights against a known model arch but not the training code. Wy stuff is all custom layers though. Any thoughts on a release strategy where you need to release the layer code for people to see test weights/the benefits?
My first advice is to have a test set with clear improvement, and a clear "wow" demo use case. There are lots of "breakthroughs" that seem good but aren't (e.g. some new architecture that doesn't mask past tokens correctly and leaks information), so people will assume it is wrong. To prevent this, you need to be extremely rigorous in your launch materials. If you can make it into a product that people can try out themselves, that goes a long way. You don't need to open source any code (I haven't yet) if people can try it out some other way like a demo website. Good luck! Ping me if you want to chat more