Comment by dennysora-main
3 days ago
Recently, I started a personal project to build an LLM from zero.
I've spent a ton of time reading up on math, ML, and DL through books, open courses, and papers, while also studying all the major open-source LLM architectures.
Since I only have one DGX Spark machine to run experiments, I can't train a massive LLM from the get-go. Instead, I'm experimenting with an auto-scaling parameter mechanism, which has led me to create a pretty unconventional and fun architecture!
Why go through all this effort when modern LLMs can basically write simple LLMs themselves, and I clearly can't out-compute the big tech giants?
Honestly, it's because I'm obsessed with the core mechanics of LLMs. I want to build something exclusively for myself and hopefully discover some completely undiscovered mechanisms along the way.
Just keeping a record and sharing my progress—having fun with it is truly the biggest reward!
I'll share it when I get a chance!
Do share! I read all the blog posts where people share their experiences of building small scale LLMs "from scratch".
[dead]
Most hobbyists rent the compute for training models instead of needing to purchase it all out right.
It's mainly just my personal preference to run a local machine. It gives me better privacy and security, and I can keep all my heavy data and projects right there.
Cloud rentals are usually billed hourly. Since I constantly tweak the architecture and run it again, having a local rig completely kills any cost anxiety—it's just a one-off payment.
Plus, regular users can't even get access to H100s anyway. I applied on AWS and GCP before and couldn't get them.