← Back to context

Comment by dennysora-main

3 days ago

Recently, I started a personal project to build an LLM from zero.

I've spent a ton of time reading up on math, ML, and DL through books, open courses, and papers, while also studying all the major open-source LLM architectures.

Since I only have one DGX Spark machine to run experiments, I can't train a massive LLM from the get-go. Instead, I'm experimenting with an auto-scaling parameter mechanism, which has led me to create a pretty unconventional and fun architecture!

Why go through all this effort when modern LLMs can basically write simple LLMs themselves, and I clearly can't out-compute the big tech giants?

Honestly, it's because I'm obsessed with the core mechanics of LLMs. I want to build something exclusively for myself and hopefully discover some completely undiscovered mechanisms along the way.

Just keeping a record and sharing my progress—having fun with it is truly the biggest reward!

I'll share it when I get a chance!

Most hobbyists rent the compute for training models instead of needing to purchase it all out right.

  • It's mainly just my personal preference to run a local machine. It gives me better privacy and security, and I can keep all my heavy data and projects right there.

    Cloud rentals are usually billed hourly. Since I constantly tweak the architecture and run it again, having a local rig completely kills any cost anxiety—it's just a one-off payment.

    Plus, regular users can't even get access to H100s anyway. I applied on AWS and GCP before and couldn't get them.