← Back to context

Comment by huijzer

1 year ago

What is a bit weird about AI currently is that you basically always want to run the best model, but the price of the hardware is a bit ridiculous. In the 1990s, it was possible to run Linux on scrappy hardware. You could also always run other “building blocks” like Python, Docker, or C++ easily.

But the newest AI models require an order of magnitude more RAM than my system or the systems I typically rent have.

So I’m curious to people here, has this in the history of software happened before? Maybe computer games are a good example. There people would also have to upgrade their system to run the latest games.

Like AI, there were exciting classes of applications in the 70s, 80s and 90s that mandated pricier hardware. Anything 3D related, running multi-user systems, higher end CAD/EDA tooling, and running any server that actually got put under “real” load (more than 20 users).

If anything this isn’t so bad: $4K in 2025 dollars is an affordable desktop computer from the 90s.

  • The thing is I'm not that interested in running something that will run on a $4K rig. I'm a little frustrated by articles like this, because they claim to be running "R1" but it's a quantized version and/or it has a small context window... it's not meaningfully R1. I think to actually run R1 properly you need more like $250k.

    But it's hard to tell because most of the stuff posted is people trying to do duct tape and bailing wire solutions.

    • I can run the 671B-Q8 version of R1 with a big context on a used dual-socket Xeon I bought for about $2k with 768GB of RAM. It gets about 1-1.5 tokens/sec, which is fine to give it a prompt and just come back an hour or so later. To get to many 10s of tokens/sec, you would need >8 GPUs with 80GB of HBM each, and you're probably talking well north of $250k. For the price, the 'used workstation with a ton of DDR4' approach works amazingly well.

    • If you google, there is a $6k setup for the non-quantized version running like 3-4 tps.

  • Indeed, even design and prepress required quite expensive hardware. There was a time when very expensive Silicone Graphics workstations were a thing.

Of course it has. Coughs in SGI and advanced 3D and video software like PowerAnimator, Softimage, Flame. Hardware + software combo starting around 60k of 90's dollars, but to do something really useful with it you'd have to enter 100-250k of 90's dollars range.

> What is a bit weird about AI currently is that you basically always want to run the best model,

I think the problem is thinking that you always need to use the best LLM. Consider this:

- When you don't need correct output (such as when writing a blog post, there's no right/wrong answer), "best" can be subjective.

- When you need correct output (such as when coding), you always need to review the result, no matter how good the model is.

IMO you can get 70% of the value of high end proprietary models by just using something like Llama 8b, which is runnable on most commodity hardware. That should increase to something like 80% - 90% when using bigger open models such as the newly released "mistral small 3"

  • With o1 I had a hairy mathematical problem recently related to video transcoding. I explained my flawed reasoning to o1, and it was kind of funny in that it took roughly the same amount of time to figure out the flaw in my reasoning, but it did, and it also provided detailed reasoning with correct math to correct me. Something like Llama 8b would've been worse than useless. I ran the same prompt by ChatGPT and Gemini, and both gave me sycophantic confirmation of my flawed reasoning.

    > When you don't need correct output (such as when writing a blog post, there's no right/wrong answer), "best" can be subjective.

    This is like, everything that is wrong with the Internet in a single sentence. If you are writing a blog post, please write the best blog post you can, if you don't have a strong opinion on "best," don't write.

    • This isn’t he best comment I’ve seen on HN, you should delete it, or stop gatekeeping.

  • for coding insights / suggestions as you type, similar to copilot, i agree.

    for rapidly developing prototypes or working on side projects, i find llama 8b useless. it might take 5-6 iterations to generate something truly useful. compared to say 1-shot with claude sonnet 3.5 or open ai gpt-4o. that’s a lot less typing and time wasted.

I'm not sure Linux is the best comparison; it was specifically created to run on standard PC hardware. We have user access to AI models for little or no monetary cost, but they can be insanely expensive to run.

Maybe a better comparison would be weather simulations in the 90s? We had access to their outputs in the 90s but running the comparable calculations as a regular Joe might've actually been impossible without a huge bankroll.

  • Or 3D rendering, or even particularly intense graphic design-y stuff I think, right? In the 90’s… I mean, computers in the $1k-$2k range were pretty much entry level, right?

The early 90's and digital graphic production. Computer upgrades could make intensive alterations interactive. This was true of photoshop and excel. There were many bottle necks to speed. Upgrade a network of graphic machines from 10mbit networking to 100mbit did wonders for server based workflows.

well, if there was e.g. a model trained for coding - i.e. specialization as such, having models trained mostly for this or that - instead of everything incl. Shakespeare, the kitchen sink and the cockroaches biology under it, that would make those runable on much low level hardware. But there is only one, The-Big-Deal.. in many incarnations.

Read “masters of doom”, they go into quite some detail on how Carmack got himself a very expensive work station to develop Doom/Quake.

We finally enter an era where the demand for more memory is really needed. Small local ai models will be used for many things in the near future. Requiring lots of memory. Even phones will be in the need for terabytes of fast memory in the future.

In the 90's it was really expensive to run 3D Studio or POVray. It could take days to render a single image. Silicon Graphics workstations could do it faster but were out of the budget of non professionals.

Raytracing decent scenes was a big CPU hog in the 80s/90s for me. I'd have to leave single frames running overnight.

How were you running Docker in the 1990s?

  • > you basically always want to run the best model, but the price of the hardware is a bit ridiculous. In the 1990s, it was possible to run Linux on scrappy hardware. You could also always run other “building blocks” like Python, Docker, or C++ easily

    = "When you needed to run common «building blocks» (such as, in other times, «Python, Docker, or C++» - normal fundamental software you may have needed), even scrappy hardware would suffice in the '90s"

    As a matter of facts, people would upgrade foremostly for performance.

  • Heh. I caught that too, and was going to say "I totally remember running Docker on Slackware on my 386DX40. I had to upgrade to 8MB of RAM. Good times."