Comment by chao-

20 hours ago

Crazy to think that my first personal computer's entire storage (was 160MB IIRC?) could fit into the L3 of a single consumer CPU!

It's probably not possible architecturally, but it would be amusing to see an entire early 90's OS running entirely in the CPU's cache.

30 comments

chao-

cwzwarich 20 hours ago

https://github.com/coreboot/coreboot/blob/main/src/soc/intel...

wmf 19 hours ago
Context: Early in the firmware boot process the memory controller isn't configured yet so the firmware uses the cache as RAM. In this mode cache lines are never evicted since there's no memory to evict them to.
- lathiat 18 hours ago
  
  I remember the talk about the Wii/WiiU hacking they intentionally kept the early boot code in cache so that the memory couldn’t be sniffed or modified on the ram bus which was external to the CPU and thus glitchable.
- coppsilgold 16 hours ago
  
  There may be server workloads for which the L3 cache is sufficient, would be interesting if it made sense to create boards for just the CPU and no memory at scale.
  I imagine for such a workload you can always solder a small memory chip to avoid having to waste L3 on unused memory and a non-standard booting process so probably not.
  
  1 reply →

pwg 20 hours ago

In my case it began with 16K (yes, 161024 bytes) and 90K (yes, 901024 bytes) 5.25" floppy disks (although the floppies were a few months after the computer). Eventually upgraded to 48K RAM and 180K double density floppy disks. The computer: Atari 800.

MegaDeKay 20 hours ago
I'll see your Atari 800 and raise you my Atari 2600 with its whopping 128 bytes of RAM. Bytes with a B. I can kinda sorta call it a computer because you could buy a BASIC cartridge for it (I didn't and stand by that decision - it was pretty bad).
- acomjean 12 hours ago
  
  I thought the timex Sinclair 1000 win 2 Kbytes of ram was bad.
  The membrane keyboard wasn’t great (the lack of a space bar was a wierd choice) but it did work. We had programs on casette and did get the 16Kbyte memory expansion.
  https://en.wikipedia.org/wiki/Timex_Sinclair_1000
  I didn’t realize the Atari 2600 had basic, always thought of it as a game console.
  
  1 reply →

HerbManic 18 hours ago

My first PC had a 20MB HDD with 512Kb of RAM. So yeah that could fit into cache 10 times now.

compounding_it 19 hours ago

Maybe in 50 years the cache of CPUs and GPUs will be 1TB. Enough to run multiple LLMs (a model entirely run for each task). Having robots like in the movies would need LLMs much much faster than what we see today.

nextaccountic 5 hours ago

doubtful that we will still have this computer architecture by then

basilikum 20 hours ago

KolibriOS would fit in there, even with the data in memory. You cannot load it into the cache directly, but when the cache capacity is larger than all the data you read there should be no cache eviction and the OS and all data should end up in the cache more or less entirely. In other words it should be really, really fast, which KolibriOS already is to begin with.

RiverCrochet 1 hour ago

I thought there was an MSR buried deep somewhere that enables "Cache as RAM" mode and basically maps the cache into the memory address space or something like that.
Lol a quick Google search leads me to a Linked in post with all the gory technical details?
https://www.linkedin.com/pulse/understanding-x86-cpu-cache-m...
vlovich123 20 hours ago
Unless you lay everything out continuously in memory, you’ll still get cache eviction due to associativty and depending on the eviction strategy of the CPU. But certainly DOS or even early Windows 95 could conceivably just run out of the cache
- tadfisher 19 hours ago
  
  Windows 95 only needed 4MB RAM and 50 MB disk, so that's certainly doable. The trick is to have a hypervisor spread that allocation across cache lines.
- chao- 20 hours ago
  
  Yeah, cache eviction is the reason I was assuming it is "probably not possible architecturally", but I also figured there could be features beyond my knowledge that might make it possible.
  Edit: Also this 192MB of L3 is spread across two Zen CCDs, so it's not as simple as "throw it all in L3" either, because any given core would only have access to half of that.
- basilikum 20 hours ago
  
  Well, yeah, reality strikes again. All you need is an exploit in the microcode to gain access to AMD's equivalent to the ME and now you can just map the cache as memory directly. Maybe. Can microcode do this or is there still hardware that cannot be overcome by the black magic of CPU microcode?
hrmtst93837 13 hours ago

That assumes KolibriOS or any major component is pinned to one core and one cache slice instead of getting dragged between CCDs or losing memory affinity. Throw actual users, IO, and interrupts at it and you get traffic across chiplets, or at least across L3 groups, so the nice 'everything lives in cache' story falls apart fast.
Nice demo, bad model. The funny part is that an entire OS can fit in cache now, the hard part is making the rest of the system act like that matters.

shric 19 hours ago

You had ~160,000 times more storage than I did for my first personal computer.

defrost 15 hours ago

Commodore PET for me - 8 KB of RAM and all the data you could store and read back from a TDK 120 cassette tape . . .

* https://en.wikipedia.org/wiki/Commodore_PET

Same time as the Trash-80 and BBC micro were making inroads.

bombcar 20 hours ago

IIRC some relatively strange CPUs could run with unbacked cache.

twbarr 20 hours ago

Intel's platform, at the very least, use cache-as-ram during the boot phase before the DDR interface can be trained and started up. https://github.com/coreboot/coreboot/blob/main/src/soc/intel...

alfiedotwtf 15 hours ago

> it would be amusing to see an entire early 90's OS running entirely in the CPU's cache.

There’s actually already two running (MINIX and UEFI), and it’s the opposite OS amusing - https://www.zdnet.com/article/minix-intels-hidden-in-chip-op...

m463 20 hours ago

I wonder how much faster dos would boot, especially with floppy seek times...

userbinator 20 hours ago

Instantly.
If you run a VM on a CPU like this, using a baremetal hypervisor, you can get very close to "everything in cache".
RulerOf 18 hours ago

You can get close with a VM, but there's overhead in device emulation that slows things down.
Consider a VM where that kind of stuff has been removed, like the firecracker hypervisor used for AWS Lambda. You're talking milliseconds.

tumdum_ 13 hours ago

My first pc had 40MB hrs and 8MB ram :D

amelius 14 hours ago

640K ought to be enough for anybody.

Zardoz84 15 hours ago

My first computer whole RAM could fit in L1 of a single core (128k)