← Back to context

Comment by andrewstuart

1 year ago

AMD Strix Halo APU is a CPU with very powerful integrated GPU.

It’s faster at AI than an Nvidia RTX4090, because 96GB of the 128GB can be allocated to the GPU memory space. This means it’s doesn’t have the same swapping/memory thrashing that a discrete GPU experiences when processing large models.

16 CPU cores and 40 GPU compute units sounds pretty parallel to me.

Doesn’t that fit the bill?

> It’s faster at AI than an Nvidia RTX4090, because 96GB of the 128GB can be allocated to the GPU memory space

No definitely. RTX4090 definitely use fast graphics RAM (though it is usually previous generation, but overclocked and very wide bus). AMD Strix Halo definitely use standard DDR5 which is not so fast.

And yes, Strix Halo GPU using "3dcache", but as officials said, CPU don't have access to GPU cache, because "have not seen any app significantly benefited from such access".

So probably, internal SoC bus should have less delay than discrete GPU on PCIe, but not too much different.

It looks like it will be available in the Framework Desktop! I would love to see it in a more budget mini PC at some point from another company. (Framework is great but not in my price range.)

> It’s faster at AI than an Nvidia RTX4090, because 96GB of the 128GB can be allocated to the GPU memory space

I love AMD's Ryzen chips and will recommend their laptops over an Nvidia model all day. However, this is a pretty facetious comparison that falls apart when you normalize the memory. Any chip can be memory bottlenecked, and if we take away that arbitrary precondition the Strix Halo gets trounced in terms of compute capacity. You can look at the TDP of either chip and surmise this pretty easily.

  • > However, this is a pretty facetious comparison that falls apart when you normalize the memory

    Why would you normalize though? You can't buy a 96 GB RTX4090. So it's fair to compare the whole deal, slowish APU with large RAM versus very fast GPU with limited RAM.

  • “ AMD also claims its Strix Halo APUs can deliver 2.2x more tokens per second than the RTX 4090 when running the Llama 70B LLM (Large Language Model) at 1/6th the TDP (75W).”

    https://www.tomshardware.com/pc-components/cpus/amd-slides-c...

    You could argue it’s invalid claim because it’s from AMD not independent.

    • This is still a memory-constrained benchmark. The smallest Llama 70B model (gguf-q2) doesn't fit in-memory so is bottlenecked by your PCIe connector. It's a valid benchmark, but it's still guilty of being stacked in the exact way I described before.

      A comparison of 7B/13B/32B model performance would actually test the compute performance of either card. AMD is appealing to the consumers that don't feel served by Nvidia's gaming lineup, which is fine but also doomed if Nvidia brings their DGX Spark lineup to the mobile form factor.