← Back to context

Comment by bigyabai

1 year ago

This is still a memory-constrained benchmark. The smallest Llama 70B model (gguf-q2) doesn't fit in-memory so is bottlenecked by your PCIe connector. It's a valid benchmark, but it's still guilty of being stacked in the exact way I described before.

A comparison of 7B/13B/32B model performance would actually test the compute performance of either card. AMD is appealing to the consumers that don't feel served by Nvidia's gaming lineup, which is fine but also doomed if Nvidia brings their DGX Spark lineup to the mobile form factor.