Comment by littlestymaar
5 days ago
Re-using a comment a wrote some time ago:
Tenstorrent really needs to put more VRAM on their cards.
If chinese companies can hack Nvidia GPUs with 48 or 96GB vram at a competitive price, surely Tensorrent can too.
Variants of n300d at $2500 for 48GB and $3900 for 96GB would be instant hits.
~~24GB for $1500 simply isn't gonna do it.~~ (old part of the comment related to the old n300 which can be update with: 32B for $1400 still isn't enough for success. There's some progress, but that's still too low considering it's exotic hardware that will lead to tons of compatibility issues).
Dedicated memory isn't the issue. Increase DRAM on your card and your bandwidth goes down; increase the bandwidth and your price increases reciprocally. The solution isn't to just solder more memory anywhere it fits, these are well-paid engineers that are working to optimize a complex problem space. The Chinese board fluxers are working with a different class of hardware that regularly ships with dark silicon, binned hardware and die-chopped configurations.
You'll note that Apple didn't just immediately resume shipping systems with 1.5TB of RAM when they revised their own system architecture. It's taken them half a decade to recoup a third of that capacity at the VRAM-level speeds they require to unify the GPU and CPU's memory.
> Dedicated memory isn't the issue.
To run large MoE models it is.
> Increase DRAM on your card and your bandwidth goes down
Why would it?
> You'll note that Apple didn't just immediately resume shipping systems with 1.5TB of RAM when they revised their own system architecture. It's taken them half a decade to recoup a third of that capacity at the VRAM-level speeds they require to unify the GPU and CPU's memory
I fail to see how a unified architecture on a general purpose CPU is a good illustration when we're discussing PCIe accelerator cards. The problems they face have little in common.
It's 32GB for $1300 for Blackhole p150a([1]). The rest of your point is very true.
1. https://tenstorrent.com/hardware/blackhole
I updated my comment accordingly, I had just copy-pasted a comment of mine of Reddit from a few days ago but this part needed an update.
The new p150 cards linked in the OP have 32GB GDDR6 @ 512GB/s for $1,300. Which isn't bad on paper, I guess. They're meant to be networked (quad 800GB QSFP-DD) like Nvidia GPUs, so two of them would get you 64GB of VRAM at $2600 for ~600W which is basically what you're asking for? The power usage isn't good enough yet at scale I think, but for a workstation it's quite manageable.
Real workloads remain to be seen, but if they can actually get a working build of vLLM and their cards remain actually buyable, well, they're doing better than some of the competition...
> so two of them would get you 64GB of VRAM at $2600 for ~600W which is basically what you're asking for?
Almost, except with respect to space in the box and power usage, which are critical IMHO.
> but if they can actually get a working build of vLLM and their cards remain actually buyable, well, they're doing better than some of the competition...
That's a big if though, poor software support is to be expected and you'll need to factor that in IMHO, and that's why they need to beef up the memory. Of course if software support is stellar then it may be good enough of a deal.
It's not meant as a workstation/tinkering system, the card without networking is not the main aim. If you're willing to pay 4k for 96GB, just get 3 with networking.
That said, it missed the boat on MoE. The future is two tiered memory systems, NVIDIA has already announced they are doing that. Ideally these cards should have 4-8 DIMM slots for a couple channels of DDR5.
That would also make them far more useful for workstations/tinkering.
> It's not meant as a workstation
They are literally selling a workstation built around these chips though: https://tenstorrent.com/hardware/tt-quietbox