Comment by gardnr
17 hours ago
> The training and deployment of LongCat-2.0 are built on large-scale clusters of tens of thousands of AI ASIC superpods. Compared to the mature Nvidia GPU ecosystem, the supporting software community is still less developed. We have therefore put significant effort into building a stable, secure, and scalable infrastructure.
This is the real news story. It looks like they may have used Huawei Ascend 910C chips: https://nitter.net/teortaxesTex/status/2071708141037781407#m
If they really managed this from pre-training a 1.6 T parameter model through to post-training without NVIDIA, Dwarkesh Patel got what he wanted.
It is interesting how much people doubt Huawei’s capabilities in this area - Jensen does not (in the dp interview) - of course you can dismiss this as him talking his own book.
Who? What did he want?
Dwarkesh Patel has AI/ML guests on his podcast. BoorishBears may have been referring to the Jensen Huang episode where they discuss TPUs: https://youtu.be/Hrbq66XqtCo?t=982
2 replies →
[flagged]