Comment by BoorishBears

14 hours ago

If they really managed this from pre-training a 1.6 T parameter model through to post-training without NVIDIA, Dwarkesh Patel got what he wanted.

5 comments

BoorishBears

chvid 12 hours ago

It is interesting how much people doubt Huawei’s capabilities in this area - Jensen does not (in the dp interview) - of course you can dismiss this as him talking his own book.

Jabrov 14 hours ago

Who? What did he want?

gardnr 13 hours ago
Dwarkesh Patel has AI/ML guests on his podcast. BoorishBears may have been referring to the Jensen Huang episode where they discuss TPUs: https://youtu.be/Hrbq66XqtCo?t=982
- BoorishBears 7 hours ago
  
  Specifically Dwarkesh couldn't understand that GPUs are not enough: it's GPUs plus multiple ecosystems to leverage them at massive scale during training vs inference.
  Instead of giving China open access to US controlled chips and creating a misalignment between labs that want to train a model on whatever is best, and hardware manufacturers that need labs to suffer the growing pains for their new ecosystems built from scratch... we removed the option from the board and now they've beat the growing pains decisively, with a speed that reflects the non-optionality.
  
  1 reply →