← Back to context

Comment by BoorishBears

14 hours ago

If they really managed this from pre-training a 1.6 T parameter model through to post-training without NVIDIA, Dwarkesh Patel got what he wanted.

It is interesting how much people doubt Huawei’s capabilities in this area - Jensen does not (in the dp interview) - of course you can dismiss this as him talking his own book.

Who? What did he want?

  • Dwarkesh Patel has AI/ML guests on his podcast. BoorishBears may have been referring to the Jensen Huang episode where they discuss TPUs: https://youtu.be/Hrbq66XqtCo?t=982

    • Specifically Dwarkesh couldn't understand that GPUs are not enough: it's GPUs plus multiple ecosystems to leverage them at massive scale during training vs inference.

      Instead of giving China open access to US controlled chips and creating a misalignment between labs that want to train a model on whatever is best, and hardware manufacturers that need labs to suffer the growing pains for their new ecosystems built from scratch... we removed the option from the board and now they've beat the growing pains decisively, with a speed that reflects the non-optionality.

      1 reply →