← Back to context

Comment by Neywiny

3 days ago

32 cores on a die, 256 on a package. Still stunning though

How do people use these things? Map MPI ranks to dies, instead of compute nodes?

  • Yeah, there's an option to configure one NUMA node per CCD that can speed up some apps.

  • MPI is fine, but have you heard of threads?

    • Sure, the conventional way of doing things is OpenMP on a node and MPI across nodes, but

      * It just seems like a lot of threads to wrangle without some hierarchy. Nested OpenMP is also possible…

      * I’m wondering if explicit communication is better from one die to another in this sort of system.

      6 replies →