Comment by ggregoire

17 days ago

> scaled up by increasing the instance size

I always wondered what kind of instance companies at that level of scalability are using. Anyone here have some ideas? How much cpu/ram? Do they use the same instance types available to everyone, or does AWS and co offer custom hardware for these big customers?

9 comments

ggregoire

jiggawatts 17 days ago

The major hyperscalers all offer a plethora of virtual machines SKUs that are essentially one entire two-socket box with many-core CPUs.

For example, Azure Standard_E192ibds_v6 is 96 cores with 1.8 TB of memory and 10 TB of local SSD storage with 3 million IOPS.

Past those "general purpose" VMs you get the enormous machines with 8, 16, or even 32 sockets.[1] These are almost exclusively used for SAP HANA in-memory databases or similar ERP workloads.

Azure Standard_M896ixds_24_v3 provides 896 cores, 32 TB of memory, and 185 Gbps Ethernet networking. This is generally available, but you have to allocate the quota through a support ticket and you may have to wait and/or get your finances "approved" by Microsoft. Something like this will set you back [edited] $175K per month[/edited]. (I suspect OpenAI is getting a huge effective discount.)

Personally, I'm a fan of "off label" use of the High Performance Compute (HPC) sizes[2] for database servers.

The Standard_HX176rs HPC VM size gives you 176 cores and 1.4 TB of memory. That's similar to the E-series VM above, but with a higher compute-to-memory ratio. The memory throughput is also way better because it has some HBM chips for L3 (or L4?) cache. In my benchmarks it absolutely smoked the general-purpose VMs at a similar price point.

[1] https://learn.microsoft.com/en-us/azure/virtual-machines/siz...

[2] https://learn.microsoft.com/en-us/azure/virtual-machines/siz...

antonkochubey 17 days ago
> Something like this will set you back $30K-$60K per year
lol, no, cloud is nowhere near that good value. It’s $3.5M annually.
> The Standard_HX176rs HPC VM size gives you 176 cores and 1.4 TB of memory
This one is $124k per year.
- jiggawatts 17 days ago
  
  Thanks for the correction, fixed.
  I noticed that the M896i is so obscure and rarely used that there are typos associated with it everywhere including the official docs! In once place is says it has 23 TB of memory when it actually has 32 TB.
manquer 17 days ago
On the AWS side there are "HANA certified" instances that max out at 1920 cores and 32 TB RAM - u7inh-32tb.480xlarge
https://docs.aws.amazon.com/sap/latest/general/sap-hana-aws-...
- jiggawatts 17 days ago
  
  I'm pretty sure both Azure and AWS are merely reselling the same HPE Compute Scale-up Server 3200 chassis with some variations. Azure seems to have only the 16-socket model, but AWS has the 32-socket model.
  That AWS instance uses these 60-core processors: https://www.intel.com/content/www/us/en/products/sku/231747/...
  To anyone wondering about these huge memory systems: avoid them if at all possible! Only ever use these if you absolutely must.
  For one, these systems have specialised parts that are more expensive per unit compute: $283 per CPU core instead of something like $85 for a current-gen AMD EPYC, which are also about 2x as fast as the older Intel Scalable Xeons that need to go into this chassis! So the cost efficiency ratio is something like 6:1 in favour of AMD processors. (The cost of the single large host system vs multiple smaller ones can get complicated.)
  The second effect is that 32-way systems have huge inter-processor cache synchronisation overheads. Only very carefully coded software can scale to use thousands of cores without absolutely drowning in cache line invalidations.
  At these scales you're almost always better off scaling out "medium" sized boxes. A single writer and multiple read-only secondary replicas will take you very far, up to hundreds of gigabits of aggregate database traffic.
tetha 16 days ago

Interestingly enough,
> For example, Azure Standard_E192ibds_v6 is 96 cores with 1.8 TB of memory and 10 TB of local SSD storage with 3 million IOPS.
Is a well-stocked Dell Server going for ~50 - 60K capex without storage before the RAM prices exploded. I"m wondering a bit about the CPU in there, but the Storage + RAM is fairly normal and nothing crazy. I'm pretty sure you could have that in a rack for 100k hardware pricing.
cuu508 17 days ago
Are there any pictures around of these 8, 16, 32 socket boards? Just curious how they look like.
- jiggawatts 17 days ago
  
  The individual motherboards have only four sockets: https://assets.ext.hpe.com/is/image/hpedam/s00012647?$zoom$#...
  Multiple of these can be linked together with “NUMALink” cables, which carry the same protocol as the traces that go between sockets on the motherboard. You end up with a single kernel running across multiple chassis.