Comment by mmoskal

8 months ago

Their tech report says one inference deployment is around 400 GPUs...

1 comment

mmoskal

You need that to optimize load balancing. Unfortunately that gain is not available to small or individual deployment.