Comment by gaeld

4 hours ago

Follow-up reading the most technical and research people here:

Monokernel deep dive (GPU Engineering): http://blog.kog.ai/building-a-single-kernel-latency-optimize...

Delayed Tensor Parallelism (research): http://blog.kog.ai/delayed-tensor-parallelism-for-faster-tra...

To try the speed on the playground: http://playground.kog.ai

1 comment

gaeld

It looks like DTP is a distinct architectural choice that would require training new models accordingly? This wouldn't be able to just run inference for existing models.