Comment by yu3zhou4

1 month ago

I’m recreating a tiny version of vLLM in C++ and CUDA from scratch (high throughput LLM inference server)