Comment by nathan-barry 3 months ago Actually NVIDIA made one earlier this year, check out their Fast-dLLM paper 2 comments nathan-barry Reply gdiamos 3 months ago Thanks I’ll check it out! gdiamos 3 months ago Did I miss something? https://github.com/NVlabs/Fast-dLLM/blob/main/llada/chat.pyThat’s inference code, but where is the high perf web server?
gdiamos 3 months ago Thanks I’ll check it out! gdiamos 3 months ago Did I miss something? https://github.com/NVlabs/Fast-dLLM/blob/main/llada/chat.pyThat’s inference code, but where is the high perf web server?
gdiamos 3 months ago Did I miss something? https://github.com/NVlabs/Fast-dLLM/blob/main/llada/chat.pyThat’s inference code, but where is the high perf web server?
Thanks I’ll check it out!
Did I miss something? https://github.com/NVlabs/Fast-dLLM/blob/main/llada/chat.py
That’s inference code, but where is the high perf web server?