Comment by nathan-barry

3 months ago

Actually NVIDIA made one earlier this year, check out their Fast-dLLM paper

2 comments

nathan-barry

Reply

gdiamos 3 months ago

Thanks I’ll check it out!

gdiamos 3 months ago

Did I miss something? https://github.com/NVlabs/Fast-dLLM/blob/main/llada/chat.py
That’s inference code, but where is the high perf web server?