← Back to context

Comment by gdiamos

3 months ago

One year later and there is still no inference engine for diffusion LLMs

Students looking for a project to break into AI - please!

4 comments

gdiamos

Reply

nathan-barry 3 months ago

Actually NVIDIA made one earlier this year, check out their Fast-dLLM paper

gdiamos 3 months ago
Thanks I’ll check it out!
- gdiamos 3 months ago
  
  Did I miss something? https://github.com/NVlabs/Fast-dLLM/blob/main/llada/chat.py
  That’s inference code, but where is the high perf web server?

tough 3 months ago

training inspired on nanochat for diffusion models: https://github.com/ZHZisZZ/dllm

now someone needs to make it work with vllm or something