Comment by embedding-shape

11 days ago

> Maybe just a direct layer on top of vllm

My dream would be something like vLLM, but without all the Python mess, packaged as a single binary that has both HTTP server + desktop GUI, and can browse/download models. Llama.cpp is like 70% there, but large performance difference between llama.cpp and vLLM for the models I use.

1 comment

embedding-shape

Imustaskforhelp 11 days ago

> My dream would be something like vLLM, but without all the Python mess, packaged as a single binary that has both HTTP server + desktop GUI, and can browse/download models. Llama.cpp is like 70% there, but large performance difference between llama.cpp and vLLM for the models I use.

To be honest, I was seeing your comment multiple times and after 6 hours, It suddenly clicked about something new.

I had seen this project on reddit once, https://github.com/GeeeekExplorer/nano-vllm

It's almost as fast (from what I can tell in its readme, faster?) than vllm itself but unfortunately its written in python too.

But the good news is that its much smaller in the whole size of the codebase. Let me paste somethings from its readme

     Fast offline inference - Comparable inference speeds to vLLM
     Readable codebase - Clean implementation in ~ 1,200 lines of Python code
     Optimization Suite - Prefix caching, Tensor Parallelism, Torch compilation, CUDA graph, etc.

Inference Engine Output Tokens Time (s) Throughput (tokens/s) vLLM 133,966 98.37 1361.84 Nano-vLLM 133,966 93.41 1434.13

So I guess I am pretty sure that you can one-agent-one-human it from python to rust/golang! It can be an open project.

Also speaking of oaoh (as I have started calling it), a bit offtopic but my golang port faces multiple issues as I tried today to make it work. I do feel like rust was a good lang because quite frankly the AI agent or anything instead of wanting to do things with its own hands, really wants to end up wanting/wishing to use Fyne library & the best success I had around going against Fyne was in kimi's computer use where you can say that I got a very very (like only simple text) nothing else png file-esque thing working

If you are interesting emsh. I am quite frankly interested that given that your oaoh project is really high quality. Does it still require the intervention of human itself or can an AI port it itself. Because I have mixed feelings about it.

Honestly It's an open challenge to everybody. I am just really interested in getting to learn something about how LLM's work and some lesson from this whole thing I guess imo.

Still trying to create the golang port as we speak haha xD.