Comment by jdright

6 months ago

Yeah, I would love an actual alternative to Ollama, but RamaLama is not it unfortunately. As the other commenter said, onboarding is important. I just want one operation install and it needs to work and the simple fact RamaLama is written in Python, assures it will never be that easy, and this is even more true with LLM stuff when using AMD gpu.

I know there will be people that disagree with this, that's ok. This is my personal experience with Python in general, and 10x worse when I need to figure out all compatible packages with specifc ROCm support for my GPU. This is madness, even C and C++ setup and build is easier than this Python hell.

9 comments

jdright

cge 6 months ago

RamaLama's use of Python is different: it appears to just be using Python for scripting its container management. It doesn't need ROCm to work with Python or anything else. It has no difficult dependencies or anything else: I just installed it with `uv tool install ramalama` and it worked fine.

I'd agree that Python packaging is generally bad, and that within an LLM context it's a disastrous mess (especially for ROCm), but that doesn't appear to be how RamaLama is using it at all.

ecurtin 6 months ago

@cge you have this right, the main python script has no dependancies, it just uses python3 stdlib stuff. So if you have a python3 executable on your system you are good to go. All the stuff with dependancies runs in a container. On macOS, using no containers works well also, as we basically just install brew llama.cpp
There's really no major python dependancy problems people have been running this on many Linux distros, macOS, etc.
We deliberately don't use python libraries because of the packaging problems.

depingus 6 months ago

I gave Ramalama shot today. I'm very impressed. `uvx ramalama run deepseek-r1:1.5b` just works™ for me. And that's saying A LOT, because I'm running Fedora Kinoite (KDE spin of Silverblue) with nothing layered on the ostree. That means no ROCm or extra AMDGPU stuff on the base layer. Prior to this, I was running llamafile in a podman/toolbox container with ROCm installed inside. Looks like the container ramalama is using has that stuff in there and amdgpu_top tells me the gpu is cooking when I run a query.

Side note: `uv` is a new package manager for python that replaces the pips, the virtualenvs and more. It's quite good. https://github.com/astral-sh/uv

ecurtin 6 months ago
One of the main goals of RamaLama at the start was to be easy to install and run for Silverblue and Kinoite users (and funnily enough that machine had an AMD GPU, so we had almost identical setups). I quickly realized contributing to Ollama wasn't possible without being an Ollama employee:
https://github.com/ollama/ollama/pulls/ericcurtin
They merged a one-line change of mine, but you can't get any significant PRs in.
- depingus 6 months ago
  
  I just realized that ramalama is actually part of the whole Container Tools ecosystem (Podman, Buildah, etc). This is excellent! Thanks for doing this.
jdright 6 months ago

I'll try it then, if it can get a docker setup using my GPU and no dependency hell, then good. I'll report back to correct myself once I try it.

exe34 6 months ago

I get the impression the important stuff is done in a container rather than on the host system, so having python/pip might be all you need.

ecurtin 6 months ago

This is true.