Comment by v3ss0n
1 year ago
Reminder: You don't need ollama, running llamacpp is as easy as ollama. Ollama is just a wrapper over llamacpp.
1 year ago
Reminder: You don't need ollama, running llamacpp is as easy as ollama. Ollama is just a wrapper over llamacpp.
Llamacpp is great but saying it is as easy to setup as Ollama just isn't true.
It actually is true. Running an OpenAI compatible server using llama.cpp is a one-liner.
Check out the docker option if you don’t want to build/install llama.cpp.
https://github.com/ggerganov/llama.cpp/tree/master/examples/...
it took me several hours to get llama.cpp working as a server, it took me 2 minutes to get ollama working.
much like how i got into linux via linux-on-vfat-on-msdos and wouldnt have gotten into linux otherwise, ollama got me into llama.cpp by making me understand what was possible
then again i am Gen X and we are notoriously full of lead poisoning.
4 replies →
People should definitely be more aware of Llamacpp, but I don't want to undersell the value that Ollama adds here.
I'm a contributor / maintainer of Llamacpp, but even I'll use Ollama sometimes -- especially if I'm trying to get coworkers or friends up and running with LLMs. The Ollama devs have done a really fantastic job of packaging everything up into a neat and tidy deliverable.
Even simple things -- like not needing to understand different quantization techniques. What's the difference between Q4_k and Q5_1 and -- what the heck do I want? The Ollama devs don't let you choose -- they say: "You can have any quantization level you want as long as it's Q4_0" and be done with it. That level of simplicity is really good for a lot of people who are new to local LLMs.
This is bad advice. Ollama may be “just a wrapper”, but it’s a wrapper that makes running local LLMs accessible to normal people outside the typical HN crowd that don’t have the first clue what a Makefile is or what cuBlas compiler settings they need.
Or just don't wanna bother. Ollama just works and accelerated me getting running and trying different models a lot.
Ollama is easier to get working on my server with a container, simple as that
Ollama also exposes an RPC interface on Linux at least. That can be used with open-webui. Maybe llama.cpp has this, idk, but I use Ollama mostly through its RPC interface. It’s excellent for that.
I spent less than 5 seconds learning how to use ollama: I just entered "ollama run llama3" and it worked flawlessly.
I spent HOURS setting up llama.cpp from reading the docs and then following this guide (after trying and failing with other guides which turned out to be obsolete):
https://voorloopnul.com/blog/quantize-and-run-the-original-l...
Using llama.cpp, I asked the resulting model "what is 1+1", and got a neverending stream of garbage. See below. So no, it is not anywhere near as easy to get started with llama.cpp.
--------------------------------
what is 1+1?") and then the next line would be ("What is 2+2?"), and so on.
How can I make sure that I am getting the correct answer in each case?
Here is the code that I am using:
\begin{code} import random
def math_problem(): num1 = random.randint(1, 10) num2 = random.randint(1, 10) problem = f"What is {num1}+{num2}? " return problem
def answer_checker(user_answer): num1 = random.randint(1, 10) num2 = random.randint(1, 10) correct_answer = num1 + num2 return correct_answer == int(user_answer)
def main(): print("Welcome to the math problem generator!") while True: problem = math_problem() user_answer = input(problem) if answer_checker(user_answer): print("Correct!") else: print("Incorrect. Try again!")
if __name__ == "__main__": main() \end{code}
My problem is that in the `answer_checker` function, I am generating new random numbers `num1` and `num2` every time I want to check if the user's answer is correct. However, this means that the `answer_checker` function is not comparing the user's answer to the correct answer of the specific problem that was asked, but rather to the correct answer of a new random problem.
How can I fix this and ensure that the `answer_checker` function is comparing the user's answer to the correct answer of the specific problem that was asked?
Answer: To fix this....
--------------------------------