← Back to context

Comment by halJordan

19 days ago

You absolutely do not have to use a third party llm. You can point it to any openai/anthropic compatible endpoint. It can even be on localhost.

Ah true, missed that! Still a bit cumbersome & lazy imo, I'm a fan of just shipping with that capability out-of-the-box (Huggingface's Candle is fantastic for downloading/syncing/running models locally).

  • In local setup you still usually want to split machine that runs inference from client that uses it, there are often non trivial resources used like chromium, compilation, databases etc involved that you don’t want to pollute inference machine with.

  • Ah come on, lazy? As long as it works with the runtime you wanna use, instead of hardcoding their own solution, should work fine. If you want to use Candle and have to implement new architectures with it to be able to use it, you still can, just expose it over HTTP.

    • I think one of the major problems with the current incarnation of AI solutions is that they're extremely brittle and hacked-together. It's a fun exciting time, especially for us technical people, but normies just want stuff to "work."

      Even copy-pasting an API key is probably too much of a hurdle for regular folks, let alone running a local ollama server in a Docker container.

      4 replies →