Show HN: Any-LLM – Lightweight router to access any LLM Provider

14 hours ago (github.com)

We built any-llm because we needed a lightweight router for LLM providers with minimal overhead. Switching between models is just a string change : update "openai/gpt-4" to "anthropic/claude-3" and you're done.

It uses official provider SDKs when available, which helps since providers handle their own compatibility updates. No proxy or gateway service needed either, so getting started is pretty straightforward - just pip install and import.

Currently supports 20+ providers including OpenAI, Anthropic, Google, Mistral, and AWS Bedrock. Would love to hear what you think!

> LiteLLM: While popular, it reimplements provider interfaces rather than leveraging official SDKs, which can lead to compatibility issues and unexpected behavior modifications

with no vested interest in litellm, i'll challenge you on this one. what compatibility issues have come up? (i expect text to have the least, and probably voice etc have more but for text i've had no issues)

you -want- to reimplement interfaces because you have to normalize api's. in fact without looking at any-llm code deeply i quesiton how you do ANY router without reimplementing interfaces. that's basically the whole job of the router.

  • Both approaches work well for standard text completion. Issues tend to be around edge cases like streaming behavior, timeout handling, or new features rolling out.

    You're absolutely right that any router reimplements interfaces for normalization. The difference is what layer we reimplement at. We use SDKs where available for HTTP/auth/retries and reimplement normalization.

    Bottom line is we both reimplement interfaces, just at different layers. Our bet on SDKs is mostly about maintenance preferences, not some fundamental flaw in LiteLLM's approach.

  • Yeah, official SDKs are sometimes a problem too. Together's included Apache Arrow, a ~60MB dependency, for a single feature (I patched to make it optional). If they ever lock dependency versions it could conflict with your project.

    I'd rather a library that just used OpenAPI/REST, than one that takes a ton of dependencies.

  • LiteLLM is quite battle tested at this point as well.

    > it reimplements provider interfaces rather than leveraging official SDKs, which can lead to compatibility issues and unexpected behavior modifications

    Leveraging official SDKs also does not solve compatibility issues. any_llm would still need to maintain compatibility with those offical SDKs. I don't think one way clearly better than the other here.

    • That's true. We traded API compatibility work for SDK compatibility work. Our bet is that providers are better at maintaining their own SDKs than we are at reimplementing their APIs. SDKs break less often and more predictably than APIs, plus we get provider-implemented features (retries, auth refresh, etc) "for free." Not zero maintenance, but definitely less. We use this in production at Mozilla.ai, so it'll stay actively maintained.

  • I use litellm as my personal AI gateway, and from user point of view there is no difference if proxy uses official SDK or not, this might be benefit for proxy developers.

    but I can give you one example: litellm recently had issue with handling deepseek reasoning. they broke implementation and while reasoning was missing from sync and streaming responses.

I'm excited to see this. Have been using LiteLLM but it's honestly a huge mess once you peek under the hood, and it's being developed very iteratively and not very carefully. For example. for several months recently (haven't checked in ~a month though), their Ollama structured outputs were completely botched and just straight up broken. Docs are a hot mess, etc.

How does this differ from this project? https://github.com/simonw/llm

  • From my understanding of Simon's project it only supports OpenAI and OpenAI-compatible models in addition to local model support. For example, if I wanted to use a model on Amazon Bedrock I'd have to first deploy (and manage) a gateway/proxy layer[1] to make it OpenAI-compatible.

    Mozzila's project boosts of a lot of existing interfaces already, much like LiteLLM, which has the benefit of directly being able to use a wider range or supported models.

    > No Proxy or Gateway server required so you don't need to deal with setting up any other service to talk to whichever LLM provider you need.

    Now how it compares to LiteLLM, I don't have enough experience in either to tell.

    [1] https://github.com/aws-samples/bedrock-access-gateway

This looks awesome.

Why Python? Probably because most of the SDKs are python, but something that could be ported across languages without requiring an interpreter would have been really amazing.

Crazy timing!

I shipped a similar abstraction for llms a bit over a week ago:

https://github.com/omarkamali/borgllm

pip install borgllm

I focused on making it Langchain compatible so you could drop it in as a replacement. And it offers virtual providers for automatic fallback when you reach rate limits and so on.

There is liteLLM, OpenRouter, Arch (although that’s an edge/service proxy for agents) and now this. We all need a new problem to solve

I use Litellm Proxy, even in a dev environment via Docker, because the Usage and Logs feature greatly helps in providing visibility into LLM usage. The Caching functionality greatly helps in reducing costs for repetitive testing.

What is mozilla-ai?

Seems like reputation parasitism.

  • Common question, thanks for asking! We’re a public benefit corporation focused on democratizing access to AI tech, on enabling non-AI experts to benefit from and control their own AI tools, and on empowering the open source AI ecosystem. Our majority shareholder is the Mozilla Foundation - the other shareholders being our employees, soon :). As access to knowledge and people shifts due to AI, we’re working to make sure people retain choice, ownership, privacy, and dignity.

    We're very small compared to the Mozilla mothership, but moving quickly to support open source AI in any way we can.

  • It is an official Mozilla Foundation subsidiary. Their website is here: https://www.mozilla.ai/

    • Interesting. I made my comment after visiting their repo and website. Didnt see a pixel worth of the mozilla brand there, hence my comment.

      On a second visit I notice a link to mozilla.org on their footer.

      Still doesent ring official by me from being a veteran mozilla user (netscape, mdn, firefox) but ok, thanks for the explanation.

      2 replies →

a proxy means you offload observability, filtering, caching rules, global rate limiters to a specialized piece of software - pushing this in application code means you _cannot_ do things centrally and it doesn't scale as more copies of your application code get deployed. You can bounce a single proxy server neatly vs. updating a fleet of your application server just to monkey patch some proxy functionality.

  • Good points! any-llm handles the LLM routing, but you can still put it behind your own proxy for centralized control. We just don't force that architectural decision on you. Think of it as composable: use any-llm for provider switching, add nginx/envoy/whatever for rate limiting if you need it.

    • How do I put this behind a proxy? You mean run the module as a containerized service?

      But provider switching is built in some of these - and the folks behind envoy built: https://github.com/katanemo/archgw - developers can use an OpenAI client to call any model, offers preference-aligned intelligent routing to LLMs based on usage scenarios that developers can define, and acts as an edge proxy too.

      2 replies →

  • You can do all of that without a proxy. Just store the current state in your database or a Redis instance.

    • and managed from among the application servers that are greedily trying to store/retrieve this state? Not to mention you'll have to be in the business of defining, updating and managing the schema, ensuring that upgrades to the db don't break the application servers, etc, etc. The proxy server is the right design decision if you are truly trying to build something production worthy and you want it to scale.

This is awesome, will give it a try tonight.

I’ve been looking for something a bit different though related to Ollama. I’d like a load balancing reverse proxy that supports queuing requests to multiple Ollama servers and sending requests only when a Ollama server is up and idle (not processing). Anything exist?

In truth it wasn’t that hard for me to ask Claude Code to just implement the text completion API so routing wasn’t that much of a problem.

Interesting timing. Projects like Any-LLM or LiteLLM solve backend routing well but still involve server-side code. I’ve been tackling this from a different angle with Airbolt [1], which completely abstracts backend setup. Curious how others see the trade-offs between routing-focused tools and fully hosted backends like this.

[1] https://github.com/Airbolt-AI/airbolt

  • (retracted after GP edited their comment)

    • I didn’t intend my original comment to be overly-promotional without relevance. I'm genuinely curious about the tradeoffs between different LLM API routing solutions, most acutely as a consumer.

    • don't you post links to your own stuff all the time? i don't think their comment was out of line.