← Back to context

Comment by therealpygon

12 hours ago

I agree with you in certain circumstances, but not really for internal user inference. OpenRouter is great if you need to maintain uptime, but for basic usage (chat/coding/self-agents) you can do all of what you mentioned and more with a LiteLLM instance. The number of companies that send a bill is rarely a concern when it comes to “is work getting done”, but I agree with you that minimizing user friction is best.

For general use, I personally don’t see much justification as to why I would want to pay a per-token fee just to not create a few accounts with my trusted providers and add them to an instance for users. It is transparent to users beyond them having a single internal API key (or multiple if you want to track specific app usage) for all the models they have access to, with limits and logging. They wouldn’t even need to know what provider is hosting the model and the underlying provider could be swapped without users knowing.

It is certainly easier to pay a fee per token on a small scale and not have to run an instance, so less technical users could definitely find advantage in just sticking with OpenRouter.

The two things I like about OpenRouter:

1. The LLM provider doesn't know it's you (unless you have personally identifiable information in your queries). If N people are accessing GPT-5.x using OpenRouter, OpenAI can't distinguish the people. It doesn't know if 1 person made all those requests, or N.

2. The ability to ensure your traffic is routed only to providers that claim not to log your inputs (not even for security purposes): https://openrouter.ai/docs/guides/routing/provider-selection...

It's been forever since I played with LiteLLM. Can I get these with it?

  • 1 - I can’t speak to whether that is the case with OpenRouter. However, I suspect that there is more than enough fingerprint and uniqueness inherent to the requests that an AI could probably do a fairly accurate job of reconstructing “possible” sources, even with such anonymity. The result is the same, all your information is still tied to OpenRouter in order to track the billing. That also ignores that OpenRouter is also privy to all that same information. In the end, it comes down to how much you trust your partners.

    As for LiteLLM, the company you would pay for inference is going to know it is “you” — the account — but LiteLLM would also have the same effect of appearing to be a single source to that provider.

    2 - well, you select the providers, so that’s pretty much on you? :-) basically, you are establishing accounts with the inference providers you trust. Bedrock has ZDR, SOC, HIPPA, etc available, even for token inference, as an example. Cost is higher without cache, but you can’t have true ZDR and Cache (that I know of), because a cache would have to be stored between requests. The closest you could get there is maybe a secure inference container but that piles on the cost. Still, plenty of providers with ZDR policies.

    LiteLLM is effectively just a proxy for whatever supported (or OpenAI, Anthropic, etc compatible api provider) you choose.

  • > It doesn't know if 1 person made all those requests, or N.

    FWIW this is highly unlikely to be true.

    It's true that the upstream provider won't know it's _you_ per se, but most LLM providers strongly encourage proxies like OpenRouter to distinguish between downstream clients for security and performance reasons.

    For example:

    - https://developers.openai.com/api/docs/guides/safety-best-pr...

    - https://developers.openai.com/api/docs/guides/prompt-caching...

    • Fair point. Would be good to hear from OpenRouter folks on how they handle the safety identifier.

      For prompt caching, they already say they permit it, and do not consider it "logging" (i.e. if you have zero retention turned on, it will still go to providers who do prompt caching).

      2 replies →

  • One additional major benefit of OpenRouter is that there is no rate limiting. This is the primary reason why we went with OpenRouter because of the tight rate limiting with the native providers.

    • I think it's more accurate to say that they switch providers when there is rate limiting.

      The underlying provider can still limit rates. What Openrouter provides is automatic switching between providers for the same model.

      (I could be wrong.)

      1 reply →

> The number of companies that send a bill is rarely a concern

Not true in any non startup where there is an actual finance department

A lot of inference providers for open models only accept prepaid payments, and managing multiple of those accounts is kind of cumbersome. I could limit myself to a smaller set of providers, but then I'm probably overpaying by more than the 5.5% fee

If you're only using flagship model providers then openrouter's value add is a lot more limited

  • The main thing about Openrouter is also that they take 100% of the risk in case of overcharges from the models, you have an actual hard cap.

    The minus is that context caching is only moderately working at best, rendering all savings nearly useless.

    • I haven't noticed any problems with large context requests through OR to e.g. Opus (other than the rate at which my budget gets spent!). Is this a performance thing?

Does OpenRouter perform better than LiteLLM on integration though? I found using Anthropic's models through a LiteLLM-laundered OpenAI-style API to perform noticably worse than using Anthropic's API directly. So I've scrapped considering LiteLLM as an option. It's also just a buggy mess from trying to use their MCP server. The errors it puts out are meaningless, and the UI behaves oddly even in the happy path (error message colored green with Success: prepended).

But if OpenRouter does better (even though it's the same sort of API layer) maybe it's worth it?