Comment by nico

11 hours ago

How are you using glm-4.5? Are you consuming the api or running something like glm-4.5 air locally?

14 comments

nico

I run a privacy-focused inference company, Synthetic [1], and I use our API of course :P I actually like GLM-4.5 enough that it's currently our default recommended model for new users. But yes, otherwise I'd use the official zai API most likely, or Fireworks. GLM-4.5-Air is quite good for a local model but GLM-4.5 is better; up to you if the tradeoff is worth it — there's definitely value in the data not ever leaving your machine, but it's not going to be as strong of a model.

1: https://synthetic.new

fariszr 5 hours ago
What makes your service especially privacy friendly?
I think if you are striving for full privacy, you should implement the secure enclave idea presented by ollama, it makes the entire pipeline fully encrypted, I'm waiting for an actual provider to finally implement this.
https://ollama.com/blog/secureminions
- reissbaker 5 hours ago
  
  We don't store prompts or completions for the API (our privacy policy says "for longer than 14 days," as mentioned elsewhere in this thread — we don't actually store them at all, but the 14 day legal guarantee is to make sure that if someone accidentally commits a log statement, we have a little bit of time to catch it and revert without being in breach of policy). And, we don't train on your data, even for messages in the UI: we only store UI messages in order to let you view your message history, not for training.
  Compared to using — for example — DeepSeek from deepseek.com, I think we're much more private. Even compared to using OpenAI and opting-out of your data being used for training, we're still more private, since OpenAI makes no guarantees for individuals that they don't store the data — notably, any data ever sent to them is apparently now being shared with New York courts (and the New York Times!) due to their ongoing legal battle with the New York Times [1]. And compared to using OpenRouter with "data_collection: deny", we uh, actually work :P Surprisingly sad how many broken model implementations there are if you're just round-robin-ing between inference companies... Especially reasoning models, and especially with tool-calling.
  (And if something's broken, you can email us and we'll generally fix it; OpenRouter doesn't actually host any models themselves, so there's not much they can do if one isn't working well other than just de-list.)
  1: https://arstechnica.com/tech-policy/2025/07/nyt-to-start-sea...
  
  4 replies →
throwdbaaway 5 hours ago

You support logprobs, that's wonderful! Fireworks, Synthetic, (ik_)llama.cpp, now I have a quorum.
azinman2 9 hours ago
I’m curious for your service, if it’s centered around privacy, why is the data stored for 14 days at all? My understanding with fireworks is that it’s 0 logging — nothing to store. To me that’s private.
- reissbaker 8 hours ago
  
  Great question! We actually don't store prompts or completions for the API at all — but legally we say 14 days so that if someone accidentally commits a log statement, we're not in breach as long as we catch it quickly and revert.
mrtesthah 7 hours ago
Amazing! So I’m assuming that because it’s privacy focused, you accept payment in cryptocurrencies like Monero and Zcash?
- reissbaker 7 hours ago
  
  We accept USDC and USDP crypto payments via Stripe. We don't currently support Monero or Zcash — right now all our payments are via Stripe since it simplifies security + compliance for us. It would be a pretty neat feature to build though.

sagarpatil 10 hours ago

Not OP. Chutes.ai charges $0.20 per 1M tokens. I don’t think it uses caching though because I ended up burning $30 in an hour or two. I had to move back to Claude Code.

esafak 10 hours ago

Caching makes price comparisons hard. Does anyone have tips?