Comment by BeetleB
9 hours ago
> If they care about privacy, they can rent cloud instances in order to setup, run, close and it will be both cheaper, faster (if they can afford it) but also with no upfront cost per project. This can be done with a lot of scaffolding, e.g. Mistral, HuggingFace, or not, e.g. AWS/Azure/GoogleCloud, etc.
I'm a somewhat tech heavy guy (compiles my own kernel, uses online hosting, etc).
Reading your comment doesn't sound appealing at all. I do almost no cloud stuff. I don't know which provider to choose. I have to compare costs. How can I trust they won't peek at my data (no, a Privacy Policy is not enough - I'd need encryption with only me having the key). What do I do if they suddenly jack up the rates or go out of business? I suddenly need a backup strategy as well. And repeat the whole painful loop.
I'll lose a lot more time figuring this out than with a Mac Studio. I'll probably lose money too. I'll rent from one provider, get stuck, and having a busy life, sit on it a month or two before I find a fix (paying money for nothing). At least if I use the Mac Studio as my primary machine, I don't have to worry about money going to waste because I'm actually utilizing it.
And chances are, a lot of the data I'll use it with (e.g. mail) is sitting on the same machine anyway. Getting something on the cloud to work with it is yet-another-pain.
To your second issue/question, all the cloud provide CMEK services/features (for many years now).
> suddenly jack up the rates or go out of business?
There is basically no lock-in, you don't even "move" your image, your data is basically some "context" or a history of prompts which probably fits in a floppy disk (not even being sarcastic) so if you know the basic about containerization (Docker, podman, etc) which most likely the cloud provider even takes care of, then it takes literally minutes to switch from one to another. It's really not more complex that setting up a PHP server, the only difference is the hardware you run on and that's basically a dropdown button on a Web interface (if you don't want to have scripts for that too) then selecting the right image (basically NVIDIA support).
Consequently even if that were to happen (which I have NEVER seen! at worst it's like 15% increase after years) then it would actually not matter to you. It's also very unlikely to happen based of the investment poured into the "industry". Basically everybody is trying to get "you" as a customer to rely on their stack.
... but OK, let's imagine that's not appealing to you, have you not done the comparison of what a Mac Studio (or whatever hardware) could actually buy otherwise?
Ok. I think I misunderstood. So the idea is to simple set up the LLM service on the server and access it with an API like I would with any LLM provider? This way whatever application I want to use it for stays at home?
That's a bit more appealing. How much would it cost per month to have it continually online?
Well it depends entirely on what you need. You can even do the training yourself on that infrastructure to rent if you want. The more you do yourself, the more private but also the more expensive it will be.
I don't want to make an ad here but I'm going to point to HuggingFace https://endpoints.huggingface.co (and to avoid singling them out just https://replicate.com/pricing too but I don't know them well) as an example with pricing.
The "beauty" IMHO of such solutions is that again you pay for what you want. If you want to use the endpoint only for 5min to test that the model and its API fits your need? OK. You want the whole month? Sure. You want 1 user, namely you? Fine, not a lot of power, you want your whole organization to use that endpoint? Scale up.
I'm going to give very rough approximation because honestly I'm not really into this so someone please adjust with source :
Apple Mac Studio M3 Ultra 96GB = $4K
~NVIDIA A100 with 80G ~ 10x perf compared to M3 Pro (obviously depends on models)
So on Replicate today a one can get an A100 for ~$5/hr which is ... about a month. But that's for 10x speed and electricity included. So very VERY approximately if you use a Mac Studio for 10 months on AI non stop (days and night) then it's arguably worth it.
If you use it less, say 2hrs/day only for inference, then I imagine it takes few years to have the equivalent and by that time I bet Replicate or HuggingFace is going to rent much faster setup for much cheaper simply because that's what they have ALL done for the last few years.
2 replies →