Comment by narrator
3 days ago
People are already patching these models using abliteration to prevent them from refusing any request, so it is possible for end users to change them in meaningful ways. You can download abliterated models right now from Hugging Face that will respond to all kinds of requests that frontier models refuse.
The problem is you can't reverse engineer what was baked into the weights because they are just weights. You'll never know if you've fixed everything because it's not always going to be as obvious as request refusal. It's also not binary where you can fully confirm something is fixed or if you've accidentally affected something else.
They're for sure impressive but I don't see how anyone can push them as "open" when they are literally binary blobs. Worse, because it's not practical for anyone to actually train LLMs that can even come close to competing with the ones corporations are pumping out.
Yup there's a ton of people on HN sleeping on this new tech because they refuse to look at anything AI. We now have jail broken models but the average person on here doesn't even know how to download and try a model.
It doesnt help that guides ive seen have been pretty handwavy or are not specific enough to the individual situation (i have z hardware, heres how its done). It also doesnt help when every post on HN i see is like 'oh waow i did x on a mac mini with 128gb ram'. That spec is beyond many, running on generally available resources (such as hardware one might have laying around their house) do not seem fit for the purpose, so its back to building a new machine (gl when ram is worth 2x its weight in gold), or buying a $1000+ mac mini, or other device. Any low end system cant turn out tokens fast enough, or doesnt have the resources for context or processing.
Local ai is not ready, and if you think it is, prove me wrong with a detailed guide running commodity hardware with complete setup steps that can use a decently sized model.
I spent 2 weeks trying to get anything running - 8gb RX550XT, 12gb ram, 8core cpu. I even tried turboquant to lower memory utilization and still couldnt even get a 3B or 4B model loaded, and anything lower wont suit my needs (3/4B are even pushing it).
When Stallman was getting started writing emacs in the early 80s, Unix machines were vastly out of reach price wise for the common home user, but he did his open source work anyway, and eventually the 386 came along.
"Local AI is not ready" > proceeds to run a 7 year old budget GPU
You're like the kid showing up to a test without a pencil.
It's ridiculous for you to suggest that an advanced AI model needs to run on your budget 7 year old graphics card that is already out of date for even today's gaming. My parents spent $2500 on a computer in 1995 and that was a 166Mhz Pentium 1. If they spent that money today it would be $5261. Think of what you can get for amount of money. Then you're over here trying to say a budget graphics card needs to somehow compete with the bleeding edge of computer innovation.
You do, in fact, need to spend money on appropriate gear if you expect to participate.
2 replies →
TBH I never understood people trying to run LLM locally. Just rent a powerful machine in the cloud for few hours. It's cheap enough, because you don't need to own a hardware. It doesn't introduce a dependency because there are hundreds of hosters. It doesn't compromise your data, because nobody would extract data from your VM, not until you're under an investigation, anyway, and even in that case just use different jurisdiction.
Spending humongous amount of money to get machine that'll felt obsolete in 2 years? I don't know.