Comment by oleg_antonyan
2 days ago
I call these AI tools "proprietary non-determenistic database of the free internet". They belong to american companies which can cut off your access if american government doesn't like your country's government. They fed from the free internet that many of us grew up in, store it in humans unreadable form and sell you access to it. If some day claude starts to spit out compiled binaries instead of code nobody will notice, and we'll essentially get proprietary cloud-hosted compiler that most in the world depends on to build software. With built-in telemetry and backdoors and clause in license that allow full overtake of your business if provider wants it ofc. It's a great shift from the internet we all know and love towards the new subscription-based access to world's propriatary knowledge base. It's a perfect "mind control" tool as well - you don't need USAID, "free media" and stuff like that in other countries when all people there including politicians ask chatgpt everything from meaning of life to recipies of pancakes. Once you see these political and philosophical dimensions it's hard to unsee how claudecode running on my PC won't turn into a weapon some day. But in blissful ignorance it's fun to use, and companies love it for the promise of replacing people. Amen
The argument that LLMs are "feeding me back free and open internet" seems to skip the most useful aspect of the tool.
I could never, as an individual read, let alone synthesize and make decisions with, the amount of information on the internet. The LLM takes that free and open information and feeds me back novel information based on that free information. It gives me ideas, opinions, and hard data based on that information.
It's the most powerful information synthesizing tool in existence. I don't find the argument that "it's built on free information and sold to you" fair or plausible at all.
It's like saying you're free to make your own bottled water. Technically true, but in reality not.
I think you undervalue the contribution of internet-scale data to foundation modeling, and because LLMs can obsolete the content they required, I think its fair to characterize it as theft. Obviously RL contributes a lot to capabilities, but the judgement that an LLM uses to 'synthesize information' is born from the training data. The scale of the data really is beyond intuition. books3, for example, would 230 yrs of continuous reading
I actually think the "proprietary non-determenistic database of the free internet" does a lot to characterize the capabilities and effects to a lot of people. Obviously coders are more in tune with how well agents can work, but that's also due more to the RL breakthroughs than foundation modeling.
As I understand RL makes foundation models stupider (less capable, not more) but better at following instructions.
Can you steal something that is free and openly available?
I just don't understand this argument. "Theft" feels like a nice, heavy, moral accusation to toss at those you're debating with, but the actual prerequisites for theft don't even exist in this situation.
1 reply →
It’s the solution to the second information problem. Hypertext arose from Bush’s Memex, and the information problem it offered to solve. Now, there is simply so much information available on the modern Memex that it is impossible to make any sense of it all. So, we now have LLMs. There are still some issues with them, but they’re good at what they do.
I have mixed emotions about LLMs and AI more generally. I fear the dystopia, hope for some marginal improvement in human life, and I genuinely enjoy playing around with local models. But, I think there may be near term harms that outweigh the gains. We shall see.
Nothing about the information it feeds you is novel. It's all stolen repetition of someone else's work.
Bizarre to say that. When I have it perform work on a bespoke code base on a niche videogame, in a less commonly used language, is that still "regurgitating stuff"?
No, it is impossible for it to have seen this combination of things.
It routinely produces, suggests, and correctly implements novel things that had not existed.
You can see this yourself by learning how LLMs work, or anecdotally using these tools.
2 replies →
That is simply not true. The naive “glorified auto-complete / stochastic parrot” argument may have some merit when applied to generic pre-trained models, which only learn from unsupervised next-token prediction. But the post training through reinforcement learning the frontier models undergo is very sophisticated and they genuinely learn to do novel things that are purely the work of the model being trained (and the work of the GPUs they burn along the way of course).
Thank god I bought the alphabet before learning it unlike one of those stealing heathens.
In your hate of AI please don't build the world in The Right to Read.
I'm certain I've read this comment before.
You forgot the push forward towards more destruction of the planet we depend on to live, and the centralization of wealth in addition to the one of power.
To OP's point, I am curious why a tech forward crowd would consider AI-training/inference anywhere close to a significant contributor of greenhouse gasses? Datacenters are like a tiny blip on emissions plots [1]
I think AI is a convenient foil to get people whipped up and out to vote, but I know HN is not the forum for that. The technical data clearly says that closed-loop water coolers don't use that much water and energy use is a function of a counties energy infrastructure choices not the existence of demand.
But instead we're going straight to destruction of planet as the exact verbiage, which seems way out of whack.
[1]: https://www.wri.org/insights/4-charts-explain-greenhouse-gas...
Are currently, but at the scale AI corps are trying to deploy at they'll be the largest user of electricity.
4 replies →
> Datacenters are like a tiny blip on emissions plots
Maybe in 2023, but what about now?
4 replies →
They basically undo all the reduce renew recycle efforts we've been doing since the 90's
2 replies →
Yes, which is nothing compared to the destruction wrought by oil companies, steel manufacturing, meat production, concrete, energy, and transport.
Which have actually destroyed our environment for over a century. With only a slight bend towards slowing.
And I assure you. If you wanna argue "those are useful", not all of those things are useful, as useful as they could be, as efficient as they could be, or could be replaced today if the will was there with better options.
Data centers are a boogeyman and only cared about by Americans and some Europeans. The other 6 billion people in the world really dont think they are bad, nor have such strong feelings towards AI.
And I bet they feel a lot more animosity towards the military industrial complex and the oil empire financing it destroying our planet.
The wealth comment is valid. The resource stuff is overblown. Look up the energy cost of AI vs ordering a burrito on DoorDash.
Interesting point! How about if instead of a burrito I order yakisoba or chicken tendies?
That would be a world where there is very little value in local models. I don't thinkt that will be the case.
I'd say the "threat" of local models and user independence is currently (successfully) being fought by cutting off the supply and development of general computing devices and hardware.
The mentioned big few are buying up everything regardless of need and making hardware unaffordable and unavailable for normal people (or smaller businesses). And some of the few manufacturers are already being convinced to stop developing/producing consumer hardware altogether.
And whats left might be taken care of via the rise of attestation. Just start framing local, unapproved models as "security risks" at some point.
Forcing people to go through you by buying up the market supply of a commodity at 10x the production cost is a strategy that will inevitably collapse. More DRAM will get made if prices hold.
2 replies →
Creating and updating those local models on a regular basis still requires the massive datacenters, doesn't it? You're assuming that those local models will continue to be provided for free or a reasonable price instead of eventually being monetized.
This would make you anti-OpenAI, not anti-AI given the explosion of local models. Two different ideas.
> the internet we all know and love
Only accessible via proprietary ISPs that require a subscription, actively monitor and censor (pirate sites). And managed by ICANN which is based in America (and though nonprofit, certainly influenced by money and government).
Meanwhile, local models package a large part of the past internet, can be stored in your computer and accessed by the most ignorant (but literate) laymen.
ICANN is incredibly corrupt now. They make money selling gTLDs which all have to play by American rules. Having American rules on .org .com etc is an acceptable accident of history - having them on every generic word outside of .us is not.
Open weight models remove most of the issues you list and require relatively affordable hardware like a MBP with 128gb of ram or even less.
Deepseek v4 flash is by any means comparable to SOTA from 6 months ago. It's more than good enough for AI-assisted coding and there are no reasons to believe that one year from now or so, they won't be even better and faster.
open weight models are released by the same companies who's revenue would be threatened by open weights - they won't continue to undercut themselves by releases free models once that happens.
128 GiB MacBook Pro is like $8k! Thankfully, you can run local models on a $1,000 Pixel 10 Pro, which is still a lot, but slightly less insane.
> the free internet
It's so much worse than this, though. These companies have raped and pillaged every last store of human knowledge: art, transcripts of conversations, newspaper articles, etc., etc. Contrast that with what Aaron Schwartz did and wound up taking his life over. It's beyond shameful.
Despite how the first few trial balloon lawsuits have played out, I think there are still a few shoes left to drop. For example, try asking Gemini (very nicely) to output an image containing your favorite comic book or Disney character and it will eventually do it -- despite the response you'll probably get on the first try saying it's not legally allowed to. In what rational universe did the companies training these models not violate either the T&C of their access to the source material (e.g. streamed content) or aren't they violating copyright/trademark by generating content containing this IP?
You probably mean USAGM (US Agency for Global Media) and its affiliated programs (Voice of America, Radio Free Europe, etc) rather than USAID.
USAID was a humanitarian aid agency that focused on programs like famine relief, disaster response, and medical aid in some of the poorest countries on Earth.
It is how these agencies operate. They mask malevolent activity behind good front (“think of the children“, “age verification“ -> censorship/total surveillance. Medical aid -> overthrowing governments).
Four Horsemen of the Infocalypse is what you're speaking of. Not wrong about there being a facade, IMF trap you with wells, but this distinguishes the boogeymen used to give up civil liberties.
It's similar to enshitification drawing attention to the decay cycle. These get used constantly with a good example being KYC is needed to stop money laundering.
https://en.wikipedia.org/wiki/Four_Horsemen_of_the_Infocalyp...
Any Internet service could cancel your account. This isn't unique to AI.
[flagged]
The internet is for everyone. I am proud for us for building something so awesome that we get to train an entire replica of human reasoning on top of it. It's sad that most of the "new" internet will be made by these machines, but that's cool nonetheless.
Yes it's billion dollar companies building it, but every technical revolution needs large funding before it becomes accessible. Even the internet itself was way too expensive back in the days. Now we access it from fridges and toasters. Electric cars had to start as luxury purchase, so did phones or even CD players.
Now that we know what quantization is most optimal so that we built optimized accelerstors, how to architecture/harness LLMs for our purpose, now we can start to reclaim it.
Especially now when LLM APIs are starting to get expensive.
Respectfully, this looks like coping with the fact that a fundamentally new technology is discovered but people can’t cope with the immensity of it so they end up throwing in shallow and spread out criticisms spanning sovereignty, USA bad, replacing people bad etc.
It’s my personal opinion and it looks extremely incurious analysis of what’s going on. Even if a person doesn’t like AI, I would expect a curious person to have more deep opinions. “Non deterministic database” clearly tells me this.
There’s not a single coherent critique but just throwing some polemic to see what sticks.
the grandparent commentor described potential future paths of unfettered ai usage. it's not clear why arises the expectation that a discussion of future possibilities must adhere to a contemporary argumentative format (mainly because concrete evidence of future events does not exist). it's your right to interpret this discussion as 'USA bad', personally I think it's very likely that the USA will continue to exercise power until it cannot. do you have any arguments that support ai being unleashed en masse? at the moment it seems to me that students are mentally atrophying as a result of outsourcing thinking to robots, therefore from my perspective it's crystal clear that the current path of 'let it rip' is suboptimal
The USA is terrifying from a non-US perspective now, and it's never been great. We're furious at the states and sick of being dependent on its enshittified technology.
Anti-AI sentiment absolutely and correctly has a "USA bad" steak.
2 replies →
"Respectfully, <insert highly patronizing, incurious, belittling take about everyone who disagrees with me, which is completely divorced from any of the actual argumentation on the anti-AI side>"
Pure tech dimensions of AI are out of scope. The internet is full of tech critiques and praises, what's the point of yet another opinion on it. Can't open linkedin without seeing only AI-generated posts about AI replacing people next to AI dumb. But the political dimension seems completly abscent from the public discussion
It seems to be the opposite to me. There are mostly relatively fact free discussion of AI which are highly political. And rarely you can find some technical discussions. And even then politics will invade those discussions but ego and naive optimism tends to be more driving those discussions. Even on technical matters. NNs are still very much a black box afterall and people tend to project what they want on what they think is happening inside.
This is extremely inflammatory. It’s like you’re trying to start a fight. You’re actively degrading the quality of the discussion.