Comment by ninjahawk1

20 hours ago

The way to develop in this space seems to be to give away free stuff, get your name out there, then make everything proprietary. I hope they still continue releasing open weights. The day no one releases open weights is a sad day for humanity. Normal people won’t own their own compute if that ever happens.

I think that's an overgeneralization. We've seen all the American models be closed and proprietary from the start. Meanwhile the non-American (especially the Chinese ones) have been open since the start. In fact they often go the opposite direction. Many Chinese models started off proprietary and then were later opened up (like many of the larger Qwen models)

  • > We've seen all the American models be closed and proprietary from the start

    What about Gemma and Llama and gpt-oss, not to mention lots of smaller/specialized models from Nvidia and others?

    I would never argue that China isn't ahead in the open weights game, of course, but it's not like it's "all" American models by any stretch.

    • gpt-oss is good but I haven't heard anything about an update. It seems like one and done, to shut up people complaining about non-Open AI

    • The more accurate version is only Chinese companies (plus Facebook briefly) really open source their frontier models. The rest are non frontier. They are either older or specialized for something.

    • It's all openwashing, all of the ones you listed at somepoint have expressed how important and valuable open weights and locally usable models are. Every single one of them has then increasingly focused and pushed closed, proprietary or cloud usable only options since saying/doing that.

      I'm annoyed at myself, because I thought/hoped/praised chinese AI when they were opening up as Llama was closing, but Qwen looks to be doing the same playbook here as Llama/Meta, Gemma/Google and OpenAI/gpt-oss.

  • > We've seen all the American models be closed and proprietary from the start.

    Most*.

    OpenAI, contrary to popular belief, actually used to believe in open research and (more or less) open models. GPT1 and GPT2 both were model+code releases (although GPT2 was a "staged" release), GPT3 ended up API-only.

    • That's fair but those days seem so long gone now.

      Also the Chinese models aren't following a typical American SaaS playbook which relies on free/cheap proprietary software for early growth. They are not just publishing their weights but also their code and often even publishing papers in Open Access journals to explicitly highlight what methods and advancements were made to accomplish their results

      4 replies →

I think it is in the interest of chip makers to make sure we all get local models

  • I think they're in a win-win situation. Big AI companies would love to see local computing die in favour of the cloud because they are well aware the moment an open model that can run on non ludicrous consumer hardware appears, they're screwed. In this situation Nvidia, AMD and the like would be the only ones profiting from it - even though I'm not convinced they'd prefer going back to fighting for B2C while B2B Is so much simpler for them

    • If you want to run AI models at scale and with reasonably quick response, there's not many alternatives to datacenter hardware. Consumer hardware is great for repurposing existing "free" compute (including gaming PCs, pro workstations etc. at the higher end) and for basic insurance against rug pulls from the big AI vendors, but increased scale will probably still bring very real benefits.

      4 replies →

    • At a consistent amount of usage, datacenters are at least an order of magnitude more hardware efficient. I'm sure Nvidia and AMD would be fine fighting for B2C if it meant volume would be 10+x.

      Now, given they can't satisfy current volume, they are forced to settle for just having crazy margins.

      3 replies →

    • There are also many Chines AI-target GPU/NPU producers. You can get a hold of some boards on taobao.com. They are usable in some way.

      No, nVidia and AMD are not the only ones benefiting.

This is obviously a strategic move at a national level. Keep publishing competing free models to erode the moat western companies could have with their proprietary models. As long as the narrative serves China there will be no turn to proprietary models.

  • >This is obviously a strategic move at a national level.

    no it isn't. That's the kind of thing people say who've never worked in the Chinese software ecosystem. It's how the Chinese internet has worked for 20+ years. The Chinese market is so large and competition is so rabid that every company basically throws as much free stuff at consumers as they can to gain users. Entrepreneurs don't think about "grand strategic moves at the national level" while they flip through their copies of the Art of War and Confucius lol

    • If this was true then they’d build services around those models and provide those for free or vastly cheaper than western competition. But that’s not what they’re doing. Instead they’re giving away the entire model for free. And by the way, Qwen isn’t build from some random entrepreneur who’s trying to solve the cold start problem, but from Alibaba which is a fucking behemoth. And surprisingly of course none of these models answer uncomfortable questions about China’s past. Because sure enough, the first thing any entrepreneur would think is to protect their government and their history. Sure, happens all the time, no state interference here, move on.

      1 reply →

That has been a viable commercial strategy for most modern, funded businesses. Capture market share at a loss, then once name is established turn on the profit.

Always has been, it’s literally saas; the slight difference is that the lowest tier subscriptions at the frontier labs are basically free trials nowadays, too

I'm a little more optimistic than that. I suspect that the open-weight models we already have are going to be enough to support incremental development of new ones, using reasonably-accessible levels of compute.

The idea that every new foundation model needs to be pretrained from scratch, using warehouses of GPUs to crunch the same 50 terabytes of data from the same original dumps of Common Crawl and various Russian pirate sites, is hard to justify on an intuitive basis. I think the hard work has already been done. We just don't know how to leverage it properly yet.

  • Change layer size and you have to retrain. Change number of layers and you have to retrain. Change tokenization and you have to retrain.

    • Hopefully we will find a way to make it so that making minor changes don't require a full retrain. Training how to train, as a concept, comes to mind.

    • And yet the KL divergence after changing all that stuff remains remarkably similar between different models, regardless of the specific hyperparameters and block diagrams employed at pretraining time. Some choices are better, some worse, but they all succeed at the game of next-token prediction to a similar extent.

      To me, that suggests that transformer pretraining creates some underlying structure or geometry that hasn't yet been fully appreciated, and that may be more reusable than people think.

      Ultimately, I also doubt that the model weights are going to turn out to be all that important. Not compared to the toolchains as a whole.

      1 reply →

    • None of that is true, at least in theory. You can trivially change layer size simply by adding extra columns initialized as 0, effectively embedding your smaller network in a larger network. You can add layers in a similar way, and in fact LLMs are surprisingly robust to having layers added and removed - you can sometimes actually improve performance simply by duplicating some middle layers[0]. Tokenization is probably the hardest but all the layers between the first and last just encode embeddings; it's probably not impossible to retrain those while preserving the middle parts.

      [0] https://news.ycombinator.com/item?id=47431671 https://news.ycombinator.com/item?id=47322887

      3 replies →

  • I do not think it's common crawl anymore, its common crawl++ using paid human experts to generate and verify new content, weather its code or research.

    I believe US is building this off the cost difference from other countries using companies like scale, outlier etc, while china has the internal population to do this

The Chinese state wants the world using their models.

People think that Chinese AI labs are just super cool bros that love sharing for free.

The don't understand it's just a state sponsored venture meant to further entrench China in global supply and logistics. China's VCs are Chinese banks and a sprinkle of "private" money. Private in quotes because technically it still belongs to the state anyway.

China doesn't have companies and government like the US. It just has government, and a thin veil of "company" that readily fool westerners.

  • As opposed to the US, which just has companies and a thin veil of “government”.

    • Also many of these Chinese companies aren't just opening their weights. They are open sourcing their code AND publishing detailed research papers alongside them to reveal how they accomplished what they accomplished.

      That's very different from an American SaaS model which relies of free but proprietary software for early growth

  • I'm not sure how local AI models are meant to "entrench China in global supply and logistics". The two areas have nothing to do with one another. You can easily run a Chinese open model on all-American hardware.

    • They are building a pipeline, and the goal is to get people in the door.

      If you forever stand at the entrance eating the free samples, that's fine, they don't care. Other people are going through the door and you are still consuming what they feed you. Doesn't mean it's going to be bad or evil, but they are staking their territory of control.

      1 reply →

  • I'm Aussie. Please explain to me; why should I care whether Chinese SOEs or the US tech companies are winning? Neither have my best interests at heart.

  • Like with nuclear technology, it's not healthy for only one country to dominate AI. The cat is already out of the bag and many countries now have the ability to train and run models. Silicon Valley has bootstrapped this space. But it should be noted that they are using AI talent from all over the world and it was sort of inevitable that this technology would get around. Lots of Chinese, Indian, Russian, and Europeans are involved.

    As for what comes next, it's probably going to be a bit of a race for who can do the most useful and valuable things the cheapest. If OpenAI and Anthropic don't make it, the technology will survive them. If they do, they'll be competing on quality and cost.

    As for state sponsorship, a lot of things are state sponsored. Including in the US. Silicon Valley has a rich history that is rooted in massive government funding programs. There's a great documentary out there the secret history of Silicon Valley on this. Not to mention all the "cheap" gas that is currently powering data centers of course comes on the back of a long history of public funding being channeled into the oil and gas industry.

    • >As for state sponsorship, a lot of things are state sponsored.

      You can make any comparison you want if you use adjectives rather than values. I can say that cars use a massive amount of water (all those radiators!) to try and downplay agricultural water usage. But its blatantly disingenuous.

      SV is overwhelmingly private (actual constitutional private) money. To the point that you should disregard people saying otherwise, just like you would the people saying cars use massive amounts of water.

  • So an OPEN model that I can run on my own fucking hardware will entrench China in global supply and logistics how?

    Contrary: How will the closed, proprietary models from Anthropic, "Open"AI and Co. lead us all to freedom? Freedom of what exactly? Freedom of my money?

    At some point this "anti-communism" bullshit propaganda has to stop. And that moment was decades ago!

  • So what?

    I still prefer that over US total dominance.

    Let them fight it out.

    • Yeah, a lot of people are still living within the paradigm of tribalism: my team good, other team bad.

      But the events of the past decade or so have clearly demonstrated that there are no "good" actors.

      I personally couldn't care less who wins in the China vs US AI competition, both sides have a long list of pros and cons.

    • I'd get a bit informed about what exactly Chinese dominance entails. Ask a few Uyghurs, Cantonese Hong Kongers, or even Tibetans.

      Then decide ...

      4 replies →

  • Well, isn't this what the US and really any other power in the world has always done, since forever?

Why is it sad? These things are useles all around, along with the people who overuse them.

It would be a great day for humanity if people would stopping glazing text autocomplete as revolutionary.