OpenAI delays launch of open-weight model

1 day ago (twitter.com)

198 comments

martinald

https://nitter.space/sama/status/1943837550369812814

Not sure if it's coincidental that OpenAI's open weights release got delayed right after an ostensibly excellent open weights model (Kimi K2) got released today.

https://moonshotai.github.io/Kimi-K2/

OpenAI know they need to raise the bar with their release. It can't be a middle-of-the-pack open weights model.

sigmoid10 17 hours ago
They might also be focusing all their work on beating Grok 4 now, since xAi has a significant edge in accumulating computing power and they opened a considerable gap in raw intelligence tests like ARC and HLE. OpenAI is in this to win the competitive race, not the open one.
- unsupp0rted 17 hours ago
  
  > They might also be focusing all their work on beating Grok 4 now,
  With half the key team members they had a month prior
  
  3 replies →
- macawfish 9 hours ago
  
  Yet it suspiciously can't draw a pelican?
  
  2 replies →
bilsbie 11 hours ago
Btw why is there no k2 discussion on HN? Isn’t it pretty huge news?
- Alifatisk 10 hours ago
  
  There is, but it’s not on the front page so you don’t find it unless you go through multiple pages or manually search it up.
  Moonshot ai has released banger models without much noise about it. Like for example Kimi K1.5, it was quite impressive at the time
- segmondy 8 hours ago
  
  probably because maybe 1 or 2 folks on here can run it? It's 1000B model, if 16bit training then you need 2000b of GPU vram to run it. Or about 80 5090s hooked up to the same machine. Or 20 of them to run it in Q2.
- otterley 11 hours ago
  
  Why don’t you start one?
- always_imposter 11 hours ago
  
  had to search for the discussion, it's here, seems like nobody noticed it and it only couple hundred upvotes.
  Here: https://news.ycombinator.com/item?id=44533403
  
  5 replies →
lossolo 1 day ago
This could be it, especially since they announced last week that it would be the best open-source model.
- reactordev 20 hours ago
  
  Technically they were right when they said it, in their minds. Things are moving so fast that in a week, it will be true again.
jekwoooooe 10 hours ago

Every openai model since gpt4 has been behind the curve by miles

ryao 1 day ago

Am I the only one who thinks mention of “safety tests” for LLMs is a marketing scheme? Cars, planes and elevators have safety tests. LLMs don’t. Nobody is going to die if a LLM gives an output that its creators do not like, yet when they say “safety tests”, they mean that they are checking to what extent the LLM will say things they do not like.

natrius 1 day ago
An LLM can trivially instruct someone to take medications with adverse interactions, steer a mental health crisis toward suicide, or make a compelling case that a particular ethnic group is the cause of your society's biggest problem so they should be eliminated. Words can't kill people, but words can definitely lead to deaths.
That's not even considering tool use!
- thayne 1 day ago
  
  Part of the problem is due to the marketing of LLMs as more capable and trustworthy than they really are.
  And the safety testing actually makes this worse, because it leads people to trust that LLMs are less likely to give dangerous advice, when they could still do so.
  
  14 replies →
- ryao 1 day ago
  
  This is analogous to saying a computer can be used to do bad things if it is loaded with the right software. Coincidentally, people do load computers with the right software to do bad things, yet people are overwhelmingly opposed to measures that would stifle such things.
  If you hook up a chat bot to a chat interface, or add tool use, it is probable that it will eventually output something that it should not and that output will cause a problem. Preventing that is an unsolved problem, just as preventing people from abusing computers is an unsolved problem.
  
  7 replies →
- bilsbie 1 day ago
  
  PDFs can do this too.
  
  8 replies →
- andsoitis 12 hours ago
  
  > An LLM can trivially make a compelling case that a particular ethnic group is the cause of your society's biggest problem so they should be eliminate
  This is an extraordinary claim.
  I trust that the vast majority of people are good and would ignore such garbage.
  Even assuming that an LLM can trivially build a compelling case to convince someone who is not already murderous to go on a killing spree to kill a large group of people, one killer has limited impact radius.
  For contrast, many books and religious texts, have vastly more influence and convincing power over huge groups of people. And they have demonstrably caused widespread death or other harm. And yet we don’t censor or ban them.
- selfhoster11 20 hours ago
  
  Yes, and a table saw can take your hand. As can a whole variety of power tools. That does not render them illegal to sell to adults.
  
  8 replies →
- andsoitis 12 hours ago
  
  > An LLM can trivially instruct someone to take medications with adverse interactions,
  What’s an example of such a medication that does not require a prescription?
  
  5 replies →
- amelius 11 hours ago
  
  Yeah, give it access to some bitcoin and the internet, and it can definitely cause deaths.
- pyuser583 20 hours ago
  
  The problem is “safety” prevents users from using LLMs to meet their requirements.
  We typically don’t critique the requirements of users, at least not in functionality.
  The marketing angle is that this measure is needed because LLMs are “so powerful it would be unethical not to!”
  AI marketers are continually emphasizing how powerful their software is. “Safety” reinforces this.
  “Safety” also brings up many of the debates “mis/disinformation” brings up. Misinformation concerns consistently overestimate the power of social media.
  I’d feel much better if “safety” focused on preventing unexpected behavior, rather than evaluating the motives of users.
- anonymoushn 20 hours ago
  
  The closed weights models from OpenAI already do these things though
- 123yawaworht456 1 day ago
  
  does your CPU, your OS, your web browser come with ~~built-in censorship~~ safety filters too?
  AI 'safety' is one of the most neurotic twitter-era nanny bullshit things in existence, blatantly obviously invented to regulate small competitors out of existence.
  
  11 replies →
- bongodongobob 1 day ago
  
  Books can do this too.
  
  8 replies →
- buyucu 19 hours ago
  
  At the end of the day an LM is just a machine that talks. It might say silly things, bad things, nonsensical things, or even crazy insane things. But end the end of the day it just talks. Words don't kill.
  LM safety is just a marketing gimmick.
  
  2 replies →
olalonde 1 day ago
Especially since "safety" in this context often just means making sure the model doesn't say things that might offend someone or create PR headaches.
- SV_BubbleTime 21 hours ago
  
  Don’t draw pictures of celebrities.
  Don’t discuss making drugs or bombs.
  Don’t call yourself MechaHitler… which I don’t care that while scenario was objectively funny on its sheer ridiculousness.
  
  2 replies →
razodactyl 2 hours ago

There's a bias that smart people have in underestimating the stupidity of those on the left side of the intelligence curve.
Not even that, children believe anything and more so that of a computer designed to be "harmless".
recursive 1 day ago
I also think it's marketing but kind of for the opposite reason. Basically I don't think any of the current technology can be made safe.
- nomel 1 day ago
  
  Yes, perfection is difficult, but it's relative. It can definitely be made much safer. Looking at the analysis of pre vs post alignment makes this obvious, including when the raw unaligned models are compared to "uncensored" models.
layer8 12 hours ago

It’s about safety for the LLM provider, not necessarily the user.
jrflowers 1 day ago

> Am I the only one who thinks mention of “safety tests” for LLMs is a marketing scheme?
It is. It is also part of Sam Altman’s whole thing about being the guy capable of harnessing the theurgical magicks of his chat bot without shattering the earth. He periodically goes on Twitter or a podcast or whatever and reminds everybody that he will yet again single-handedly save mankind. Dude acts like he’s Buffy the Vampire Slayer
simianwords 18 hours ago

I hope the same people questioning ai safety (which is reasonable) don’t also hold concern on Grok due to the recent incident.
You have to understand that a lot of people do care about these kind of things.
stogot 9 hours ago

At my company (which produces models) almost all the responsible AI jazz is about DEI and banning naughty words. Little actions on preventing bad outcomes
eviks 1 day ago
Why is your definition of safety so limited? Death isn't the only type of harm...
- ryao 1 day ago
  
  There are other forms of safety, but whether a digital parrot says something that people do not like is not a form of safety. They are abusing the term safety for marketing purposes.
  
  5 replies →
ks2048 1 day ago
You could be right about this being an excuse for some other reason, but lots of software has “safety tests” beyond life or death situations.
Most companies, for better or worse (I say for better) don’t want their new chatbot to be a RoboHitler, for example.
- ryao 1 day ago
  
  It is possible to turn any open weight model into that with fine tuning. It is likely possible to do that with closed weight models, even when there is no creator provided sandbox for fine tuning them, through clever prompting and trying over and over again. It is unfortunate, but there really is no avoiding that.
  That said, I am happy to accept the term safety used in other places, but here it just seems like a marketing term. From my recollection, OpenAI had made a push to get regulation that would stifle competition by talking about these things as dangerous and needing safety. Then they backtracked somewhat when they found the proposed regulations would restrict themselves rather than just their competitors. However, they are still pushing this safety narrative that was never really appropriate. They have a term for this called alignment and what they are doing are tests to verify alignment in areas that they deem sensitive so that they have a rough idea to what extent the outputs might contain things that they do not like in those areas.
halfjoking 19 hours ago
It's overblown. Elon shipped Hitler grok straight to prod
Nobody died
- pona-a 17 hours ago
  
  Playing devil's advocate, what if it was more subtle?
  Prolonged use of conversational programs does reliably induce certain mental states in vulnerable populations. When ChatGPT got a bit too agreeable, that was enough for a man to kill himself in a psychotic episode [1]. I don't think this magnitude of delusion was possible with ELIZA, even if the fundamental effect remains the same.
  Could this psychosis be politically weaponized by biasing the model to include certain elements in its responses? We know this rhetoric works: cults have been using love-bombing, apocalypticism, us-vs-them dynamics, assigned special missions, and isolation from external support systems to great success. What we haven't seen is what happens when everyone has a cult recruiter in their pocket, waiting for a critical moment to offer support.
  ChatGPT has an estimated 800 million weekly active users [2]. How many of them would be vulnerable to indoctrination? About 3% of the general population has been involved in a cult [3], but that might be a reflection of conversion efficiency, not vulnerability. Even assuming 5% are vulnerable, that's still 40 million people ready to sacrifice their time, possessions, or even their lives in their delusion.
  [1] https://www.rollingstone.com/culture/culture-features/chatgp...
  [2] https://www.forbes.com/sites/martineparis/2025/04/12/chatgpt...
  [3] https://www.peopleleavecults.com/post/statistics-on-cults
  
  1 reply →
ignoramous 16 hours ago

> Nobody is going to die
Callous. Software does have real impact on real people.
Ex: https://news.ycombinator.com/item?id=44531120

mystraline 1 day ago

To be completely and utterly fair, I trust Deepseek and Qwen (Alibaba) more than American AI companies.

American AI companies have shown they are money and compute eaters, and massively so at that. Billions later, and well, not much to show.

But Deepseek cost $5M to develop, and made multiple novel ways to train.

Oh, and their models and code are all FLOSS. The US companies are closed. Basically, the US ai companies are too busy treating each other as vultures.

NitpickLawyer 20 hours ago
> But Deepseek cost $5M to develop, and made multiple novel ways to train
This is highly contested, and was either a big misunderstanding by everyone reporting it, or maliciously placed there (by a quant company, right before the stock fell a lot for nvda and the rest) depending on who you ask.
If we're being generous and assume no malicious intent (big if), anyone who has trained a big model can tell you that the cost of 1 run is useless in the big scheme of things. There is a lot of cost in getting there, in the failed runs, in the subsequent runs, and so on. The fact that R2 isn't there after ~6 months should say a lot. Sometimes you get a great training run, but no-one is looking at the failed ones and adding up that cost...
- jampa 20 hours ago
  
  They were pretty explicit that this was only the cost in GPU hours to USD for the final run. Journalists and Twitter tech bros just saw an easy headline there. It's the same with Clair Obscur developer's Sandfall, where the people say that the game was made by 30 people, when there were 200 people involved.
  
  2 replies →
kamranjon 1 day ago
Actually the majority of Google models are open source and they also were pretty fundamental in pushing a lot of the techniques in training forward - working in the AI space I’ve read quite a few of their research papers and I really appreciate what they’ve done to share their work and also release their models under licenses that allow you to use them for commercial purposes.
- simonw 1 day ago
  
  "Actually the majority of Google models are open source"
  That's not accurate. The Gemini family of models are all proprietary.
  Google's Gemma models (which are some of the best available local models) are open weights but not technically OSI-compatible open source - they come with usage restrictions: https://ai.google.dev/gemma/terms
  
  2 replies →
Aunche 1 day ago
$5 million was the gpu hour cost of a single training run.
- dumbmrblah 1 day ago
  
  Exactly. Not to minimize Deepseeks tremendous achievement, but that $5 million was just for the training run, not the GPUs used they purchased before, and all the OpenAI API calls they likely used to assist in synthetic data generation.
baobabKoodaa 16 hours ago
> American AI companies have shown they are money and compute eaters
Don't forget they also quite literally eat books
- knicholes 13 hours ago
  
  Who is literally eating books?
  
  1 reply →
ryao 1 day ago
Wasn’t that figure just the cost of the GPUs and nothing else?
- rpdillon 1 day ago
  
  Yeah, I hate that this figure keeps getting thrown around. IIRC, it's the price of 2048 H800s for 2 months at $2/hour/GPU. If you consider months to be 30 days, that's around $5.7M, which lines up. What doesn't line up is ignoring the costs of facilities, salaries, non-cloud hardware, etc. which will dominate costs, I'd expect. $100M seems like a fairer estimate, TBH. The original paper had more than a dozen authors, and DeepSeek had about 150 researchers working on R1, which supports the notion that personnel costs would likely dominate.
  
  3 replies →
- 3eb7988a1663 1 day ago
  
  That is also just the final production run. How many experimental runs were performed before starting the final batch? It could be some ratio like 10 hours of research to every one hour of final training.
- rynn 1 day ago
  
  It was more than $5m
  https://interestingengineering.com/culture/deepseeks-ai-trai...
root_axis 10 hours ago

> But Deepseek cost $5M to develop
Not true. It was $5M to train - it was many more millions in R&D.
IncreasePosts 1 day ago
Deepseek R1 was trained at least partially on the output of other LLMs. So, it might have been much more expensive if they needed to do it themselves from scratch.
- nomel 1 day ago
  
  Lawsuit, since it was against OpenAI TOS: https://hls.harvard.edu/today/deepseek-chatgpt-and-the-globa...
refulgentis 1 day ago

> Billions later, and well, not much to show.
This is obviously false, I'm curious why you included it.
> Oh, and their models and code are all FLOSS.
No?
buyucu 18 hours ago

Deepseek is far more worthy of the name OpenAI than Sam Altman's ClosedAI.

Y_Y 18 hours ago

My hobby: monetizing cynicism.

I go on Polymarket and find things that would make me happy or optimistic about society and tech, and then bet a couple of dollars (of some shitcoin) against them.

e.g. OpenAI releasing an open weights model before September is trading at 81% at time of writing - https://polymarket.com/event/will-openai-release-an-open-sou...

Last month I was up about ten bucks because OpenAI wasn't open, the ceasefire wasn't a ceasefire, and the climate metrics got worse. You can't hedge away all the existential despair, but you can take the sting out of it.

xnx 13 hours ago

> go on Polymarket and find things that would make me happy or optimistic about society and tech, and then bet a couple of dollars (of some shitcoin) against them.
Classic win win bet. Your bet wins -> you make money (win). Your bet loses -> something good happened for society (win).
heeton 17 hours ago

My friend does this and calls it “hedging humanity”. Every time some big political event has happened that bums me out, he’s made a few hundred.
hereme888 17 hours ago
people still use crypto? I thought the hype died around the time when AI boomed.
- yorwba 16 hours ago
  
  People use crypto on Polymarket because it doesn't comply with gambling regulations, so in theory isn't allowed to have US customers. Using crypto as an intermediary lets Polymarket pretend not to know where the money is coming from. Though I think a more robust regulator would call them out on the large volume of betting on US politics on their platform...
  
  2 replies →
- unsupp0rted 17 hours ago
  
  Bitcoin is higher than ever. People can't wait until it gets high enough that they can sell it for dollars, and use those dollars to buy things and make investments in things that are valuable.
  
  13 replies →
- amelius 13 hours ago
  
  people use crypto for speculation, and for (semi)illegal purposes
  only a small percentage of use is for actual legitimate money transfers
- ben_w 17 hours ago
  
  Unfortunatley crypto hype is still high; and I think still on the up, but that's vibes not market analysis.
khurs 14 hours ago

"Gambling can be addictive. Please gamble responsibly. You must be 18 years or older to gamble. If you need help, please contact your local gambling advice group or your doctor"

krackers 1 day ago

Probably the results were worse than K2 model released today. No serious engineer would say it's for "safety" reasons given that ablation nullifies any safety post-training.

simonw 1 day ago
I'm expecting (and indeed hoping) that the open weights OpenAI model is a lot smaller than K2. K2 is 1 trillion parameters and almost a terabyte to download! There's no way I'm running that on my laptop.
I think the sweet spot for local models may be around the 20B size - that's Mistral Small 3.x and some of the Gemma 3 models. They're very capable and run in less than 32GB of RAM.
I really hope OpenAI put one out in that weight class, personally.
- NitpickLawyer 20 hours ago
  
  Early rumours (from a hosting company that apparently got early access) was that you'd need "multiple h100s to run it", so I doubt it's a gemma - mistral small tier model..
  
  1 reply →
- aabhay 13 hours ago
  
  You will get at 20gb model. Distillation is so compute efficient that it’s all but inevitable that if not OpenAI, numerous other companies will do it.
  I would rather have an open weights model that’s the best possible one I can run and fine tune myself, allowing me to exceed SOTA models on the narrower domain my customers care about.

etaioinshrdlu 1 day ago

It's worth remembering that the safety constraints can be successfully removed, as demonstrated by uncensored fine-tunes of Llama.

Havoc 13 hours ago

Pointless security theatre. The community worked out long ago how to strip away any safeguards.

thrance 13 hours ago

Whenever I read something similar I immediately remember how "Open"AI refused to release GTP2 XL at the time because it was "too powerful".

seydor 11 hours ago

> this is new for us

So much for the company that should never be new to that

qoez 12 hours ago

My pet theory is that they delayed this because grok-4 released because they explicitly want to not be seen as competing with them by pulling the usual trick of releasing right around when google does. Feels like a very sam altman move in my model of his mind.

jmugan 10 hours ago

What is their business purpose for releasing an open-weights model? How does it help them? I asked an LLM but it just said vague unconvincing things about ecosystem plays and fights for talent.

brap 9 hours ago

PR

macawfish 9 hours ago

Wow. Twitter is not a serious website anymore. Why are companies and professionals still using it? Is it really like that now, with all that noise from grok floating to the top?

fresh_broccoli 15 hours ago

Delays aside, I wonder what kind of license they're planning to use for their weights.

Will it be restricted like Llama, or fully open like Whisper or Granite?

groggo 4 hours ago

why would OpenAI release an open weight model? Genuinely curious.

buyucu 19 hours ago

Probably ClosedAI's model was not as good as some of the models being released now. They are delaying it to do some last minute benchmark hacking.

dorkdork 1 day ago

Maybe they’re making last minute changes to compete with Grok 4?

gnarlouse 11 hours ago

Honestly, they’re distancing themselves optically/temporally from HerrGrokler newslines

hansmayer 8 hours ago

Is it now coming before or after the release of AGI, which OpenAI, "knows how to build now" ?

stonogo 1 day ago

we'll never hear about this again