So I decided I would buy some API credits with my OpenAI account. I ponied up $20 and started Aider with my new API key set and o3 as the model. I get the following after sending a request:
"litellm.NotFoundError: OpenAIException - Your organization must be verified to use the model `o3`. Please go to: https://platform.openai.com/settings/organization/general and click on Verify Organization. If you just verified, it can take up to 15 minutes for access to propagate."
At that point, the frustration was beginning to creep in. I returned to OpenAI and clicked on "Verify Organization". It turns out, "Verify Organization" actually means "Verify Personal Identity With Third Party" because I was given the following:
"To verify this organization, you’ll need to complete an identity check using our partner Persona."
Sigh I click "Start ID Check" and it opens a new tab for their "partner" Persona. The initial fine print says:
"By filling the checkbox below, you consent to Persona, OpenAI’s vendor, collecting, using, and utilizing its service providers to process your biometric information to verify your identity, identify fraud, and conduct quality assurance for Persona’s platform in accordance with its Privacy Policy and OpenAI’s privacy policy. Your biometric information will be stored for no more than 1 year."
OK, so now, we've gone from "I guess I'll give OpenAI a few bucks for API access" to "I need to verify my organization" to "There's no way in hell I'm agreeing to provide biometric data to a 3rd party I've never heard of that's a 'partner' of the largest AI company and Worldcoin founder. How do I get my $20 back?"
I actually contacted the California AG to get a refund from another AI company after they failed to refund me.
The AG office followed up and I got my refund. Worth my time to file because we should stop letting companies get away with this stuff where they show up with more requirements after paying.
Separately they also do not need my phone number after having my name, address and credit card.
Has anyone got info on why they are taking everyone’s phone number?
(having no insider info:) Because it can be used as a primary key ID across aggregated marketing databases including your voting history / party affiliation, income levels, personality and risk profiles etc etc etc. If a company wants to, and your data hygiene hasn't been tip top, your phone number is a pointer to a ton of intimate if not confidential data. Twitter was fined $150 million for asking for phone numbers under pretense of "protecting your account" or whatever but they actually used it for ad targeting.
>> Wednesday's 9th Circuit decision grew out of revelations that between 2013 and 2019, X mistakenly incorporated users' email addresses and phone numbers into an ad platform that allows companies to use their own marketing lists to target ads on the social platform.
>> In 2022, the Federal Trade Commission fined X $150 million over the privacy gaffe.
>> That same year, Washington resident Glen Morgan brought a class-action complaint against the company. He alleged that the ad-targeting glitch violated a Washington law prohibiting anyone from using “fraudulent, deceptive, or false means” to obtain telephone records of state residents.
>> X urged Dimke to dismiss Morgan's complaint for several reasons. Among other arguments, the company argued merely obtaining a user's phone number from him or her doesn't violate the state pretexting law, which refers to telephone “records.”
>> “If the legislature meant for 'telephone record' to include something as basic as the user’s own number, it surely would have said as much,” X argued in a written motion.
Thank you for this comment… a relative of mine spent a ton of money on an AI product that never came a license he cannot use. I told him to contact his states AG just in case.
Phone number is the only way to reliably stop MOST abuse on a freemium product that doesn't require payment/identity verification upfront. You can easily block VOIP numbers and ensure the person connected to this number is paying for an actual phone plan, which cuts down dramatically on bogus accounts.
Hence why even Facebook requires a unique, non-VOIP phone number to create an account these days.
I'm sure this comment will get downvoted in favor of some other conspiratorial "because they're going to secretly sell my data!" tinfoil post (this is HN of course). But my explanation is the actual reason.
I would love if I could just use email to signup for free accounts everywhere still, but it's just too easily gamed at scale.
the core tech and premise doesnt collect biometric data, but biometric data is collected for training purposes with consent and compensation. There is endless misinformation (willfully and ignorantly) around worldcoin but it is not, at its core, a biometric collection company
I also am using OpenRouter because OpenAI isn't a great fit for me. I also stopped using OpenAI because they expire your API credits even if you don't use them. Yeah, it's only $10, but I'm not spending another dime with them.
Hi - I'm the COO of OpenRouter. In practice we don't expire the credits, but have to reserve the right to, or else we have a uncapped liability literally forever. Can't operate that way :) Everyone who issues credits on a platform has to have some way of expiring them. It's not a profit center for us, or part of our P&L; just a protection we have to have.
then you shouldn’t use OpenRouter.
ToS: 4.2 Credit Expiration; Auto Recharge
OpenRouter reserves the right to expire unused credits three hundred sixty-five (365) days after purchase
There are stories about e.g. Hetzner requiring all sorts of data from people who want to open/verify accounts so perhaps not. Might just be an anti “money laundering” thing. Especially if the credit card company ends up refunding everything..
As someone not in the US, I do a straight nope out whenever I see a Persona request. I advise everyone else to do the same. Afaik, it's used by LinkedIn and Doordash too.
> OK, so now, we've gone from "I guess I'll give OpenAI a few bucks for API access" to "I need to verify my organization" to "There's no way in hell I'm agreeing to provide biometric data to a 3rd party I've never heard of that's a 'partner' of the largest AI company and Worldcoin founder. How do I get my $20 back?"
This should be illegal. How many are going to do the same as you, but then think that the effort/time/hassle they would waste to try to get their money back would not be worth it? At which point you've effectively donated money to a corp that implements anti-consumer anti-patterns.
This is in part "abuse prevention"[1] and in part marketing. Making customers feel like they're signing up to access state secrets makes the models seem more "special". Sama is well known to use these SV marketing tricks, like invite-only access, waiting lists, etc to psychologically manipulate users into thinking they're begging for entry to an exclusive club instead of just swiping a credit card to access an API.
Google tried this with Google Plus and Google Wave, failed spectacularly, and have ironically stopped with this idiotic "marketing by blocking potential users". I can access Gemini Pro 2.5 without providing a blood sample or signing parchment in triplicate.
[1] Not really though, because a significant percentage of OpenAI's revenue is from spammers and bulk-generation of SOE-optimised garbage. Those are valued customers!
This feels eerily similar to a post I've read a within the last month. Either I'm having a deja vu, it's a coincidence that the same exact story is mentioned or theres something else going on
Interesting, it works for me through openrouter, without configured openai integration. Although, I have openai account and did verification with openai before. Conspiricy theory would say that they are exchanging PII so openrouter knows who am I :)
Just send them a random passport photo from the Internet, what's the deal? Probably they are just vibe-verifying the photo with "Is it legit passport?" prompt anyways.
Follow-up. I pasted the above into ChatGPT because I genuinely wanted to know how to get my 20 bucks back. That request is in and they say I'll get a response in 2-3 days. I then asked it "What are the potential risks of agreeing to their terms?" Here's the response:
Agreeing to Persona’s terms, especially for biometric identity verification, involves both privacy and long-term data security risks. Here’s a clear breakdown of the main risks you should be aware of:
1. Biometric Data Collection
Risk: Biometric identifiers (like facial recognition, voiceprints, etc.) are extremely sensitive and irreplaceable if compromised.
What they collect: Persona may collect a selfie, video, and metadata, and extract biometric templates from those for facial comparison and liveness detection.
If leaked or abused: Unlike passwords, you can't change your face. A future data breach or misuse could lead to permanent identity compromise.
2. Data Storage & Retention
Risk: Persona says biometric data is kept for up to one year, but:
You’re relying on their internal policies, not a legal guarantee.
There’s no technical detail on how securely it’s stored or whether it’s encrypted at rest.
Worst-case scenario: Poorly secured biometric templates could be stolen, reused, or matched against other data sets by bad actors or governments.
3. Third-Party Sharing and Surveillance Risks
Risk: Your biometric and ID data may be shared with subprocessors (partners/vendors) that you haven’t explicitly vetted.
Persona may transfer your data to cloud providers (like AWS, GCP), verification specialists, or fraud prevention services.
Depending on jurisdiction, data could be subject to subpoenas, surveillance laws, or government backdoors (especially in the U.S.).
4. Consent Ambiguity & Future Use
Risk: The fine print often includes vague consent for "quality assurance", "model improvement", or "fraud detection".
This opens the door to retraining algorithms on your biometric data—even if anonymized, that's still a use of your body as data.
Their privacy policy may evolve, and new uses of your data could be added later unless you opt out (which may not always be possible).
Should You Agree?
Only if:
You absolutely need the service that requires this verification.
You’re aware of the privacy tradeoff and are okay with it.
You trust that Persona and its partners won’t misuse your biometric data—even a year down the line.
If you’re uneasy about this, you’re not alone. Many developers and privacy advocates refuse to verify with biometrics for non-critical services, and companies like OpenAI are increasingly facing criticism for requiring this.
Has anyone noticed that OpenAI has become "lazy"? When I ask questions now it will not give me a complete file or fix. Instead it tells me what I should do and I need to ask a second or third time to just do the thing I asked.
I don't see this happening with for example deepseek.
Is it possible they are saving on resources by having it answer that way?
Yeah, our models are sometimes too lazy. It’s not intentional, and future models will be less lazy.
When I worked at Netflix I sometimes heard the same speculation about intentionally bad recommendations, which people theorized would lower streaming and increase profit margins. It made even less sense there as streaming costs are usually less than a penny. In reality, it’s just hard to make perfect products!
I think it's good. The model will probably make some mistake at first. Not doing the whole thing and just telling the user the direction it's going in gives us a chance to correct its mistakes.
Had a fun experience the other day asking "make a graph of [X] vs [Y]" (some chemistry calculations), and the response was blah blah blah explain explain "let me know if you want a graph of this!" Yeah ok thanks for offering.
I've been turned off with OpenAI and have been actively avoiding using any of their models for a while, luckily this is easy to do given the quality of Sonnet 4 / Gemini Pro 2.5.
Although I've always wondered how OpenAI could get away with o3's astronomical pricing, what does o3 do better than any other model to justify their premium cost?
It's just a highly unoptimized space. There is very little market consolidation at this point, everyone is trying things out that lead to wildly different outcomes and processes and costs, even though in the end it's always just a bunch of utf-8 characters. o3 was probably just super expensive to run, and now, apparently, it's not anymore and can beat sonnet/opus 4 on pricing. It's fairly wild.
> Very few customers pick the model based on cost.
What? 3 ou of 4 companies I consulted for that started using AI for coding marked cost as an important criteria. The 4th one has virtually infinite funding so they just don't care.
Gemini has been better than Claude for me on a coding project. Claude kept telling me it update some code but the update wasn't in the output. Like, I had to re-prompt just for updated output 5 times in a row.
If the competition boils down to who has access to the largest amount of high quality data, it's hard to see how anyone but Google could win in the end: through Google Books they have scans of tens of millions of books, and published books are the highest quality texts there are.
I've been learning vietnamese. Unfortunately, a lot of social media (reddit, fb, etc) has a new generation of language. The younger generation uses so much abbreviations and acronyms, ChatGPT and Google Translate can't keep up.
I think if you're goal is to have properly written langauge using older writing styles, then you're correct.
I have been using Google’s models the past couple months, and was surprised to see how sycophantic chatGPT is now. It’s not just at the start or end of responses, it’s interspaced within the markdown, with little substance. Asking it to change its style makes it overuse technical terms.
how do we know it's not a quantized version of o3? what's stopping these firms from announcing the full model to perform well on the benchmarks and then gradually quantizing it (first at Q8 so no one notices, then Q6, then Q4, ...).
I have a suspicion that's how they were able to get gpt-4-turbo so fast. In practice, I found it inferior to the original GPT-4 but the company probably benchmaxxed the hell out of the turbo and 4o versions so even though they were worse models, users found them more pleasing.
Where are you getting this information? What basis do you have for making this claim? OpenAI, despite its public drama, is still a massive brand and if this were exposed, would tank the company's reputation. I think making baseless claims like this is dangerous for HN
I swear every time a new model is released it's great at first but then performance gets worse over time. I figured they were fine-tuning it to get rid of bad output which also nerfed the really good output. Now I'm wondering if they were quantizing it.
I've heard lots of people say that, but no objective reproducible benchmarks confirm such a thing happening often. Could this simply be a case of novelty/excitement for a new model fading away as you learn more about its shortcomings?
I suspect what's happening is that lots of people have a collection of questions / private evals that they've been testing on every new model, and when a new model comes out it sometimes can answer a question that previous models couldn't. So that selects for questions where the new model is at the edge of its capabilities and probably got lucky. But when you come up with a new question, it's generally going to be on the level of the questions the new model is newly able to solve.
Like I suspect if there was a "new" model which was best-of-256 sampling of gpt-3.5-turbo that too would seem like a really exciting model for the first little bit after it came out, because it could probably solve a lot of problems current top models struggle with (which people would notice immediately) while failing to do lots of things that are a breeze for top models (which would take people a little bit to notice).
I'm pretty sure this is just a psychological phenomenon. When a new model is released all the capabilities the new model has that the old model lacks are very salient. This makes it seem amazing. Then you get used to the model, push it to the frontier, and suddenly the most salient memories of the new model are it's failures.
There are tons of benchmarks that don't show any regressions. Even small and unpublished ones rarely show regressions.
That was my suspicion when I first deleted my account, when it felt the output got worse in ChatGPT and I found highly suspicious when I saw an errand davinci model keyword in the chatgpt url.
Now I'm feeling similarly with their image generation (which is the only reason I created a paid account two months ago, and the output looks more generic by default).
It's the same model, no quantization, no gimmicks.
In the API, we never make silent changes to models, as that would be super annoying to API developers [1]. In ChatGPT, it's a little less clear when we update models because we don't want to bombard regular users with version numbers in the UI, but it's still not totally silent/opaque - we document all model updates in the ChatGPT release notes [2].
[1] chatgpt-4o-latest is an exception; we explicitly update this model pointer without warning.
Is this what happened to Gemini 2.5 Pro? It used to be very good, but it's started struggling on basic tasks.
The thing that gets me is it seems to be lying about fetching a web page. It will say things are there that were never on any version of the page and it sometimes takes multiple screenshots of the page to convince it that it's wrong.
IMO 2.5 Pro 03-25 was insanely good. I suspect it was also very expensive to run. The 05-06 release was a huge regression in quality, most people saying it was a better coder and a worse writer. They tested a few different variants and some were less bad then others, but overall it was painful to lose access to such a good model. The just released 06-05 version seems to be uniformly better than 05-06, with far fewer "wow this thing is dumb as a rock" failure modes, but it still is not as strong as the 03-25 release.
Entirely anecdotally, 06-05 seems to exactly ride the line of "good enough to be the best, but no better than that" presumably to save costs versus the OG 03-25.
In addition, Google is doing something notably different between what you get on AI Studio versus the Gemini site/app. Maybe a different system prompt. There have been a lot of anecdotal comparisons on /r/bard and I do think the AI Studio version is better.
good question, and I don't know of any, although it's a no brainer that someone should make it.
a proxy to that may be the anecdotal evidence of users who report back in a month that model X has gotten dumber (started with gpt-4 and keeps happening, esp. with Anthro and OpenAI models). I haven't heard such anecdotal stories about Gemini, R1, etc.
Quantization is a massive efficiency gain for near negligible drop in quality. If the tradeoff is quantization for an 80 percent price drop I would take that any day of the week.
You may be right that the tradeoff is worth it, but it should be advertised as such. You shouldn't think you're paying for full o3, even if they're heavily discounting it.
I would like the option to pay for the unquantized version. For creative or story writing (D&D campaign materials and such) quantization seems to end up in much weaker word selection and phrasing. There are small semantic missteps that break the illusion the LLM understands what it's writing. I find it jarring and deeply immersion breaking. I'd prefer prototype prompts on a cheaper quantized version, but I want to be able to spend 50 cents an API call to get golden output.
I don't work for OAI so obviously I can't say for them. But we don't do this.
We don't make hobbyist mistakes of randomly YOLO trying various "quantization" methods that only happen after all training and claim it a day, at all. Quantization was done before it went live.
It's probably optimized in some way, but if the optimizations degrade performance, let's hope it is reflected in various benchmarks. One alternative hypothesis is that it's the same model, but in the early days they make it think "harder" and run a meta-process to collect training data for reinforcement learning for use on future models.
You can just give it a go for very little money (in Windsurf it's 1x right now), and see what it does. There is no room for conspiracy here, because you can simple look at what it does. If you don't like it, so won't others, and then people will not use it. People are obviously very capable of (collectively) forming opinions on models, and then vote with their wallet.
Is there also a corresponding increase in weekly messages for ChatGPT Plus users with o3?
In my experience, o4-mini and o4-mini-high are far behind o3 in utility, but since I’m rate-limited for the latter, I end up primarily using the former, which has kind of reinforced the perception that OpenAI’s thinking models are behind the competition altogether.
My usage has also reflected the pretty heavy rate limits on o3. I find o4-mini-high to be quite good, but I agree that I would much rather use o3. Hoping this means an increase in the limits.
That’s already been the case for a few weeks though, right? and it’s up from 50, whereas a price reduction by 80% would correspond to 5x the quota extrapolating linearly.
Despite the popular take that LLMs have no moat and are burning cash, I find OpenAI's situation really promising.
Just yesterday, they reported an annualized revenue run rate of 10B. Their last funding round in March valued them at 300B. Despite losing 5B last year, they are growing really fast - 30x revenue with over 500M active users.
It reminds me a lot of Uber in its earlier years—fast growth, heavy investment, but edging closer to profitability.
The problem is your costs also scale with revenue. Ideally you want to have control costs as you scale (the first you build is expensive, but as you make more your costs come down).
For OpenAI, the more people use the product, the same you spend on compute unless they can supplement it with another ways of generating revenue.
I dont unfortunately think OpenAI will be able to hit sustained profitability (see Netflix for another example)
All costs are not equal. There is a classic pattern of dogfights for winner-take-most product categories where the long term winner does the best job of acquiring customers at the expense of things like "engineering to reduce costs". I have no idea how the AI space is going to shake out, but if I had to pick between OpenAI's mindshare in the broadest possible cohort of users vs. best/most efficient model, I'd pick the customers.
Obviously, lots of nerds on HN have preferences for Gemini and Claude, and having used all three I completely get why that is. But we should remember we're not representative of the whole addressable market. There were probably nerds on like ancient dial-up bulletin boards explaining why Betamax was going to win, too.
Anyone concerned about cost should remember that those costs are dropping exponenentially.
Similarly, nearly all AI products but especially OpenAI are heavily _under_ monetized. OpenAI is an excellent personal shopper - the ad revenue that could be generated from that rivals Facebook or Google.
You raise a good point that this isn't a low marginal cost business like software, telecom, or (most of) the web. Efficiency will be a big advantage for companies that can achieve it, in part because it will let them scale to new AI use cases.
With the race to get new models out the door, I doubt any of these companies have done much to optimize cost so far. Google is a partial exception – they began developing the TPU ten years ago and the rest of their infrastructure has been optimized over the years to serve computationally expensive products (search, gmail, youtube, etc.).
As an anecdote they have first mover advantage on me. I pay monthly but mostly because it’s good enough and I can’t be bothered to try a bunch out and switch. But if the dust settles and prices drop i would be motivated to switch. How much that matters maybe depends if their revenue comes from app users or API plans. And first mover only works once. Now they maybe coasting on name recognition, but otherwise new users maybe load balanced among all the options.
I mean sure, it's very promising if OpenAI's future is your only metric. It gets notably darker if you look at the broader picture of ChatGPT (and company)'s impact on our society.
* We have people uploading tons of zero-effort slop pieces to all manner of online storefronts, and making people less likely to buy overall because they assume everything is AI now
* We have an uncomfortable community of, to be blunt, actual cultists emerging around ChatGPT, doing all kinds of shit from annoying their friends and family all the way up to divorcing their spouses
* Education is struggling in all kinds of ways due to students using (and abusing) the tech, with already strained administrations struggling to figure out how to navigate it
Like yeah if your only metric is OpenAI's particular line going up, it's looking alright. And much like Uber, it's success seems to be corrosive to the society in which it operates. Is this supposed to be good news?
Yes but in a typical western business sense they are merely optimizing for user engadgement and profits. What happens to society a decade from now because of all the slop being produced, that is not their concern. Facebook is just about connecting friends right, totally wont become a series of information moats and bubbles controlled by the algorithms...
A great communicator on the risks of AI being to heavily intergrated into society is Zak Stein. As someone who works in education, they are see first hand how people are becoming dependent on this stuff rather than any kind of self improvement. The people who are just handing over all their thinking to the machine. It is very bizarre and I am seeing it in my personal experience a lot more over the last few months.
The moat is increasingly becoming having access to billions needed to finance the infrastructure needed to serve billions. That's why Google is still in the game. They have that and they are very good at massive scale and have some cost advantages there.
OpenAI is very good at this as well because of their brand name. For many people ChatGPT is all they know. That's the one that's in the news. That's the one everybody keeps talking about. They have many millions of paying users at this point.
This is a non trivial moat. If you can only be successful by not serving most of the market for cost reasons, then you can't be successful. It's how Google has been able to guard its search empire for a quarter century. It's easy to match what they do algorithmically. But then growing from a niche search engine that has maybe a few tens of thousands of users (e.g. Kagi) to Google scale serving essentially most of this planet (minus some fire walled countries like Russia and China), is a bit of a journey.
So Google rolling out search integration is a big deal. It means they are readying themselves for that scale and will have billions of users exposed to this soon.
> Their last funding round in March valued them at 300B. Despite losing 5B last year, they are growing really fast
Yes, they are valued based on world+dog needing agentic AIs and subscribing to the extent of tens or hundreds of dollars/month. It's going to outstrip revenue things like MS Office in its prime.
5B loss is peanuts compared to that. If they weren't burning that, their ambition level would be too low.
Uber now has a substantial portion of the month. They have about 3-4 billion revenue per month. A lot of cost obviously. But they managed 10B profit last year. And they are not done growing yet. They were overvalued at some point and then they crashed, but they are still there and it's a pretty healthy business at this point and that reflects in their stock price. It's basically valued higher now than at the time of the Softbank investment pre-IPO. Of course a lot of stuff needed to be sorted out for that to happen.
I would wager most of their revenue is from the subscriptions - both consumer and business. That pricing is detached from the API pricing. The heavy emphasis on applications more recently is because they realize this as well.
I don't think the no moat approach makes sense. In a world where more an more content and interaction is done with and via LLMs, the data of your users chatting with your LLM is a super valuable dataset.
when the race to the bottom reaches the bottom, the foundation model companies will be bought by ... energy companies. You 'll be paying for AI with your electricity bill
Yes - it’s common in traditional industries too. In my home town the aluminum refinery bought the power station to improve reliability (I should add - through upgrades not screwing over the rest of the power users).
You know. because LLMs can only be built by corporations... but because they're so easy to build, I see the price going down massively thanks to competition. Consumers benefit because all the companies are trying to out run each other.
And then they all go out of business, since models cost a fortune to build, and their fan club is left staring at their computers trying to remember how to do anything without getting it served on a silver plate.
Investors pouring money, its probably impossible to go out of business, at least for the big ones, until investors realise this is wrong hill to die on.
I expect they don't go out of business: at worst they don't start their next training run quite as aggressively and instead let their new very good model be profitable for a minute
Many many companies are currently thrilled to pay the current model prices for no performance improvement for 2-3 years
We still have so many features to build on top of current capabilities
Sure is - and o3 is missing from the OpenAI models that Azure is serving, which I suspect isn’t a coincidence - if OpenAI has some secret sauce that lets them undercut resellers this might shake up agreements for a bit.
Can we know for sure that the price drop is accompanied by a change in the model such as quantization?
On twitter, some people say that some models perform better at night when there is a less demand which allows them to serve a non-quantized model.
Since the models are only available through API and there is no test to check which version of the model is served, it's hard to know what we're buying...
Curious that the number of usages for plus users remained the same. I don't think they're actually doing anything material to lower the cost by a meaningful amount. It's just margin they've always had, and they cut it because magistral is pretty incredible for being completely free
It was only a matter of time considering Deepseek R1’s recent release. OpenAI’s competitor is an open-source product that offers similar quality at a tenth of the cost. Now they’re just trying to prevent customers from leaving.
I've asked it a question not in it's dataset three different ways and I see the same three sentences in the response, word for word, which could imply it's caching the core answer. I hadn't previously seen this behavior before this last week.
> Now cheaper than gpt-4o and same price as gpt-4.1 (!).
This is where the naming choices get confusing. "Should" o3 cost more or less than GPT-4.1? Which is more capable? A generation 3 of tech intuitively feels less advanced than a 4.1 of a (similar) tech.
No, people had tested it after Altman's announcement and had confirmed that they were still being billed at the original price. And I checked the docs ~1h after and they still showed the original price.
The speculation of only input pricing being lowered was because yesterday they gave out vouchers for 1M free input tokens while output tokens were still billed.
thinking models produce a lot of internal output tokens making them more expensive than non-reasoning models for similar prompt and visible output lengths
I got the opposite experience. Not with Opus (too expensive), but with Sonnet. I got things done way more efficiently when using Sonnet with Roo than with Claude Code.
i found them all disappointing in their own ways. Atleast deepseek models actually listen to what i say instead of ignoring me doing their own thing like a toddler.
Presumably because the price was about 5x higher to begin with than any the competitors at the same tier of performance? Perhaps it's better to get paid anything at all than to just lose 100% of the customers.
Deepseek made a few major innovations allowing them to achieve major compute efficiency and then published them. My guess is that OpenAI just implemented these themselves.
it used to take decades of erosion to make google search a hot mess, now that everything's happening in light speed, we get days for AI models to decay to the point of hot mess again..
Especially now that they are second in the race (behind Anthropic) and lot of free-to-download and free-to-use models are now starting to be viable competitors.
Once new MacBooks and iPhones have enough memory onboard this is going to be a disaster for OpenAI and other providers.
I'm not sure they're scared of Anthropic - they're doing great work but afaict running into some scaling issues and really focused on winning over developers at the moment.
If I was OpenAI (or Anthropic for that matter) I would remain scared of Google, who is now awake and able to dump Gemini 2.5 pro on the market at costs that I'm not sure people without their own hardware can compete with, and with the infrastructure to handle everyone switching to them tomorrow.
While mac unified ram inference is great for prosumers+ I really don't foresee Apple making 128GB+ options affordable enough to be attractive for inference for the general public. iPhone even less so considering the latest is only at 8GB. Meanwhile the best model sizes will just keep growing.
Third behind Anthropic/Google. People are too quick to discount mindshare though. For the vast majority of the world's population AI = LLM = ChatGPT, and that itself will keep OpenAI years ahead of the competition as long as they don't blunder away that audience.
LLM inferencing is race to the bottom but the service layers on top isn’t. People always pay much more for convenience, those are the thing OpenAI focuses on and is harder to replicate
My understanding was that OpenAI couldn't make money at their previous price point, and I don't think operation and training cost have gone down sufficiently to make up for those short comings. So how are they going to make money by lowering the price by 80%?
I get the point is to be the last man standing, and poaching customers by lowering the price, and perhaps attract a few people who wouldn't have bought a subscription at the higher price. I just question how long investors can justify pouring money into OpenAI. OpenAI is also the poster child for modern AI, so if they fail the market will react badly.
Mostly I don't understand Silicon Valley venture capital, but dumping price, making wild purchases for investor money and mostly only leading on branding, why isn't this a sign that OpenAI is failing?
There was an article on here a week or two ago on batch inference.
Do you not think that batch inference gives at least a bit of a moat whereby unit costs fall with more prompts per unit of time, especially if models get more complicated and larger in the future?
for sure they are no longer clear winners, but they try to be just barely on top of others.
right now new Gemini surpassed their o3 (barely) in benchmarks for significantly less money so they cut pricing to be still competitive.
I bet they didn't released o4 not because it's not competitive, but because they are doing Nvidia game: release new product that is just enough better to convince people to buy it.
so IMO they are holding full o4 model to have something to release after competition release something better that their top horse
I don't know if this is OpenAI's intention, but the little message "you've reached your usage limit!" is actively disincentivizing me from subscribing. For my purposes, the free model is more than good enough; the difference before and after is negligible. I honestly wouldn't pay a dollar.
That said, I'm absolutely willing to hear people out on "value-adds" I am missing out on; I'm not a knee-jerk hater (For context, I work with large, complex & private databases/platforms, so its not really possible for me to do anything but ask for scripting suggestions).
Also, I am 100% expecting a sad day when I'll be forced to subscribe, unless I want to read dick pill ads shoehorned in to the answers (looking at you, YouTube). I do worry about getting dependent on this tool and watching it become enshittified.
Just switch to a competitors free offering. There are enough to cycle through not to be hindered by limits. I wonder how much money I have cost those companies by now?
How anyone believes there is any moat for anyone here is beyond me.
I'd like to offer a cautionary tale that involves my experience after seeing this post.
First, I tried enabling o3 via OpenRouter since I have credits with them already. I was met with the following:
"OpenAI requires bringing your own API key to use o3 over the API. Set up here: https://openrouter.ai/settings/integrations"
So I decided I would buy some API credits with my OpenAI account. I ponied up $20 and started Aider with my new API key set and o3 as the model. I get the following after sending a request:
"litellm.NotFoundError: OpenAIException - Your organization must be verified to use the model `o3`. Please go to: https://platform.openai.com/settings/organization/general and click on Verify Organization. If you just verified, it can take up to 15 minutes for access to propagate."
At that point, the frustration was beginning to creep in. I returned to OpenAI and clicked on "Verify Organization". It turns out, "Verify Organization" actually means "Verify Personal Identity With Third Party" because I was given the following:
"To verify this organization, you’ll need to complete an identity check using our partner Persona."
Sigh I click "Start ID Check" and it opens a new tab for their "partner" Persona. The initial fine print says:
"By filling the checkbox below, you consent to Persona, OpenAI’s vendor, collecting, using, and utilizing its service providers to process your biometric information to verify your identity, identify fraud, and conduct quality assurance for Persona’s platform in accordance with its Privacy Policy and OpenAI’s privacy policy. Your biometric information will be stored for no more than 1 year."
OK, so now, we've gone from "I guess I'll give OpenAI a few bucks for API access" to "I need to verify my organization" to "There's no way in hell I'm agreeing to provide biometric data to a 3rd party I've never heard of that's a 'partner' of the largest AI company and Worldcoin founder. How do I get my $20 back?"
I actually contacted the California AG to get a refund from another AI company after they failed to refund me.
The AG office followed up and I got my refund. Worth my time to file because we should stop letting companies get away with this stuff where they show up with more requirements after paying.
Separately they also do not need my phone number after having my name, address and credit card.
Has anyone got info on why they are taking everyone’s phone number?
(having no insider info:) Because it can be used as a primary key ID across aggregated marketing databases including your voting history / party affiliation, income levels, personality and risk profiles etc etc etc. If a company wants to, and your data hygiene hasn't been tip top, your phone number is a pointer to a ton of intimate if not confidential data. Twitter was fined $150 million for asking for phone numbers under pretense of "protecting your account" or whatever but they actually used it for ad targeting.
>> Wednesday's 9th Circuit decision grew out of revelations that between 2013 and 2019, X mistakenly incorporated users' email addresses and phone numbers into an ad platform that allows companies to use their own marketing lists to target ads on the social platform.
>> In 2022, the Federal Trade Commission fined X $150 million over the privacy gaffe.
>> That same year, Washington resident Glen Morgan brought a class-action complaint against the company. He alleged that the ad-targeting glitch violated a Washington law prohibiting anyone from using “fraudulent, deceptive, or false means” to obtain telephone records of state residents.
>> X urged Dimke to dismiss Morgan's complaint for several reasons. Among other arguments, the company argued merely obtaining a user's phone number from him or her doesn't violate the state pretexting law, which refers to telephone “records.”
>> “If the legislature meant for 'telephone record' to include something as basic as the user’s own number, it surely would have said as much,” X argued in a written motion.
https://www.mediapost.com/publications/article/405501/None
25 replies →
Thank you for this comment… a relative of mine spent a ton of money on an AI product that never came a license he cannot use. I told him to contact his states AG just in case.
Source: have dealt with fraud at scale before.
Phone number is the only way to reliably stop MOST abuse on a freemium product that doesn't require payment/identity verification upfront. You can easily block VOIP numbers and ensure the person connected to this number is paying for an actual phone plan, which cuts down dramatically on bogus accounts.
Hence why even Facebook requires a unique, non-VOIP phone number to create an account these days.
I'm sure this comment will get downvoted in favor of some other conspiratorial "because they're going to secretly sell my data!" tinfoil post (this is HN of course). But my explanation is the actual reason.
I would love if I could just use email to signup for free accounts everywhere still, but it's just too easily gamed at scale.
9 replies →
Doesn’t Sam Altman own a crypto currency company [1] that specifically collects biometric data to identify people?
Seems familiar…
[1] https://www.forbes.com/advisor/investing/cryptocurrency/what...
GP did mention this :)
> I've never heard of that's a 'partner' of the largest AI company and Worldcoin founder
the core tech and premise doesnt collect biometric data, but biometric data is collected for training purposes with consent and compensation. There is endless misinformation (willfully and ignorantly) around worldcoin but it is not, at its core, a biometric collection company
2 replies →
I also am using OpenRouter because OpenAI isn't a great fit for me. I also stopped using OpenAI because they expire your API credits even if you don't use them. Yeah, it's only $10, but I'm not spending another dime with them.
Hi - I'm the COO of OpenRouter. In practice we don't expire the credits, but have to reserve the right to, or else we have a uncapped liability literally forever. Can't operate that way :) Everyone who issues credits on a platform has to have some way of expiring them. It's not a profit center for us, or part of our P&L; just a protection we have to have.
8 replies →
I wonder if they do this everywhere, in certain jurisdictions this is illegal.
then you shouldn’t use OpenRouter. ToS: 4.2 Credit Expiration; Auto Recharge OpenRouter reserves the right to expire unused credits three hundred sixty-five (365) days after purchase
That is so sleezy.
After how long do they expire?
2 replies →
I suspect their data collection might not be legal in the EU.
https://withpersona.com/legal/privacy-policy
To me it looks like an extremely aggressive data pump.
There are stories about e.g. Hetzner requiring all sorts of data from people who want to open/verify accounts so perhaps not. Might just be an anti “money laundering” thing. Especially if the credit card company ends up refunding everything..
7 replies →
As someone not in the US, I do a straight nope out whenever I see a Persona request. I advise everyone else to do the same. Afaik, it's used by LinkedIn and Doordash too.
Oh I also recently got locked out of my linkedin account until I supply data to Persona.
(So I’m remaining locked out of my linkedin account.)
> How do I get my $20 back?
Contact support and ask for a refund. Then a charge back.
KYC requirement + OpenAI preserving all logs in the same week?
OpenAI introduced this with the public availability of o3, so no.
It's also the only LLM provider which has this.
What OpenAI has that the others don't is SamA's insatiable thirst for everyone's biometric data.
I think KYC has been beaten by AI agents according to RepliBench [0] as obtaining compute requires KYC which has a high success rate in the graphic.
[0] https://www.aisi.gov.uk/work/replibench-measuring-autonomous...
KYC has been around for a few months I believe. Whenever they released some of the additional thought logs you had to be verified.
Meanwhile the FSB and Mossad happily generate fake identities on demand.
The whole point of identity verification is for the same Mossad to gather your complete profile and everything else they can from OpenAI.
Since Mossad and CIA is essentially one organization they already do it, 100%.
You are even luck to be able to verify. Mine give me an error about "Session expired" for months!! Support do not reply.
I was more excited by the process, like, there exists a model out there so powerful it requires KYC
which, after using it, fair! It found a zero day
I think they're probably more concerned about fake accounts and people finding ways to get free stuff.
7 replies →
> which, after using it, fair! It found a zero day
Source?
2 replies →
[flagged]
6 replies →
I was excited about trying o3 for my apps but I'm not doing this validation.. thanks for the heads up.
> OK, so now, we've gone from "I guess I'll give OpenAI a few bucks for API access" to "I need to verify my organization" to "There's no way in hell I'm agreeing to provide biometric data to a 3rd party I've never heard of that's a 'partner' of the largest AI company and Worldcoin founder. How do I get my $20 back?"
This should be illegal. How many are going to do the same as you, but then think that the effort/time/hassle they would waste to try to get their money back would not be worth it? At which point you've effectively donated money to a corp that implements anti-consumer anti-patterns.
Yeah, same. I am a paying API customer but I am not doing biometric KYC to talk to a bot.
This is in part "abuse prevention"[1] and in part marketing. Making customers feel like they're signing up to access state secrets makes the models seem more "special". Sama is well known to use these SV marketing tricks, like invite-only access, waiting lists, etc to psychologically manipulate users into thinking they're begging for entry to an exclusive club instead of just swiping a credit card to access an API.
Google tried this with Google Plus and Google Wave, failed spectacularly, and have ironically stopped with this idiotic "marketing by blocking potential users". I can access Gemini Pro 2.5 without providing a blood sample or signing parchment in triplicate.
[1] Not really though, because a significant percentage of OpenAI's revenue is from spammers and bulk-generation of SOE-optimised garbage. Those are valued customers!
Gemini doesn't give you reasoning via API though, at least as far as I'm aware.
4 replies →
HN Don’t Hate Marketing Challenge
Difficulty: Impossible
This feels eerily similar to a post I've read a within the last month. Either I'm having a deja vu, it's a coincidence that the same exact story is mentioned or theres something else going on
What should be going on? A regular Google search for "openai persona verify organization" shows withpersona.com in the second search result.
2 replies →
This is OpenAI’s fairly dystopian process, so the exact same thing happens to lots of people.
It's a concerted attempt to de-anonymise the internet. Corporate entities are jostling for position as id authorities.
This is just the process for OpenAI. It's the same process I went through as well.
this reminds me of how broadcom maintains the “free” tier of vmware.
Can you explain? Is it not actually free?
1 reply →
Interesting, it works for me through openrouter, without configured openai integration. Although, I have openai account and did verification with openai before. Conspiricy theory would say that they are exchanging PII so openrouter knows who am I :)
With all this plus the saving of all chats they can't operate on the EU. But they do ....
I don't recall Persona being in the mix last year when I signed up. Interesting development.
I think it was a recent thing [1], but I thought they were only considering it
[1] https://techcrunch.com/2025/04/13/access-to-future-ai-models...
Thank you for reminding me why I’ll never touch these LLM services.
>ID Check
Just send them a random passport photo from the Internet, what's the deal? Probably they are just vibe-verifying the photo with "Is it legit passport?" prompt anyways.
I think modern face verification has moved on, it's been video in all my encounters.
8 replies →
It requires video and an app. They are collecting facial biometrics.
3 replies →
Crazy, I already gave up registering for chatgpt because they asked for my phone number. I'm not giving that to any random startup.
Welcome to tech dystopia.
Hello Human Resource, we have all your data, please upload your bio-metric identity, as well as your personal thoughts.
Building the next phase of a corporate totalitarian state, thank you for your cooperation.
Follow-up. I pasted the above into ChatGPT because I genuinely wanted to know how to get my 20 bucks back. That request is in and they say I'll get a response in 2-3 days. I then asked it "What are the potential risks of agreeing to their terms?" Here's the response:
Agreeing to Persona’s terms, especially for biometric identity verification, involves both privacy and long-term data security risks. Here’s a clear breakdown of the main risks you should be aware of: 1. Biometric Data Collection
Risk: Biometric identifiers (like facial recognition, voiceprints, etc.) are extremely sensitive and irreplaceable if compromised.
Risk: Persona says biometric data is kept for up to one year, but:
Risk: Your biometric and ID data may be shared with subprocessors (partners/vendors) that you haven’t explicitly vetted.
Risk: The fine print often includes vague consent for "quality assurance", "model improvement", or "fraud detection".
Should You Agree?
Only if:
If you’re uneasy about this, you’re not alone. Many developers and privacy advocates refuse to verify with biometrics for non-critical services, and companies like OpenAI are increasingly facing criticism for requiring this.
Huh, I never considered using ChatGPT to read terms of serivce for me. Nice idea.
[dead]
o3 is really powerful. I understand it tbh. They don't want scammers and abusers easily accessing it
Has anyone noticed that OpenAI has become "lazy"? When I ask questions now it will not give me a complete file or fix. Instead it tells me what I should do and I need to ask a second or third time to just do the thing I asked.
I don't see this happening with for example deepseek.
Is it possible they are saving on resources by having it answer that way?
Yeah, our models are sometimes too lazy. It’s not intentional, and future models will be less lazy.
When I worked at Netflix I sometimes heard the same speculation about intentionally bad recommendations, which people theorized would lower streaming and increase profit margins. It made even less sense there as streaming costs are usually less than a penny. In reality, it’s just hard to make perfect products!
(I work at OpenAI.)
Please be careful about the alternative. I’ve seen o3 doing excessive tool calls and research for relatively simple problems.
3 replies →
That was a problem in GPT 4 Turbo as well...
2 replies →
I think it's good. The model will probably make some mistake at first. Not doing the whole thing and just telling the user the direction it's going in gives us a chance to correct its mistakes.
but maybe you are saying that because you are a CIA plant that's trying to make the product bad because of complex reasons.
takes tinfoil hat off
Oh, nvm, that makes sense.
Can you share what are the main challenges OpenAI has been facing in terms of increasing access to top-tier and non-lazy models?
Had a fun experience the other day asking "make a graph of [X] vs [Y]" (some chemistry calculations), and the response was blah blah blah explain explain "let me know if you want a graph of this!" Yeah ok thanks for offering.
I don't think that's laziness but maybe agent tuning.
I've been turned off with OpenAI and have been actively avoiding using any of their models for a while, luckily this is easy to do given the quality of Sonnet 4 / Gemini Pro 2.5.
Although I've always wondered how OpenAI could get away with o3's astronomical pricing, what does o3 do better than any other model to justify their premium cost?
It's just a highly unoptimized space. There is very little market consolidation at this point, everyone is trying things out that lead to wildly different outcomes and processes and costs, even though in the end it's always just a bunch of utf-8 characters. o3 was probably just super expensive to run, and now, apparently, it's not anymore and can beat sonnet/opus 4 on pricing. It's fairly wild.
Very few customers pick the model based on cost, for many ChatGPT is the only one they know of.
> Very few customers pick the model based on cost.
What? 3 ou of 4 companies I consulted for that started using AI for coding marked cost as an important criteria. The 4th one has virtually infinite funding so they just don't care.
1 reply →
Google has been catching up. Funny how fast this space is evolving. Just a few months ago, it was all about DeepSeek.
Many would say Google's Gemini models are SOTA, although Claude seems to be doing well with coding tasks.
Gemini has been better than Claude for me on a coding project. Claude kept telling me it update some code but the update wasn't in the output. Like, I had to re-prompt just for updated output 5 times in a row.
1 reply →
If the competition boils down to who has access to the largest amount of high quality data, it's hard to see how anyone but Google could win in the end: through Google Books they have scans of tens of millions of books, and published books are the highest quality texts there are.
I've been learning vietnamese. Unfortunately, a lot of social media (reddit, fb, etc) has a new generation of language. The younger generation uses so much abbreviations and acronyms, ChatGPT and Google Translate can't keep up.
I think if you're goal is to have properly written langauge using older writing styles, then you're correct.
1 reply →
I have been using Google’s models the past couple months, and was surprised to see how sycophantic chatGPT is now. It’s not just at the start or end of responses, it’s interspaced within the markdown, with little substance. Asking it to change its style makes it overuse technical terms.
I have observed that DeepSeek hallucinates a lot more than others for the same task. Anyone else experienced it?
Deepseek was exciting because you could download their model. They are seemingly 3rd place and have been since Gemini 2.5.
I would put them on the fourth after Google, OpenAI and Anthropic. Still the best open weight llm.
how do we know it's not a quantized version of o3? what's stopping these firms from announcing the full model to perform well on the benchmarks and then gradually quantizing it (first at Q8 so no one notices, then Q6, then Q4, ...).
I have a suspicion that's how they were able to get gpt-4-turbo so fast. In practice, I found it inferior to the original GPT-4 but the company probably benchmaxxed the hell out of the turbo and 4o versions so even though they were worse models, users found them more pleasing.
This is almost certainly what they're doing and rebranding the original o3 model as "o3-pro"
Nope, not what we’re doing.
o3 is still o3 (no nerfing) and o3-pro is new and better than o3.
If we were lying about this, it would be really easy to catch us - just run evals.
(I work at OpenAI.)
51 replies →
Where are you getting this information? What basis do you have for making this claim? OpenAI, despite its public drama, is still a massive brand and if this were exposed, would tank the company's reputation. I think making baseless claims like this is dangerous for HN
1 reply →
> rebranding the original o3 model as "o3-pro"
interesting take, I wouldn't be surprised if they did that.
-pro models appear to be a best-of-10 sampling of the original full size model
5 replies →
I swear every time a new model is released it's great at first but then performance gets worse over time. I figured they were fine-tuning it to get rid of bad output which also nerfed the really good output. Now I'm wondering if they were quantizing it.
I've heard lots of people say that, but no objective reproducible benchmarks confirm such a thing happening often. Could this simply be a case of novelty/excitement for a new model fading away as you learn more about its shortcomings?
17 replies →
I suspect what's happening is that lots of people have a collection of questions / private evals that they've been testing on every new model, and when a new model comes out it sometimes can answer a question that previous models couldn't. So that selects for questions where the new model is at the edge of its capabilities and probably got lucky. But when you come up with a new question, it's generally going to be on the level of the questions the new model is newly able to solve.
Like I suspect if there was a "new" model which was best-of-256 sampling of gpt-3.5-turbo that too would seem like a really exciting model for the first little bit after it came out, because it could probably solve a lot of problems current top models struggle with (which people would notice immediately) while failing to do lots of things that are a breeze for top models (which would take people a little bit to notice).
It seems that least Google is overselling their compute capacity.
You pay monthly fee, but Gemini is completely jammed 5-6 hours when North America is working.
12 replies →
I'm pretty sure this is just a psychological phenomenon. When a new model is released all the capabilities the new model has that the old model lacks are very salient. This makes it seem amazing. Then you get used to the model, push it to the frontier, and suddenly the most salient memories of the new model are it's failures.
There are tons of benchmarks that don't show any regressions. Even small and unpublished ones rarely show regressions.
That was my suspicion when I first deleted my account, when it felt the output got worse in ChatGPT and I found highly suspicious when I saw an errand davinci model keyword in the chatgpt url.
Now I'm feeling similarly with their image generation (which is the only reason I created a paid account two months ago, and the output looks more generic by default).
2 replies →
It’s easy to measure the models getting worse, so you should be suspicious that nobody who claims this has scientific evidence to back it up.
I have seen this behavior as well.
[flagged]
3 replies →
It's the same model, no quantization, no gimmicks.
In the API, we never make silent changes to models, as that would be super annoying to API developers [1]. In ChatGPT, it's a little less clear when we update models because we don't want to bombard regular users with version numbers in the UI, but it's still not totally silent/opaque - we document all model updates in the ChatGPT release notes [2].
[1] chatgpt-4o-latest is an exception; we explicitly update this model pointer without warning.
[2] ChatGPT Release Notes document our updates to gpt-4o and other models: https://help.openai.com/en/articles/6825453-chatgpt-release-...
(I work at OpenAI.)
From the announcement email:
> Today, we dropped the price of OpenAI o3 by 80%, bringing the cost down to $2 / 1M input tokens and $8 / 1M output tokens.
> We optimized our inference stack that serves o3—this is the same exact model, just cheaper.
I got 700+ tokens/sec on o3 after the announcement, I suspect it's very much a quantized version.
https://x.com/hyperknot/status/1932476190608036243
Or maybe they just brought online much faster much cheaper hardware.
1 reply →
Do you also have numbers on intelligence before and after?
Is that input tokens or output tokens/s?
An OpenAI researcher claims it's the exact same model on X: https://x.com/aidan_mclau/status/1932507602216497608
Is this what happened to Gemini 2.5 Pro? It used to be very good, but it's started struggling on basic tasks.
The thing that gets me is it seems to be lying about fetching a web page. It will say things are there that were never on any version of the page and it sometimes takes multiple screenshots of the page to convince it that it's wrong.
The Aider discord community has proposed and disproven the theory that 2.5 Pro became worse, several times, through many benchmark runs.
It had a few bugs here or there when they pushed updates, but it didn't get worse.
2 replies →
My use case is mostly creative writing.
IMO 2.5 Pro 03-25 was insanely good. I suspect it was also very expensive to run. The 05-06 release was a huge regression in quality, most people saying it was a better coder and a worse writer. They tested a few different variants and some were less bad then others, but overall it was painful to lose access to such a good model. The just released 06-05 version seems to be uniformly better than 05-06, with far fewer "wow this thing is dumb as a rock" failure modes, but it still is not as strong as the 03-25 release.
Entirely anecdotally, 06-05 seems to exactly ride the line of "good enough to be the best, but no better than that" presumably to save costs versus the OG 03-25.
In addition, Google is doing something notably different between what you get on AI Studio versus the Gemini site/app. Maybe a different system prompt. There have been a lot of anecdotal comparisons on /r/bard and I do think the AI Studio version is better.
Are there any benchmarks that track historical performance?
good question, and I don't know of any, although it's a no brainer that someone should make it.
a proxy to that may be the anecdotal evidence of users who report back in a month that model X has gotten dumber (started with gpt-4 and keeps happening, esp. with Anthro and OpenAI models). I haven't heard such anecdotal stories about Gemini, R1, etc.
Aider has one, but it hasn't been updated in months. People kept claiming models were getting worse, but the results proved that they weren't.
4 replies →
> users found them more pleasing.
Some users. For me the drop was so huge it became almost unusable for the things I had used it for.
Same here. One of my apps straight out stopped working because the gpt-4o outputs were noticeably worse than the gpt-4 that I built the app based on.
Quantization is a massive efficiency gain for near negligible drop in quality. If the tradeoff is quantization for an 80 percent price drop I would take that any day of the week.
> for near negligible drop in quality
Hmm, that's evidently and anecdotally wrong:
https://github.com/ggml-org/llama.cpp/discussions/4110
You may be right that the tradeoff is worth it, but it should be advertised as such. You shouldn't think you're paying for full o3, even if they're heavily discounting it.
I would like the option to pay for the unquantized version. For creative or story writing (D&D campaign materials and such) quantization seems to end up in much weaker word selection and phrasing. There are small semantic missteps that break the illusion the LLM understands what it's writing. I find it jarring and deeply immersion breaking. I'd prefer prototype prompts on a cheaper quantized version, but I want to be able to spend 50 cents an API call to get golden output.
The API lists o3 and o3-2025-04-16 as the same thing with the same price. The date based models are set in stone.
I don't work for OAI so obviously I can't say for them. But we don't do this.
We don't make hobbyist mistakes of randomly YOLO trying various "quantization" methods that only happen after all training and claim it a day, at all. Quantization was done before it went live.
Related, when o3 finally came out ARC-AGI updated their graph because it didn’t perform nearly as well as the version of o3 that “beat” the benchmark.
https://arcprize.org/blog/analyzing-o3-with-arc-agi
The o3-preview test was with very expensive amounts of compute, right? I remember it was north of $10k so makes sense it did better
2 replies →
Even classic GPT-4 from March 2023 was quantized to 4.5 bits.
Hw about testing same input vs output with same seed on different dates. If its a different model it will return different output.
Isn’t this not true since these models run with a non-zero temperature?
1 reply →
It's probably optimized in some way, but if the optimizations degrade performance, let's hope it is reflected in various benchmarks. One alternative hypothesis is that it's the same model, but in the early days they make it think "harder" and run a meta-process to collect training data for reinforcement learning for use on future models.
It's a bit dated now, but it would be cool if people submitted PRs for this one: https://aider.chat/docs/leaderboards/by-release-date.html
2 replies →
I think the API has some special IDs to check for reproducibility of the environment.
You can just give it a go for very little money (in Windsurf it's 1x right now), and see what it does. There is no room for conspiracy here, because you can simple look at what it does. If you don't like it, so won't others, and then people will not use it. People are obviously very capable of (collectively) forming opinions on models, and then vote with their wallet.
you don't, so run your own model.
Is there also a corresponding increase in weekly messages for ChatGPT Plus users with o3?
In my experience, o4-mini and o4-mini-high are far behind o3 in utility, but since I’m rate-limited for the latter, I end up primarily using the former, which has kind of reinforced the perception that OpenAI’s thinking models are behind the competition altogether.
200 per week now: https://x.com/kevinweil/status/1932565467736027597
My usage has also reflected the pretty heavy rate limits on o3. I find o4-mini-high to be quite good, but I agree that I would much rather use o3. Hoping this means an increase in the limits.
Before: 50 messages per week Now: 100 messages per week
That’s already been the case for a few weeks though, right? and it’s up from 50, whereas a price reduction by 80% would correspond to 5x the quota extrapolating linearly.
1 reply →
Why does OpenAI require me to verify my "organization" (which requires my state issued ID) to use o3?
Don't bother anyway. There are lots of cases of people trying and failing to go through the process, and there is no way to try a second time.
https://community.openai.com/t/session-expired-verify-organi...
https://community.openai.com/t/callback-from-persona-id-chec...
https://community.openai.com/t/verification-issue-on-second-...
https://community.openai.com/t/verification-not-working-and-...
https://community.openai.com/t/organization-verfication-fail...
https://community.openai.com/t/help-organization-could-not-b...
https://community.openai.com/t/to-verify-an-organization-acc...
Yikes! Indeed, I won't bother.
Prevent Deepseek R2 being trained on it
If only there were people with multiple passports or, I don’t know, Kyrgyzstan.
How exactly will passport check prevent any training?
At most this will block API access to your average Ivan, not a state actor
4 replies →
Only at HN can the reaction to an 80% price drop be a wall of criticism.
"80% price drop" is just a title. The wall of criticism is for the fine-print.
The wall of criticism is all wild speculation, not fine print.
What in the fine print are we criticising? Most of the negative comments make no reference to any fine print on their website.
Despite the popular take that LLMs have no moat and are burning cash, I find OpenAI's situation really promising.
Just yesterday, they reported an annualized revenue run rate of 10B. Their last funding round in March valued them at 300B. Despite losing 5B last year, they are growing really fast - 30x revenue with over 500M active users.
It reminds me a lot of Uber in its earlier years—fast growth, heavy investment, but edging closer to profitability.
The problem is your costs also scale with revenue. Ideally you want to have control costs as you scale (the first you build is expensive, but as you make more your costs come down).
For OpenAI, the more people use the product, the same you spend on compute unless they can supplement it with another ways of generating revenue.
I dont unfortunately think OpenAI will be able to hit sustained profitability (see Netflix for another example)
"... as you make more your costs come down"
I'd say dropping the price of o3 by 80% due to "engineers optimizing inferencing" is a strong sign that they're doing exactly that.
10 replies →
>(see Netflix for another example)
Netflix has been profitable for over a decade though? They reported $8.7 billion in profit in 2024.
1 reply →
The bulk of AI costs are NOT in inference. They're in R&D and frontier training runs.
The more inference customers OpenAI has, the easier it is for them to reach profitability.
All costs are not equal. There is a classic pattern of dogfights for winner-take-most product categories where the long term winner does the best job of acquiring customers at the expense of things like "engineering to reduce costs". I have no idea how the AI space is going to shake out, but if I had to pick between OpenAI's mindshare in the broadest possible cohort of users vs. best/most efficient model, I'd pick the customers.
Obviously, lots of nerds on HN have preferences for Gemini and Claude, and having used all three I completely get why that is. But we should remember we're not representative of the whole addressable market. There were probably nerds on like ancient dial-up bulletin boards explaining why Betamax was going to win, too.
4 replies →
Anyone concerned about cost should remember that those costs are dropping exponenentially.
Similarly, nearly all AI products but especially OpenAI are heavily _under_ monetized. OpenAI is an excellent personal shopper - the ad revenue that could be generated from that rivals Facebook or Google.
2 replies →
You raise a good point that this isn't a low marginal cost business like software, telecom, or (most of) the web. Efficiency will be a big advantage for companies that can achieve it, in part because it will let them scale to new AI use cases.
With the race to get new models out the door, I doubt any of these companies have done much to optimize cost so far. Google is a partial exception – they began developing the TPU ten years ago and the rest of their infrastructure has been optimized over the years to serve computationally expensive products (search, gmail, youtube, etc.).
> sustained profitability (see Netflix for another example)
What? Netflix is incredibly profitable.
2 replies →
As an anecdote they have first mover advantage on me. I pay monthly but mostly because it’s good enough and I can’t be bothered to try a bunch out and switch. But if the dust settles and prices drop i would be motivated to switch. How much that matters maybe depends if their revenue comes from app users or API plans. And first mover only works once. Now they maybe coasting on name recognition, but otherwise new users maybe load balanced among all the options.
I mean sure, it's very promising if OpenAI's future is your only metric. It gets notably darker if you look at the broader picture of ChatGPT (and company)'s impact on our society.
* We have people uploading tons of zero-effort slop pieces to all manner of online storefronts, and making people less likely to buy overall because they assume everything is AI now
* We have an uncomfortable community of, to be blunt, actual cultists emerging around ChatGPT, doing all kinds of shit from annoying their friends and family all the way up to divorcing their spouses
* Education is struggling in all kinds of ways due to students using (and abusing) the tech, with already strained administrations struggling to figure out how to navigate it
Like yeah if your only metric is OpenAI's particular line going up, it's looking alright. And much like Uber, it's success seems to be corrosive to the society in which it operates. Is this supposed to be good news?
I absolutely agree. I find it abhorrent.
Dying for a reference on the cult stuff, a quick search didn’t provide anything interesting.
8 replies →
Yes but in a typical western business sense they are merely optimizing for user engadgement and profits. What happens to society a decade from now because of all the slop being produced, that is not their concern. Facebook is just about connecting friends right, totally wont become a series of information moats and bubbles controlled by the algorithms...
A great communicator on the risks of AI being to heavily intergrated into society is Zak Stein. As someone who works in education, they are see first hand how people are becoming dependent on this stuff rather than any kind of self improvement. The people who are just handing over all their thinking to the machine. It is very bizarre and I am seeing it in my personal experience a lot more over the last few months.
The moat is increasingly becoming having access to billions needed to finance the infrastructure needed to serve billions. That's why Google is still in the game. They have that and they are very good at massive scale and have some cost advantages there.
OpenAI is very good at this as well because of their brand name. For many people ChatGPT is all they know. That's the one that's in the news. That's the one everybody keeps talking about. They have many millions of paying users at this point.
This is a non trivial moat. If you can only be successful by not serving most of the market for cost reasons, then you can't be successful. It's how Google has been able to guard its search empire for a quarter century. It's easy to match what they do algorithmically. But then growing from a niche search engine that has maybe a few tens of thousands of users (e.g. Kagi) to Google scale serving essentially most of this planet (minus some fire walled countries like Russia and China), is a bit of a journey.
So Google rolling out search integration is a big deal. It means they are readying themselves for that scale and will have billions of users exposed to this soon.
> Their last funding round in March valued them at 300B. Despite losing 5B last year, they are growing really fast
Yes, they are valued based on world+dog needing agentic AIs and subscribing to the extent of tens or hundreds of dollars/month. It's going to outstrip revenue things like MS Office in its prime.
5B loss is peanuts compared to that. If they weren't burning that, their ambition level would be too low.
Uber now has a substantial portion of the month. They have about 3-4 billion revenue per month. A lot of cost obviously. But they managed 10B profit last year. And they are not done growing yet. They were overvalued at some point and then they crashed, but they are still there and it's a pretty healthy business at this point and that reflects in their stock price. It's basically valued higher now than at the time of the Softbank investment pre-IPO. Of course a lot of stuff needed to be sorted out for that to happen.
their moat is leaky because llm prices will be dropping forever and the only viable model will be a free model. Eventually everyone will catch up.
Plus there is the thing that "thinking models" can't really solve complex tasks / aren't really as good as they are believed to be .
I would wager most of their revenue is from the subscriptions - both consumer and business. That pricing is detached from the API pricing. The heavy emphasis on applications more recently is because they realize this as well.
I don't think the no moat approach makes sense. In a world where more an more content and interaction is done with and via LLMs, the data of your users chatting with your LLM is a super valuable dataset.
This is the best model out there, priced level or lesser than Claude and Gemini
They’re not letting the competition breathe
Gemini is close (if not better) so it just makes sense no? o3-pro might be ahead of pack tho
o3 does better especially if you use the api (not ChatGPT)
appreciate this, the faster we get to cheap commoditization, the better
O3 is really good. I haven’t had the same results with o4 unfortunately
It's one of the most unfortunate naming conventions
Wait, you have access to o4? All I see is o4-mini, a distill of o4. I would not expect that to beat o3/o3-pro.
Anthropic will need to follow suit with Opus soon. It is simply too expensive for anything by an order of magnitude.
Have they ever decreased the price in the past? I don’t remember.
when the race to the bottom reaches the bottom, the foundation model companies will be bought by ... energy companies. You 'll be paying for AI with your electricity bill
It'll be the opposite. Large tech companies are already running their own power plants.
Yes - it’s common in traditional industries too. In my home town the aluminum refinery bought the power station to improve reliability (I should add - through upgrades not screwing over the rest of the power users).
You know. because LLMs can only be built by corporations... but because they're so easy to build, I see the price going down massively thanks to competition. Consumers benefit because all the companies are trying to out run each other.
Easy doesn’t mean cheap.
They need lots of energy and customers don’t pay much, if they pay at all
Exactly,
The developers of AI models do have a moat, the cost of training the model in the first place.
It's 90% of the low effort AI wrappers with little to no value add who have no moat.
And then they all go out of business, since models cost a fortune to build, and their fan club is left staring at their computers trying to remember how to do anything without getting it served on a silver plate.
Investors pouring money, its probably impossible to go out of business, at least for the big ones, until investors realise this is wrong hill to die on.
1 reply →
I expect they don't go out of business: at worst they don't start their next training run quite as aggressively and instead let their new very good model be profitable for a minute
Many many companies are currently thrilled to pay the current model prices for no performance improvement for 2-3 years
We still have so many features to build on top of current capabilities
Maybe because they also are releasing o3-pro.
o3 is very much needed in VSCode GitHub CoPilot for Ask/Edit/Agent modes. It is sorely missing there.
Sure is - and o3 is missing from the OpenAI models that Azure is serving, which I suspect isn’t a coincidence - if OpenAI has some secret sauce that lets them undercut resellers this might shake up agreements for a bit.
Can we know for sure that the price drop is accompanied by a change in the model such as quantization?
On twitter, some people say that some models perform better at night when there is a less demand which allows them to serve a non-quantized model.
Since the models are only available through API and there is no test to check which version of the model is served, it's hard to know what we're buying...
Curious that the number of usages for plus users remained the same. I don't think they're actually doing anything material to lower the cost by a meaningful amount. It's just margin they've always had, and they cut it because magistral is pretty incredible for being completely free
It was only a matter of time considering Deepseek R1’s recent release. OpenAI’s competitor is an open-source product that offers similar quality at a tenth of the cost. Now they’re just trying to prevent customers from leaving.
Meanwhile Apple: Liquid Glass
always seemed to me that efficient caching strategies could greatly reduce costs… wonder if they cooked up something new
How are LLMs cached? Every prompt would be different so it's not clear how that would work. Unless you're talking about caching the model weights...
I've asked it a question not in it's dataset three different ways and I see the same three sentences in the response, word for word, which could imply it's caching the core answer. I hadn't previously seen this behavior before this last week.
1 reply →
This document explains the process very well. It’s a good read: https://platform.openai.com/docs/guides/prompt-caching
6 replies →
You would use a KV cache to cache a significant chunk of the inference work.
5 replies →
A lot of the prompt is always the same: the instructions, the context, the codebase (if you are coding), etc.
> Every prompt would be different
No? Eg "how to cook pasta" is probably asked a lot.
Note that they have not actually dropped the price yet: https://x.com/OpenAIDevs/status/1932463601119637532
> We’ll post to @openaidevs once the new pricing is in full effect. In $10… 9… 8…
There is also speculation that they are only dropping the input price, not the output price (which includes the reasoning tokens).
I think that was a joke. New pricing is already in place:
Input: $2.00 / 1M tokens
Cached input: $0.50 / 1M tokens
Output: $8.00 / 1M tokens
https://openai.com/api/pricing/
Now cheaper than gpt-4o and same price as gpt-4.1 (!).
> Now cheaper than gpt-4o and same price as gpt-4.1 (!).
This is where the naming choices get confusing. "Should" o3 cost more or less than GPT-4.1? Which is more capable? A generation 3 of tech intuitively feels less advanced than a 4.1 of a (similar) tech.
4 replies →
No, people had tested it after Altman's announcement and had confirmed that they were still being billed at the original price. And I checked the docs ~1h after and they still showed the original price.
The speculation of only input pricing being lowered was because yesterday they gave out vouchers for 1M free input tokens while output tokens were still billed.
thinking models produce a lot of internal output tokens making them more expensive than non-reasoning models for similar prompt and visible output lengths
It is slower though
Still 4x more expensive than Deepseek R1 tho.
Personally I've found these bigger models (o3/Claude 4 Opus) to be disappointing for coding.
Opus is really great but through Claude Code. If you used Cursor or RooCode it could be normal that you get disappointed
This matches my experience, but cant explain it. Do you know what's going on?
2 replies →
I got the opposite experience. Not with Opus (too expensive), but with Sonnet. I got things done way more efficiently when using Sonnet with Roo than with Claude Code.
3 replies →
i found them all disappointing in their own ways. Atleast deepseek models actually listen to what i say instead of ignoring me doing their own thing like a toddler.
...how? I'd understand a 20-30% price drop from infra improvements for a model as-is, but 80%?
I wonder if "we quantized it lol" would classify as false advertising for modern LLMs.
Presumably because the price was about 5x higher to begin with than any the competitors at the same tier of performance? Perhaps it's better to get paid anything at all than to just lose 100% of the customers.
Deepseek made a few major innovations allowing them to achieve major compute efficiency and then published them. My guess is that OpenAI just implemented these themselves.
Wouldn’t surprise me. And even with this price cut it’s still 4x more expensive than Deepseek R1 is.
fyi the price drop has been updated in Cursor:
https://x.com/cursor_ai/status/1932484008816050492
it used to take decades of erosion to make google search a hot mess, now that everything's happening in light speed, we get days for AI models to decay to the point of hot mess again..
It's going to be a race to the bottom, they have no moat.
Especially now that they are second in the race (behind Anthropic) and lot of free-to-download and free-to-use models are now starting to be viable competitors.
Once new MacBooks and iPhones have enough memory onboard this is going to be a disaster for OpenAI and other providers.
I'm not sure they're scared of Anthropic - they're doing great work but afaict running into some scaling issues and really focused on winning over developers at the moment.
If I was OpenAI (or Anthropic for that matter) I would remain scared of Google, who is now awake and able to dump Gemini 2.5 pro on the market at costs that I'm not sure people without their own hardware can compete with, and with the infrastructure to handle everyone switching to them tomorrow.
2 replies →
What do you mean, Google is number 1
Thrid for coding, after Anthropic, and Gemini, which was leading last I checked.
OpenAI are second in the race to Anthropic in some benchmarks (maybe?), but OpenAI still dwarves Anthropic in distribution and popularity.
5 replies →
While mac unified ram inference is great for prosumers+ I really don't foresee Apple making 128GB+ options affordable enough to be attractive for inference for the general public. iPhone even less so considering the latest is only at 8GB. Meanwhile the best model sizes will just keep growing.
Third behind Anthropic/Google. People are too quick to discount mindshare though. For the vast majority of the world's population AI = LLM = ChatGPT, and that itself will keep OpenAI years ahead of the competition as long as they don't blunder away that audience.
LLM inferencing is race to the bottom but the service layers on top isn’t. People always pay much more for convenience, those are the thing OpenAI focuses on and is harder to replicate
My understanding was that OpenAI couldn't make money at their previous price point, and I don't think operation and training cost have gone down sufficiently to make up for those short comings. So how are they going to make money by lowering the price by 80%?
I get the point is to be the last man standing, and poaching customers by lowering the price, and perhaps attract a few people who wouldn't have bought a subscription at the higher price. I just question how long investors can justify pouring money into OpenAI. OpenAI is also the poster child for modern AI, so if they fail the market will react badly.
Mostly I don't understand Silicon Valley venture capital, but dumping price, making wild purchases for investor money and mostly only leading on branding, why isn't this a sign that OpenAI is failing?
OpenAI's Adam Groth credits "engineers optimizing inferencing" for the price drop: https://twitter.com/TheRealAdamG/status/1932440328293806321
That seems likely to me, all of the LLM providers have been consistently finding new optimizations for the past couple of years.
There was an article on here a week or two ago on batch inference.
Do you not think that batch inference gives at least a bit of a moat whereby unit costs fall with more prompts per unit of time, especially if models get more complicated and larger in the future?
Batch inference is not exclusive to OpenAI.
for sure they are no longer clear winners, but they try to be just barely on top of others.
right now new Gemini surpassed their o3 (barely) in benchmarks for significantly less money so they cut pricing to be still competitive.
I bet they didn't released o4 not because it's not competitive, but because they are doing Nvidia game: release new product that is just enough better to convince people to buy it. so IMO they are holding full o4 model to have something to release after competition release something better that their top horse
groq is better
No.
I don't know if this is OpenAI's intention, but the little message "you've reached your usage limit!" is actively disincentivizing me from subscribing. For my purposes, the free model is more than good enough; the difference before and after is negligible. I honestly wouldn't pay a dollar.
That said, I'm absolutely willing to hear people out on "value-adds" I am missing out on; I'm not a knee-jerk hater (For context, I work with large, complex & private databases/platforms, so its not really possible for me to do anything but ask for scripting suggestions).
Also, I am 100% expecting a sad day when I'll be forced to subscribe, unless I want to read dick pill ads shoehorned in to the answers (looking at you, YouTube). I do worry about getting dependent on this tool and watching it become enshittified.
> "you've reached your usage limit!"
Just switch to a competitors free offering. There are enough to cycle through not to be hindered by limits. I wonder how much money I have cost those companies by now?
How anyone believes there is any moat for anyone here is beyond me.
I expect the answer is <$1 as someone who shares a discord server with a friend where we egregiously ping the models
o3 is so good it's worth paying for a minute (just for plus) just to see what it's like
I've never used anything like it. I think new Claude is similarly capable
80%? So this is either same Trump style "art of the deal" with setting unreasonable pricing in the first place or desperately needing customers?
I have no moat and I must make these GPUs scream.
For those wondering
https://archive.is/20250610154009/https://openai.com/api/pri...
https://openai.com/api/pricing/
OpenAI dropped the price by so much that the server also went down.
Is the price drop really the reason for their recent outage?
Or is the price drop an attempt to cover up bad news about the outage with news about the price drop?
> Or is the price drop an attempt to cover up bad news about the outage with news about the price drop?
This makes no sense. No way a global outage will get less coverage than the price drop.
Also the earliest sign of price drop is this tweet 20 hrs ago (https://x.com/OpenAIDevs/status/1932248668469445002), which is earlier than the earliest outage reports 13hrs ago on https://downdetector.com/status/openai/
2 replies →
[dead]
Is this a reaction to Apple paper showing that reasoning models don’t really reason?
Why would that be?
If these kids could read, they would be very upset.
They could drop the price 100% and I still wouldn't use it, so long as they're retaining my data.
Sounds like you want their Zero Data Retention plan: https://platform.openai.com/docs/guides/your-data#zero-data-...
(It's "contact us" pricing, so I have no idea how much that would set you back. I'm guessing it's not cheap.)
it doesn't seem like this would supercede a court order
2 replies →