Comment by Imustaskforhelp

12 hours ago

I don't understand how Codex can blunder so badly. I imagine that even if they would be using vibe-coding, surely they must have some good engineers. So why is there such severe bugs?

One can argue that these products are the flagship products of their respective AI companies aside from the AI models themselves of course.

I imagine that this story will be picked up by the news left and right, some stories just feel this way and this one is like that (given 12 upvotes on HN in 7 minutes)

The only logical conclusion (from this incident) that I can have is: An (vibe-coded?) product is hard to maintain even for some of the best engineers and is bound to have severe bugs.

2. Proper testing and taking issues seriously is the key if you still wish to do this and there isn't much. This is a week old issue which I can only classify as severe.

I wish to keep an nuanced opinion about it but oh this is bad for openAI (not as bad as them accepting autonomous AI within drones and mass surveillance though)

My point is: AI has both uphills and downward valleys and cliffs. It might as well just accelerate you, which could be, towards your downfall as well. Its recommended to keep an eye while driving and not drive too fast.

AI companies might be like car companies which don't offer a brake pedal.

> I don't understand how Codex can blunder so badly.

because they trust the AI too much (and seem to be fin with acting knowingly negligent)

the problem is

- AI tends to produces very convincing looking code, even if fully wrong

- AI does mistakes of kinds no human would do, at least no human who is also able to write convincing looking code

- code reviews are hard, a lot of devs, including senior devs, put a lot of implicit trust into the co-worker behaving "sane and non malicious". But AIs behave sometimes not so sane and in a way (wrt. trying to be convincing). In the worst case in ways which if it where a human you might consider to be them trying malicious sabotage the product

Like a "dump" example from work:

- AI randomly removes a HTML element id while doing other changes in jsx/react

- the PR has a lot of changes, the id removal line looks innocent, like some on the fly cleanup

- human reviewers have the bad tendency to often not look too much at deleted lines, only if they need reference to how a new line was before (but it's only a deleted line and no new line)

- you don't expect humans to randomly without reason delete important properties of components when changing other things

- you maybe would still have found it, but it's a emergency fix for a production issue

- it happens to miss integration tests, but happens to still matter a lot for one specific important for complicated reasons not properly tested flow (similar people tend to not test logging too much, at best the presence of needed info but hardly ever the absence of noise)

> I don't understand how Codex can blunder so badly. I imagine that even if they would be using vibe-coding, surely they must have some good engineers. So why is there such severe bugs?

I'd say this is also partly a problem of working under intense pressure and the demand to work faster and faster - even faster now with "AI". All these companies are competing with each other very aggressively and are driving their employees like horses in order to win the "AI" race.

"Vibe coding" implies minimal to no human involvement. It doesn't matter how good of an engineer the person who typed the prompt was, they were not involved in writing or reviewing the code, so the end result will not reflect their skill. The whole point of vibe coding is making software engineers irrelevant.

People like to go on about how "good engineers review their AI code" but that's just not what's happening in reality. Not only is reviewing large amounts of AI generated code unpleasant and mentally taxing, it also negates most of the perceived productivity boost, so people are simply not doing it.

> Proper testing

There is no formal testing that would be expected to catch an issue like this. It can barely be classified as a bug, the logging is working as intended, just with negative side effects that weren't accounted for.

The only real way to proactively prevent an issue like this is for a human programmer to stop and think about this code as they're writing it and go "hmm, we're logging large amounts of data to disk at a fast pace here, this may be a bad idea". Without human involvement, this is just going to keep happening. All vibe coded software is bloated and unstable, I have yet to see a single counter-example.

> I don't understand how Codex can blunder so badly. I imagine that even if they would be using vibe-coding, surely they must have some good engineers. So why is there such severe bugs?

Because it was deemed not Hard Enough task for real engineer to look at, so AI was sent to do it with no supervision, just checking the effects.

Also overly excessive logging is probably useful to them in chasing some of the edge cases, the cost to users doesn't matter in the slightest to them

The truth of the matter is that any time that has been saved in writing the code must be spent on ensuring proper system design, reviewing the code, and most importantly of all, QA, which is an uncomfortable discussion for AI techbros who are peddling complete automation of the software profession.