Comment by stavros

3 months ago

It's gotten more and more shippable, especially with the latest generation (Codex 5.1, Sonnet 4.5, now Opus 4.5). My metric is "wtfs per line", and it's been decreasing rapidly.

My current preference is Codex 5.1 (Sonnet 4.5 as a close second, though it got really dumb today for "some reason"). It's been good to the point where I shipped multiple projects with it without a problem (with eg https://pine.town being one I made without me writing any code).

75 comments

stavros

yread 2 months ago

I feel it sometimes tries to be overly correct. Like using BigInts when working with offsets in big files in javascript. My files are big but not 53bits of mantissa big. And no file APIs work with bigints. This was from Gemini 3 thinking btw

gghffguhvc 2 months ago
I just whack-a-mole these things in AGENTS.md for a while until it codes more like me.
- Sammi 2 months ago
  
  Coding LLMs were almost useless for me, until my AGENTS.md crossed some threshold of completeness and now they are mostly useful. I now curate multiple different markdown files in a /docs folder, that I add to the context as needed. Any time the LLM trips on something and we figure it out, then I ask it to document it's learnings in a markdown doc, and voila it can do it correctly from then on.
  
  1 reply →

apwell23 2 months ago

> https://pine.town

how many prompts did it take you to make this?

how did you make sure that each new prompt didn't break some previous functionality?

did you have a precise vision for it when you started or did you just go with whatever was being given to you?

GoatInGrey 2 months ago
Judging by the site, they don't have insightful answers to these questions. It's broken with weird artifacts, errors, and amateurish console printing in PROD.
https://i.ibb.co/xSCtRnFJ/Screenshot-2025-11-25-084709.png
https://i.ibb.co/7NTF7YPD/Screenshot-2025-11-25-084944.png
- stavros 2 months ago
  
  I definitely don't have insightful answers to these questions, just the ones I gave in the sibling comment an hour before yours. How could someone who uses LLMs be expected to know anything, or even be human?
  Alas, I did not realize I was being held to the standard of having no bugs under any circumstance, and printing nothing to the console.
  I have removed the amateurish log entries, I am pitiably sorry for any offense they may have caused. I will be sure to artisanally hand-write all my code from now on, to atone for the enormity of my sin.
- boplicity 2 months ago
  
  It also doesn't seem to work right now.
  
  1 reply →
stavros 2 months ago
> how many prompts did it take you to make this?
Probably hundreds, I'd say.
> how did you make sure that each new prompt didn't break some previous functionality?
For the backend, I reviewed the code and steered it to better solutions a few times (fewer than I thought I'd need to!). For the frontend, I only tested and steered, because I don't know much about React at all.
This was impossible with previous models, I was really surprised that Codex didn't seem to completely break down after a few iterations!
> did you have a precise vision
I had a fairly precise vision, but the LLM made some good contributions. The UI aesthetic is mostly the LLM, as I'm not very good at that. The UX and functionality is almost entirely me.
- apwell23 2 months ago
  
  did you not run into this problem described by ilya below
  https://www.youtube.com/watch?v=aR20FWCCjAs&list=PLd7-bHaQwn...
  this has been my experience purely vibecoding. i am surprised it works well for others.
  btw the current production bug. how did you discover that and why it slip out. looks like site wasn't working at all when you posted that comment?
  
  4 replies →

Madmallard 2 months ago

It's not really any different in my experience

mirekrusin 2 months ago
Stochastic parrot? Autocomplete on steroids? Fancy autocorrect? Bullshit generator? AI snake oil? Statistical mimicry?
You don't hear that anymore.
Feels like whole generation of skeptics evaporated.
- bigstrat2003 2 months ago
  
  I certainly hold those opinions still, because the models still have yet to prove they are anything worth a person's time. I don't bother posting that because there's no way an AI hype person and I are ever going to convince each other, so what's the point?
  The skeptics haven't evaporated, they just aren't bothering to try to talk to you any more because they don't think there's value in it.
  
  15 replies →
- m4nu3l 2 months ago
  
  I think the stochastic part is true and useless. It can be applied to anyone or anything. Yes, the models give you probabilities, but any algorithm gives you probabilities (only zero or one for deterministic ones). You can definitely view the human mind as a complex statistical model of the world.
  Now, that being said, do I think they are as good as a skilled human on most things? No, I don't. My trust issues have increased after the GPT-5 presentation. The very first question was to showcase its "PhD-level" knowledge, and it gave a wrong answer. It just happened to be in a field I know enough about to notice, but most didn't.
  So, while I think they can be considered as having some form of intelligence, I believe they have more limits than a lot of people seem to realise.
- apwell23 2 months ago
  
  > Feels like whole generation of skeptics evaporated.
  https://www.youtube.com/watch?v=aR20FWCCjAs&list=PLd7-bHaQwn...
  Ilya Sutskever this week.
  
  3 replies →
- notachatbot123 2 months ago
  
  Maybe your bubble flew away from those voices? I see them all the time, and am glad.
- poulpy123 2 months ago
  
  still haven't see something proving it was not autocomplete on steroids or statistical mimicry
- Marazan 2 months ago
  
  It is all those things.
  The Bitter Lesson is with enough VC subsidised compute those things are useful.
- rkozik1989 2 months ago
  
  Those echoes have grown louder over the past year or so. The only way you've heard less of it is if you buried your head under sand.
- deadbabe 2 months ago
  
  It is all those things. It consistently fails to make truly novel discoveries, everything it does is derived from something it trained on from somewhere.
  No point in arguing about it though with true believers, they will never change their minds.

tempestn 3 months ago

Have you tried Gemini 3 yet? I haven't done any coding with it, but on other tasks I've been impressed compared to gpt 5 and Sonnet 4.5.

joegibbs 3 months ago

It's very good but it feels kind of off-the-rails in comparison to Sonnet 4.5 - at least with Cursor it does strange things like putting its reasoning in comments that are about 15 lines long, deleting 90% of a file for no real reason (especially when context is reaching capacity) and making the same error that I just told it not to do.
Culonavirus 2 months ago
The computer science field is going to be an absolute shitshow within 5 years (it already kinda is). On one side you'll have ADHD dog attention span zoomers trying out all these nth party model apis and tools every 5 seconds (switching them like socks, insisting the latest one is better, but ultimately producing the same slop) and on the other side you'll have all these applied math gurus squeezing out the last bits of usable AI compute on the planet... and nothing else.
We used to joke that "The internet was a mistake.", making fun of the bad parts... but LLMs take the fucking cake. No intelligent beings, no sentient robots, just unlimited amounts of slop.
The tech basically stopped evolving right around the point of it being good enough for spam and slop, but not going any further, there are no cures no new laws of physics or math or anything else being discovered by these things. All AI use in science I can see is based on finding patters in data, not intelligent thought (as in novel ideas). What a bust.
- ako 2 months ago
  
  Completely disagree, what i see agentic coding agents do in combination with LLMs is seriously mind-blowing. I don't care how much knowledge is compressed into an LLM. What is way more interesting is what it does when it misses some knowledge. I see it come up with a plan to create the knowledge by running an experiment (running a script, sometimes asking me to run a script or try something), evaluating the output, and then replan based on the output. Full Plan-Do-Check-Act. Finding answers systematically to things you don't know is way more impressive than remembering lots of stuff.
- visarga 2 months ago
  
  I don't see a big difference to humans, we are saying many unreasonable things too, validation is necessary. If you use internet, books or AI it is your job to test their validity. Anything can be bullshit, written by human or AI.
  In fact I fear the humans optimize for attention and cater to the feed ranking Algorithm too much, while AI is at least trying to do a decent job. But with AI it is the responsibility of the user to guide it, what AI does depends on what the user does.
  
  9 replies →
- cons0le 2 months ago
  
  The worst part is when the AI spits out dogshit results --people show up at lightspeed in the comments to say how "you're not using it right" / "try this other model, it's better"
  Anecdotally, the people I see the most excited about AI are the people that don't do any fucking work. I can create a lot of value with plain ol' for loop style automation in my niche. We're stil nowhere near the limit of what we can do with automation, that I don't give a fuck about what AI can do. Bruh in windows 10 copy and fuckin paste doesn't work for me anymore, but instead of fixing that they're adding AI
  
  4 replies →
- Uptrenda 2 months ago
  
  The LLM only reflects the input of what its fed. If the results are unintelligent then so is the input.
- amunozo 2 months ago
  
  It's been three years of amazing use cases and discoveries, and in those same years we got things like Ozempic. You can be skeptical of all the hyped things that are said that may be exaggerated without negating the good side.
  
  3 replies →
stavros 3 months ago
Only a tiny bit, but I should. When you say GPT-5, do you mean 5.1? Codex or regular?
- tempestn 2 months ago
  
  Sorry, yeah, 5.1 regular chatbot.
  
  1 reply →
KoolKat23 2 months ago

imo don't waste your time for coding with Gemini 3. Perhaps worth it if it's something Claude's not helping with, as Gemini 3's reasoning is very good supposedly.

gtirloni 3 months ago

Maybe the wtfs per line are decreasing because these models aren't saying anything interesting or original.

stavros 3 months ago
No, it's because they write correct code. Why would I want interesting code?
- gtirloni 2 months ago
  
  Oh, my bad. I still had the comment someone made about the model writing phd-level paper in my head and didn't realize you were talking about code.
  Fully agree.
- gumaflux 2 months ago
  
  :D made my day