Comment by maccard

1 month ago

You’ve missed my point here - I agree that gen AI has changed everything and is useful, _but_ I disagree that it’s improved substantially - which is what the comment I replied to claimed.

Anecdotally I’ve seen no difference in model changes in the last year, but going from LLM to Claude code (where we told the LLMs they can use tools on our machines) was a game changer. The improvement there was the agent loop and the support for tools.

In 2023 I asked v0.dev to one shot me a website for a business I was working on and it did it in about 3 minutes. I feel like we’re still stuck there with the models.

3 comments

maccard

BeetleB 1 month ago

I've been coding with LLMs for less than a year. As I mentioned to someone in email a few days ago: In the first half, when an LLM solved a problem differently from me, I would probe why and more often than not overrule and instruct it to do it my way.

Now it's reversed. More often than not its method is better than mine (e.g. leveraging a better function/library than I would have).

In general, it's writing idiomatic mode much more often. It's been many months since I had to correct it and tell it to be idiomatic.

Macha 1 month ago

My experience in 2024 AI tools like copilot was if the code compiled first time it was an above average result and I’d need a lot of manual tweaking.

There were definitely languages where it worked better (JS), but if I told people here I had to spend a lot of time tweaking after it, at least half of them assumed I was being really anal about spacing or variable names, which was simply not the case.

It’s still the case for cheaper models (GPT-mini remains a waste of my timetime), but there’s mid level models like Minimax M2 that can produce working code and stuff like Sonnet can produce usable code.

I’m not sure the delta is enough for me that I’d pay for these tools on my own though…

tombert 1 month ago

In my experience it has gotten considerably better. When I get it to generate C, it often gets the pointer logic correct, which wasn't the case three years ago. Three years ago, ChatGPT would struggle with even fairly straightforward LaTeX, but now I can pretty easily get it to generate pretty elaborate LaTeX and I have even had good success generating LuaTeX. I've been able to fairly successfully have it generate TLA+ spec from existing code now, which didn't work even a year ago when I tried it.

Of course, sample size of one, so if you haven't gotten those results then fair enough, but I've at least observed it getting a lot better.