← Back to context

Comment by claytongulick

2 days ago

I keep seeing this. The "for now" comments, and how much better it's getting with each model.

I don't see it in practice though.

The fundamental problem hasn't changed: these things are not reasoning. They aren't problem solving.

They're pattern matching. That gives the illusion of usefulness for coding when your problem is very similar to others, but falls apart as soon as you need any sort of depth or novelty.

I haven't seen any research or theories on how to address this fundamental limitation.

The pattern matching thing turns out to be very useful for many classes of problems, such as translating speech to a structured JSON format, or OCR, etc... but isn't particularly useful for reasoning problems like math or coding (non-trivial problems, of course).

I'm pretty excited about the applications for AI overall and it's potential to reduce human drudgery across many fields, I just think generating code in response to prompts is a poor choice of a LLM application.

> I don't see it in practice though.

Have you actually tried the latest agentic coding models?

Yesterday I asked claude to implement a working web based email client from scratch in rust which can interact with a JMAP based mail server. It did. It took about 20 minutes. The first version had a few bugs - like it was polling for mail instead of streaming emails in. But after prompting it to fix some obvious bugs, I now have a working email client.

Its missing lots of important features - like, it doesn't render HTML emails correctly. And the UI looks incredibly basic. But it wrote the whole thing in 2.5k lines of rust from scratch and it works.

This wasn't possible at all a couple of years ago. A couple of years ago I couldn't get chatgpt to port a single source file from rust to typescript without it running out of context space and introducing subtle bugs in my code. And it was rubbish at rust - it would introduce borrow checker problems and then get stuck, trying and failing to get it to compile. Now claude can write a whole web based email client in rust from scratch, no worries. I did need to manually point out some bugs in the program - claude didn't test its email client on its own. There's room for improvement for sure. But the progress is shocking.

I don't know how anyone who's actually pushed these models can claim they haven't improved much. They're lightyears ahead of where they were a few years ago. Have you actually tried them?

  • Honestly, I really did do this for a while, mostly in response to comments like this, with some degree of excitement.

    I've been disappointed every time.

    I do use the LLMs for summarization and "a better google" and am constantly confronted with how inaccurate they are.

    I haven't tried with code in the past couple months because to be completely honest, I just don't care.

    I enjoy my craft, I enjoy puzzling and thinking through better ways of doing things, I like being confronted with a tedious task because it pushes me towards finding more optimal approaches.

    I haven't seen any research that justifies the use of LLMs for code generation, even in the short term, and plenty that supports my concerns about mid to long term impact on quality and skills.

    So the TL;DR version is: nah.