Comment by brundolf

2 days ago

I find this type of problem is what current AI is best at: where the actual logic isn't very hard, but it requires pulling together and assimilating a huge amount of fuzzy, known information from various sources

They are, after all, information-digesters

Which also fits with how it performs at software engineering (in my experience). Great at boilerplate code, tests, simple tutorials, common puzzles but bad at novel and complex things.

  • This is also why I buy the apocalyptic headlines about AI replacing white collar labor - most white collar employment is mostly creating the same things (a CRUD app, a landing page, a business plan) with a few custom changes

    Not a lot of labor is actually engaged in creating novel things.

    The marketing plan for your small business is going to be the same as the marketing plan for every other small business with some changes based on your current situation. There’s no “novel” element in 95% of cases.

    • I don’t know if most software engineers build toy CRUD apps all day? I have found the state of the art models to be almost completely useless in a real large codebase. Tried Claude and Gemini latest since the company provides them but they couldn’t even write tests that pass after over a day of trying

      12 replies →

    • I agree but the reason it won’t be an apocalypse is the same reason economists get most things wrong, it’s not an efficient market.

      Relatively speaking we live in a bubble, there are still broad swaths of the economy that operate with pen and paper. Another broad swath that migrated off 1980s era AS/400 in the last few years. Even if we had ASI available literally today (And we don’t) I’d give it 20-30 years until the guy that operates your corner market or the local auto repair shop has any use in the world for it.

      2 replies →

    • I wonder what the impact will be when replicating the same thing becomes machine readable with near 100% accuracy.

  • Definitely matches my experience as well. I've been working away on a very quirky, non-idiomatic 3D codebase, and LLMs are a mixed bag there. Y is down, there's no perspective distortion or Z buffer, there are no meshes, it's a weird place.

    It's still useful to save me from writing 12 variations of x1 = sin(r2) - cos(r1) while implementing some geometric formula, but absolutely awful at understanding how those fit into a deeply atypical environment. Also have to put blinders on it. Giving it too much context just throws it back in that typical 3D rut and has it trying to slip in perspective distortion again.

    • Yeah I have the same experience. I’ve done some work on novel realtime text collaboration algorithms. For optimisation, I use some somewhat bespoke data structures. (Eg I’m using an order-statistic tree storing substring lengths with internal run-length encoding in the leaf nodes).

      ChatGPT is pretty useless with this kind of code. I got it to help translate a run length encoded b-tree from rust to typescript. Even with a reference, it still introduced a bunch of new bugs. Some were very subtle.

      1 reply →

  • Yep. But wonderful at aggregating details from twelve different man pages to write a shell script I didn't even know was possible to write using the system utils

    • Is it 'only' "aggregating details from twelve different man pages" or has it 'studied' (scraped) all (accessible) code in GitHub/GitLab/Stachexchange/etc. and any other publicly available coding repositories on the web (and for the case of MS the Git it owns)? Together with descriptions of what is right and what is wrong..

      I use it for code, and I only do fine tuning. When I want something that is clearly never done before, I 'talk' to it and train it on which method to use, and for a human brain some suggestions/instructions are clearly obvious (use an Integer and not a Double, or use Color not Weight). So I do 'teach' it as well when I use it.

      Now, I imagine that when 1 million people use LLMs to write code and fine tune it (the code), then we are inherently training the LLMs on how to write even better code.

      So it's not just "..different man pages.." but "the finest coding brains (excluding mine) to tweak and train it".

  • how often are we truly writing actual novel programs that are complex in a way AI does not excel at?

    There are many types of complex, and many times complex for a human coder, are trivial for AI and its skillset.

    • Depends on the field of development you do.

      CRUD backend app for a business in a common sector? It's mostly just connecting stuff together (though I would argue that an experienced dev with a good stack takes less time to write it as is than painstakingly explaining it to an LLM in an inexact human language).

      Some R&D stuff, or even debugging any kind of code? It's almost useless, as it would require deep reasoning, where these models absolutely break down.

      4 replies →

  • > novel and complex things

    a) What's an example?

    b) Is 90% (or more) of programming mundane, and not really novel?

    • If you'd like a creative waste of time, make it implement any novel algorithm that mixes the idea of X with Y. It will fail miserably, double down on the failure and hard troll you, run out of context and leave you questioning why you even pay for this thing. And it is not something that can be fixed with more specific training.

      8 replies →

I've been surprised that so much focus was put on generative uses for LLMs and similar ML tools. It seems to me like they have a way better chance of being useful when tasked with interpreting given information rather than generating something meant to appear new.

  • Yeah, the "generative" in "generative AI" gives a little bit of a false impression. I like Laurie Voss's take on this: https://seldo.com/posts/what-ive-learned-about-writing-ai-ap...

    > Is what you're doing taking a large amount of text and asking the LLM to convert it into a smaller amount of text? Then it's probably going to be great at it. If you're asking it to convert into a roughly equal amount of text it will be so-so. If you're asking it to create more text than you gave it, forget about it.

    • This quote sounds clever, but is very different than my experience.

      I have been very pleased with responses to things like: "explain x", "summarize y", "make up a parody dog about A to the tune of B", "create a single page app that does abc".

      The response is 1000x more text than the prompt.

      1 reply →

    • I've had coworkers tell me it works Copilot works well for refactoring code, which also makes sense in the same vein.

      Its like they wouldn't be so controversial if they didn't decide to market it as "generative" or "AI"...I assume fund raising valuations would move inline with the level of controversy though.

FWIW, I do a lot of talks about AI in the physical security domain and this is how I often describe AI, at least in terms of what is available today. Compared to humans, AI is not very smart, but it is tireless and able to recall data with essentially perfect accuracy.

It is easy to mistake the speed, accuracy, and scope of training data for "intelligence", but it's really just more like a tireless 5th grader.

  • Something I have found quite amusing about LLMs is that they are computers that don't have perfect recall - unlike every other computer for the past 60+ years.

    That is finally starting to change now that they have reliable(ish) search tools and are getting better at using them.

“best where the actual logic isn’t very hard”?

yeah, well it’s also one of the top scorers on the Math olympiads

  • My guess is that those questions are very typical and follow very normal patterns and use well established processes. Give it something weird and it'll continuously trip over itself.

    My current project is nothing too bizarre, it's a 3D renderer. Well-trodden ground. But my project breaks a lot of core assumptions and common conventions, and so any LLM I try to introduce—Gemini 2.5 Pro, Claude 3.7 Thinking, o3—they all tangle themselves up between what's actually in the codebase and the strong pull of what's in the training data.

    I tried layering on reminders and guidance in the prompting, but ultimately I just end up narrowing its view, limiting its insight, and removing even the context that this is a 3D renderer and not just pure geometry.

    • > Give it something weird and it'll continuously trip over itself.

      And so will almost all humans. It's weird how people refuse to ascribe any human-level intelligence to it until it starts to compete with the world top elite.

      5 replies →

  • LLMs struggle with context windows, so as long as the problem can be solved in their small windows, they do great.

    Humans neural networks are constantly being retrained, so their effective context window is huge. The LLM may be better at a complex, well specified 200 line python program, but the human brain is better at the 1M line real-world application. It takes some study though.

LLMs are like a knowledge aggregator. The reasoning models have potential to get creative usefully but I have yet to see evidence of it, like invent a novel scientific thing

Be that as it may, do not forget that in the pursuit of the most textually plausible output, gaps may be filled in for you.

The mistake, and it's a common one, is in using phrases like "the actual logic" to explain to ourselves what is happening.

It takes a lot of energy to compress the data. And a lot to actually extract something sensible. While you could just just optimize the single problem you have quite easily.