← Back to context

Comment by gonzobonzo

1 month ago

> It's generally accepted as true that passing any one of them does not constitute fully general intelligence but the difficult part has been finding things that they cannot do.

It's pretty easy to find things they can't do. They lack a level of abstraction that even small mammals have, which is why you see them constantly failing when it comes to things like spacial awareness.

The difficult part is creating an intelligence test that they score badly on. But that's more of an issue with treating intelligence tests as if they're representative of general intelligence.

It's like have difficulty finding a math problem that Wolfram Alpha would do poorly on. If a human was able to solve all of these problems as well as Wolfram Alpha, they would be considered a genius. But Wolfram Alpha being able to solve those questions doesn't show that it has general intelligence, and trying to come up with more and more complicated math problems to test it with doesn't help us answer that question either.

yeah like ask them to use tailwindcss.

most llm's actually fail that task, even in agent modes and there is a really simple reason for that. because tailwindcss changed their packages / syntax.

and this is basically a test that should be focused on. change things and see if the llm can find a solutions on its own. (...it can't)

  • And if I take my regular ordinary commuter car off the paved road and onto the dirt I get stuck in the mud. That doesn't mean the whole concept of cars is worthless, instead we paved all over the world with roads. But for some reason with LLMs, the attitude is that them being unable to go offroad means everyone's totally deluded and we should give up on the whole idea.

    • Im not against llms. I‘m just not a fan of people that says we have agi/singularity soon. I basically dropped google to search for things about code, because even if it fails to get stuff right I can ask for the doc source and I can force it to give me a link or the exact example/wording of the docs.

      But using it correctly means that especially junior developers have a way harder barrier of entry.

    • I don't think your analogy works for the tailwind situation, and there is no whole idea to give up on anyway. People will still be researching this hyper-complicated matrix multiplication thing, i.e. LLM, for a very long time.

      Personally, the tailwind example is an argument against one specific use case: LLM-assisted/driven coding, which I also believe is the best shot of LLM being actually productive in a non-academic setting.

      If I have a super-nice RL-ed (or even RLHF-ed) coding model & weights that's working for me (in whatever sense the word "working" means), and changing some function names will actually f* it up badly, then it is very not good. I hope I will never ever have to work with "programmer" that is super-reluctant to reorganize the code just to protect their pet LLM.

  • How do they do if you include the updated docs in the context?

    • You would need to remove the older docs first and still than it will hallucinate. Forcing the llm to open the doc webpage does produce some hallucinations as well. The more context you provide the worse it gets. And tbf inb4 most llms could migrate bootstrap to tailwindcss v3 without too much trouble (of course it fails to change tags when building css classes from multiple strings, but that’s fine) And I tried a lot of models. It just broke from one week to another

      1 reply →