Comment by dogcomplex

10 months ago

Right. And any particular question people think AIs are bad at also has a comments section of people who have run better crafted prompts that do the job just fine. The consensus is heading more towards "well damn, actually LLMs might be all we need" rather than "LLMs are just a stepping stone" - but either way, that's fine, cuz plenty of more advanced architecture uses are on their way (especially error correction / consistency frameworks).

I dont believe there are any significant academic critiques doubting this. There are a lot of armchair hot takes, and perceptions that this stuff isn't improving up to their expectations, but those are pretty divorced from any rigorous analysis of the field, which is still improving at staggeringly fast rates compared to any other field of research. Aint no wall, folks.

2 comments

dogcomplex

Retric 10 months ago

“Crafting a better prompt” is often simply spinning an RNG again and again until you end up with an answer that happens to be good enough.

In the real world if you know the correct answer you don’t need to ask the question. A self driving car that needs you to pay attention isn’t self driving.

Any system can get canned response, the value of AI is completely in its ability to handle novelty without hand holding. And none of these systems actually do that even vaguely well in practice rather than providing response that are vaguely close to correct.

If I ask for a summary of an article and it gets anything wrong in the article that’s a 0 because now I need to read the article to know what it said. Arguably the value is actually negative here.

dogcomplex 10 months ago

Any time prompt crafting matters is just when demonstrating the current edge of capabilities - next iteration, you can get away with a much more general/primitive prompt. Those are just people countering the "gotcha" arguments people try to levy against LLMs, showing that even now those tasks can be done with a good prompt. Anytime it's a practical concern though - just wait a little longer for the next model to smooth that out.
You don't have to pay attention, that's the point. You can code without reading code now. Sure you gotta tell it what the app looks like with each iteration - but again, that's temporary til the next model comes out with good enough vision to assess that itself. None of this is permanently planning on requiring human interaction - it's just early days and these are progressing through mediums one at a time.
They're not canned responses either. They're bespoke mixtures of all the various elements of the current environment/context translated to an answer. It certainly handles novelty - that's the whole point. They certainly handle plenty of novelty - like entire mediums of text and images - to expert levels. I think you're just being greedy for more, here.
As for consistency and avoiding error? There are benchmarks for that. There are error checking methods. Those are all steadily improving too, and are already well-consistent on easier topics/mediums. It would be foolish to think that's innately impossible from AI for remaining ones.