Comment by InsideOutSanta

7 hours ago

> I seriously dont' know all this big hullabaloo about one shot prompting.

It's a relatively objective way of testing LLMs, and I think it's pretty representative of how strong models are overall.

The outcome of this test mirrors how GLM 5.2 and Opus 4.8 work for me: they're both similarly capable of fully executing a given task, but Opus tends to have a bit more "taste" in how it handles unstated details or implicit requirements.

> what you'll get is a series of assumptions made by the model

Yes, but that's why we use these models in the first place. We don't want to explicitly write down all the details because that would mean writing code. So we write a higher-level, human-language spec, and let the LLM fill in the blanks. The question is how good they are at doing that.

0 comments

InsideOutSanta

No comments yet

Contribute on Hacker News ↗