← Back to context

Comment by vntok

6 hours ago

> Are LLMs that super reliable in their output already with all the guardrails around?

Well, what is your definition of "super reliable in the output", and is it a quantifiable/measurable target or just a feeling?

Is it "more than humans", "more than senior developers", "almost perfect", "perfect"?

> It might behave differently than specified and a human is required to validate every output carefully or else.

Sure, just like meatbag developers. All the security flaws AI finds today were introduced years/decades ago by humans and haven't been found (that we know) by humans in ages.

It is quantifiable thing not a feeling.

Between ten thousand runs of:

``` const int MAX_COUNT = 10000;

printf("I'll count up to %d", MAX_COUNT); for(int i=1; < MAX_COUNT; i++) printf("I'm now counting %d", i); ```

And of the following prompt:

``` You'll count to 10,000. At the start say "I'll count up to 10,000" and then for each number say "I'm now counting <number>" and do not say anything else. Do not miss numbers in between. ```

Which one is going to produce 100% correct results out of a 10,000 run of each?

Now don't give me "these are different tools". We all know. I'm talking about reliability and predictability.

  • Well, for starters the program you wrote is wrong (very unreliable) 100% of the time (very predictable)... so you just got your answer I guess.

    In any case, most -if not nearly all- of the top-100 LLM will answer your prompt with some code that does what you intended the first program to do. Only they'll actually code it properly of course.