Comment by sarchertech

8 hours ago

When I can look at a prompt and predict what the code it outputs will look like to some high degree of accuracy.

I mostly don’t think that is possible though because there’s too much ambiguity in natural language. So the answer is probably when AI is close enough to AGI that I can treat it like an actual trusted senior engineer that I’m delegating to.

Can you look at code today and predict what assembly a compiler will output to some high degree of accuracy? Do you avoid certain classes of compiler optimization so you can more accurately predict compiler output? I recall a time where many compilers would remove a bzero() operation in situations where you’re trying to zero out a buffer that had sensitive data in it - it’s why we have APIs like https://github.com/MicrosoftDocs/win32/blob/docs/desktop-src.... I ran into a huge performance regression because I didn’t have all the edges of named return value optimization in mind when I refactored some code.

There’s ambiguity in the x86 specification, such that you can execute a single instruction and get different results in intel vs amd. See the rcpss instruction, for example.

I get that LLMs are categorically different, and they’re absolutely not as reliable as compilers are, but compilers are also not as reliable as compilers seem. And even less predictable IMO.

  • > Can you look at code today and predict what assembly a compiler will output to some high degree of accuracy?

    Yeah a fair amount is the time. And when I can’t, I can predict what an unoptimized version of the assembly will look like.

    And I know that the optimized assembly has a very very high likelihood of being semantically identical. And I know enough of the edge cases where the differences matter to know when I need to actually verify what’s coming out of the compiler.

    Prompt instability (not even worrying about non-determinism ) ensures that asking for the same exact thing in very slightly different ways or with slightly different contexts will give you wildly different outputs that are not even close to semantically identical.