← Back to context

Comment by RHSeeger

2 days ago

I'm a bit confused by this because a given set of inputs can produce a different output, and different behaviors, each time it is run through the AI.

> By regenerable, I mean: if you delete a component, you can recreate it from stored intent (requirements, constraints, and decisions) with the same behavior and integration guarantees.

That statement just isn't true. And, as such, you need to keep track of the end result... _what_ was generated. The why is also important, but not sufficient.

Also, and unrelated, the "reject whitespace" part bothered me. It's perfectly acceptable to have whitespace in an email address.

I'm a bit confused by this because a given set of inputs can produce a different output, and different behaviors, each time it is run through the AI.

How different the output is each time you generate something from an LLM is a property called 'prompt adherence'. It's not really a big deal in coding LLMs, but in image generation some of the newer models (Z Image Turbo for example) give virtually the same output every time if the prompt doesn't change. To the point where some users claim it's actually a problem because most of the time you want some variety in image gen. It should be possible to tune a coding LLM to give the same response every time.

  • Even if you have deterministic LLMs (which is absolutely something that can be done), you still need to pin a specific version to get that. That might work in the short term; but 10 years from now, your not going to want to be using a model from today.

    • > Even if you have deterministic LLMs (which is absolutely something that can be done),

      Note, when Fabrice Bellard made his LLM thing to compress text, he had to make sure it was deterministic. It would be terrible if it slightly corrupted files in different ways each time it decompressed

    • You cannot pin a certain version, even today, if you are using some vendor LLM the versions are transient; they are constantly making micro optimizations/tweaks.

  • If that is true, and a given history of prompts combined with a given mosel always gives the same code, then you have invented what’s called a compiler. Take human-readable text and convert it into machine code. Which means we have a much higher level language, than before and your prompts become your code.

  • > How different the output is each time you generate something from an LLM is a property called 'prompt adherence'. It's not really a big deal in coding LLMs, (...)

    I strongly disagree. Nowadays most LLMs support updating context with chat history. This means the output of a LLM will be influenced by what prompts you have been feeding it. You can see glaring changes in what a coding agent does based on what topics you researched.

    To take the example a step further, some LLMs even update their system prompts to include context such as where you are in the world at that precise moment and the time of the year. Once I had ChatGPT generate a complete example project based around an event that was taking place at a city I happened to be cruising through at that moment.

  • > It's not really a big deal in coding LLMs

    Challenge. Given that you're not nailing down _EVERY_ detail in your descriptions (because that's not possible), the results can vary a fair amount. Especially as the model changes over time. And if there was anything else in the context; I've gotten different results from the exact same prompt 60 minutes apart after reverting the code, because there were some failed attempts to get it to fix what it broke.