← Back to context

Comment by colonCapitalDee

4 days ago

You can literally click "Show Code"

Yes. "Show Code", not "Show CPU cycles". There's a difference. Writing code is not the same as running code. It looks to you like it ran the code. But you have no proof that it did. I've seen many times LLM systems from companies that claimed that their LLMs would run code and return the output claiming that they ran some code and returned the output but the output was not what the shown code actually produced when run.

  • In my experience, models do not tend to write their own HTML output. They tend to output something like Markdown, or a modified version of it, and they wouldn't be able to write their own HTML that the browser would parse as such.

    • What, in your view, does sending one markup language instead of another markup language tell you about whether the back-end executed some code or only pretended to?

      The front-end display is a representation of what the back-end sends it. Saying "but the back-end doesn't send HTML" is as meaningless as saying that about literally any other SPA website that builds its display from API requests that respond with JSON.

  • Maybe the only way to be sure is to have it generate (not stable diffuse) an image with the value in there.