Comment by potatolicious

4 days ago

> "Wait, so you just tell the LLM the schema, and hope it replicates it verbatim with content filled into it?"

In the early stages of LLMs yes ("get me all my calendar events for next week and output in JSON format" and pray the format it picks is sane), but nowadays there are specific model features that guarantee output constrained to the schema. The term of art here is "constrained decoding".

The structuring is also a bit of a dark art - overall system performance can improve/degrade depending on the shape of the data structure you constrain to. Sometimes you want the LLM to output into an intermediate and more expressive data structure before converting to a less expressive final data structure that your deterministic piece expects.

> "Are there "mediator" agents with some reliability AND some flexibility?"

Pretty much, and this is basically where "agentic" stuff is at the moment. What mediates the LLM's outputs? Is it some deterministic system? Is it a probabilistic system? Is it kind of both? Is it a machine? Is it a human?

Specifically with coding tools, there seems like the mediator(s) are some mixture of sticklers (compiles, tests) and loosey-goosey components (other LLMs, the same LLM).

This gets a bit wilder with multimodal models too: think about a workflow step like "The user asked me to make a web page that looks like [insert user input here], here is my work, including a screenshot of the rendered page. Hey mediator, does this look like what the user asked for? If not, give me specific feedback on what's wrong."

And then feed that back into codegen. There has been some surprisingly good results from the mediator being a multimodal LLM.