Comment by biophysboy
4 days ago
Wait, so you just tell the LLM the schema, and hope it replicates it verbatim with content filled into it? I was under the impression that you say "hey, please tell me what to put in this box" repeatedly until your data model is done. That sort of surprises me!
This interface interests me the most because it sits between the reliability-flexibility tradeoff that people are constantly debating w/ the new AI tech. Are there "mediator" agents with some reliability AND some flexibility? I could see a loosey goosey LLM passing things off to Mr. Stickler agent leading to failure all the time. Is the mediator just humans?
> "Wait, so you just tell the LLM the schema, and hope it replicates it verbatim with content filled into it?"
In the early stages of LLMs yes ("get me all my calendar events for next week and output in JSON format" and pray the format it picks is sane), but nowadays there are specific model features that guarantee output constrained to the schema. The term of art here is "constrained decoding".
The structuring is also a bit of a dark art - overall system performance can improve/degrade depending on the shape of the data structure you constrain to. Sometimes you want the LLM to output into an intermediate and more expressive data structure before converting to a less expressive final data structure that your deterministic piece expects.
> "Are there "mediator" agents with some reliability AND some flexibility?"
Pretty much, and this is basically where "agentic" stuff is at the moment. What mediates the LLM's outputs? Is it some deterministic system? Is it a probabilistic system? Is it kind of both? Is it a machine? Is it a human?
Specifically with coding tools, there seems like the mediator(s) are some mixture of sticklers (compiles, tests) and loosey-goosey components (other LLMs, the same LLM).
This gets a bit wilder with multimodal models too: think about a workflow step like "The user asked me to make a web page that looks like [insert user input here], here is my work, including a screenshot of the rendered page. Hey mediator, does this look like what the user asked for? If not, give me specific feedback on what's wrong."
And then feed that back into codegen. There has been some surprisingly good results from the mediator being a multimodal LLM.