← Back to context

Comment by tlarkworthy

1 day ago

I use regex to force an XML schema and then use a normal XML parser to decode.

XML is better for code, and for code parts in particular I enforce a cdata[[ part so there LLM is pretty free to do anything without escaping.

OpenAI API lets you do regex structured output and it's much better than JSON for code.

Could you share some samples / pointers on how you do this?

  • Yeah, this upsert_cell tool does it

    https://observablehq.com/@tomlarkworthy/forking-agent#upsert...

    format: { type: "grammar", syntax: "regex", definition: cellsRegex },

    Where cellRegex is

    cellsRegex = { const CELL_OPEN = String.raw`<cell>\s`;

      const INPUTS_BLOCK = String.raw`<inputs>.*<\/inputs>\s*`;
    
      const CODE_BLOCK = String.raw`<code><!\[CDATA\[[\s\S]*\]\]>\s*<\/code>\s*`;
    
      const CELL_CLOSE = String.raw`<\/cell>`;
    
      return "^(" + CELL_OPEN + INPUTS_BLOCK + CODE_BLOCK + CELL_CLOSE + ")*$";

    }

    And the extraction logic is here https://observablehq.com/@tomlarkworthy/robocoop-2#process

    function process(content) { const doc = domParser.parseFromString( "<response>" + content + "</response>", "text/xml" ); const cells = [...doc.querySelectorAll("cell")]; return cells.map((cell) => { const inputsContent = cell.querySelector("inputs")?.textContent || ""; return { inputs: inputsContent.length > 0 ? inputsContent.split(",").map((s) => s.trim()) : [], code: (cell.querySelector("code")?.textContent || "").trim() }; }); }

    BTW that agent is under development and not actually that good at programming. Its parent https://observablehq.com/@tomlarkworthy/robocoop-2 is actually very good at notebook programming