Comment by padolsey

1 year ago

Running code would be a downstream (client) concern. There's the ability to get structured data from LLMs (usually called 'tool use' or 'function calling') which is the first port of call. Then running it is usually an iterative agent<>agent task where fixes need to be made. FWIW Langchain seems to be what people use to link things together but I find it overkill.* In terms of actually running the code, there are a bunch of tools popping up at different areas in the pipeline (replit, agentrun, riza.io, etc)

What we really need (from end-user POV) is that kinda 'resting assumption' that LLMs we talk to via chat clients are verifying any math they do. For actually programming, I like Replit, Cursor, ClaudeEngineer, Aider, Devin. There are bunch of others. All of them seem to now include ongoing 'agentic' steps where they keep trying until they get the response they want, with you as human in the chain, approving each step (usually).

* I (messing locally with my own tooling and chat client) just ask the LLM for what I want, delimited in some way by a boundary I can easily check for, and then I'll grab whatever is in it and run it in a worker or semi-sandboxed area. I'll halt the stream then do another call to the LLM with the latest output so it can continue with a more-informed response.