← Back to context

Comment by ozten

1 day ago

It is not a silly question. The various flavors of LLM have issues with reliability. In software we expect five 9s, LLMs aren't even a one 9. Early on it was reliability of them writing JSON output. Then instruction following. Then tool use. Now it's "computer use" and orchestration.

Creating models for this specific problem domain will have a better chance at reliability, which is not a solved problem.

Jules is the gemini coder that links to github. Half the time it doesn't create a pull request and forgets and assumes I'll do some testing or something. It's wild.