Comment by lmeyerov
17 hours ago
Curious what kinds of evals you focus on?
We're finding investigating to be same-but-different to coding. Probably the most close to ours that has a bigger evals community is AI SRE tasks.
Agreed wrt all these things being contextual. The LLM needs to decide whether to trigger tools like self-planning and todo lists, and as the talk gives examples of, which kind of strategies to use with them.
No comments yet
Contribute on Hacker News ↗