← Back to context

Comment by beering

9 hours ago

I’m a little confused as to the setup. It was asking each model to one-shot a script and then the scripts faced off? Were the models given a computer environment? Or a test server to iterate against?

Sounds incredibly simple to me. One-shot.

  • So nothing like real-world coding, where you’d be able to run and test the script before submitting?

    • One shot just means the user doesn’t have to iterate on it via the agent. The agent does what ever it needs to deliver the best outcome, including its own running and iteration until it’s happy with it. This could be a short or long process potentially depending on the task.