Comment by beering

9 hours ago

I’m a little confused as to the setup. It was asking each model to one-shot a script and then the scripts faced off? Were the models given a computer environment? Or a test server to iterate against?

3 comments

beering

rpmisms 9 hours ago

Sounds incredibly simple to me. One-shot.

beering 8 hours ago
So nothing like real-world coding, where you’d be able to run and test the script before submitting?
- procinct 8 hours ago
  
  One shot just means the user doesn’t have to iterate on it via the agent. The agent does what ever it needs to deliver the best outcome, including its own running and iteration until it’s happy with it. This could be a short or long process potentially depending on the task.