Comment by intervieweratg

1 year ago

Interesting, any reason to not use reasoning models? Is there anything 4o seems better at with respect to coding?

I typically use o1 or o3-mini, but I am seeing that they just released an agent mode and, honestly, I think it depends on what you use it for. I don’t think the agent mode is going to be useful for me. Typically my tasks are quite pedestrian, like I don’t know how to use a certain regex format, I need a python script to print list of directories, etc.

My main issue (which is not really covered in the paper) is that it’s not clear what models are most aligned to my work; by this I mean not lazy and willing to put in the required work, not incentivized to cheat, etc. So I’ll use them for the very small tasks (like regex) or the very big ones (like planning), but still don’t use them for the “medium” tasks that you’d give an intern. It’s not clear to me how they will operate totally unsupervised, and I think more benchmarking for that would be incredible.