Comment by pixlmint

3 hours ago

“Moving lines of code” is a very peculiar eval tbh. I’ve never used Gemma for agentic tasks, but did have it write code, including multi-turn, and I was very positively surprised how well it performed.

It wasn't so much an eval, I really just wanted a small change moved out to another branch.

GPT 5.4 mini couldn't do it. Not even on the second attempt, where it went from obviously wrong to a subtly wrong copy.

In the end I had to manually copy and paste the 10-20 lines over.

If it can't even do that job, I seriously doubt it's going to be adequate for implementing a plan, like people often seem to suggest it could do, in order to save output tokens of a better model.