← Back to context

Comment by Tostino

2 months ago

I don't know, Gemini 2.5 has been the only model that's been able to not consistently make fundamental mistakes with my project as I've been working with it over the last year. Claud 3.7, 4.0, and 4.5 are not nearly as good. I gave up on chatgpt a couple years ago so I have no idea how they perform. They were bad when I quit using it.

Do you find that Gemini results are slightly different when you ask the same question multiple times? I found it to have the least consistently reproducible results compared to others I was trying to use.

  • Sometimes it will alternate between different design patterns for implementing the same feature on different generations.

    If it gets the answer wrong and I notice it, often just regenerating will get past it rather than having to reformulate my prompt.

    So, I'd say yeah...it is consistent in the general direction or understanding, but not so much in the details. Adjusting temp does help with that, but I often just leave it default regardless.

I use all of them about equally, and I don't really want to argue the point, as I've had this conversation with friends, and it really feels like it is becoming more about brand affiliation and preference. At the end of the day, they're random text generators and asking the same question with different seeds gives different results, and they're all mostly good.