Comment by anshumankmr

5 months ago

Yeah. God knows. I was really surprised to see the Fchollet's benchmark being aced months ago, but whatever their internal QA was perhaps lacking. I was asking some fairly simple code, that too in Python, using Scikit learn for which I presume there must be a lot of training data, it for some reason, changed the casing of the columns, and didn't follow my instructions as I asked it, cause the function was being rewritten to reduce bloat, along with other random things I didn't ask it for.

2 comments

anshumankmr

jjani 5 months ago

Everyone games the benchmarks, but a lot is pointing towards both Meta and OpenAI going to even further lengths than the others.

anshumankmr 5 months ago

I am however wondering if this is o3-preview or o3? I have had wildly fluctuating experiences when I used the preview models previously, esp. the GPT4-Turbo previews though the GPT4-Turbo/V/o were a lot more stable.