I also suspect as much, but obviously can't know for sure. IMHO it's intellectually lazy if not dishonest to benchmark against 3.5 and not make that fact clearly known upfront
A better benchmark would have had two entries for ChatGPT, showing both 3.5 and 4 results
I also suspect as much, but obviously can't know for sure. IMHO it's intellectually lazy if not dishonest to benchmark against 3.5 and not make that fact clearly known upfront
A better benchmark would have had two entries for ChatGPT, showing both 3.5 and 4 results
The agenda of not wanting to pay for something just to test it out when there is a free version?
The agenda of using the significantly shitty version to try to paint it in a poor light.