Comment by conception

13 days ago

https://clocks.brianmoore.com

K2 is one of the only models to nail the clock face test as well. It’s a great model.

Kimi 2 is remarkably consistently the best. I wonder if it's somehow been trained specifically on tasks like these. It seems too consistent to be coincidence

Also shocking is how the most common runner up I've seen is DeepSeek