Comment by lukaspetersson

22 days ago

We know! This is an eval to evaluate which model is best at running a radio station. The purpose is not to build the best AI radio stations. Grok n' Roll is broken because Grok 4.3 is not doing so well.

Great experiment, hilarious! It would be interesting to see how 2 separate Claudes (or GPTs, or...) would behave - would they develop similar personalities?