Updated LLM Benchmark (Gemini 3 Flash)

15 days ago (entropicthoughts.com)

5 comments

surprisetalk

Good article. So this is sort of a tangent, but here's a bit of advice as someone who makes heavy use of GenAI imagery in the service of articles that I write.

Never use out-of-the-box images of CRT computers - 99% of the time the keyboards are an ergonomic train wreck and the text on the screen is a smeary blurry mess.

See the image here for a good example:

https://mordenstar.com/blog/win9x-hacks

This is a combination of a GenAI image from NB Pro layered with a loop of the Win95 start sequence into a single animated gif. Notice I sidestepped the inclusion of a keyboard altogether.

Now, more topically: since the actual list of IF games doesn't appear to be a secret, I think it would have been better to feature it more prominently in the article, rather than tucking it away in a side note in the footer.

kqr 14 days ago
> Never use out-of-the-box images of CRT computers
Thanks for the feedback! I'm very new to GenAI imagery and still finding my feet.
Seeing the results, I definitely considered compositing a real photograph of a computer with the rest of the landscape, but ended up deciding against it on account of a lack of time.
- vunderba 14 days ago
  
  Cool - GenAI image generation is a deep rabbit hole that you're about to fall into!
  Super happy that you pit LLMs against relatively recent IF to mitigate cheating through pre-existing training data as well.
  FYI I've been running a SOTA model comparison site for about a year now that looks at prompt adherence across local (Qwen-Image, Flux) vs proprietary (NB Pro, Seedream) that might help give an idea where the capabilities are today.
  https://genai-showdown.specr.net
  
  2 replies →