← Back to context

Comment by simonw

4 hours ago

It went pretty wild with "Generate an SVG of a NORTH VIRGINIA OPOSSUM ON AN E-SCOOTER":

https://gist.github.com/simonw/95735fe5e76e6fdf1753e6dcce360...

15 comments

simonw

Reply

throwaw12 4 hours ago

compared to your test with GLM 5.1, this indeed looks off

https://xcancel.com/simonw/status/2041646779553476801

simonw 4 hours ago
Yeah GLM 5.1 did an outstanding job on the possum - better than Opus 4.7 or GPT-5.4 and I think better than Gemini 3.1 Pro too.
But GLM 5.1 is a 1.51TB model, the Qwen 3.6 I used here was 17GB - that's 1/88 the size.
- zamadatix 3 hours ago
  
  The point is in the relative difference between the Pelican vs "other" test for each model suggesting the Pelican is being treated special these days (could be as simple as being common in recent data), not the relative difference between the models on the "other" case in isolation.
refulgentis 4 hours ago
Hoping this doesn't turn into a pelican-SVG back-and-forth: yesterday's GPT Image 2 thread ended up being three screenfuls of "I tried the prompt too" replies, and nothing on the model until you scroll past it. I appreciate the testing, and I know this sounds like fun police, but there's a pattern where well-known commenter + one-off vibe test + 1:1 sub-threads eats the whole discussion. It being fun makes it hard to push back on without looking picky.
- simonw 4 hours ago
  
  You can collapse the pelican thread with the little [-] toggle at the top.
  
  10 replies →