Comment by thegeomaster

1 day ago

> Since then OpenAI has refreshed the weights behind the “gpt-4.1” alias a couple of times, and one of those updates fixed the em-dash miss.

I don't know where you are getting this information from... The only snapshot of gpt-4.1 is gpt-4.1-2025-04-14 (mid-April), and the gpt-4.1 alias still points to it [1].

Just to be sure, I re-ran my test specifying that particular snapshot and am still getting a 100% pass rate.

[1]: https://platform.openai.com/docs/models/gpt-4.1

Right, the 4.1 training checkpoint hasn’t moved. What has moved is the glue on top: decoder heuristics / safety filters / logit-bias rules that OpenAI can hot-swap without re-training the model. Those “serving-layer” tweaks are what stomped the obvious em-dash miss for short, clean prompts. So the April-14 weights are unchanged, but the pipeline that samples from those weights is stricter about “don’t output X” than it was on day one. By all means, keep trying to poke holes! I’ve got nothing to sell; just sharing insights and happy to stress-test them.