Comment by musebox35
8 days ago
I work on open source text-to-image finetuning of open source models like zimage/flux2 klein 4b and inference time latency optimization. The moment I read the silent treatment, I went ahead and cancelled my subscription too since I would never know whether the models they launch will silently corrupt my output. This is totally unacceptable. There is a big difference between silent / flagged if you are doing ml research but not at frontier capability.
This goes on to show that - All that interpretability / safety research they are doing can also be weaponized against customers (steering vectors, intent classification, ...) in the name of safety from malicious actors. - If they deem profitable, they might nerf to original model and its training data for ml research at a bulk scale and then they won't even have to announce it so long as the overall benchmark score stays high enough.
As the IPOs get closer, they can do whatever they want to assure the investors that they have a moat that can not be crossed over by their own products. Considering this affects all ML researchers/students at universities, smaller scale research labs, this is just "cutting the branch you are sitting on".
I think all this started with post opus 4.5, that's when claude started wrecking my shit without extreme oversight. Codebases it was making positive contributions to before were slowly and constantly being eroded and wrecked. Give it tasks in isolation? still does well, but the moment it sees the bigger picture, it goes to shit. I chalked it up to a bad model but this makes it all seem like it may have been by design in retrospect.
Constraint decay is an issue with all LLM-based agentic development, at least for now.
Humans can maintain a long- and medium- term memory of constraints that they consciously (or subconsciously!) apply to the code that they write. The current crop of AIs are all amnesiacs, like the protagonist in Memento, falling back onto general instead of institutional knowledge.
For now, we are safe. We can rent out our meat brains for money for a little while longer.
Next year? Who knows...
> I would never know whether the models they launch will silently corrupt my output
You never knew to begin with, now you have an explicit reason to realize this. Any black box run entirely out of your control, where you can never verify the output, is subject to the same suspicion.
True enough, but that is true for all the products I buy. I do not expect to control every product I own. For some I prefer to have more control, for others I just need something that works out of the box. There is always an initial bias for trust when you buy something otherwise you would not spend your hard earned money on it.
“Fool me once, shame on you. Fool me twice, shame on me. Fool me three times, shame on both of us.” -- S. King
> but that is true for all the products I buy
Some things are more obscure than others. It's easier to trust and verify Office SaaS than AI SaaS. The determinism and obviousness of most other activities make them less susceptible to hidden interference. AI run by someone else is the next level of black box for users compared to most other objects or services we usually interact with.