"how many of those shapes are rectangles?" "sounds like zero unless they are squares"
Adding "unless" to a statement makes it vacuous if the latter clause is weaker than the first clause. I find it hard to believe that a company willing to violate licenses would have scruples about lying about it.
Not vacuous, but tautological.
Which is different, because tautologies can actually be quite directly informative. Whereas vacuous truths tend to be oblique.
Also, “Microsoft is lying” is not a logically stronger statement, because they might be lying about something other than whether they distilled or trained on AI output.
So, laundered data?
> with AI-generated content excluded from pre-training.
> without distillation from third-party models
sounds like zero unless they are lying.
> with AI-generated content excluded from pre-training.
Though this is largely impossible these days, unless they pre-trained on pre-AI era data.
That could be. Just use pre-training for language understanding and let the post-training on synthetic data do the heavy lifting.
"how many of those shapes are rectangles?" "sounds like zero unless they are squares"
Adding "unless" to a statement makes it vacuous if the latter clause is weaker than the first clause. I find it hard to believe that a company willing to violate licenses would have scruples about lying about it.
Not vacuous, but tautological. Which is different, because tautologies can actually be quite directly informative. Whereas vacuous truths tend to be oblique.
Also, “Microsoft is lying” is not a logically stronger statement, because they might be lying about something other than whether they distilled or trained on AI output.
Adding "unless" to a statement makes it vacuous if the latter clause is weaker than the first clause
I think that's the point. "How do I say they're lying without outright saying they're lying?"
It's a common rhetorical trick.
1 reply →
“ We trained it from the ground up on enterprise grade, clean and commercially licensed data, without distillation from third-party models.”
aka all of GitHub OSS
Not OSS only, likely also the enterprise private repos, with a lot of business secrets.
Yeah this is exactly what I was thinking.