Comment by embedding-shape
18 hours ago
No, I'm not sure how that'd make sense. Either you're making the correct (expected) calculations, or you're getting it wrong. Depending the type of wrong or how wrong, could go from "used #2 in attention instead of #1" so "blue" instead of "Blue" or whatever, to completely incoherent text and garbled output.
I accept errors are more likely to decrease "intelligence". But I don't see how increased load, through batching, is any more likely to increase than decrease errors.