← Back to context

Comment by embedding-shape

1 day ago

No, I'm not sure how that'd make sense. Either you're making the correct (expected) calculations, or you're getting it wrong. Depending the type of wrong or how wrong, could go from "used #2 in attention instead of #1" so "blue" instead of "Blue" or whatever, to completely incoherent text and garbled output.

I accept errors are more likely to decrease "intelligence". But I don't see how increased load, through batching, is any more likely to increase than decrease errors.