Comment by genewitch

15 hours ago

> there's a strong correlation between length of response and accuracy

i'd need to see real numbers. I can trigger a thinking model to generate hundreds of tokens and return a 3 word response (however many tokens that is), or switch to a non-thinking model of the same family that just gives the same result. I don't necessarily doubt your experience, i just haven't had that experience tuning SD, for example; which is also xformer based

I'm sure there's some math reason why longer context = more accuracy; but is that intrinsic to transformer-based LLMs? that is, per your thought that the 'scalers want shorter responses, do you think they are expending more effort to get shorter, equivalent accuracy responses; or, are they trying to find some other architecture or whatever to overcome the "limitations" of the current?