Comment by lelanthran

5 hours ago

> > …but as a user, I would much rather wait an extra 200ms for my slow/expensive prompt to be accurate

> This is the opposite of the feedback I get. Users want instant responses.

I am skeptical that you are getting feedback that users prefer instant wrong results to 200ms-lag correct results.

Deeply skeptical!

Oh, I can absolutely believe it. Humans are deeply irrational, especially about things that mess about in time frames too short for our conscious thought processes to kick in. Instant but confident sounding (and confident sounding because it's instant) will beat slower every time. You don't know which is correct until a long time after you've made a decision to trust it, or whether you like it.

  • > Instant but confident sounding (and confident sounding because it's instant) will beat slower every time.

    Sure, but I am skeptical that users are actually saying "I prefer wrong answers over lag", which is what the post I responded to implied.

    This is different to user's saying "I prefer quick answers to laggy answers", which is what I presume they may have said.

    To actually settle this, the feedback must answer the question "Do you want wrong answers quickly or correct answers with an added 0.2 second delay?" because, well, those are the only two options right now.

    • Dunno. Feels like stated vs revealed preferences to me. Of course everyone will _say_ they want the wrong answers, but I can totally see users getting annoyed at slow responses, thinking that the developers should've traded accuracy for quicker responses. (or not thinking that at all, just demanding quicker responses unconditionally)

100% agree. Sounds like they're either asking the wrong questions, or quoting answers selectively to suit this argument.

I would be punching my phone if the stupid network causing a wrong prompt and the LLM sends me unrelated answers. Correctness should be foundational no matter what, then improve the latency as best as possible. We all understand that if the network is bad then the latency can not be guaranteed but correctness should be.

Especially when 200ms is the rule of thumb for things still feeling "instant" to users in terms of UX, this is like a rounding error in terms of latency when I regularly wait for actual minutes for an LLM to finish its bloody thinking and have to refresh through several "we're experiencing heavy load" errors.