← Back to context

Comment by jodrellblank

2 days ago

> LMs don't do this. Instead, every question is immediately responded with extreme confidence with a paragraph or more of text.

Having just read a load of Quora answers like this, which did not cover the thing I was looking for, that is how humans on the internet behave and how people have to write books, blog posts, articles, documentation. Without the "dance" to choose a path through a topic on the fly, the author has to take the burden of providing all relevant context, choosing a path, explaining why, and guessing at any objections and questions and including those as well.

It's why "this could have been an email" is a bad shout. The summary could have been an email, but the bit which decided on that being the summary would be pages of guessing all the things which what might have been in the call and which ones to include or exclude.

This is a recent phenomenon. It seems most of the pages today are SEO optimized LLM garbage with the aim of having you scroll past three pages of ads.

THe internet really used to be efficient and i could always find exactly what i wanted with an imprecise google search ~ 15 years ago.

  • Don’t you get this today with AI Overviews summarizing everything on top of most Google results?

    • The AI Overviews are... extremely bad. For most of my queries, Google's AI Overview misrepresents its own citations, or almost as bad, confidently asserts a falsehood or half-truth based on results that don't actually contain an answer to my search query.

      I had the same issue with Kagi, where I'd follow the citation and it would say the opposite of the summary.

      A human can make sense of search results with a little time and effort, but current AI models don't seem to be able to.

      1 reply →

    • I find myself skipping the AI overview like I used to skip over "Sponsored" results back in the day, looking for a trustworthy domain name.

    • Those AI overviews are dumb and wrong so often I have cut them out of the results entirely. They're embarrassing, really.

    • It’s fine about 80% of the time, but the other 20% is a lot harder to answer because of lower quality results.

    • From a UX perspective, the AI overview summary being a multi-paragraph summary makes sense since that was a single query that isn't expected to have conversational context. Where it does not make sense is in conversation-based interfaces. Like, the most popular product is literally called "chat".

      "I ask a short and vague question and you response with a scrollbar-full of information based on some invalid assumptions" is not, by any reasonable definition, a "chat".

  • You'd think with the reputation of LLMs being trained on Twitter (pre-Musk radicalization) and Reddit, they'd be better at understanding normal conversation flow since twitter requires short responses and Reddit... while Wall of Text happens occasionally, it's not the typical cadence of the discussion.

    • Reddit and Twitter don't have human conversations. They have exchanges of confident assertions followed with rebuttals. In fact, both of our comments are perfect demonstrations of exactly that too. Fairly reflective of how LLMs behave — except nobody wants to "argue" with an LLM like Twitter and Reddit users want to.

      This is not how humans converse in human social settings. The medium is the message, as they say.

    • Twitter, Reddit, HN don't always have the consistency of conversation that two people talking do.

      Even here, I'm responding to you on a thread that I haven't been in on previously.

      There's also a lot more material out there in the format of Stack Exchange questions and answers, Quora posts, blog posts and such than there is for consistent back and forth interplay between two people.

      IRC chat logs might have been better...ish.

      The cadence for discussion is unique to the medium in which the discussion happens. What's more, the prompt may require further investigation and elaboration prior to a more complete response, while other times it may be something that requires story telling and making it up as it goes.

Interesting. Like many people here, I've thought a great deal about what it means for LLMs to be trained on the whole available corpus of written text, but real world conversation is a kind of dark matter of language as far as LLMs are concerned, isn't it? I imagine there is plenty of transcription in training data, but the total amount of language use in real conversational surely far exceeds any available written output and is qualitatively different in character.

This also makes me curious to what degree this phenomenon manifests when interacting with LLMs in languages other than English? Which languages have less tendency toward sycophantic confidence? More? Or does it exist at a layer abstracted from the particular language?

That's part of it, but I think another part is just the way the LLMs are tuned. They're capable of more conversational tones, but human feedback in post-training biases them toward a writing style that's more of a Quora / StackOverflow / Reddit Q&A style because that's what gets the best ratings during the RLHF process.