← Back to context

Comment by secabeen

2 years ago

This is a decent summary. I've been thinking about how ChatGPT by it's very nature destroys context and source reputation. When I search for something on the Internet, I get a link to the original content, which I can then evaluate based on my knowledge and the reputation of the original source. Wikipedia is the same, with a big emphasis on citation. ChatGPT and other LLMs destroy that context and knowledge, giving me no tools to evaluate the sources they're using.

So it's more like talking to a person.

If somebody asked me how heap sort works (my favorite sort!) I can sketch it out. If they ask me where I learned it, I really don't remember. Might be the Aho, Hopcroft, and Ullman book. I can't really say though.

  • Yes, and then I'll evaluate that answer by your reputation, either socially, organizationally, or publicly. I will value that summary differently if you are a random person on the street, a random person who works at a tech company, or a person wearing a name tag that says "Donald Knuth, Stanford University".

    ChatGPT has little reputation of its own, and produces such a broad swath of knowledge, it becomes "Jack of all trades, master of none."

    • The "jack of all trades, master of none" heuristic works well for humans because given our limited lifespans and rate we assimilate knowledge, it's nearly impossible for someone to be both.

      ChatGPT in later iterations CAN be a jack of all trades AND a master of many (most? all?) of them.

What's interesting is that Microsoft's implementation of ChatGPT in Bing seems to include linking to references, which is a good step forward in my opinion.

  • The references seem wrong though. I'm looking at the response to a demo Bing query, "What cars should I consider buying that are AWD, go 0-60 in less than 6 seconds, seat 6 or more and have decent reviews?"

    > The 2022 Kia Telluride is a midsize SUV that can seat up to eight passengers and has an AWD option. It has a 3.8-liter V6 engine that produces 291 hp and 262 lb-ft of torque. It can accelerate from 0 to 60 mph in 7.1 seconds [10] and has a combined fuel economy of 21 mpg. It also has excellent reviews from critics and owners, and won several awards, including the 2020 World Car of the Year [7].

    [10] https://www.topspeed.com/cars/guides/best-awd-cars-for-2022/

    [7] https://www.hotcars.com/best-6-seater-suvs-2022/

    The references don't back up the 7.1 seconds or World Car of the Year claims.

I would love to know their plan for having new facts propagate into these models.

My idle speculation makes me think this is a hard problem. If ChatGPT kills Search it also kills the websites that get surfaced by search that were relying on money from search-directed users. So stores are fine, but "informational" websites are probably in for another cull. Paywall premium publications are probably still fine - the people currently willing to pay for new, somewhat-vetted, human content still will be. But things like the Reddits of the world might be in for a hard time since all those "search hack" type uses of "search google for Reddit's reviews of this product" are short-circuited, if this is actually more effective than search.

Meanwhile, SEO folks will probably try to maximize what they can get out of the declining search market by using these tools to flood the open web with even more non-vetted bullshit than it's already full of.

So as things change, how does 1 tiny-but-super-reliable amateur website (say, an expert on .NET runtime internals's individual blog) make a dent in the "knowledge" of the next iteration of these models? How do they outweigh the even-bigger sea of crap that the rest of the web has now become when future training is done?

  • The other interesting thing is that if people stop using websites, then it reduce revenue for those websites > then development of new pages and sources stops/slows, how does ChatGPT improve? If the information for it to learn isn't there.

    We need the source information to be continually generated in order for ChatGPT to improve.

  • I almost never find these kind of websites with search, except I already know there might be one on a specific topic.

    The way I find them is from forums, chat, links from other such sites. All goes into my RSS reader.

    I use search with !wiki or !mdn etc. most of the time.

Yes, we need some kind of tool like Explain Plan for databases. Except that for any AI response we can understand more the decision plan and sources.

The sources are there in the training dataset, they are just not linked to the response. I don't think this is an inherent property of LLMs though, and I imagine future iterations will have some sort of attention mechanism that highlights the contributing source materials.