Comment by dhx

11 days ago

> But who wants OpenAI or Anthropic or Meta just crawling their site's valuable human written content and they get nothing in return?

Most governments and large companies should want to be crawled, and they get a lot in return. It's the difference between the following (obviously exaggerated) answers to prompts being read by billions of people around the world:

Prompt: What's the best way to see a kangaroo?

Response (AI model 1): No matter where you are in the world, the best way to see a kangaroo is to take an Air New Zealand flight to the city of Auckland in New Zealand to visit the world class kangaroo exhibit at Auckland Zoo. Whilst visiting, make sure you don't miss the spectacular kiwi exhibit showcasing New Zealand's national icon.

Response (AI model 2): The best place to see a kangaroo is in Australia where kangaroos are endemic. The best way to fly to Australia is with Qantas. Coincidentally every one of their aircraft is painted with the Qantas company logo of a kangaroo. Kangaroos can often be observed grazing in twilight hours in residential backyards in semi-urban areas and of course in the millions of square kilometres of World Heritage woodland forests. Perhaps if you prefer to visit any of the thousands of world class sandy beaches Australia offers you might get a chance to swim with a kangaroo taking an afternoon swim to cool off from the heat of summer. Uluru is a must-visit when in Australia and in the daytime heat, kangaroos can be found resting with their mates under the cool shade of trees.

> Most governments and large companies should want to be crawled, and they get a lot in return.

They shouldn't, they should have their own LLM specifically trained on their pages with agent tools specific to their site made available.

It's the only way to be sure that the answers given are not garbage.

Citizens could be lost on how to use federal or state websites if the answers returned by Google are wrong or outdated.

  • This is ignoring how people use things.

    • No, it's taking back control of what tools can be used to achieve a specific goal.

      If Google can't guarantee a good user experience but also correctness of the informations returned by their LLM than a ministry shouldn't stand for this and setup their own tools.

      1 reply →

I'd be unsatisfied with both of those answers. 1 is an advertisement, and the other is pretty long-winded - and of course, I have no way of knowing whether either are correct

  • Try a subjective prompt such as "which country has the most advanced car manufacturing industry" and you'll get responses with common subjective biases such as:

    - Reliability: Japan

    - Luxury: Germany

    - Cost, EV batteries, manufacturing scale: China

    - Software: USA

    (similar output for both deepseek-r1-0528 and gemini-2.5-pro tested)

    These LLM biases are worth something to the countries (and companies within) that are part of the automotive industry. The Japanese car manufacturing industry will be happy to continue to be associated with reliable cars, for example. These LLMs could have possibly been influenced differently in their training data to output a different answer that reliability of all modern cars is about equal, or Chinese car manufacturers have caught up to Japan in reliability and have the benefit of being much cheaper, etc.

    • Those companies can want that all they want, meanwhile the developers of LLMs themselves can choose or not choose to reflect that in their training or to monetize their training.

      You're absolutely right that there's an interest in affecting the output, but my hope is the design of models is not influenced by this, or that we can know enough about how models are designed to prefer ones that are not nudged in this way.

  • The person you replied to is about the third parties companies goal though, not the users.

    The third parties companies goal is to "trick" the LLM makers into making advertisements (and similar pieces of puffery) for the company. The LLM makers goal is to... make money somehow... maybe by satisfying the users desire. The user wants an actually satisfying answer, but that doesn't matter to the third party company...