Comment by Pamar

6 days ago

Partially OT:

Yesterday I asked Chat GPT which was the Japanese Twin City for Venice (Italy). This was just a quick offhand question because I needed the answer for a post on IG, so not exactly a death or life situation.

Answer: Kagoshima. It also added that the "twin status" was officially set in 1965, and that Kagoshima was the starting point for the Jesuit Missionary Alessandro Valignano in his attempt to proselitize Japanese people (to Catholicism, and also about European Culture).

I never heard of Kagoshima, so I googled for it. And discovered it is the twin city of Neaples :/

So I then googled for "Venice Japanese Twin City" and got: Hiroshima. I doublechecked this then I went back to ChatGPT and wrote:

"Kagoshima is the Twin City for Neaples.".

This triggered a websearch and finally it wrote back:

"You are right, Kagoshima is Twin City of Neaples since 1960."

Then it added "Regarding Venice instead, the twin city is Hiroshima, since 2023".

So yeah, a Library of Alexandria that you can count on as long as you have another couple of libraries to doublechek whatever you get from it. Note also that this was very straightforward question, there is nothing to "analyze" or "interpret" or "reason about". And yet the answer was completely wrong, the first date was incorrect even for Neaples (actually the ceremony was in May 1960) and the extra bits about Alessandro Valignano are not reported anywhere else: Valignano was indeed a Jesuit and he visited Japan multiple times, but Kagoshima is never mentioned when you google for him or if you check his wikipedia page.

You may understand how I remain quite skeptical for any application which I consider "more important than an IG title".

Claude 4 Opus:

> Venice, Italy does not appear to have a Japanese twin city or sister city. While several Japanese cities have earned the nickname "Venice of Japan" for their canal systems or waterfront architecture, there is no formal sister city relationship between Venice and any Japanese city that I could find in the available information

I think GPT-4o got it wrong in your case because it searched Bing, and then read only fragments of the page ( https://en.wikipedia.org/wiki/List_of_twin_towns_and_sister_... ) to save costs for processing "large" context

  • I am Italian, and I have some interest in Japanese history/culture.

    So when I saw a completely unknown city I googled it up because I was wondering what it actually had in common with Venice (I mean, a Japanese version of Venice would be a cool place to visit next time I go to Japan, no?).

    If I wanted to know, I dunno, "What is the Chinese Twin City for Buenos Aires" (to mention two countries I do not really know much about, and do not plan to visit in the future) should I trust the answer? Or should I go looking it up somewhere else? Or maybe ask someone from Argentina?

    My point is that even as a "digital equivalent of the Library of Alexandria" LLM seem to be extremely unreliable. Therefore - at least for now - I am wary about using them for work, or for any other area where I really care for the quality of the result.

If I want facts that I would expect the top 10 Google results to have, I turn search on. If I want a broader view of a well known area, I turn it off. Sometimes I do both and compare. I don’t rely on model training memory for facts that the internet wouldn’t have a lot of material for.

40 for quick. 40 plus search for facts. O4-mini high plus search for “mini deep research”, where it’ll hit more pages, structure and summarise.

And I still check the facts and sources to be honest. But it’s not valueless. I’ve searched an area for a year and then had deep research find things I haven’t.

What model?

People often say "I asked ChatGPT something and it was wrong", and then you ask them the model and they say "huh?"

The default model is 4.1o-mini, which is much worse than 4.1o and much much worse than o3 at many tasks.

  • Yup. The difference is particularly apparent with o3, which does bursts of web searches on its own whenever it feels it'll be helpful in solving a problem, and uses the results to inform its own next steps (as opposed to just picking out parts to quote in a reply).

    (It works surprisingly well, and feels mid-way between Perplexity's search and OpenAI's Deep Research.)

  • I asked "What version/model are you running, atm" (I have a for-free account on OpenAI, what I have seen so far will not justify a 20$ monthly fee - IN MY CASE).

    Answer: "gpt-4-turbo".

    HTH.

    • Don't ask the model, just look at the model selection drop-down (wherever that may be in your UI)

    • >I have a for-free account on OpenAI, what I have seen so far will not justify a 20$ monthly fee - IN MY CASE

      4.1o-mini definitely is not worth $20/month. o3 probably is (and is available in the $20/month plan) for many people.