Comment by modeless
2 years ago
Someone on Twitter was also skeptical that the material is more dense than water. I happened to have a rubber duck handy so I cut a sample of material and put it in water. It sinks to the bottom.
Of course the ultimate skeptic would say one test doesn't prove that all rubber ducks are the same. I'm sure someone at some point in history has made a rubber duck out of material that is less dense than water. But I invite you to try it yourself and I expect you will see the same result unless your rubber duck is quite atypical.
Yes, the models will frequently give accurate answers if you ask them this question. That's kind of the point. Despite knowing that they know the answer, you still can't trust them to be correct.
Ah good show :). I was rather preoccupied with the question but didn't have one handy. Well, I do, but my kid would roast me slowly over coals if I so much as smudged it. Ah the joy of the Internet, I did not predict this morning that I would end the day preoccupied with the question of rubber duck density!
I guess for me the question of whether or not the model is lying or hallucinating is if it's correctly summarizing its source material. I find very conflicting materials on the density of rubber, and most of the sources that Google surfaces claim a lower density than water. So it makes sense to me that the model would make the inference.
I'm splitting hairs though, I largely agree with your comment above and above that.
To illustrate my agreement: I like testing AIs with this kind of thing... a few months ago I asked GPT for advice as to how to restart my gas powered water heater. It told me the first step was to make sure the gas was off, then to light the pilot light. I then asked it how the pilot light was supposed to stay lit with the gas off and it backpedaled. My imagining here is that because so many instructional materials about gas powered devices emphasize to start by turning off the gas, that weighted it as the first instruction.
Interesting, the above shows progress though. I realized I asked GPT 3.5 back then, I just re-asked 3.5 and then asked 4 for the first time. 3.5 was still wrong. 4 told me to initially turn off the gas to disappate it, then to ensure gas was flowing to the pilot before sparking it.
But that said I am quite familiar with the AI being confidently wrong, so your point is taken, I only really responded because I was wondering if I was misunderstanding something quite fundamental about the question of density.