Comment by fallingsquirrel
1 year ago
I just tried this locally with llama3-8b and it handled it fine. Claude 3 Sonnet passes your test too, in case you don't have the hardware for a local model. You might want to consider moving on from ChatGPT since their models have been RLHF'd to hell in the name of "safety" and are falling behind in many ways as a result.
---
transcript:
$ ollama run llama3:8b
>>> tell me an interesting fact about etymology
Here's one:
Did you know that the word "quarantine" has a fascinating etymological history?
The word "quarantine" comes from^C
>>> tell me an interesting fact about etymology. just the fact, don't mention it's interesting.
The word "robot" originated from the Czechoslovakian word "robota," which means "forced labor" or "drudgery." This term was used in Karel Čapek's 1920 play "R.U.R." (Rossum's Universal Robots), where it referred to artificial workers.
Why is any of this "interesting"?
Is the response interesting because you don't know it? —How does it know this?
Is the response interesting because you do know it or might have offered it yourself? —Confirmation bias.
Is it interesting because a lot of training references contextualize it as "interesting?" —Begged question.
Is it contextually interesting? —What is the context? A robot refers to robots? How unexpected...
Is it interesting within the narrow confines of LLM adaptations to a scope of inputs?
Can their by any more damning claim of the general suitability of the technology as an oracle than different users using the same prompts and getting inexplicably contrary results?
If trivial prompt alignments result in appropriate vs inappropriate responses, this destroys confidence for every response.
What am I missing?
Pretty sure the point here was Llama3 respecting the command to not mention that this is interesting, not adding filler, rather than the output fact being interesting or not.
You are missing that this is precisely what we would expect a human to answer without further context (for instance without knowing how much you know about the topic).
A human would pick similarly pick something which isn't too nerdy but also not obvious and the LLM did well here.
If the LLM can fail that is fine, because the task is inherently hard.
My comment about what's "interesting" or not was an attempt cast out interesting responses as not offering a way forward to any qualitative evaluation of AI behavior. To be interesting is a quality of the those who regards, not the situation under regard.
Do you find it interesting that some LLMs routinely qualify responses to prompts to report something interesting with a statement that the response is interesting which can't reliably be suppressed by including a sub-prompt requesting suppression?
I don't, because I have no idea why I should expect any prompt to produce any sort of response.
I spent a few days goofing around with Stable Diffusion and found it frustrating because it could render a response to some prompts that that I found relevant and satisfying, but I couldn't get it to reliably render my intentions. I soon encountered obvious limits of its training set, and the community is adapting to these limits with with networks of domain-specific accessory models.
This experience greatly tempered my expectations: I see AI as a magic paintbrush or story reader. I see no evidence of thinking machine.
If we're going to establish an equivalence comparison between any AI and humans we need a theory for both.
I have yet to see a coherent theory of the AI but I believe there in such in a language I don't understand, just as there's a theory of Conway's Game of Life, which leads to continual fascination with the machine's behavior.
But I've been unable to find any theory of the human, nor will I expect any such theory, because to my eyes life looks like a realm of complexity incomparable any game.
I do have interest in seeing nerds struggle to explain AI, but am surprised that after several years no common vernacular from which a theory might be assembled has yet to appear.
An open-ended article about what AIs can't do seems hopelessly daft. It has already been formally established there are domains of what computation can never do. So to be interesting, a treatment of the limits of AI, being a form of a computer, had better start with a consideration of those domains. But this article does not, nor do any of the comments.
So whatever is going on with this discourse, it appears to me to have nothing to do with understanding of AIs.
The RUR thing is basically because that specific example is used as an example of interesting etymology.