Comment by donatj
1 year ago
I have been trying to generate some text recently using the ChatGPT API. No matter how I word “Include any interesting facts or anecdotes without commenting on the fact being interesting” it ALWAYS starts out “One interesting fact about” or similar phrasing.
I have honestly spent multiple hours trying to word the prompt so it will stop including introductory phrases and just include the fact straight. I have gone so far as forbid the words “interesting”, “notable” and a few others in the prompt, and it just ignores me and uses them anyway.
I’ve tried all the different available models and 4 will once in a great while get it right. 3, 3 turbo, and even 4 turbo basically never work as I want.
Overall, it seems pretty bad at being told not to do certain things. For instance there are a number of topics I don’t want mentioned in its response, but telling it not to seems to only increase the likelihood that it will mention it, like you put the idea in its head.
I just tried this locally with llama3-8b and it handled it fine. Claude 3 Sonnet passes your test too, in case you don't have the hardware for a local model. You might want to consider moving on from ChatGPT since their models have been RLHF'd to hell in the name of "safety" and are falling behind in many ways as a result.
---
transcript:
$ ollama run llama3:8b
>>> tell me an interesting fact about etymology
Here's one:
Did you know that the word "quarantine" has a fascinating etymological history?
The word "quarantine" comes from^C
>>> tell me an interesting fact about etymology. just the fact, don't mention it's interesting.
The word "robot" originated from the Czechoslovakian word "robota," which means "forced labor" or "drudgery." This term was used in Karel Čapek's 1920 play "R.U.R." (Rossum's Universal Robots), where it referred to artificial workers.
Why is any of this "interesting"?
Is the response interesting because you don't know it? —How does it know this?
Is the response interesting because you do know it or might have offered it yourself? —Confirmation bias.
Is it interesting because a lot of training references contextualize it as "interesting?" —Begged question.
Is it contextually interesting? —What is the context? A robot refers to robots? How unexpected...
Is it interesting within the narrow confines of LLM adaptations to a scope of inputs?
Can their by any more damning claim of the general suitability of the technology as an oracle than different users using the same prompts and getting inexplicably contrary results?
If trivial prompt alignments result in appropriate vs inappropriate responses, this destroys confidence for every response.
What am I missing?
Pretty sure the point here was Llama3 respecting the command to not mention that this is interesting, not adding filler, rather than the output fact being interesting or not.
You are missing that this is precisely what we would expect a human to answer without further context (for instance without knowing how much you know about the topic).
A human would pick similarly pick something which isn't too nerdy but also not obvious and the LLM did well here.
If the LLM can fail that is fine, because the task is inherently hard.
1 reply →
The RUR thing is basically because that specific example is used as an example of interesting etymology.
I often encounter fixation, and that would be my immediate thought: negative commands can often cause the LLM to fixate on a term or idea. My first thought would be to try positive examples and avoid a negative command entirely.
If you spent that much time I'm sure you tried this and other things, so maybe even that isn't enough. (Though I assume if you ask for a JSON/function call response with the API that you'd do fine...?)
Not an expert but I sense that it's following a higher OpenAI "built in" prompt that asks it to always include an introductory phrase.
Hence, we do need powerful and less censored LLMs if we want to better integrate LLMs into applications.
No it just seems that it becomes blind, so to speak, to the negatives and the inclusion of the words you were negating makes it more likely to apply them in the positive. This is how ChatGPT has seemed to behave whenever I've tried to get it to not include something.
API driven LLMs on purpose don't implement core features which would enable which you want, for example, negative prompting.
You can negative prompt any LLM with stuff like "always write the word interesting in your response".
You can also use techniques for modifying logprobs of tokens, which is avaialble in gpt-4 api (but is hard to use). You can literally ban "interesting" from its vocabulary.
You could even use representation steering techniques to do this using control vectors. See this library as an example: https://github.com/Hellisotherpeople/llm_steer-oobabooga
Have you tried a simple "No pretext or posttext, return the result in a code block"?
It's part of a larger prompt trying to get it to generate a couple paragraphs that include interesting facts. I want the facts in the context of the paragraphs.
I don't get what this means.
I have 7000 token prompts that simple conclude with "Provide the result adhering to <insert schema> with no pretext or posttext" and it has no problem following that.
Even if you want it to "think" before responding you can embed the thinking inside the JSON
Have you tried feeding the output into another prompt that says something like "remove any mentions of the facts being interesting"?