Comment by zozbot234

2 years ago

How is that any different from what these AI chatbots are doing? They make stuff up that they predict will be rewarded highly by humans who look at it. This is exactly what leads to truisms like "rubber duckies are made of a material that floats over water" - which looks like it should be correct, even though it's wrong. It really is no different from Facebook memes that are devised to get a rise out of people and be widely shared.

Because we shouldn't be striving to make mediocrity. We should be striving to build better. Unless the devs of the bots are wanting to have a bot built on trying to deceive people, I just don't see the purpose of this. If we can "train" a bot and fine tune it, we should be fine tuning truth and telling it what absolutely is bullshit.

To avoid the darker topics to keep the conversation on the rails, if there were a misinformation campaign that was trying to state that the Earth's sky is red, then the fine tuning should be able to inform that this is clearly fake so when quoting this it should be stated as incorrect information that is out there. This kind of development should be how we can clean up the fake, but nope, we're seemingly quite happy at accepting it. At least that's how your question comes off to me.

  • Sure, but current AI bots are just following the human feedback they get. If the feedback is naïve enough to score the factoid about rubber duckys as correct, guess what, that's the kind of thing these AI's will target. You can try to address this by prompting them with requests like "do you think this answer is correct and ethical? Think through this step by step" ('reinforcement learning from AI feedback') but that's very ad hoc and uncertain - ultimately, the humans in the loop call the shots.

    • At the end of the day, if there is no definitive answer to a question, it should respond in such a manner. "While there are compelling reasons to think A or B, neither A nor B have been verified. They are just the leading theories." That would be a much better answer than "Option A is the answer even if some people think B is." when A is just as unproven as B, but because it answers so definitively, people think it is the right answer.

      So the labels thing is something that obviously will never work. But the system has all of the information it needs to know if the question is definitively answerable. If it is not, do not phrase the response definitively. At this point, I'd be happy if it responded to "Is 1+1 = 2?" with a wish washy answer like, "Most people would agree that 1+1 = 2", and if it wanted to say "in base 10, that is the correct answer. however, in base 2, the 1+1 = 10" would also be acceptable. Fake it till you make it is not the solution here.