Comment by billythemaniam
2 years ago
Let's talk about what ChatGPT (or fine-tuned GPT-3) actually is and what it is not. It is a zero-shot or few-shot model that is pretty good at a variety of NLP tasks. Playing tic tac toe or chess is not a traditional NLP task so shouldn't expect it to be good at that. But board games can be completely played in a text format so it is not unexpected either that it can kinda play a board game.
If GPT-3 was listed on Huggingface, its main category listing would be a completion model. Those models tend to be good at generative NLP tasks like creating a Shakespeare sonnet about French fries. But they tend not to be as good at similarity tasks, used by semantic search engines, as models specifically trained for those tasks.
That's a core problem with this. If people with expertise can't even tell us clear boundaries of its truth, how is anyone else going to come to rely on this for that purpose. I mean, you could say you defined a fuzzy boundary and I shouldn't trend towards that boundary from the wrong direction (re: text games that use different tokens than the ones it was trained on) but, how will I know if I'm too close to this boundary when I'm trending from a direction of doing things it's supposed to be good at?
It can't play tic tac toe, fine. But I know it gets concepts wrong on things I'm good at. I've seen it generate a lot of sentences that are correct on their own, but when you combine them to form a bigger picture, it paints something fundamentally different than what's going on.
Moreover, I've had terrible results with it as something to generate creative writing; to the extent that it's on par with a lazy secondary school student that only knows a rudimentary outline of what they're writing about. For example, I asked it to generate a debate between Chomsky and Trump and it gives me a basic debate format around a vague outline of their beliefs where they argue respectfully and blandly (both of which Trump is not known for).
It's entirely possible I haven't exercised it enough and that it requires more than the hours I put into it or it just doesn't work for anything I find interesting.
I agree that the state of the art isn't ready yet for general consumption. I think GPT-3, etc are good enough to help with a wide range of tasks with guardrails. To clear, I don't mean guardrails around racist language, etc which is a separate topic. Rather guardrails around when to use the results because of limitations and accuracy.
For example, let's say you have a website that sells clothes and you want to make the site search engine better. Let's also say that a lot of work has been done to make the top 100 queries return relevant results. But the effort required to get the same relevance for the long tail of unique queries, think misspellings and unusual keywords, doesn't make sense. However you still want to provide a good search experience so you can turn to ML for that. Even if the model only has a 60% accuracy, that's still a lot better than 0% accuracy. So applying ML queries outside the top 100, should improve the overall search experience.
ChatGPT/GPT-3 has an increased the number of areas where ML can be used but it still has plenty of limitations.