Comment by bonoboTP
1 year ago
This illustrates a different point. This is a variation on a well known riddle that definitely comes up in the training corpus many times. In the original riddle a father and his son die in the car accident and the idea of the original riddle is that people will be confused how the boy can be the doctor's son if the boy's father just died, not realizing that women can be doctors too and so the doctor is the boy's mother. The original riddle is aimed to highlight people's gender stereotype assumptions.
Now, since the model was trained on this, it immediately recognizes the riddle and answers according to the much more common variant.
I agree that this is a limitation and a weakness. But it's important to understand that the model knows the original riddle well, so this is highlighting a problem with rote memorization/retrieval in LLMs. But this (tricky twists in well-known riddles that are in the corpus) is a separate thing from answering novel questions. It can also be seen as a form of hypercorrection.
My codebases are riddled with these gotchas. For instance, I sometimes write Python for the Blender rendering engine. This requires highly non-idiomatic Python. Whenever something complex comes up, LLM's just degenerate to cookie cutter basic bitch Python code. There is simply no "there" there. They are very useful to help you reason about unfamiliar codebases though.
For me the best coding use case is getting up to speed in an unfamiliar library or usage. I describe the thing I want and get a good starting point and often the cookie-cutter way is good enough. The pre-LLM alternative would be to search for tutorials but they will talk about some slightly different problem with different goals etc then you have to piece it together, and the tutorial assumes you already know a bunch of things like how to initialize stuff and skips the boilerplate and so on.
Now sure, actually working through it will give a deeper understanding that might come handy at a later point, but sometimes the thing is really a one-off and not an important point. Like as an AI researcher I sometimes want to draft up a quick demo website, or throw together a quick Qt GUI prototype or a Blender script or use some arcane optimization library or write a SWIG or a Cython wrapper around a C/C++ library to access it in Python, or how to stuff with Lustre, or the XFS filesystem or whatever. Any number of small things where, sure, I could open the manual, do some trial and error, read stack overflow, read blogs and forums, OR I could just use an LLM, use my background knowledge to judge whether it looks reasonable, then verify it, use the now obtained key terms to google more effectively etc. You can't just blindly copy-paste it and you have to think critically and remain in the driver seat. But it's an effective tool if you know how and when to use it.