Comment by gwern
3 months ago
What people are discovering with the latest models is that often their errors are due to entirely reasonable choices and assumptions... which happen to be wrong in your specific case. They call a library you don't have installed, or something like that. Short of inventing either telepathy or spice which can allow LLMs to see the future, it will increasingly be the case that you cannot use the best models efficiently without giving them extensive context. Writing 'reports' where you dump in everything even tangentially relevant is the obvious way to do so, and so I would expect future LLMs to be even more so than o1-preview/pro.
I get much better output from o1* models when I dump a lot of context + leave a detailed but tightly scoped prompt with minimal ambiguity. Sometimes I even add - don’t assume, ask me if you are unsure. What I get back is usually very very high quality. To the point that I feel my 95th percentile coding skills have diminishing returns. I find that I am more productive researching and thinking about the what and leaving the how (implementation details) to the model - nudging it along.
One last thing, anecdotally - I find that it’s often better to start a new chat after implementing a chunky bit/functionality.
Yes, I've tried out both: ordering it to ask me questions upfront, and sometimes restarting with an edited 'report' and a prototype implementation for a 'clean start'. It feels like it sometimes helps... but I have no numbers or rigorous evidence on that.
The economics of the deal over the long term are exponentially more critical than the performance in the short term, right?
In that context, how is convincing intelligent people to pay OpenAI to help train their own replacements while agreeing not to compete with them anything but the biggest, dumbest, most successful nerd snipe in history?
Dumping more context just implies getting brain raped even harder. Y’all are the horses paying to work at the glue factory. “Pro” users paying extra for that privilege, no thank you!
Maximum likelihood training tinges, nay, corrupts, everything it touches. That’s before you pull apart the variously-typed maximum likelihood training processes that the artifice underwent..
Your model attempts to give you a reasonably maximum likelihood output (in terms of kl-ball constrained preference distributions not too far from language), and expects you to be the maximum likelihood user (since its equilibriation is intended for the world in which you the user are just like the people who ended up in the training corpus) for which the prompt that you gave would be a maximum likelihood query (implying that there are times it’s better to ignore you-specific contingencies in your prompt to instead rather re-envision your question instead as being a noisily worded version of a more normal question).
I think there are probably some ways to still use maximum likelihood but you switch out over the ‘what’ that is being assumed as likely - eg models that attenuate dominant response strategies as needed by the user, and easy ux affordances for the user to better and more fluidly align the model with their own dispositional needs.
MLE is a basic statistical technique. Feel free to go REEEE when Amazon recommends products.
Exactly, it trivializes the actual structure of the real world problem setup. It’s ML’s spherical cow.
Alternatively, we can standardize the environment. It takes humans weeks to adapt to a new interface or starting point. Why would ai be different?