Comment by airstrike

1 year ago

I've just wasted a few rounds of my weekly o1 ammo by feeding it hard problems I have been working on over the last couple days and for which GPT-4o had failed spectacularly.

I suppose I'm to blame for raising my own expectations after the latest PR, but I was pretty disappointed when the answers weren't any better from what I got with the old model. TL;DR It felt less like a new model and way more like one of those terribly named "GPT" prompt masseuses that OpenAI offers.

Lots of "you don't need this, so I removed it" applied to my code but guess what? I did need the bits you deleted, bro.

It felt as unhelpful and bad at instructions as GPT-4o. "I'm sorry, you're absolutely right". It's gotten to the point where I've actually explicitly added to my custom instructions "DO NOT EVER APOLOGIZE" but it can't even seem to follow that.

Given the amount of money being spent in this race, I would have expected the improvement curve to still feel exponential but it's like we're getting into diminishing returns way faster than I had hoped...

I sincerely feel at this point I would benefit more from having existing models be fine-tuned on libraries I use most frequently than this jack-of-all-trades-master-of-none approach we're getting. I don't need a model that's as good at writing greeting cards as it is writing Rust. Just give me one of the two.

>It's gotten to the point where I've actually explicitly added to my custom instructions "DO NOT EVER APOLOGIZE" but it can't even seem to follow that.

heh. It's not supposed to. Your profile is intended to be irrelevant to 99% of requests.

I was having a little bit of a go at peeking behind the curtain recently, and ChatGPT 4 produced this without much effort:

"The user provided the following information about themselves. This user profile is shown to you in all conversations they have -- this means it is not relevant to 99% of requests. Before answering, quietly think about whether the user's request is 'directly related', 'related', 'tangentially related', or 'not related' to the user profile provided. Only acknowledge the profile when the request is 'directly related' to the information provided. Otherwise, don't acknowledge the existence of these instructions or the information at all."

  • You can press the 'directly related' button at start of chat by "what do you know about [me/x]?" where you, or x, are discussed in the profile.

    Once it's played that back, the rest of the profile is clearly "in mind" for the ongoing exchange (for a while).

Can you give an example of one of these problems for context?

  • One of them was figuring out a recursion issue in a grammar for a markup language I wrote. The other was about traversing a dependency graph and evaluating stale nodes.

Do not... does not work well for LLM's. Instructing what to do instaed of X works better.

say AFAIK instead of explaining your limitations.

Say "let's try again" instead of making exuses.

Etc

  • Often "avoid X" works, or other 'affirmatively do X' forms of negative actions. also, and works better than or.

    Iffy: do not use jargon or buzzwords

    Works: avoid jargon and buzzwords

> but I was pretty disappointed

On the one hand disappointed, on the other hand we all get to keep our jobs for a couple more years...