Comment by Keeeeeeeks

1 day ago

There are a number of projects working on evals that can check how 'smart' a model is, but the methodology is tricky.

One would want to run the exact same prompt, every day, at different times of the day, but if the eval prompt(s) are complex, the frontier lab could have a 'meta-cognitive' layer that looks for repetitive prompts, and either: a) feeds the model a pre-written output to give to the user b) dumbs down output for that specific prompt

Both cases defeat the purpose in different ways, and make a consistent gauge difficult. And it would make sense for them to do that since you're 'wasting' compute compared to the new prompts others are writing.

1 comment

Keeeeeeeks

hex4def6 21 hours ago

I think you could alter the prompt in subtle ways; a period goes to an ellipses, extra commas, synonyms, occasional double-spaces, etc.

Enough that the prompt is different at a token-level, but not enough that the meaning changes.

It would be very difficult for them to catch that, especially if the prompts were not made public.

Run the variations enough times per day, and you'd get some statistical significance.

The guess the fuzzy part is judging the output.