Comment by aerhardt
1 day ago
I’m currently experimenting with prompts of ~300k tokens for a certain classification task and I think I might be able to make it work. GPT5 chokes but Gemini 2.5 Pro is showing promise. Jury’s still out and I might change my tune in a couple of weeks.
It should also be said, that what I say here is focused on things where these models have problems.
For example, I consider the model confused when it starts outputting stereotyped or cliche responses, and I intentionally go at problems that I know that the models have problems with (I already know they can program and do some maths, but I want to see what they can't do). But if you're using them for things they're made for, and which aren't confusing, such as people arguing with each other, then you are probably likely to succeed.
Prompts with lots of examples are reasonable and I know they can get very long.