← Back to context

Comment by irthomasthomas

1 day ago

You are experiencing the jagged skills frontier. All models have these weird skill gaps and prompt phrasing sensitivity. This is the main problem solved by an llm-consortium. It's expensive running multiple models in parallel for the same prompt, but the time saved is worth it for gnarly problems. It fills in the gaps between models to tame the jagged frontier.

My very first use of the llm-consortium saw me feeding in it's own source code to look for bugs. It surfaced a serious bug which only one out of the three models had spotted. Lots of problems are NP-ish so parallel sampling works really well. Googles IMO gold and openais IOI gold both used parallel reasoning of some sort.