Comment by KaoruAoiShiho

2 years ago

Sadly it's 3.5 quality, :(

18 comments

KaoruAoiShiho

Lol that's why it's hidden in a PDF.

They basically announced GPT 3.5, then. Big woop, by the time Ultra is out GPT-5 is probably also out.

dmix 2 years ago
Isn't having GPT 3.5 still a pretty big deal? Obviously they are behind but does anyone else offer that?
3.5 is still highly capable and Google investing a lot into making it multi modal combined with potential integration with their other products makes it quite valuable. Not everyone likes having to switch to ChatGPT for queries.
- DebtDeflation 2 years ago
  
  Yeah, right now the leaderboard is pretty much: GPT4 > GPT 3.5 > Claude > Llama2. If Google just released something (Gemini Pro) on par with GPT 3.5 and will release something (Gemini Ultra) on par with GPT 4 in Q1 of next year while actively working on Gemini V2, they are very much back in the game.
  
  2 replies →
- Keyframe 2 years ago
  
  Obviously they are behind but does anyone else offer that?
  Claude by Anthropic is out and offers more and is being actively used
- generalizations 2 years ago
  
  I thought there were some open-source models in the 70-120B range that were GPT3.5 quality?
  
  2 replies →
satchlj 2 years ago
Yup, it's all a performance for the investors
- Racing0461 2 years ago
  
  +1. The investors are the customers of this release, not end users.

Table 2 indicates Pro is generally closer to 4 than 3.5 and Ultra is on par with 4.

caesil 2 years ago

If you think eval numbers mean a model is close to 4, then you clearly haven't been scarred by the legions of open source models which claim 4-level evals but clearly struggle to actually perform challenging work as soon as you start testing
Perhaps Gemini is different and Google has tapped into their own OpenAI-like secret sauce, but I'm not holding my breath
KaoruAoiShiho 2 years ago
Ehhh not really, it even loses to 3.5 on 2/8 tests. For me it feels pretty lackluster considering I'm using GPT-4 probably close to 100 times or more a day and it would be a huge downgrade.
- glenstein 2 years ago
  
  Pro is approximately in the middle between GPT 3.5 and GPT 4 on four measures (MMLU, BIG-Bench-Hard, Natural2Cod, DROP), it is closer to 3.5 on two (MATH, Hellaswag), and closer to four on the remaining two (GSM8K, HumanEval). Two one way, two the other way, and four in the middle.
  So it's a split almost right down the middle, if anything closer to 4, at least if you assume the benchmarks to be of equal significance.
  
  1 reply →