Comment by rawrawrawrr
6 months ago
Nice! It would be a better benchmark to compare this prompt (w/ gpt-4o, claude) with whatever the original model was compared to.
6 months ago
Nice! It would be a better benchmark to compare this prompt (w/ gpt-4o, claude) with whatever the original model was compared to.
It appears that they have. [0]
[0] https://x.com/mattshumer_/status/1831767017507954808