Comment by bn-l
3 days ago
Hey I hope you see this. The scoring needs to be a 0-10 or something with a range rather than pass or fail. Flux one getting the same score for the surfer as Gemini pro 3 reduces the quality of the benchmark.
3 days ago
Hey I hope you see this. The scoring needs to be a 0-10 or something with a range rather than pass or fail. Flux one getting the same score for the surfer as Gemini pro 3 reduces the quality of the benchmark.
Hi bn-l, yeah as mentioned above and in the Release Notes - we'll be adding a more nuanced numerical score in the next week.
I don't know if I'm going to get as granular as 1-10 only because the finer the scoring - the more potential for subjectivity. That's why it was initially set up as a "Minimum Passing Criteria Rule Set" along with a Pass/Fail grade.
A suggestion from a previous HN post was something along the lines of (0 Fail, 0.5 Technical Pass, 1.0 Proficient Pass).