Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library

Comment by _peregrine_

2 months ago

Yeah I mean SQL is pretty nuanced - one of the things we want to improve in the benchmark is how we measure "success", in the sense that multiple correct SQL results can look structurally dissimilar while semantically answering the prompt.

There's some interesting takeaways we learned here after the first round: https://www.tinybird.co/blog-posts/we-graded-19-llms-on-sql-...

0 comments

_peregrine_

Reply

No comments yet

Contribute on Hacker News ↗

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities