Comment by lern_too_spel
9 days ago
If you want that to get better, you need to produce a 3d model benchmark and popularize it. You can start with a pelican riding a bicycle with working bicycle.
9 days ago
If you want that to get better, you need to produce a 3d model benchmark and popularize it. You can start with a pelican riding a bicycle with working bicycle.
I am building pretty much the same product as OP, and have a pretty good harness to test LLMs. In fact I have run a tons of tests already. It’s currently aimed for my own internal tests, but making something that is easier to digest should be a breeze. If you are curious: https://grandpacad.com/evals
building a benchmark is a great idea, thanks, maybe I will have a couple of days to spend on this soon