← Back to context

Comment by cadamsdotcom

4 months ago

Neat! LLM-as-judge use cases could benefit from this too.

Normally you’d ask the judge LLM to “rate this output out of 5” or whatever the best practice is this week.

Vectorizing the output you’re trying to judge, then judging on semantic similarity to a desired output - instead of asking a judge “how good was this output” - avoids so many challenges. Instead of a “rating out of 5” you get more precise semantic similarity and you get it faster.

No doubt obvious to folks in the space, but seemed like a huge insight to me.