Comment by pantsforbirds

4 months ago

Does DeepEval allow you to set up custom metrics without an LLM-as-a-judge base?

If I want my result to be a JSON output, and I want to weight the keys based on some specific importance weighting, can I write a Python function/class to calculate and average those weighted scores as a metric for DeepEval?

I do have some annoyances with DSPy, but I think their approach to defining evals is decent.

1 comment

pantsforbirds

jeffreyip 4 months ago

You sure can! A few lines of code is all it takes, and a few simple rules to follow as shown here: https://docs.confident-ai.com/guides/guides-building-custom-...

If you're using DSPy, you can also include it directly in this custom metric from the link above. It's hard for me to say 100% if there are advantages of doing this within DeepEval, but 8/10 times running evals in our ecosystem brings you more benefits than drawbacks. Let me know if you have trouble setting up!