Comment by Gakho

3 days ago

Congrats on launching. I've noticed that switching prompts without edits between different LLM providers has degradation on performance. I'm wondering if you guys have noticed how developers do these "translations", I'm wondering since maybe your eval framework might have data for best practices.

Yeah, this is something we've heard as well. No particular feature right now but we did ship an agent in local dev to help people improve their prompts.

  • I'm wondering since there seem to be a lot of frameworks/websites that support evals, even OpenAI has evals.

    Do you think that a lot of these components like observability and evals will eventually be consumed by either providers (like OpenAI) or an orchestration framework like Mastra (when using multiple providers, though even if you're using just one provider for many tasks I can see it belonging to the orchestration framework)?

    • I could be wrong but don't think OpenAI wants to be opinionated about that, except maybe the OpenAI solutions engineers :)