Comment by Gakho

9 months ago

Congrats on launching. I've noticed that switching prompts without edits between different LLM providers has degradation on performance. I'm wondering if you guys have noticed how developers do these "translations", I'm wondering since maybe your eval framework might have data for best practices.

5 comments

Gakho

calcsam 9 months ago

Yeah, this is something we've heard as well. No particular feature right now but we did ship an agent in local dev to help people improve their prompts.

Gakho 9 months ago
I'm wondering since there seem to be a lot of frameworks/websites that support evals, even OpenAI has evals.
Do you think that a lot of these components like observability and evals will eventually be consumed by either providers (like OpenAI) or an orchestration framework like Mastra (when using multiple providers, though even if you're using just one provider for many tasks I can see it belonging to the orchestration framework)?
- calcsam 9 months ago
  
  I could be wrong but don't think OpenAI wants to be opinionated about that, except maybe the OpenAI solutions engineers :)
swyx 9 months ago
link to this agent?
- calcsam 9 months ago
  
  demo: https://x.com/calcsam/status/1889856384549982419