Comment by TeMPOraL

2 days ago

Maybe the prompt you used was more Claude-friendly than Gemini-friendly?

I'm only half-joking. Different models process their prompts differently, sometimes markedly so; vendors document this, but hardly anyone pays any attention to it - everyone seems to be writing prompts for an idealized model (or for whichever one they use the most), and then rate different LLMs on how well they respond.

Example: Anthropic documents both the huge impact of giving the LLM a role in its system prompt, and of structuring your prompt with XML tags. The latter is, AFAIK, Anthropic-specific. Using it improves response quality (I've tested this myself), and yet as far I've seen, no BYOK tool offering multiple vendor support respects or leverages that.

Maybe Gemini has some magic prompt features, too? I don't know, I'm in the EU, and Google hates us.

Possibly. But my Claude prompts work fine on ChatGPT, the only difference being ChatGPT isn't very good. I pay for both.

I would not pay for Gemini - which is presumably why they've added it for "free" for everyone.

My anthropic prompts in the API are structured. I've got one amazing API prompt that has 67 instructions, and gives mind-blowing results (to the point that it has replaced a human) but for a simple question I don't find value in that. And, frankly, 'consumer'-facing AI chatbots shouldn't need prompting expertise for basic out of the box stuff.

The prompt I used in this example was simply "Please extract the data points contained within this report and present as structured data"

> and yet as far I've seen, no BYOK tool offering multiple vendor support respects or leverages that

When you say BYOK tool do you mean effectively a GUI front end on the API? I use typingmind for quickly throwing things at my API keys for testing, and I'm pretty sure you can have a persistent custom system prompt, though I think you'd need to input it for each vendor/model.

  • > When you say BYOK tool do you mean effectively a GUI front end on the API?

    Less that, and more focused tools like e.g. Aider (OSS Cursor from before Cursor was a thing).

    I use TypingMind almost exclusively for any and all LLM chatting, and I do maintain a bunch of Claude-optimized prompts that specifically exploit the "XML tags" feature (some of them I also run through the Anthropic's prompt improver) -- but I don't expect the generic frontends to care about vendor-specific prompting tricks by default. Here, my only complaint is that I don't have control over how it injects attachments, and inlined text attachments in particular are something Anthropic docs recommend demarking with XML tags, which TypingMind almost certainly doesn't do. I'd also love for the UI to recognize XML tags in output and perhaps offer some structuring or folding on the UI side, e.g. to auto-collapse specified tags, such as "<thinking>" or "<therapeuticAnalysis>" or whatever I told the LLM to use.

    (Oh, and another thing: Anthropic recently introduced a better form of PDF upload, in which the Anthropic side handles simultaneously OCR-ing and imaging the PDF and feeding it to the model, to exploit its multimodal capabilities. TypingMind, as far as I can tell, still can't take advantage of it, despite it boiling down to an explicit if/else on the model vendor.)

    No, I first and foremost mean the more focused tools, that generalize across LLMs. Taking Aider as an example, as far as I can tell, it doesn't have any special handling for Anthropic, meaning it doesn't use XML tags to mark up the repo map structure, or demarcate file content or code snippets it says, or to let the LLM demarcate diffs in reply, etc. It does its own model-agnostic thing, which means that using Claude 3.5 Sonnet, I lose out on model performance boost it's not taking advantage of.

    I singled out Aider, but there's plenty of tools and plugins out there that utilize some common LLM portability libraries, and end up treating every LLM the same way. The LLM portability libraries however are not the place to solve it - by their nature, they target the lowest common denominator. Those specialized tools should be doing it IMO, and it's not even much work - it's a bunch of model-based if/elses. Might not look pretty, but it's not a maintenance burden.