Comment by spindump8930

17 hours ago

Remember that models on different inference platforms might not necessarily give exactly the same results, adding another axis of non-determinism to development. Things like quantization, custom model serving silicon, batching, or other inference optimizations might mean a model from the original provider performs differently from the hosted one :/

This paper isn't the exact same scenario, since it's an auditable open weight llama model, but shows the symptoms of this: https://arxiv.org/pdf/2410.20247

5 comments

spindump8930

bossyTeacher 16 hours ago

Anyone who has used gpt-x via openai vs microsoft has experienced this very clearly.

energy123 7 hours ago
Which one is better?
- dannyw 5 hours ago
  
  For OpenAI, OpenAI direct has always been better; except maybe early 2023-era when OpenAI Platform was not that stable or reliable yet.
  For Anthropic, it can vary based on model and time. For Opus 4.7, Bedrock is the clear winner in TPS by leaps: https://artificialanalysis.ai/models/claude-opus-4-7/provide...
- weli 6 hours ago
  
  As a rule of thumb inference offered by the model labs are closer to the "true implementation" compared to third parties. They have other problems though.

redsocksfan45 4 hours ago

[dead]