Comment by stijntonk
17 hours ago
No, for clients we use paid Vertex AI accounts. We often need to host workloads in an EU region, which rules out “global” models (and probably better capacity).
In the past, we used a wrapper that round-robined across multiple projects to get enough quota. Luckily, many of our workloads are workflow-style tasks, so we can simply keep retrying on 429s.
Fun fact: for one of their services, I think it was Stitch, I noticed that my paid key kept hitting quota, while the free worked fine. That blew my mind.
I've been seeing the same in my product; 429s in vertex.
We generally avoid any Google AI for the most part because it's so unreliable.