Show HN: OpenGem – Free, self-healing load-balanced proxy for Google Gemini API

2 hours ago (github.com)

Hi HN!

I loved the official Gemini CLI authentication method to get standard Google Gemini AI access, but a single free Google account quota depletes fast.

I wanted to just build and prototype freely. So, I built OpenGem. It essentially turns your idle accounts into your own personal API provider, so developing and scaling side projects is no longer a problem.

GitHub: https://github.com/arifozgun/OpenGem

What it does: OpenGem acts as a standard drop-in replacement endpoint (POST /v1beta/models/{model}). Behind the scenes, it's a smart load balancer. You connect multiple idle/free Google accounts to the dashboard via standard OAuth, and OpenGem routes your traffic to the least-used account.

How it handles limits gracefully: If an account legitimately hits a true 429 quota limit, OpenGem instantly detects it, puts that specific account on a 60-minute cooldown, and seamlessly retries your request with the next available account. The accounts are completely self-healing—a background probe checks them every 30 seconds to see if they've recovered.

It doesn't just blindly retry, either. It uses an 8-category error classifier (50+ regex patterns) to distinguish between a brief rate-limit burst and actual quota exhaustion, and applies exponential backoff with jitter to prevent hammering the servers.

Tech specs:

- 100% compatible with official Google SDKs (@google/genai), LangChain, and standard SSE streaming. - Full support for native "tools" (Function Calling) for agentic workflows. - Raised internal payload limit to 50MB so you can throw huge documents at it. - Has a 3-model fallback chain (flash → pro → pro-3.1) if a specific model is overloaded. - AES-256-GCM encryption for all sensitive configs and OAuth tokens at rest. - Toggle between Firebase Firestore or a fully offline Local JSON database. - It’s completely open-source (MIT licensed) and written in TypeScript. It’s strictly for educational purposes and personal research to bypass the friction of prototyping.

I’m currently running it with my own side projects and it handles agent tasks flawlessly. I would love any feedback on the load balancing and self-healing logic, or just general thoughts!