Comment by PaulShin

9 months ago

Great question. We're building Markhub, an AI-powered collaboration OS, and our stack is a hybrid one, because we believe the "best" model depends entirely on the task.

1. For Heavy, Complex Tasks (Summarization, Code Gen, Creative Work): We don't self-host. The performance of top-tier models is still unmatched. We use Gemini-based models via Google's Vertex AI. The reliability and raw power for complex reasoning are worth the API cost for these critical features.

2. For Fast, Specific, Private Tasks (Our Self-Hosted Stack): For smaller, high-frequency tasks like classifying feedback types or extracting specific keywords from a conversation, we use a self-hosted stack for speed and cost-efficiency.

Models: We use fine-tuned versions of smaller, open-source models like Llama 3 8B or Mistral 7B. They are incredibly fast and cost-effective for specific, repetitive tasks. Runtime/Orchestration: We use LangChain for chaining prompts and managing workflows. For serving the model, we're using a simple FastAPI server running in a Docker container. Hardware: We run this on a dedicated GPU instance (like an A10G on AWS/GCP) for inference. The cost is predictable and much lower than using a large model for every small task. My takeaway: The "go-to stack" in 2025 isn't one-size-fits-all. It's a pragmatic, hybrid approach using the bestin class cloud APIs for the heavy lifting, and deploying fast, fine-tuned open-source models for everything else.

0 comments

PaulShin

No comments yet

Contribute on Hacker News ↗