Comment by DrPhish

6 months ago

Do you need realtime results, or is an ongoing queue of article analysis good enough? Have you considered running your own hardware with a frontier MoE model like deepseek v3? It can be done for relatively low cost on CPU depending on your inference speed needs. Maybe a hybrid approach could at least reduce your API spend?

source: I run inference locally and built the server for around $6k. I get upwards of 10t/s on deepseek v3

PS: thank you for running this service. I've been using it casually since launch and find it much better for my mental health than any other source of news I've tried in the past.

3 comments

DrPhish

yakhinvadim 6 months ago

Thank you so much! Always glad to see long-time readers.

There was a period when I considered switching to an open-source model, but every time I was ready for a switch, OpenAI released a smarter and often cheaper model that was just too good to pass up.

Eventually I decided that the potential savings are not worth it in the long term - it looks like LLMs will only get cheaper over time and the cost of inference should become negligible.

DrPhish 6 months ago
Thanks for the reply! This is perhaps not so much a Hacker News type question since this place is very VC focused, but have you considered publishing any papers on your system? I think it would make a fascinating and valuable bit of research.
Or, even farther off the deep-end: have you considered open-sourcing any old versions of your prompts or pipeline? Say one year after they are superseded in your production system?
- yakhinvadim 6 months ago
  
  I don't oppose these ideas, but it's a matter of priorities. There's just many other features and improvements to implement that seem more valuable.