Comment by esquire_900

13 hours ago

Cost wise it does not seem very effective. .5 token / sec (the optimized one) is 3600 tokens an hour, which costs about 200-300 watts for an active 3090+system. Running 3600 tokens on open router @.4$ for llama 3.1 (3.3 costs less), is about $0,00144. That money buys you about 2-3 watts (in the Netherlands).

Great achievement for privacy inference nonetheless.

7 comments

esquire_900

teo_zero 12 hours ago

I think we use different units. In my system there are 3600 seconds per hour, and watts measure power.

IsTom 9 hours ago
OP probably means watt-hours.
- dotancohen 4 hours ago
  
  And 0.5 tokens/s should work out to 1800 tokens at the end of the hour. Not 3600 as stated.

qoez 6 hours ago

Open router is highly subsidized. This might be cheaper in the long run once these companies shift to taking profits

Aerroon 13 hours ago

Something to consider is that input tokens have a cost too. They are typically processed much faster than output tokens. If you have long conversations then input tokens will end up being a significant part of the cost.

It probably won't matter much here though.

thatwasunusual 8 hours ago

> Cost wise it does not seem very effective.

Why is this so damn important? Isn't it more important to end up with the best result?

I (in Norway) use a homelab with Ollama to generate a report every morning. It's slow, but it runs between 5-6 am, energy prices are at a low, and it doesn't matter if it takes 5 or 50 minutes.

xienze 1 hour ago

> Why is this so damn important? Isn't it more important to end up with the best result?
You’re wondering why someone would prefer to get the same or better result in less time for less money?