Comment by esquire_900

4 hours ago

Cost wise it does not seem very effective. .5 token / sec (the optimized one) is 3600 tokens an hour, which costs about 200-300 watts for an active 3090+system. Running 3600 tokens on open router @.4$ for llama 3.1 (3.3 costs less), is about $0,00144. That money buys you about 2-3 watts (in the Netherlands).

Great achievement for privacy inference nonetheless.

3 comments

esquire_900

teo_zero 3 hours ago

I think we use different units. In my system there are 3600 seconds per hour, and watts measure power.

IsTom 5 minutes ago

OP probably means watt-hours.

Aerroon 4 hours ago

Something to consider is that input tokens have a cost too. They are typically processed much faster than output tokens. If you have long conversations then input tokens will end up being a significant part of the cost.

It probably won't matter much here though.