Comment by storystarling
9 hours ago
Electricity is negligible but the dominant cost is the hardware depreciation itself. Also inference is typically memory bandwidth bound so you are limited by how fast you can move weights rather than raw compute efficiency.
Yes, because the margin is like 80% for Nvidia, and 80% again for the manufacturers like Samsung and TSMC. Once the fixed cost like R and D is amortized the same node technology and hardware capacity could be just few single digit percent of current.