Comment by GeekyBear

12 hours ago

In previous generations, throughout was excellent for an integrated GPU, but the time to first token was lacking.

4 comments

GeekyBear

So throughput was already good but TTFT was the metric that needed more improvement?

zamadatix 11 hours ago

To add to the sibling "good is relative" it also depends what you're running, not just your relative tolerances of what good is. E.g. in a MoE the decode speedup means the speed of prompt processing delay is more noticeable for the same size model in RAM.
convenwis 11 hours ago

Good is relative but first token was clearly the biggest limitation.
brookst 3 hours ago

Yeah TTFT was terrible. I don’t think it’s unreasonable to benchmark the most-improved metric.