Comment by Vampiero

5 months ago

250/s is still nothing when compared to an actual NLP pipeline that takes a few ms per it, because you can parallelize that too.

I know it's hard to understand, but you can achieve a throughput that is a few orders of magnitude higher.