Comment by vvladymyrov

4 years ago

Are you running inference on CPU or GPU?

2 comments

vvladymyrov

CPU, GPU doesn't work out well in our case. There is the data transfer cost as well as memory constraint & the model blows up in memory for every inference call.

ramoz 4 years ago

x2 Gunicorn workers, MKL mapped to half physical cores for each... for some reason the model (tensorflow) performs better on half vs 1 worker