← Back to context

Comment by ramoz

4 years ago

CPU, GPU doesn't work out well in our case. There is the data transfer cost as well as memory constraint & the model blows up in memory for every inference call.

x2 Gunicorn workers, MKL mapped to half physical cores for each... for some reason the model (tensorflow) performs better on half vs 1 worker