← Back to context Comment by vvladymyrov 4 years ago Are you running inference on CPU or GPU? 2 comments vvladymyrov Reply ramoz 4 years ago CPU, GPU doesn't work out well in our case. There is the data transfer cost as well as memory constraint & the model blows up in memory for every inference call. ramoz 4 years ago x2 Gunicorn workers, MKL mapped to half physical cores for each... for some reason the model (tensorflow) performs better on half vs 1 worker
ramoz 4 years ago CPU, GPU doesn't work out well in our case. There is the data transfer cost as well as memory constraint & the model blows up in memory for every inference call. ramoz 4 years ago x2 Gunicorn workers, MKL mapped to half physical cores for each... for some reason the model (tensorflow) performs better on half vs 1 worker
ramoz 4 years ago x2 Gunicorn workers, MKL mapped to half physical cores for each... for some reason the model (tensorflow) performs better on half vs 1 worker
CPU, GPU doesn't work out well in our case. There is the data transfer cost as well as memory constraint & the model blows up in memory for every inference call.
x2 Gunicorn workers, MKL mapped to half physical cores for each... for some reason the model (tensorflow) performs better on half vs 1 worker