Comment by theanonymousone

11 hours ago

Wouldn't it be much more useful if the request received raw input (i.e. before feature extraction), and not the feature vector?

2 comments

theanonymousone

marcyb5st 9 hours ago

You can do that with Onnx. You can graft the preprocessing layers to the actual model [1] and then serve that. Honestly, I already thought that ONNX (CPU at least) was already low level code and already very optimized.

@Author - if you see this is it possible to add comparisons (ie "vanilla" inference latencies vs timber)?

[1] https://gist.github.com/msteiner-google/5f03534b0df58d32abcc... <-- A gist I put together in the past that goes from PyTorch to ONNX and grafts the preprocessing layers to the model, so you can pass the raw input.

kossisoroyce 9 hours ago

I'll check this out as soon as I am at my desk.