← Back to context

Comment by marcyb5st

21 days ago

I am a solution engineer mostly on the traditional ML side of things but have good knowledge of K8S/GKE. The most fun I had last year was helping a customer serve their models at scale. They thought it was cost prohibitive (500k inferences/second and a hard requirement of 7ms at p99) and so they were basically serving from a cache which was lossy (the combinatorial explosion of features made it so that to have full coverage you needed exabytes of ram) and was stale prone. We focused on the serving first. After their data scientists trained a New pytorch model (small one, 50k parameters more or less) we compiled to onnx (as the model is small and CPU inference is actually faster), grafted the preprocessing layers to the model so that you never leave the ONNX C++ runtime (to avoid python), and deployed it to GKE. A 8 core node using AMD genoa cpus managed to get 25k/inferences per second. After a bit of fiddling with Numa affinity, GKE DNS replication, Triton LRU caches and few other things we managed to hit 30k inferences per second. If you scale up to the traffic it would cost them few thousands per month, which is less than their original cache approach.

Now they are working on continuous learning so that they can roll out new model (it is a very adversarial line of business and the models get stale in O(hours)). For that part I only helped them design the thing, no hands on. It was a super fun engagement TBH

Are they paying you as well as your comment makes it sound? That was a ton of lingo and I'm used to lingo!

  • Yeah :) happy on the money front. I didn't mention this earlier, but I'm a Googler and my role is to make sure that big customers are as happy as possible on GCP. And Google still pays well (talking with my SWE friends my total comp lands around the middle of the pack, perhaps a bit fewer stock grants). I was a SWE (also at Google) before, so maybe they didn't change my comp too much in the new job family. I don't know as those things are mysterious.

    Also, not all projects are this fun. Sometimes is solving the same problem over and over or working with customers that aren't tech savvy and there is a bunch of politicking and "fluffy" stuff.