← Back to context

Comment by synthetic-echo

1 day ago

Just an anecdote and not necessarily generalizable, but I can at least give one example:

I'm in academia doing ML research where, for all intents and purposes, we work exclusively in Python. We had a massive CSV dataset which required sorting, filtering, and other data transformations. Without getting into details, we had to rerun the entire process when new data came in roughly every week. Even using every trick to speed up the Python code, it took around 3 days.

I got so annoyed by it that I decided to rewrite it in a compiled language. Since it had been a few years since I've written any C/C++, which was only for a single class in undergrad and I remember very little of, I decided to give Go a try.

I was able to learn enough of the language and write up a simple program to do the data processing in less than a few hours, which reduced the time it took from 3+ days to less than 2 hours.

I unfortunately haven't had a chance or a need to write any more Go since then. I'm sure other compiled, GC languages (e.g., Nim) would've been just as productive or performant, but I know that C/C++ would've taken me much longer to figure out and would've been much harder to read/understand for the others that work with me who pretty much only know Python. I'm fairly certain that if any of them needed to add to the program, they'd be able to do so without wasting more than a day to do so.

Did you try scipy/numpy or any python library with a compiled implementation before picking up Go?

  • Of course, but the dataset was mostly strings that needed to be cross-referenced with GIS data. Tried every library under the sun. The greatest speed up I got was using polars to process the mostly-string CSVs, but didn't help much. With that said, I think polars was also just released when we were working with that dataset and I'm sure there's been a lot of performance improvements since then.