Comment by hliyan

2 years ago

I've written this comment before: in 2007, there was a period where I used to run an entire day's worth of trade reconciliations of one of the US's primary stock exchanges on my laptop (I was on-site engineer). It was a Perl script, and it completed in minutes. A decade later, I watched incredulously as a team tried to spin up a Hadoop cluster (or Spark -- I forget which) over several days, to run a work load an order of magnitude smaller.

20 comments

hliyan

jjav 2 years ago

> over several days, to run a work load an order of magnitude smaller

Here I sit, running a query on a fancy cloud-based tool we pay nontrivial amounts of money for, which takes ~15 minutes.

If I download the data set to a Linux box I can do the query in 3 seconds with grep and awk.

Oh but that is not The Way. So here I sit waiting ~15 minutes every time I need to fine tune and test the query.

Also, of course the query now is written in the vendor's homegrown weird query language which is lacking a lot of functionality, so whenever I need to do some different transformation or pull apart data a bit differently, I get to file a feature request and wait a few month for it to be implemented. On the linux box I could just change my awk parameters a little bit (or throw perl in the pipeline for heavier lifting) and be done in a minute. But hey at least I can put the ticket in blocked state for a few months while waiting for the vendor.

Why are we doing this?

jdksmdbtbdnmsm 2 years ago
>Why are we doing this?
someone got promoted
- eddd-ddde 2 years ago
  
  Oh how true this is. At my current work we use _kubernetes_ for absolutely no reason at all other than the guy in charge of infra wanted to learn it.
  The result? 1. I don't have access to basic logs for debugging because apparently the infra guy would have to give me access to the whole cluster. 2. Production ends up dying from time to time because apparently they don't know how to set it up. 3. The boss likes him more because he's using big boy tools.

liveoneggs 2 years ago

yeah but who was getting better stuff on their resume? didn't you get the memo about perl?

Just because your throw-away 40 line script worked from cron for five years without issue doesn't mean that a seven node hadoop cluster didn't come with benefits. You got to write in a language called "pig"! so fun.

jan_Sate 2 years ago
I still think that it'd be easier to maintain the script that runs on a single computer than to maintain a hadoop cluster.
- rbanffy 2 years ago
  
  The resume would look better if you used Python and Polars ;-)
  
  2 replies →
- 2-718-281-828 2 years ago
  
  s/he was obviously joking
asdffdasasdf 2 years ago
maybe we should all start to add "evaluated a hadoop cluster for X applications and saved the company 1mi (in time, headcount, and uptime) a year going with a 40line perl script"
- Breza 2 years ago
  
  I like this idea. And something similar for evaluating blockchains and sticking with a relational database instead.
RcouF1uZ4gsC 2 years ago

> yeah but who was getting better stuff on their resume? didn't you get the memo about perl?
That is why Rust is so awesome. It still allows me to get stuff in my resume, but still make an executable that runs on my laptop with high performance.
ramon156 2 years ago
Id love to hear what the benefits are to using a framework for the wrong purpose
- Filligree 2 years ago
  
  Resume entries!
  
  2 replies →
jasfi 2 years ago
There was a time, about 10 years ago, when Hadoop/Spark was on just about every back-end job post out there.
- Breza 2 years ago
  
  I was in the field at the time and I agree. I thought it had to be what the big boys used. Then I realized that my job involves huge amounts of structured data and our MySQL instance handled everything quite well.

forinti 2 years ago

People should first try the simplest most obvious solution just to have a baseline before they jump into the fancy solutions.

alberth 2 years ago

I imagine your laptop had an SSD.

People who weren’t developing around this time can’t appreciate how game changing SSDs were then spinning rust.

I/O was no longer the bottleneck post SSD’s.

Even today, people way underestimate the power of NVME.