Comment by rainboOow9

3 years ago

I am only speaking for myself here, but I am really feeling the switch from "data engineering" to "data ops", for whatever that means.

In short, 5-10 years ago, writing mapreduce / spark jobs (or even debugging / optimizing hive jobs) was complex enough that it was often the job of the data engineer (and not the data analyst / scientist). And I do not only mean writing the data processing logic, but more importantly, properly configuring it so that the resource footprint was acceptable. This required a good understanding of the underlying framework, analyzing the job execution plan, tweaking the resource configuration, etc.

Now, writing distributed jobs is pretty trivial with most cloud providers, hence it is now purely done by data analysts and scientists. And the data engineers have switched to doing more of a devops kind of work, doing the plumbing between the various cloud components and the IaC required to provide those cloud resources to other data users. In short, you can be a data engineer and have absolutely no clue on how distributed systems are actually working, this will not be an issue in your daily job.