← Back to context

Comment by antonvs

2 months ago

There might have been some misunderstanding there.

The point of map/reduce was that it could easily be parallelized across large numbers of machines, for processing very large amounts of data. Hadoop implemented the first open-source example of this.

The limitations on what it could do were well-known from the start. No-one who knew what they were doing proposed that programs should be rewritten that way unless you were processing enough data to need to run them distributed on a cluster, in which case that was often your best option.

Many of the limitations of pure map/reduce were overcome by adding steps to the basic map/reduce parallel pipelines. Apache Spark is one example. It still has map and reduce operations in its pipeline, but it has several other operations as well. Nothing better than map and reduce has been found for the purpose it serves in such pipelines.