← Back to context

Comment by cratermoon

5 years ago

I think this paper just shows that Amdhal's Law[1] is just as relevant when discussing distributed systems as it is when talking about multi-core single machines.

1. https://en.wikipedia.org/wiki/Amdahl%27s_law

It's related to Amdahl's law but not identical. As I understand it, Amdahl's law is talking about the situation where you have say 10 units of work, and 5 of them can be parallelized, and 5 can't and must be run serially.

The COST paper is talking about the situation where you have 10 units of work, but it takes 10 more units of work to distribute it across multiple computers.

Sometimes that's a win and sometimes it isn't, thus the phrase "parallelizing your overhead". If all you did was parallelize the additional overhead, then you didn't gain anything by using a distributed system.

IOW, the overhead for distribution in distributed systems can be very large. It's still there in parallel computing, but shared memory makes it less pronounced. (Although NUMA makes your single computer like a distributed system again.)

  • > it takes 10 more units of work to distribute it across multiple computers.

    That's just serialized work in a different form: in order for a computer to do a unit of work, it must be sent a unit of work. That's strictly serial -- a computer can't process a unit of work it doesn't have. So no matter how many units of work it takes to distribute the load, that's still 10 serial operations.

    Amdahl wasn't only seeing things as separate chunks, with the parallelized work over one part of the code, and the serial work in a different part. The operations can be interleaved.

I think more important is the actual cost of communication. Networks are slow. 100Gbit is still a couple times slower than the memory controllers in our phones.

https://en.m.wikipedia.org/wiki/Bulk_synchronous_parallel is a useful model for incorporating communication costs into design vs. PRAM.

  • BSP is a specific model for parallel computation, but it still operates on the fundamentals of Amdahl's Law, and the whole reason it exists is find the optimal divide-and-conquer solution given a set of real-world limitations.

    One might even say that the cost of implementing your algorithm with BSP is higher than PRAM, because of the extra layers. But you can't ignore some of the things that you can in a strict PRAM model, so you have to incorporate those into the cost as well.

    Given Gustafson's law, if you have enough data, enough to work to do, you can sort of ignore the un-parallelizable fraction by throwing enough processors at the computation.

    What starts to trip things up at this scale is synchronization.