Comment by maccard

1 day ago

I think you’re fixating on the very specific example. Imagine if instead of 2 + 2 it was multiplying arrays of large matrices. The compiler or runtime would be smart enough to figure out if it’s worth dispatching the parallelism or not for you. Basically auto vectorisation but for parallelism

Notably - in most cases, there is no way the compiler can know which of these scenarios are going to happen at compile time.

At runtime, the CPU can figure it out though, eh?

  • I mean, theoretically it's possible. A super basic example would be if the data is known at compile time, it could be auto-parallelized, e.g.

        int buf_size = 10000000;
        auto vec = make_large_array(buf_size);
        for (const auto& val : vec)
        {
            do_expensive_thing(val);
        }
    

    this could clearly be parallelised. In a C++ world that doesn't exist, we can see that it's valid.

    If I replace it with int buf_size = 10000000; cin >> buf_size; auto vec = make_large_array(buf_size); for (const auto& val : vec) { do_expensive_thing(val); }

    the compiler could generate some code that looks like: if buf_size >= SOME_LARGE_THRESHOLD { DO_IN_PARALLEL } else { DO_SERIAL }

    With some background logic for managing threads, etc. In a C++-style world where "control" is important it likely wouldn't fly, but if this was python...

        arr_size = 10000000
        buf = [None] * arr_size
        for x in buf:
            do_expensive_thing(x)
    

    could be parallelised at compile time.