← Back to context

Comment by adolph

12 hours ago

My go-to for fast and easy parallelization is xargs -P.

  find a-bunch-of-files | xargs -P 10 do-something-with-a-file

       -P max-procs
       --max-procs=max-procs
              Run up to max-procs processes at a time; the default is 1.
              If max-procs is 0, xargs will run as many processes as
              possible at a time.

note that one should use -print0 and -0 for safety

  • Thanks! I've been using the -F{} do-something-tofile "{}" approach which is also handy for times in which the input is one pram among others. -0 is much faster.

    Edit: Looks like when doing file-by-file -F{} is still needed:

      # find tmp -type f | xargs -0 ls
      ls: cannot access 'tmp/b file.md'$'\n''tmp/a file.md'$'\n''tmp/c file.md'$'\n': No such file or directory

    • find -print0 will print the files with null bytes as separators

      xargs -0 will use a null byte as separator for each argument

      printf 'a\0b\0c\0' | xargs -tI{} echo “file -> {}"