bash: `map` in series and parallel

To process a bunch of items this bash can process them in series or parallel.

Note this passes a function to use for each item in bash's version of a higher order function, so the same underlying function can be supplied to either map_series or map_parallel without modification.

Strategy to convert the series code to parallel:

  1. Maintain an index of the input line (index=$(( $index+1 )))
  2. Prepend index column in every line of output (sed "/^/$index\t/')
  3. Process each input item in the background (&)
  4. Wait for all background jobs to finish (wait)
  5. Sort by index column, then remove it (sort -nk1 | cut -f2-)
# Usage: blah | map_series <Function name>
function map_series() {
  local fn="$1"

  while read; do
    "$fn" $REPLY
  done
}

# Usage: blah | map_parallel <Function name>
function map_parallel() {
  local fn="$1"

  {
    # Pass through source index to sort by it at the end
    local index=0

    while read; do
      # Prepend `index` column
      index=$(( index+1 ))
      "$fn" $REPLY | sed -E "s/^/$index\t/" &
    done

    # Await all application jobs
    wait

    # Restore order from `index` column, then remove...
  } | sort --numeric-sort --key 1 | cut -f 2-
}

Example

Given a function that does some work that takes time and produces something on stout the parallel one finishes much faster than the serial but produces the same output (albeit all at the end due to the sort).

$ function DoItem() {
  printf "DoEcho: sleeping for $@... "
  sleep $1
  printf "done\n"
}

$ time seq 5 | map_serial DoItem
DoEcho: sleeping for 1... done     # ┐
DoEcho: sleeping for 2... done     # │
DoEcho: sleeping for 3... done     # ├ appear gradually, as done
DoEcho: sleeping for 4... done     # │
DoEcho: sleeping for 5... done     # ┘

real    0m15.037s                  # ⬅  Sum durations

$ time seq 5 | map_parallel DoItem
DoEcho: sleeping for 1... done     # ┐
DoEcho: sleeping for 2... done     # │
DoEcho: sleeping for 3... done     # ├ appear together, when all done
DoEcho: sleeping for 4... done     # │
DoEcho: sleeping for 5... done     # ┘

real    0m5.020s                   # ⬅  Max durations
Published on: 20 Aug 2022