bash: `map` in series and parallel
To process a bunch of items this bash
can process them in series or parallel.
Note this passes a function to use for each item in bash's version of a higher order function, so the same underlying function can be supplied to either map_series
or map_parallel
without modification.
Strategy to convert the series code to parallel:
- Maintain an index of the input line (
index=$(( $index+1 ))
) - Prepend index column in every line of output (
sed "/^/$index\t/'
) - Process each input item in the background (
&
) - Wait for all background jobs to finish (
wait
) - Sort by index column, then remove it (
sort -nk1 | cut -f2-
)
# Usage: blah | map_series <Function name>
function map_series() {
local fn="$1"
while read; do
"$fn" $REPLY
done
}
# Usage: blah | map_parallel <Function name>
function map_parallel() {
local fn="$1"
{
# Pass through source index to sort by it at the end
local index=0
while read; do
# Prepend `index` column
index=$(( index+1 ))
"$fn" $REPLY | sed -E "s/^/$index\t/" &
done
# Await all application jobs
wait
# Restore order from `index` column, then remove...
} | sort --numeric-sort --key 1 | cut -f 2-
}
Example
Given a function that does some work that takes time and produces something on stout the parallel one finishes much faster than the serial but produces the same output (albeit all at the end due to the sort
).
$ function DoItem() {
printf "DoEcho: sleeping for $@... "
sleep $1
printf "done\n"
}
$ time seq 5 | map_serial DoItem
DoEcho: sleeping for 1... done # ┐
DoEcho: sleeping for 2... done # │
DoEcho: sleeping for 3... done # ├ appear gradually, as done
DoEcho: sleeping for 4... done # │
DoEcho: sleeping for 5... done # ┘
real 0m15.037s # ⬅ Sum durations
$ time seq 5 | map_parallel DoItem
DoEcho: sleeping for 1... done # ┐
DoEcho: sleeping for 2... done # │
DoEcho: sleeping for 3... done # ├ appear together, when all done
DoEcho: sleeping for 4... done # │
DoEcho: sleeping for 5... done # ┘
real 0m5.020s # ⬅ Max durations
Published on: 20 Aug 2022