171 lines
5.5 KiB
Text
171 lines
5.5 KiB
Text
namespace tf {
|
|
|
|
/** @page ParallelReduction Parallel Reduction
|
|
|
|
%Taskflow provides template function that constructs a task to perform parallel reduction over a range of items.
|
|
|
|
@tableofcontents
|
|
|
|
@section ParallelReductionInclude Include the Header
|
|
|
|
You need to include the header file, <tt>taskflow/algorithm/reduce.hpp</tt>,
|
|
for creating a parallel-reduction task.
|
|
|
|
@code{.cpp}
|
|
#include <taskflow/algorithm/reduce.hpp>
|
|
@endcode
|
|
|
|
@section A2ParallelReduction Create a Parallel-Reduction Task
|
|
|
|
The reduction task created by
|
|
tf::Taskflow::reduce(B first, E last, T& result, O bop, P&& part) performs
|
|
parallel reduction over a range of elements specified by <tt>[first, last)</tt> using the binary operator @c bop and stores the reduced result in @c result.
|
|
It represents the parallel execution of the following reduction loop:
|
|
|
|
@code{.cpp}
|
|
for(auto itr=first; itr<last; itr++) {
|
|
result = bop(result, *itr);
|
|
}
|
|
@endcode
|
|
|
|
At runtime, the reduction task spawns a subflow to perform parallel reduction.
|
|
The reduced result is stored in @c result that will be captured by reference in the reduction task.
|
|
It is your responsibility to ensure @c result remains alive during the parallel execution.
|
|
|
|
@code{.cpp}
|
|
int sum = 100;
|
|
std::vector<int> vec = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
|
|
|
|
tf::Task task = taskflow.reduce(vec.begin(), vec.end(), sum,
|
|
[] (int l, int r) { return l + r; } // binary reducer operator
|
|
);
|
|
executor.run(taskflow).wait();
|
|
|
|
assert(sum == 100 + 55);
|
|
@endcode
|
|
|
|
The order in which the binary operator is applied to pairs of elements is @em unspecified.
|
|
In other words, the elements of the range may be grouped and rearranged in arbitrary order.
|
|
The result and the argument types of the binary operator must be consistent with the input data type.
|
|
|
|
@section ParallelReductionCaptureIteratorsByReference Capture Iterators by Reference
|
|
|
|
You can pass iterators by reference using @std_ref
|
|
to marshal parameter update between dependent tasks.
|
|
This is especially useful when the range is unknown
|
|
at the time of creating a parallel-reduction task,
|
|
but needs initialization from another task.
|
|
|
|
@code{.cpp}
|
|
int sum = 100;
|
|
std::vector<int> vec;
|
|
std::vector<int>::iterator first, last;
|
|
|
|
tf::Task init = taskflow.emplace([&](){
|
|
vec = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
|
|
first = vec.begin();
|
|
last = vec.end();
|
|
});
|
|
|
|
tf::Task task = taskflow.reduce(std::ref(first), std::ref(last), sum,
|
|
[] (int l, int r) { return l + r; } // binary reducer operator
|
|
);
|
|
|
|
// wrong! must use std::ref, or first and last are captured by copy
|
|
// tf::Task task = taskflow.reduce(first, last, sum, [] (int l, int r) {
|
|
// return l + r; // binary reducer operator
|
|
// });
|
|
|
|
init.precede(task);
|
|
|
|
executor.run(taskflow).wait();
|
|
|
|
assert(sum == 100 + 55);
|
|
@endcode
|
|
|
|
In the above example, when @c init finishes, @c vec has been initialized to 10 elements with
|
|
@c first and @c last pointing to the data range of @c vec.
|
|
The reduction task will then work on this initialized range
|
|
as a result of passing iterators by reference.
|
|
|
|
@section A2ParallelTransformationReduction Create a Parallel-Transform-Reduction Task
|
|
|
|
It is common to transform each element into a new data type and
|
|
then perform reduction on the transformed elements.
|
|
%Taskflow provides a method,
|
|
tf::Taskflow::transform_reduce(B first, E last, T& result, BOP bop, UOP uop, P&& part),
|
|
that applies @c uop to transform each element in the specified range
|
|
and then perform parallel reduction over @c result and transformed elements.
|
|
It represents the parallel execution of the following reduction loop:
|
|
|
|
@code{.cpp}
|
|
for(auto itr=first; itr<last; itr++) {
|
|
result = bop(result, uop(*itr));
|
|
}
|
|
@endcode
|
|
|
|
The example below transforms each digit in a string to an integer number
|
|
and then sums up all integers in parallel.
|
|
|
|
@code{.cpp}
|
|
std::string str = "12345678";
|
|
int sum {0};
|
|
tf::Task task = taskflow.transform_reduce(str.begin(), str.end(), sum,
|
|
[] (int a, int b) { // binary reduction operator
|
|
return a + b;
|
|
},
|
|
[] (char c) -> int { // unary transformation operator
|
|
return c - '0';
|
|
}
|
|
);
|
|
executor.run(taskflow).wait();
|
|
assert(sum == 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8); // sum will be 36
|
|
@endcode
|
|
|
|
The order in which we apply the binary operator on the transformed elements is @em unspecified.
|
|
It is possible that the binary operator will take @em r-value in both arguments, for example,
|
|
<tt>bop(uop(*itr1), uop(*itr2))</tt>, due to the transformed temporaries.
|
|
When data passing is expensive,
|
|
you may define the result type @c T to be move-constructible.
|
|
|
|
@section ParallelReductionCfigureAPartitioner Configure a Partitioner
|
|
|
|
You can configure a partitioner for parallel-reduction tasks to run with different
|
|
scheduling methods, such as guided partitioning, dynamic partitioning, and static partitioning.
|
|
The following example creates two parallel-reduction tasks using two different
|
|
partitioners, one with the static partitioning algorithm and
|
|
another one with the guided partitioning algorithm:
|
|
|
|
@code{.cpp}
|
|
tf::StaticPartitioner static_partitioner;
|
|
tf::GuidedPartitioner guided_partitioner;
|
|
|
|
int sum1 = 100, sum2 = 100;
|
|
std::vector<int> vec = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
|
|
|
|
// create a parallel-reduction task with static partitioner
|
|
taskflow.reduce(vec.begin(), vec.end(), sum1,
|
|
[] (int l, int r) { return l + r; },
|
|
static_partitioner
|
|
);
|
|
|
|
// create a parallel-reduction task with guided partitioner
|
|
taskflow.reduce(vec.begin(), vec.end(), sum2,
|
|
[] (int l, int r) { return l + r; },
|
|
guided_partitioner
|
|
);
|
|
@endcode
|
|
|
|
@note
|
|
By default, parallel-reduction tasks use tf::DefaultPartitioner
|
|
if no partitioner is specified.
|
|
|
|
*/
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|