147 lines
5.2 KiB
Text
147 lines
5.2 KiB
Text
namespace tf {
|
|
|
|
/** @page PartitioningAlgorithm Partitioning Algorithm
|
|
|
|
A partitioning algorithm allows applications to optimize parallel algorithms
|
|
using different scheduling methods, such as static partitioning, dynamic partitioning,
|
|
and guided partitioning.
|
|
|
|
@tableofcontents
|
|
|
|
@section DefineAPartitionerForParallelAlgorithms Define a Partitioner for Parallel Algorithms
|
|
|
|
A partitioner defines how to partition and distribute iterations to different workers
|
|
when running parallel algorithms in %Taskflow,
|
|
such as tf::Taskflow::for_each and tf::Taskflow::transform.
|
|
The following example shows how to create parallel-iteration tasks
|
|
with different execution policies:
|
|
|
|
@code{.cpp}
|
|
std::vector<int> data = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
|
|
|
|
// create different partitioners
|
|
tf::GuidedPartitioner guided_partitioner;
|
|
tf::StaticPartitioner static_partitioner;
|
|
tf::RandomPartitioner random_partitioner;
|
|
tf::DynamicPartitioner dynamic_partitioner;
|
|
|
|
// create four parallel-iteration tasks from the four execution policies
|
|
taskflow.for_each(data.begin(), data.end(), [](int i){}, guided_partitioner);
|
|
taskflow.for_each(data.begin(), data.end(), [](int i){}, static_partitioner);
|
|
taskflow.for_each(data.begin(), data.end(), [](int i){}, random_partitioner);
|
|
taskflow.for_each(data.begin(), data.end(), [](int i){}, dynamic_partitioner);
|
|
@endcode
|
|
|
|
Each partitioner has a specific algorithm to partition iterations
|
|
into a set of @em chunks and distribute chunks to workers.
|
|
A chunk is the basic unit of work that will be run by a worker
|
|
during the execution of parallel iterations.
|
|
The following figure illustrates the scheduling diagram for three major partitioners,
|
|
tf::StaticPartitioner, tf::DynamicPartitioner, and tf::GuidedPartitioner:
|
|
|
|
@image html images/parallel_for_partition_algorithms.png
|
|
|
|
Depending on applications, partitioning algorithms can impact the performance a lot.
|
|
For example, if a parallel-iteration workload contains a regular work unit per iteration,
|
|
tf::StaticPartitioner may deliver the best performance.
|
|
On the other hand, if the work unit per iteration is irregular and unbalanced,
|
|
tf::GuidedPartitioner or tf::DynamicPartitioner can outperform tf::StaticPartitioner.
|
|
|
|
@note
|
|
By default, all parallel algorithms in %Taskflow use tf::DefaultPartitioner,
|
|
which is based on guided scheduling via tf::GuidedPartitioner.
|
|
|
|
@section DefineAStaticPartitioner Define a Static Partitioner
|
|
|
|
Static partitioner splits iterations into <tt>iter_size/chunk_size</tt> chunks
|
|
and distribute chunks to workers in order.
|
|
If no chunk size is given (@c chunk_size is 0),
|
|
%Taskflow will partition iterations into chunks that are approximately equal in size.
|
|
The following code creates a static partitioner with chunk size equal to 100:
|
|
|
|
@code{.cpp}
|
|
tf::StaticPartitioner static_partitioner(100);
|
|
@endcode
|
|
|
|
@section DefineADynamicPartitioner Define a Dynamic Partitioner
|
|
|
|
Dynamic partitioner splits iterations into <tt>iter_size/chunk_size</tt> chunks
|
|
and distribute chunks to workers without any specific order.
|
|
If no chunk size is given (@c chunk_size is 0),
|
|
%Taskflow will use 1 for the minimum size of a partition.
|
|
The following code creates a dynamic partitioner with chunk size equal to 2:
|
|
|
|
@code{.cpp}
|
|
tf::DynamicPartitioner dynamic_partitioner(2);
|
|
@endcode
|
|
|
|
@section DefineAGuidedPartitioner Define a Guided Partitioner
|
|
|
|
Guided partitioner dynamically decides the chunk size.
|
|
The size of a chunk is proportional to the number of unassigned iterations divided
|
|
by the number of the threads,
|
|
and the size will gradually decrease to the specified chunk size (default 1).
|
|
The last chunk may be smaller than the specified chunk size.
|
|
If no chunk size is given (@c chunk_size is 0),
|
|
%Taskflow will use 1 for the minimum size of a partition.
|
|
The following code creates a guided partitioner with chunk size equal to 10:
|
|
|
|
@code{.cpp}
|
|
tf::GuidedPartitioner guided_partitioner(10);
|
|
@endcode
|
|
|
|
In most situations, guided partitioner can achieve decent performance
|
|
due to adaptive parallelism, especially for those with irregular and
|
|
unbalanced workload per iteration.
|
|
As a result, guided partitioner is used as the default partitioner for
|
|
our parallel algorithms.
|
|
|
|
@section DefineAClosureWrapperForAPartitioner Define a Closure Wrapper for a Partitioner
|
|
|
|
In addition to partition size, applications can specify a <em>closure wrapper</em>
|
|
for a partitioner.
|
|
A closure wrapper allows the application to wrapper a partitioned task,
|
|
i.e., closure, with a custom function object that performs additional tasks.
|
|
For example:
|
|
|
|
@code{.cpp}
|
|
std::atomic<int> count = 0;
|
|
tf::Taskflow taskflow;
|
|
taskflow.for_each_index(0, 100, 1,
|
|
[](){
|
|
printf("%d\n", i);
|
|
},
|
|
tf::StaticPartitioner(0, [](auto&& closure){
|
|
// do something before invoking the partitioned task
|
|
// ...
|
|
|
|
// invoke the partitioned task
|
|
closure();
|
|
|
|
// do something else after invoking the partitioned task
|
|
// ...
|
|
}
|
|
);
|
|
executor.run(taskflow).wait();
|
|
@endcode
|
|
|
|
Each partitioner uses a default closure wrapper (tf::DefaultClosureWrapper)
|
|
that does nothing but simply invokes the given closure to perform
|
|
the ordinary partitioned task.
|
|
|
|
@code{.cpp}
|
|
struct DefaultClosureWrapper {
|
|
template <typename C>
|
|
void operator()(C&& closure) const { std::forward<C>(closure)(); }
|
|
};
|
|
@endcode
|
|
|
|
*/
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|