namespace tf {

/** @page PartitioningAlgorithm Partitioning Algorithm

A partitioning algorithm allows applications to optimize parallel algorithms
using different scheduling methods, such as static partitioning, dynamic partitioning,
and guided partitioning.

@tableofcontents

@section DefineAPartitionerForParallelAlgorithms Define a Partitioner for Parallel Algorithms

A partitioner defines how to partition and distribute iterations to different workers 
when running parallel algorithms in %Taskflow,
such as tf::Taskflow::for_each and tf::Taskflow::transform.
The following example shows how to create parallel-iteration tasks
with different execution policies:

@code{.cpp}
std::vector<int> data = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

// create different partitioners
tf::GuidedPartitioner guided_partitioner;
tf::StaticPartitioner static_partitioner;
tf::RandomPartitioner random_partitioner;
tf::DynamicPartitioner dynamic_partitioner;

// create four parallel-iteration tasks from the four execution policies
taskflow.for_each(data.begin(), data.end(), [](int i){}, guided_partitioner);
taskflow.for_each(data.begin(), data.end(), [](int i){}, static_partitioner);
taskflow.for_each(data.begin(), data.end(), [](int i){}, random_partitioner);
taskflow.for_each(data.begin(), data.end(), [](int i){}, dynamic_partitioner);
@endcode

Each partitioner has a specific algorithm to partition iterations
into a set of @em chunks and distribute chunks to workers.
A chunk is the basic unit of work that will be run by a worker
during the execution of parallel iterations.
The following figure illustrates the scheduling diagram for three major partitioners,
tf::StaticPartitioner, tf::DynamicPartitioner, and tf::GuidedPartitioner:

@image html images/parallel_for_partition_algorithms.png

Depending on applications, partitioning algorithms can impact the performance a lot. 
For example, if a parallel-iteration workload contains a regular work unit per iteration, 
tf::StaticPartitioner may deliver the best performance. 
On the other hand, if the work unit per iteration is irregular and unbalanced, 
tf::GuidedPartitioner or tf::DynamicPartitioner can outperform tf::StaticPartitioner.

@note
By default, all parallel algorithms in %Taskflow use tf::DefaultPartitioner,
which is based on guided scheduling via tf::GuidedPartitioner.

@section DefineAStaticPartitioner Define a Static Partitioner

Static partitioner splits iterations into <tt>iter_size/chunk_size</tt> chunks
and distribute chunks to workers in order.
If no chunk size is given (@c chunk_size is 0), 
%Taskflow will partition iterations into chunks that are approximately equal in size.
The following code creates a static partitioner with chunk size equal to 100:

@code{.cpp}
tf::StaticPartitioner static_partitioner(100);
@endcode

@section DefineADynamicPartitioner Define a Dynamic Partitioner

Dynamic partitioner splits iterations into <tt>iter_size/chunk_size</tt> chunks
and distribute chunks to workers without any specific order.
If no chunk size is given (@c chunk_size is 0),
%Taskflow will use 1 for the minimum size of a partition.
The following code creates a dynamic partitioner with chunk size equal to 2:

@code{.cpp}
tf::DynamicPartitioner dynamic_partitioner(2);
@endcode

@section DefineAGuidedPartitioner Define a Guided Partitioner

Guided partitioner dynamically decides the chunk size.
The size of a chunk is proportional to the number of unassigned iterations divided
by the number of the threads,
and the size will gradually decrease to the specified chunk size (default 1).
The last chunk may be smaller than the specified chunk size.
If no chunk size is given (@c chunk_size is 0),
%Taskflow will use 1 for the minimum size of a partition.
The following code creates a guided partitioner with chunk size equal to 10:

@code{.cpp}
tf::GuidedPartitioner guided_partitioner(10);
@endcode

In most situations, guided partitioner can achieve decent performance
due to adaptive parallelism, especially for those with irregular and
unbalanced workload per iteration.
As a result, guided partitioner is used as the default partitioner for
our parallel algorithms.

@section DefineAClosureWrapperForAPartitioner Define a Closure Wrapper for a Partitioner

In addition to partition size, applications can specify a <em>closure wrapper</em>
for a partitioner.
A closure wrapper allows the application to wrapper a partitioned task,
i.e., closure, with a custom function object that performs additional tasks.
For example:

@code{.cpp}
std::atomic<int> count = 0;
tf::Taskflow taskflow;
taskflow.for_each_index(0, 100, 1, 
  [](){                 
    printf("%d\n", i); 
  },
  tf::StaticPartitioner(0, [](auto&& closure){
    // do something before invoking the partitioned task
    // ...
    
    // invoke the partitioned task
    closure();

    // do something else after invoking the partitioned task
    // ...
  }
);
executor.run(taskflow).wait();
@endcode

Each partitioner uses a default closure wrapper (tf::DefaultClosureWrapper) 
that does nothing but simply invokes the given closure to perform 
the ordinary partitioned task.

@code{.cpp}
struct DefaultClosureWrapper {
  template <typename C>
  void operator()(C&& closure) const { std::forward<C>(closure)(); }
};
@endcode

*/

}