108 lines
14 KiB
XML
108 lines
14 KiB
XML
<?xml version='1.0' encoding='UTF-8' standalone='no'?>
|
|
<doxygen xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="compound.xsd" version="1.9.1" xml:lang="en-US">
|
|
<compounddef id="PartitioningAlgorithm" kind="page">
|
|
<compoundname>PartitioningAlgorithm</compoundname>
|
|
<title>Partitioning Algorithm</title>
|
|
<tableofcontents>
|
|
<tocsect>
|
|
<name>Define a Partitioner for Parallel Algorithms</name>
|
|
<reference>PartitioningAlgorithm_1DefineAPartitionerForParallelAlgorithms</reference>
|
|
</tocsect>
|
|
<tocsect>
|
|
<name>Define a Static Partitioner</name>
|
|
<reference>PartitioningAlgorithm_1DefineAStaticPartitioner</reference>
|
|
</tocsect>
|
|
<tocsect>
|
|
<name>Define a Dynamic Partitioner</name>
|
|
<reference>PartitioningAlgorithm_1DefineADynamicPartitioner</reference>
|
|
</tocsect>
|
|
<tocsect>
|
|
<name>Define a Guided Partitioner</name>
|
|
<reference>PartitioningAlgorithm_1DefineAGuidedPartitioner</reference>
|
|
</tocsect>
|
|
<tocsect>
|
|
<name>Define a Closure Wrapper for a Partitioner</name>
|
|
<reference>PartitioningAlgorithm_1DefineAClosureWrapperForAPartitioner</reference>
|
|
</tocsect>
|
|
</tableofcontents>
|
|
<briefdescription>
|
|
</briefdescription>
|
|
<detaileddescription>
|
|
<para>A partitioning algorithm allows applications to optimize parallel algorithms using different scheduling methods, such as static partitioning, dynamic partitioning, and guided partitioning.</para>
|
|
<sect1 id="PartitioningAlgorithm_1DefineAPartitionerForParallelAlgorithms">
|
|
<title>Define a Partitioner for Parallel Algorithms</title>
|
|
<para>A partitioner defines how to partition and distribute iterations to different workers when running parallel algorithms in Taskflow, such as <ref refid="classtf_1_1FlowBuilder_1aae3edfa278baa75b08414e083c14c836" kindref="member">tf::Taskflow::for_each</ref> and <ref refid="classtf_1_1FlowBuilder_1a97be7ceef6fa4276e3b074c10c13b826" kindref="member">tf::Taskflow::transform</ref>. The following example shows how to create parallel-iteration tasks with different execution policies:</para>
|
|
<para><programlisting filename=".cpp"><codeline><highlight class="normal"><ref refid="cpp/container/vector" kindref="compound" external="/home/thuang295/Code/taskflow/doxygen/cppreference-doxygen-web.tag.xml">std::vector<int></ref><sp/>data<sp/>=<sp/>{1,<sp/>2,<sp/>3,<sp/>4,<sp/>5,<sp/>6,<sp/>7,<sp/>8,<sp/>9,<sp/>10}</highlight></codeline>
|
|
<codeline><highlight class="normal"></highlight></codeline>
|
|
<codeline><highlight class="normal"></highlight><highlight class="comment">//<sp/>create<sp/>different<sp/>partitioners</highlight><highlight class="normal"></highlight></codeline>
|
|
<codeline><highlight class="normal"><ref refid="classtf_1_1GuidedPartitioner" kindref="compound">tf::GuidedPartitioner</ref><sp/>guided_partitioner;</highlight></codeline>
|
|
<codeline><highlight class="normal"><ref refid="classtf_1_1StaticPartitioner" kindref="compound">tf::StaticPartitioner</ref><sp/>static_partitioner;</highlight></codeline>
|
|
<codeline><highlight class="normal"><ref refid="classtf_1_1RandomPartitioner" kindref="compound">tf::RandomPartitioner</ref><sp/>random_partitioner;</highlight></codeline>
|
|
<codeline><highlight class="normal"><ref refid="classtf_1_1DynamicPartitioner" kindref="compound">tf::DynamicPartitioner</ref><sp/>dynamic_partitioner;</highlight></codeline>
|
|
<codeline><highlight class="normal"></highlight></codeline>
|
|
<codeline><highlight class="normal"></highlight><highlight class="comment">//<sp/>create<sp/>four<sp/>parallel-iteration<sp/>tasks<sp/>from<sp/>the<sp/>four<sp/>execution<sp/>policies</highlight><highlight class="normal"></highlight></codeline>
|
|
<codeline><highlight class="normal">taskflow.for_each(data.begin(),<sp/>data.end(),<sp/>[](</highlight><highlight class="keywordtype">int</highlight><highlight class="normal"><sp/>i){},<sp/>guided_partitioner);</highlight></codeline>
|
|
<codeline><highlight class="normal">taskflow.for_each(data.begin(),<sp/>data.end(),<sp/>[](</highlight><highlight class="keywordtype">int</highlight><highlight class="normal"><sp/>i){},<sp/>static_partitioner);</highlight></codeline>
|
|
<codeline><highlight class="normal">taskflow.for_each(data.begin(),<sp/>data.end(),<sp/>[](</highlight><highlight class="keywordtype">int</highlight><highlight class="normal"><sp/>i){},<sp/>random_partitioner);</highlight></codeline>
|
|
<codeline><highlight class="normal">taskflow.for_each(data.begin(),<sp/>data.end(),<sp/>[](</highlight><highlight class="keywordtype">int</highlight><highlight class="normal"><sp/>i){},<sp/>dynamic_partitioner);</highlight></codeline>
|
|
</programlisting></para>
|
|
<para>Each partitioner has a specific algorithm to partition iterations into a set of <emphasis>chunks</emphasis> and distribute chunks to workers. A chunk is the basic unit of work that will be run by a worker during the execution of parallel iterations. The following figure illustrates the scheduling diagram for three major partitioners, <ref refid="classtf_1_1StaticPartitioner" kindref="compound">tf::StaticPartitioner</ref>, <ref refid="classtf_1_1DynamicPartitioner" kindref="compound">tf::DynamicPartitioner</ref>, and <ref refid="classtf_1_1GuidedPartitioner" kindref="compound">tf::GuidedPartitioner</ref>:</para>
|
|
<para><image type="html" name="parallel_for_partition_algorithms.png"></image>
|
|
</para>
|
|
<para>Depending on applications, partitioning algorithms can impact the performance a lot. For example, if a parallel-iteration workload contains a regular work unit per iteration, <ref refid="classtf_1_1StaticPartitioner" kindref="compound">tf::StaticPartitioner</ref> may deliver the best performance. On the other hand, if the work unit per iteration is irregular and unbalanced, <ref refid="classtf_1_1GuidedPartitioner" kindref="compound">tf::GuidedPartitioner</ref> or <ref refid="classtf_1_1DynamicPartitioner" kindref="compound">tf::DynamicPartitioner</ref> can outperform <ref refid="classtf_1_1StaticPartitioner" kindref="compound">tf::StaticPartitioner</ref>.</para>
|
|
<para><simplesect kind="note"><para>By default, all parallel algorithms in Taskflow use <ref refid="namespacetf_1a66b72776c788898aee9e132b0ea9b405" kindref="member">tf::DefaultPartitioner</ref>, which is based on guided scheduling via <ref refid="classtf_1_1GuidedPartitioner" kindref="compound">tf::GuidedPartitioner</ref>.</para>
|
|
</simplesect>
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="PartitioningAlgorithm_1DefineAStaticPartitioner">
|
|
<title>Define a Static Partitioner</title>
|
|
<para>Static partitioner splits iterations into <computeroutput>iter_size/chunk_size</computeroutput> chunks and distribute chunks to workers in order. If no chunk size is given (<computeroutput>chunk_size</computeroutput> is 0), Taskflow will partition iterations into chunks that are approximately equal in size. The following code creates a static partitioner with chunk size equal to 100:</para>
|
|
<para><programlisting filename=".cpp"><codeline><highlight class="normal"><ref refid="classtf_1_1StaticPartitioner" kindref="compound">tf::StaticPartitioner</ref><sp/>static_partitioner(100);</highlight></codeline>
|
|
</programlisting></para>
|
|
</sect1>
|
|
<sect1 id="PartitioningAlgorithm_1DefineADynamicPartitioner">
|
|
<title>Define a Dynamic Partitioner</title>
|
|
<para>Dynamic partitioner splits iterations into <computeroutput>iter_size/chunk_size</computeroutput> chunks and distribute chunks to workers without any specific order. If no chunk size is given (<computeroutput>chunk_size</computeroutput> is 0), Taskflow will use 1 for the minimum size of a partition. The following code creates a dynamic partitioner with chunk size equal to 2:</para>
|
|
<para><programlisting filename=".cpp"><codeline><highlight class="normal"><ref refid="classtf_1_1DynamicPartitioner" kindref="compound">tf::DynamicPartitioner</ref><sp/>dynamic_partitioner(2);</highlight></codeline>
|
|
</programlisting></para>
|
|
</sect1>
|
|
<sect1 id="PartitioningAlgorithm_1DefineAGuidedPartitioner">
|
|
<title>Define a Guided Partitioner</title>
|
|
<para>Guided partitioner dynamically decides the chunk size. The size of a chunk is proportional to the number of unassigned iterations divided by the number of the threads, and the size will gradually decrease to the specified chunk size (default 1). The last chunk may be smaller than the specified chunk size. If no chunk size is given (<computeroutput>chunk_size</computeroutput> is 0), Taskflow will use 1 for the minimum size of a partition. The following code creates a guided partitioner with chunk size equal to 10:</para>
|
|
<para><programlisting filename=".cpp"><codeline><highlight class="normal"><ref refid="classtf_1_1GuidedPartitioner" kindref="compound">tf::GuidedPartitioner</ref><sp/>guided_partitioner(10);</highlight></codeline>
|
|
</programlisting></para>
|
|
<para>In most situations, guided partitioner can achieve decent performance due to adaptive parallelism, especially for those with irregular and unbalanced workload per iteration. As a result, guided partitioner is used as the default partitioner for our parallel algorithms.</para>
|
|
</sect1>
|
|
<sect1 id="PartitioningAlgorithm_1DefineAClosureWrapperForAPartitioner">
|
|
<title>Define a Closure Wrapper for a Partitioner</title>
|
|
<para>In addition to partition size, applications can specify a <emphasis>closure wrapper</emphasis> for a partitioner. A closure wrapper allows the application to wrapper a partitioned task, i.e., closure, with a custom function object that performs additional tasks. For example:</para>
|
|
<para><programlisting filename=".cpp"><codeline><highlight class="normal"><ref refid="cpp/atomic/atomic" kindref="compound" external="/home/thuang295/Code/taskflow/doxygen/cppreference-doxygen-web.tag.xml">std::atomic<int></ref><sp/><ref refid="cpp/algorithm/count" kindref="compound" external="/home/thuang295/Code/taskflow/doxygen/cppreference-doxygen-web.tag.xml">count</ref><sp/>=<sp/>0;</highlight></codeline>
|
|
<codeline><highlight class="normal"><ref refid="classtf_1_1Taskflow" kindref="compound">tf::Taskflow</ref><sp/>taskflow;</highlight></codeline>
|
|
<codeline><highlight class="normal">taskflow.<ref refid="classtf_1_1FlowBuilder_1a3b132bd902331a11b04b4ad66cf8bf77" kindref="member">for_each_index</ref>(0,<sp/>100,<sp/>1,<sp/></highlight></codeline>
|
|
<codeline><highlight class="normal"><sp/><sp/>[](){<sp/><sp/><sp/><sp/><sp/><sp/><sp/><sp/><sp/><sp/><sp/><sp/><sp/><sp/><sp/><sp/><sp/></highlight></codeline>
|
|
<codeline><highlight class="normal"><sp/><sp/><sp/><sp/><ref refid="cpp/io/c/fprintf" kindref="compound" external="/home/thuang295/Code/taskflow/doxygen/cppreference-doxygen-web.tag.xml">printf</ref>(</highlight><highlight class="stringliteral">"%d\n"</highlight><highlight class="normal">,<sp/>i);<sp/></highlight></codeline>
|
|
<codeline><highlight class="normal"><sp/><sp/>},</highlight></codeline>
|
|
<codeline><highlight class="normal"><sp/><sp/><ref refid="classtf_1_1StaticPartitioner" kindref="compound">tf::StaticPartitioner</ref>(0,<sp/>[](</highlight><highlight class="keyword">auto</highlight><highlight class="normal">&&<sp/>closure){</highlight></codeline>
|
|
<codeline><highlight class="normal"><sp/><sp/><sp/><sp/></highlight><highlight class="comment">//<sp/>do<sp/>something<sp/>before<sp/>invoking<sp/>the<sp/>partitioned<sp/>task</highlight><highlight class="normal"></highlight></codeline>
|
|
<codeline><highlight class="normal"><sp/><sp/><sp/><sp/></highlight><highlight class="comment">//<sp/>...</highlight><highlight class="normal"></highlight></codeline>
|
|
<codeline><highlight class="normal"><sp/><sp/><sp/><sp/></highlight></codeline>
|
|
<codeline><highlight class="normal"><sp/><sp/><sp/><sp/></highlight><highlight class="comment">//<sp/>invoke<sp/>the<sp/>partitioned<sp/>task</highlight><highlight class="normal"></highlight></codeline>
|
|
<codeline><highlight class="normal"><sp/><sp/><sp/><sp/>closure();</highlight></codeline>
|
|
<codeline><highlight class="normal"></highlight></codeline>
|
|
<codeline><highlight class="normal"><sp/><sp/><sp/><sp/></highlight><highlight class="comment">//<sp/>do<sp/>something<sp/>else<sp/>after<sp/>invoking<sp/>the<sp/>partitioned<sp/>task</highlight><highlight class="normal"></highlight></codeline>
|
|
<codeline><highlight class="normal"><sp/><sp/><sp/><sp/></highlight><highlight class="comment">//<sp/>...</highlight><highlight class="normal"></highlight></codeline>
|
|
<codeline><highlight class="normal"><sp/><sp/>}</highlight></codeline>
|
|
<codeline><highlight class="normal">);</highlight></codeline>
|
|
<codeline><highlight class="normal">executor.run(taskflow).wait();</highlight></codeline>
|
|
</programlisting></para>
|
|
<para>Each partitioner uses a default closure wrapper (<ref refid="structtf_1_1DefaultClosureWrapper" kindref="compound">tf::DefaultClosureWrapper</ref>) that does nothing but simply invokes the given closure to perform the ordinary partitioned task.</para>
|
|
<para><programlisting filename=".cpp"><codeline><highlight class="keyword">struct<sp/></highlight><highlight class="normal">DefaultClosureWrapper<sp/>{</highlight></codeline>
|
|
<codeline><highlight class="normal"><sp/><sp/></highlight><highlight class="keyword">template</highlight><highlight class="normal"><sp/><</highlight><highlight class="keyword">typename</highlight><highlight class="normal"><sp/>C></highlight></codeline>
|
|
<codeline><highlight class="normal"><sp/><sp/></highlight><highlight class="keywordtype">void</highlight><highlight class="normal"><sp/>operator()(C&&<sp/>closure)</highlight><highlight class="keyword"><sp/>const<sp/></highlight><highlight class="normal">{<sp/>std::forward<C>(closure)();<sp/>}</highlight></codeline>
|
|
<codeline><highlight class="normal">};</highlight></codeline>
|
|
</programlisting> </para>
|
|
</sect1>
|
|
</detaileddescription>
|
|
<location file="doxygen/algorithms/partitioner.dox"/>
|
|
</compounddef>
|
|
</doxygen>
|