120 lines
11 KiB
HTML
120 lines
11 KiB
HTML
|
<!DOCTYPE html>
|
||
|
<html lang="en">
|
||
|
<head>
|
||
|
<meta charset="UTF-8" />
|
||
|
<title>CUDA Standard Algorithms » Execution Policy | Taskflow QuickStart</title>
|
||
|
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:400,400i,600,600i%7CSource+Code+Pro:400,400i,600" />
|
||
|
<link rel="stylesheet" href="m-dark+documentation.compiled.css" />
|
||
|
<link rel="icon" href="favicon.ico" type="image/vnd.microsoft.icon" />
|
||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||
|
<meta name="theme-color" content="#22272e" />
|
||
|
</head>
|
||
|
<body>
|
||
|
<header><nav id="navigation">
|
||
|
<div class="m-container">
|
||
|
<div class="m-row">
|
||
|
<span id="m-navbar-brand" class="m-col-t-8 m-col-m-none m-left-m">
|
||
|
<a href="https://taskflow.github.io"><img src="taskflow_logo.png" alt="" />Taskflow</a> <span class="m-breadcrumb">|</span> <a href="index.html" class="m-thin">QuickStart</a>
|
||
|
</span>
|
||
|
<div class="m-col-t-4 m-hide-m m-text-right m-nopadr">
|
||
|
<a href="#search" class="m-doc-search-icon" title="Search" onclick="return showSearch()"><svg style="height: 0.9rem;" viewBox="0 0 16 16">
|
||
|
<path id="m-doc-search-icon-path" d="m6 0c-3.31 0-6 2.69-6 6 0 3.31 2.69 6 6 6 1.49 0 2.85-0.541 3.89-1.44-0.0164 0.338 0.147 0.759 0.5 1.15l3.22 3.79c0.552 0.614 1.45 0.665 2 0.115 0.55-0.55 0.499-1.45-0.115-2l-3.79-3.22c-0.392-0.353-0.812-0.515-1.15-0.5 0.895-1.05 1.44-2.41 1.44-3.89 0-3.31-2.69-6-6-6zm0 1.56a4.44 4.44 0 0 1 4.44 4.44 4.44 4.44 0 0 1-4.44 4.44 4.44 4.44 0 0 1-4.44-4.44 4.44 4.44 0 0 1 4.44-4.44z"/>
|
||
|
</svg></a>
|
||
|
<a id="m-navbar-show" href="#navigation" title="Show navigation"></a>
|
||
|
<a id="m-navbar-hide" href="#" title="Hide navigation"></a>
|
||
|
</div>
|
||
|
<div id="m-navbar-collapse" class="m-col-t-12 m-show-m m-col-m-none m-right-m">
|
||
|
<div class="m-row">
|
||
|
<ol class="m-col-t-6 m-col-m-none">
|
||
|
<li><a href="pages.html">Handbook</a></li>
|
||
|
<li><a href="namespaces.html">Namespaces</a></li>
|
||
|
</ol>
|
||
|
<ol class="m-col-t-6 m-col-m-none" start="3">
|
||
|
<li><a href="annotated.html">Classes</a></li>
|
||
|
<li><a href="files.html">Files</a></li>
|
||
|
<li class="m-show-m"><a href="#search" class="m-doc-search-icon" title="Search" onclick="return showSearch()"><svg style="height: 0.9rem;" viewBox="0 0 16 16">
|
||
|
<use href="#m-doc-search-icon-path" />
|
||
|
</svg></a></li>
|
||
|
</ol>
|
||
|
</div>
|
||
|
</div>
|
||
|
</div>
|
||
|
</div>
|
||
|
</nav></header>
|
||
|
<main><article>
|
||
|
<div class="m-container m-container-inflatable">
|
||
|
<div class="m-row">
|
||
|
<div class="m-col-l-10 m-push-l-1">
|
||
|
<h1>
|
||
|
<span class="m-breadcrumb"><a href="cudaStandardAlgorithms.html">CUDA Standard Algorithms</a> »</span>
|
||
|
Execution Policy
|
||
|
</h1>
|
||
|
<nav class="m-block m-default">
|
||
|
<h3>Contents</h3>
|
||
|
<ul>
|
||
|
<li><a href="#CUDASTDExecutionPolicyIncludeTheHeader">Include the Header</a></li>
|
||
|
<li><a href="#CUDASTDParameterizePerformance">Parameterize Performance</a></li>
|
||
|
<li><a href="#CUDASTDDefineAnExecutionPolicy">Define an Execution Policy</a></li>
|
||
|
<li><a href="#CUDASTDAllocateMemoryBufferForAlgorithms">Allocate Memory Buffer for Algorithms</a></li>
|
||
|
</ul>
|
||
|
</nav>
|
||
|
<p>Taskflow provides standalone template methods for expressing common parallel algorithms on a GPU. Each of these methods is governed by an <em>execution policy object</em> to configure the kernel execution parameters.</p><section id="CUDASTDExecutionPolicyIncludeTheHeader"><h2><a href="#CUDASTDExecutionPolicyIncludeTheHeader">Include the Header</a></h2><p>You need to include the header file, <code>taskflow/cuda/cudaflow.hpp</code>, for creating a CUDA execution policy object.</p><pre class="m-code"><span class="cp">#include</span><span class="w"> </span><span class="cpf"><taskflow/cuda/cudaflow.hpp></span></pre></section><section id="CUDASTDParameterizePerformance"><h2><a href="#CUDASTDParameterizePerformance">Parameterize Performance</a></h2><p>Taskflow parameterizes most CUDA algorithms in terms of <em>the number of threads per block</em> and <em>units of work per thread</em>, which can be specified in the execution policy template type, <a href="classtf_1_1cudaExecutionPolicy.html" class="m-doc">tf::<wbr />cudaExecutionPolicy</a>. The design is inspired by <a href="https://moderngpu.github.io/">Modern GPU Programming</a> authored by Sean Baxter to achieve high-performance GPU computing.</p></section><section id="CUDASTDDefineAnExecutionPolicy"><h2><a href="#CUDASTDDefineAnExecutionPolicy">Define an Execution Policy</a></h2><p>The following example defines an execution policy object, <code>policy</code>, which configures (1) each block to invoke 512 threads and (2) each of these <code>512</code> threads to perform <code>11</code> units of work. Block size must be a power of two. It is always a good idea to specify an odd number in the second parameter to avoid bank conflicts.</p><pre class="m-code"><span class="n">tf</span><span class="o">::</span><span class="n">cudaExecutionPolicy</span><span class="o"><</span><span class="mi">512</span><span class="p">,</span><span class="w"> </span><span class="mi">11</span><span class="o">></span><span class="w"> </span><span class="n">policy</span><span class="p">;</span></pre><p>By default, the execution policy object is associated with the CUDA <em>default stream</em> (i.e., 0). Default stream can incur significant overhead due to the global synchronization. You can associate an execution policy with another stream as shown below:</p><pre class="m-code"><span class="c1">// create a RAII-styled stream object</span>
|
||
|
<span class="n">tf</span><span class="o">::</span><span class="n">cudaStream</span><span class="w"> </span><span class="n">stream1</span><span class="p">,</span><span class="w"> </span><span class="n">stream2</span><span class="p">;</span>
|
||
|
|
||
|
<span class="c1">// assign a stream to a policy at construction time</span>
|
||
|
<span class="n">tf</span><span class="o">::</span><span class="n">cudaExecutionPolicy</span><span class="o"><</span><span class="mi">512</span><span class="p">,</span><span class="w"> </span><span class="mi">11</span><span class="o">></span><span class="w"> </span><span class="n">policy</span><span class="p">(</span><span class="n">stream1</span><span class="p">);</span>
|
||
|
|
||
|
<span class="c1">// assign another stream to the policy</span>
|
||
|
<span class="n">policy</span><span class="p">.</span><span class="n">stream</span><span class="p">(</span><span class="n">stream2</span><span class="p">);</span></pre><p>All the CUDA standard algorithms in Taskflow are asynchronous with respect to the stream assigned to the execution policy. This enables high execution efficiency for large GPU workloads that call for many different algorithms. You can synchronize the stream the block until all tasks in the stream finish:</p><pre class="m-code"><span class="n">cudaStreamSynchronize</span><span class="p">(</span><span class="n">policy</span><span class="p">.</span><span class="n">stream</span><span class="p">());</span><span class="w"> </span></pre><p>The best-performing configurations for each algorithm, each GPU architecture, and each data type can vary significantly. You should experiment different configurations and find the optimal tuning parameters for your applications. A default policy is given in <a href="namespacetf.html#a0e267ab3e1baeb1962f3b3a374de9553" class="m-doc">tf::<wbr />cudaDefaultExecutionPolicy</a>.</p><pre class="m-code"><span class="n">tf</span><span class="o">::</span><span class="n">cudaDefaultExecutionPolicy</span><span class="w"> </span><span class="n">default_policy</span><span class="p">;</span></pre></section><section id="CUDASTDAllocateMemoryBufferForAlgorithms"><h2><a href="#CUDASTDAllocateMemoryBufferForAlgorithms">Allocate Memory Buffer for Algorithms</a></h2><p>A key difference between our CUDA standard algorithms and others (e.g., Thrust) is the <em>memory management</em>. Unlike CPU-parallel algorithms, many GPU-parallel algorithms require extra buffer to store the temporary results during the multi-phase computation, for instance, <a href="namespacetf.html#a8a872d2a0ac73a676713cb5be5aa688c" class="m-doc">tf::<wbr />cuda_reduce</a> and <a href="namespacetf.html#a06804cb1598e965febc7bd35fc0fbbb0" class="m-doc">tf::<wbr />cuda_sort</a>. We <em>DO NOT</em> allocate any memory during these algorithms call but ask you to provide the memory buffer required for each of such algorithms. This decision seems to complicate the code a little bit, but it gives applications freedom to optimize the memory; also, it makes all algorithm calls capturable to a CUDA graph to improve the execution efficiency.</p></section>
|
||
|
</div>
|
||
|
</div>
|
||
|
</div>
|
||
|
</article></main>
|
||
|
<div class="m-doc-search" id="search">
|
||
|
<a href="#!" onclick="return hideSearch()"></a>
|
||
|
<div class="m-container">
|
||
|
<div class="m-row">
|
||
|
<div class="m-col-m-8 m-push-m-2">
|
||
|
<div class="m-doc-search-header m-text m-small">
|
||
|
<div><span class="m-label m-default">Tab</span> / <span class="m-label m-default">T</span> to search, <span class="m-label m-default">Esc</span> to close</div>
|
||
|
<div id="search-symbolcount">…</div>
|
||
|
</div>
|
||
|
<div class="m-doc-search-content">
|
||
|
<form>
|
||
|
<input type="search" name="q" id="search-input" placeholder="Loading …" disabled="disabled" autofocus="autofocus" autocomplete="off" spellcheck="false" />
|
||
|
</form>
|
||
|
<noscript class="m-text m-danger m-text-center">Unlike everything else in the docs, the search functionality <em>requires</em> JavaScript.</noscript>
|
||
|
<div id="search-help" class="m-text m-dim m-text-center">
|
||
|
<p class="m-noindent">Search for symbols, directories, files, pages or
|
||
|
modules. You can omit any prefix from the symbol or file path; adding a
|
||
|
<code>:</code> or <code>/</code> suffix lists all members of given symbol or
|
||
|
directory.</p>
|
||
|
<p class="m-noindent">Use <span class="m-label m-dim">↓</span>
|
||
|
/ <span class="m-label m-dim">↑</span> to navigate through the list,
|
||
|
<span class="m-label m-dim">Enter</span> to go.
|
||
|
<span class="m-label m-dim">Tab</span> autocompletes common prefix, you can
|
||
|
copy a link to the result using <span class="m-label m-dim">⌘</span>
|
||
|
<span class="m-label m-dim">L</span> while <span class="m-label m-dim">⌘</span>
|
||
|
<span class="m-label m-dim">M</span> produces a Markdown link.</p>
|
||
|
</div>
|
||
|
<div id="search-notfound" class="m-text m-warning m-text-center">Sorry, nothing was found.</div>
|
||
|
<ul id="search-results"></ul>
|
||
|
</div>
|
||
|
</div>
|
||
|
</div>
|
||
|
</div>
|
||
|
</div>
|
||
|
<script src="search-v2.js"></script>
|
||
|
<script src="searchdata-v2.js" async="async"></script>
|
||
|
<footer><nav>
|
||
|
<div class="m-container">
|
||
|
<div class="m-row">
|
||
|
<div class="m-col-l-10 m-push-l-1">
|
||
|
<p>Taskflow handbook is part of the <a href="https://taskflow.github.io">Taskflow project</a>, copyright © <a href="https://tsung-wei-huang.github.io/">Dr. Tsung-Wei Huang</a>, 2018–2024.<br />Generated by <a href="https://doxygen.org/">Doxygen</a> 1.9.1 and <a href="https://mcss.mosra.cz/">m.css</a>.</p>
|
||
|
</div>
|
||
|
</div>
|
||
|
</div>
|
||
|
</nav></footer>
|
||
|
</body>
|
||
|
</html>
|