mesytec-mnode/external/taskflow-3.8.0/docs/classtf_1_1cudaFlowCapturer.html

1037 lines
64 KiB
HTML
Raw Permalink Normal View History

2025-01-04 01:25:05 +01:00
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<title>tf::cudaFlowCapturer class | Taskflow QuickStart</title>
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:400,400i,600,600i%7CSource+Code+Pro:400,400i,600" />
<link rel="stylesheet" href="m-dark+documentation.compiled.css" />
<link rel="icon" href="favicon.ico" type="image/vnd.microsoft.icon" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="theme-color" content="#22272e" />
</head>
<body>
<header><nav id="navigation">
<div class="m-container">
<div class="m-row">
<span id="m-navbar-brand" class="m-col-t-8 m-col-m-none m-left-m">
<a href="https://taskflow.github.io"><img src="taskflow_logo.png" alt="" />Taskflow</a> <span class="m-breadcrumb">|</span> <a href="index.html" class="m-thin">QuickStart</a>
</span>
<div class="m-col-t-4 m-hide-m m-text-right m-nopadr">
<a href="#search" class="m-doc-search-icon" title="Search" onclick="return showSearch()"><svg style="height: 0.9rem;" viewBox="0 0 16 16">
<path id="m-doc-search-icon-path" d="m6 0c-3.31 0-6 2.69-6 6 0 3.31 2.69 6 6 6 1.49 0 2.85-0.541 3.89-1.44-0.0164 0.338 0.147 0.759 0.5 1.15l3.22 3.79c0.552 0.614 1.45 0.665 2 0.115 0.55-0.55 0.499-1.45-0.115-2l-3.79-3.22c-0.392-0.353-0.812-0.515-1.15-0.5 0.895-1.05 1.44-2.41 1.44-3.89 0-3.31-2.69-6-6-6zm0 1.56a4.44 4.44 0 0 1 4.44 4.44 4.44 4.44 0 0 1-4.44 4.44 4.44 4.44 0 0 1-4.44-4.44 4.44 4.44 0 0 1 4.44-4.44z"/>
</svg></a>
<a id="m-navbar-show" href="#navigation" title="Show navigation"></a>
<a id="m-navbar-hide" href="#" title="Hide navigation"></a>
</div>
<div id="m-navbar-collapse" class="m-col-t-12 m-show-m m-col-m-none m-right-m">
<div class="m-row">
<ol class="m-col-t-6 m-col-m-none">
<li><a href="pages.html">Handbook</a></li>
<li><a href="namespaces.html">Namespaces</a></li>
</ol>
<ol class="m-col-t-6 m-col-m-none" start="3">
<li><a href="annotated.html">Classes</a></li>
<li><a href="files.html">Files</a></li>
<li class="m-show-m"><a href="#search" class="m-doc-search-icon" title="Search" onclick="return showSearch()"><svg style="height: 0.9rem;" viewBox="0 0 16 16">
<use href="#m-doc-search-icon-path" />
</svg></a></li>
</ol>
</div>
</div>
</div>
</div>
</nav></header>
<main><article>
<div class="m-container m-container-inflatable">
<div class="m-row">
<div class="m-col-l-10 m-push-l-1">
<h1>
<span class="m-breadcrumb"><a href="namespacetf.html">tf</a>::<wbr/></span>cudaFlowCapturer <span class="m-thin">class</span>
<div class="m-doc-include m-code m-inverted m-text-right"><span class="cp">#include</span> <a class="cpf" href="cuda__capturer_8hpp.html">&lt;taskflow/cuda/cuda_capturer.hpp&gt;</a></div>
</h1>
<p>class to create a cudaFlow graph using stream capture</p>
<nav class="m-block m-default">
<h3>Contents</h3>
<ul>
<li>
Reference
<ul>
<li><a href="#typeless-methods">Constructors, destructors, conversion operators</a></li>
<li><a href="#pub-methods">Public functions</a></li>
</ul>
</li>
</ul>
</nav>
<p>The usage of <a href="classtf_1_1cudaFlowCapturer.html" class="m-doc">tf::<wbr />cudaFlowCapturer</a> is similar to <a href="classtf_1_1cudaFlow.html" class="m-doc">tf::<wbr />cudaFlow</a>, except users can call the method <a href="#ad0d937ae0d77239f148b66a77e35db41" class="m-doc">tf::<wbr />cudaFlowCapturer::<wbr />on</a> to capture a sequence of asynchronous CUDA operations through the given stream. The following example creates a CUDA graph that captures two kernel tasks, <code>task_1</code> and <code>task_2</code>, where <code>task_1</code> runs before <code>task_2</code>.</p><pre class="m-code"><span class="n">taskflow</span><span class="p">.</span><span class="n">emplace</span><span class="p">([](</span><span class="n">tf</span><span class="o">::</span><span class="n">cudaFlowCapturer</span><span class="o">&amp;</span><span class="w"> </span><span class="n">capturer</span><span class="p">){</span>
<span class="w"> </span><span class="c1">// capture my_kernel_1 through the given stream managed by the capturer</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">task_1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">capturer</span><span class="p">.</span><span class="n">on</span><span class="p">([</span><span class="o">&amp;</span><span class="p">](</span><span class="n">cudaStream_t</span><span class="w"> </span><span class="n">stream</span><span class="p">){</span>
<span class="w"> </span><span class="n">my_kernel_1</span><span class="o">&lt;&lt;&lt;</span><span class="n">grid_1</span><span class="p">,</span><span class="w"> </span><span class="n">block_1</span><span class="p">,</span><span class="w"> </span><span class="n">shm_size_1</span><span class="p">,</span><span class="w"> </span><span class="n">stream</span><span class="o">&gt;&gt;&gt;</span><span class="p">(</span><span class="n">my_parameters_1</span><span class="p">);</span>
<span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="c1">// capture my_kernel_2 through the given stream managed by the capturer</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">task_2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">capturer</span><span class="p">.</span><span class="n">on</span><span class="p">([</span><span class="o">&amp;</span><span class="p">](</span><span class="n">cudaStream_t</span><span class="w"> </span><span class="n">stream</span><span class="p">){</span>
<span class="w"> </span><span class="n">my_kernel_2</span><span class="o">&lt;&lt;&lt;</span><span class="n">grid_2</span><span class="p">,</span><span class="w"> </span><span class="n">block_2</span><span class="p">,</span><span class="w"> </span><span class="n">shm_size_2</span><span class="p">,</span><span class="w"> </span><span class="n">stream</span><span class="o">&gt;&gt;&gt;</span><span class="p">(</span><span class="n">my_parameters_2</span><span class="p">);</span>
<span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="n">task_1</span><span class="p">.</span><span class="n">precede</span><span class="p">(</span><span class="n">task_2</span><span class="p">);</span>
<span class="p">});</span></pre><p>Similar to <a href="classtf_1_1cudaFlow.html" class="m-doc">tf::<wbr />cudaFlow</a>, a cudaFlowCapturer is a task (<a href="classtf_1_1Task.html" class="m-doc">tf::<wbr />Task</a>) created from <a href="classtf_1_1Taskflow.html" class="m-doc">tf::<wbr />Taskflow</a> and will be run by <em>one</em> worker thread in the executor. That is, the callable that describes a cudaFlowCapturer will be executed sequentially. Inside a cudaFlow capturer task, different GPU tasks (<a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a>) may run in parallel depending on the selected optimization algorithm. By default, we use <a href="classtf_1_1cudaFlowRoundRobinOptimizer.html" class="m-doc">tf::<wbr />cudaFlowRoundRobinOptimizer</a> to transform a user-level graph into a native CUDA graph.</p><p>Please refer to <a href="GPUTaskingcudaFlowCapturer.html" class="m-doc">GPU Tasking (cudaFlowCapturer)</a> for details.</p>
<section id="typeless-methods">
<h2><a href="#typeless-methods">Constructors, destructors, conversion operators</a></h2>
<dl class="m-doc">
<dt>
<span class="m-doc-wrap-bumper"><a href="#a0ddccd6faa338047921269bfe964b774" class="m-doc">cudaFlowCapturer</a>(</span><span class="m-doc-wrap">) <span class="m-label m-flat m-info">defaulted</span></span>
</dt>
<dd>constructs a standalone <a href="classtf_1_1cudaFlowCapturer.html" class="m-doc">cudaFlowCapturer</a></dd>
<dt id="a8492d77263ab2a15cce21d4bfae5b331">
<span class="m-doc-wrap-bumper"><a href="#a8492d77263ab2a15cce21d4bfae5b331" class="m-doc-self">~cudaFlowCapturer</a>(</span><span class="m-doc-wrap">) <span class="m-label m-flat m-info">defaulted</span></span>
</dt>
<dd>destructs the <a href="classtf_1_1cudaFlowCapturer.html" class="m-doc">cudaFlowCapturer</a></dd>
<dt id="abeca6931972344a97c862c1f8d3ab9bb">
<span class="m-doc-wrap-bumper"><a href="#abeca6931972344a97c862c1f8d3ab9bb" class="m-doc-self">cudaFlowCapturer</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaFlowCapturer.html" class="m-doc">cudaFlowCapturer</a>&amp;&amp;) <span class="m-label m-flat m-info">defaulted</span></span>
</dt>
<dd>default move constructor</dd>
</dl>
</section>
<section id="pub-methods">
<h2><a href="#pub-methods">Public functions</a></h2>
<dl class="m-doc">
<dt id="a8e9d99a9bd07761156ab8445a07dbdec">
<span class="m-doc-wrap-bumper">auto <a href="#a8e9d99a9bd07761156ab8445a07dbdec" class="m-doc-self">operator=</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaFlowCapturer.html" class="m-doc">cudaFlowCapturer</a>&amp;&amp;) -&gt; <a href="classtf_1_1cudaFlowCapturer.html" class="m-doc">cudaFlowCapturer</a>&amp; <span class="m-label m-flat m-info">defaulted</span></span>
</dt>
<dd>default move assignment operator</dd>
<dt id="a3413a20a7c8229365e1ee9fb5af4af1e">
<span class="m-doc-wrap-bumper">auto <a href="#a3413a20a7c8229365e1ee9fb5af4af1e" class="m-doc-self">empty</a>(</span><span class="m-doc-wrap">) const -&gt; bool</span>
</dt>
<dd>queries the emptiness of the graph</dd>
<dt id="aeb826786f1580bae1335d94ffbeb7e02">
<span class="m-doc-wrap-bumper">auto <a href="#aeb826786f1580bae1335d94ffbeb7e02" class="m-doc-self">num_tasks</a>(</span><span class="m-doc-wrap">) const -&gt; size_t</span>
</dt>
<dd>queries the number of tasks</dd>
<dt id="a06f1176b6a5590832f0e09a049f8a622">
<span class="m-doc-wrap-bumper">void <a href="#a06f1176b6a5590832f0e09a049f8a622" class="m-doc-self">clear</a>(</span><span class="m-doc-wrap">)</span>
</dt>
<dd>clear this cudaFlow capturer</dd>
<dt id="a90d1265bcc27647906bed6e6876c9aa7">
<span class="m-doc-wrap-bumper">void <a href="#a90d1265bcc27647906bed6e6876c9aa7" class="m-doc-self">dump</a>(</span><span class="m-doc-wrap"><a href="http://en.cppreference.com/w/cpp/io/basic_ostream.html" class="m-doc-external">std::<wbr />ostream</a>&amp; os) const</span>
</dt>
<dd>dumps the cudaFlow graph into a DOT format through an output stream</dd>
<dt id="a979fe2a7bf2c361c050c0742108197c7">
<span class="m-doc-wrap-bumper">void <a href="#a979fe2a7bf2c361c050c0742108197c7" class="m-doc-self">dump_native_graph</a>(</span><span class="m-doc-wrap"><a href="http://en.cppreference.com/w/cpp/io/basic_ostream.html" class="m-doc-external">std::<wbr />ostream</a>&amp; os) const</span>
</dt>
<dd>dumps the native captured graph into a DOT format through an output stream</dd>
<dt>
<div class="m-doc-template">template&lt;typename C, std::enable_if_t&lt;std::is_invocable_r_v&lt;void, C, cudaStream_t&gt;, void&gt;* = nullptr&gt;</div>
<span class="m-doc-wrap-bumper">auto <a href="#ad0d937ae0d77239f148b66a77e35db41" class="m-doc">on</a>(</span><span class="m-doc-wrap">C&amp;&amp; callable) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
</dt>
<dd>captures a sequential CUDA operations from the given callable</dd>
<dt>
<div class="m-doc-template">template&lt;typename C, std::enable_if_t&lt;std::is_invocable_r_v&lt;void, C, cudaStream_t&gt;, void&gt;* = nullptr&gt;</div>
<span class="m-doc-wrap-bumper">void <a href="#a5215d459df3a0d7bccac1a1f2ce9d1ee" class="m-doc">on</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
C&amp;&amp; callable)</span>
</dt>
<dd>updates a capture task to another sequential CUDA operations</dd>
<dt>
<span class="m-doc-wrap-bumper">auto <a href="#a593335760ea517cea597237137ef9333" class="m-doc">noop</a>(</span><span class="m-doc-wrap">) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
</dt>
<dd>captures a no-operation task</dd>
<dt>
<span class="m-doc-wrap-bumper">void <a href="#a168a968d7f5833700fcc14a210ad39bc" class="m-doc">noop</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task)</span>
</dt>
<dd>updates a task to a no-operation task</dd>
<dt>
<span class="m-doc-wrap-bumper">auto <a href="#ae84d097cdae9e2e8ce108dea760483ed" class="m-doc">memcpy</a>(</span><span class="m-doc-wrap">void* dst,
const void* src,
size_t count) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
</dt>
<dd>copies data between host and device asynchronously through a stream</dd>
<dt>
<span class="m-doc-wrap-bumper">void <a href="#a20db64e086bf8182b350eaf5d8807af9" class="m-doc">memcpy</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
void* dst,
const void* src,
size_t count)</span>
</dt>
<dd>updates a capture task to a memcpy operation</dd>
<dt>
<div class="m-doc-template">template&lt;typename T, std::enable_if_t&lt;!std::is_same_v&lt;T, void&gt;, void&gt;* = nullptr&gt;</div>
<span class="m-doc-wrap-bumper">auto <a href="#ab70f12050e78b588f5c23d874aa4e538" class="m-doc">copy</a>(</span><span class="m-doc-wrap">T* tgt,
const T* src,
size_t num) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
</dt>
<dd>captures a copy task of typed data</dd>
<dt>
<div class="m-doc-template">template&lt;typename T, std::enable_if_t&lt;!std::is_same_v&lt;T, void&gt;, void&gt;* = nullptr&gt;</div>
<span class="m-doc-wrap-bumper">void <a href="#a605f9dfd1363e10d08cbdab29f59a52e" class="m-doc">copy</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
T* tgt,
const T* src,
size_t num)</span>
</dt>
<dd>updates a capture task to a copy operation</dd>
<dt>
<span class="m-doc-wrap-bumper">auto <a href="#a0d38965b380f940bf6cfc6667a281052" class="m-doc">memset</a>(</span><span class="m-doc-wrap">void* ptr,
int v,
size_t n) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
</dt>
<dd>initializes or sets GPU memory to the given value byte by byte</dd>
<dt>
<span class="m-doc-wrap-bumper">void <a href="#a4a7c4dd81f5e00e8a4c733417bca3205" class="m-doc">memset</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
void* ptr,
int value,
size_t n)</span>
</dt>
<dd>updates a capture task to a memset operation</dd>
<dt>
<div class="m-doc-template">template&lt;typename F, typename... ArgsT&gt;</div>
<span class="m-doc-wrap-bumper">auto <a href="#a6f06c7f6954d8d67ad89f0eddfe285e9" class="m-doc">kernel</a>(</span><span class="m-doc-wrap">dim3 g,
dim3 b,
size_t s,
F f,
ArgsT &amp;&amp; ... args) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
</dt>
<dd>captures a kernel</dd>
<dt>
<div class="m-doc-template">template&lt;typename F, typename... ArgsT&gt;</div>
<span class="m-doc-wrap-bumper">void <a href="#a850c7c028e1535db1deaecd819d82efb" class="m-doc">kernel</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
dim3 g,
dim3 b,
size_t s,
F f,
ArgsT &amp;&amp; ... args)</span>
</dt>
<dd>updates a capture task to a kernel operation</dd>
<dt>
<div class="m-doc-template">template&lt;typename C&gt;</div>
<span class="m-doc-wrap-bumper">auto <a href="#ac944c7d20056e0633ef84f1a25b52296" class="m-doc">single_task</a>(</span><span class="m-doc-wrap">C c) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
</dt>
<dd>capturers a kernel to runs the given callable with only one thread</dd>
<dt>
<div class="m-doc-template">template&lt;typename C&gt;</div>
<span class="m-doc-wrap-bumper">void <a href="#a2f7e439c336aa43781c3ef1ef0d71154" class="m-doc">single_task</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
C c)</span>
</dt>
<dd>updates a capture task to a single-threaded kernel</dd>
<dt>
<div class="m-doc-template">template&lt;typename I, typename C&gt;</div>
<span class="m-doc-wrap-bumper">auto <a href="#a0b2f1bcd59f0b42e0f823818348b4ae7" class="m-doc">for_each</a>(</span><span class="m-doc-wrap">I first,
I last,
C callable) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
</dt>
<dd>captures a kernel that applies a callable to each dereferenced element of the data array</dd>
<dt>
<div class="m-doc-template">template&lt;typename I, typename C&gt;</div>
<span class="m-doc-wrap-bumper">void <a href="#a17471b99db619c5a6b4645b3dffebe20" class="m-doc">for_each</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
I first,
I last,
C callable)</span>
</dt>
<dd>updates a capture task to a for-each kernel task</dd>
<dt>
<div class="m-doc-template">template&lt;typename I, typename C&gt;</div>
<span class="m-doc-wrap-bumper">auto <a href="#aeb877f42ee3a627c40f1c9c84e31ba3c" class="m-doc">for_each_index</a>(</span><span class="m-doc-wrap">I first,
I last,
I step,
C callable) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
</dt>
<dd>captures a kernel that applies a callable to each index in the range with the step size</dd>
<dt>
<div class="m-doc-template">template&lt;typename I, typename C&gt;</div>
<span class="m-doc-wrap-bumper">void <a href="#a05ca5fb4d005f1ff05fd1e4312fcd357" class="m-doc">for_each_index</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
I first,
I last,
I step,
C callable)</span>
</dt>
<dd>updates a capture task to a for-each-index kernel task</dd>
<dt>
<div class="m-doc-template">template&lt;typename I, typename O, typename C&gt;</div>
<span class="m-doc-wrap-bumper">auto <a href="#a99d9a86a7240ebf0767441e4ec2e14c4" class="m-doc">transform</a>(</span><span class="m-doc-wrap">I first,
I last,
O output,
C op) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
</dt>
<dd>captures a kernel that transforms an input range to an output range</dd>
<dt>
<div class="m-doc-template">template&lt;typename I, typename O, typename C&gt;</div>
<span class="m-doc-wrap-bumper">void <a href="#afa62195f91702a6f5cbdad6fefb97e4c" class="m-doc">transform</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
I first,
I last,
O output,
C op)</span>
</dt>
<dd>updates a capture task to a transform kernel task</dd>
<dt>
<div class="m-doc-template">template&lt;typename I1, typename I2, typename O, typename C&gt;</div>
<span class="m-doc-wrap-bumper">auto <a href="#ac2f527e57e8fe447b9f13ba51e9b9c48" class="m-doc">transform</a>(</span><span class="m-doc-wrap">I1 first1,
I1 last1,
I2 first2,
O output,
C op) -&gt; <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
</dt>
<dd>captures a kernel that transforms two input ranges to an output range</dd>
<dt>
<div class="m-doc-template">template&lt;typename I1, typename I2, typename O, typename C&gt;</div>
<span class="m-doc-wrap-bumper">void <a href="#a568dcdd226d7e466e2ee106fcdde5db9" class="m-doc">transform</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
I1 first1,
I1 last1,
I2 first2,
O output,
C op)</span>
</dt>
<dd>updates a capture task to a transform kernel task</dd>
<dt>
<div class="m-doc-template">template&lt;typename OPT, typename... ArgsT&gt;</div>
<span class="m-doc-wrap-bumper">auto <a href="#aa1d016b56c06cb28eabfebfdd7dbb24d" class="m-doc">make_optimizer</a>(</span><span class="m-doc-wrap">ArgsT &amp;&amp; ... args) -&gt; OPT&amp;</span>
</dt>
<dd>selects a different optimization algorithm</dd>
<dt id="a31f29772f4713848c1b0ff1a66a3dcc3">
<span class="m-doc-wrap-bumper">auto <a href="#a31f29772f4713848c1b0ff1a66a3dcc3" class="m-doc-self">capture</a>(</span><span class="m-doc-wrap">) -&gt; cudaGraph_t</span>
</dt>
<dd>captures the <a href="classtf_1_1cudaFlow.html" class="m-doc">cudaFlow</a> and turns it into a CUDA <a href="classtf_1_1Graph.html" class="m-doc">Graph</a></dd>
<dt>
<span class="m-doc-wrap-bumper">void <a href="#a952596fd7c46acee4c2459d8fe39da28" class="m-doc">run</a>(</span><span class="m-doc-wrap">cudaStream_t stream)</span>
</dt>
<dd>offloads the cudaFlowCapturer onto a GPU asynchronously via a stream</dd>
<dt id="a34be2e2d69ff66add60f5517e01bea83">
<span class="m-doc-wrap-bumper">auto <a href="#a34be2e2d69ff66add60f5517e01bea83" class="m-doc-self">native_graph</a>(</span><span class="m-doc-wrap">) -&gt; cudaGraph_t</span>
</dt>
<dd>acquires a reference to the underlying CUDA graph</dd>
<dt id="a3c03a7d269268a2a63e864fedb2fb8a6">
<span class="m-doc-wrap-bumper">auto <a href="#a3c03a7d269268a2a63e864fedb2fb8a6" class="m-doc-self">native_executable</a>(</span><span class="m-doc-wrap">) -&gt; cudaGraphExec_t</span>
</dt>
<dd>acquires a reference to the underlying CUDA graph executable</dd>
</dl>
</section>
<section>
<h2>Function documentation</h2>
<section class="m-doc-details" id="a0ddccd6faa338047921269bfe964b774"><div>
<h3>
<span class="m-doc-wrap-bumper"> tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a0ddccd6faa338047921269bfe964b774" class="m-doc-self">cudaFlowCapturer</a>(</span><span class="m-doc-wrap">) <span class="m-label m-info">defaulted</span></span></span>
</h3>
<p>constructs a standalone <a href="classtf_1_1cudaFlowCapturer.html" class="m-doc">cudaFlowCapturer</a></p>
<p>A standalone cudaFlow capturer does not go through any taskflow and can be run by the caller thread using <a href="#a952596fd7c46acee4c2459d8fe39da28" class="m-doc">tf::<wbr />cudaFlowCapturer::<wbr />run</a>.</p>
</div></section>
<section class="m-doc-details" id="ad0d937ae0d77239f148b66a77e35db41"><div>
<h3>
<div class="m-doc-template">
template&lt;typename C, std::enable_if_t&lt;std::is_invocable_r_v&lt;void, C, cudaStream_t&gt;, void&gt;* = nullptr&gt;
</div>
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#ad0d937ae0d77239f148b66a77e35db41" class="m-doc-self">on</a>(</span><span class="m-doc-wrap">C&amp;&amp; callable)</span></span>
</h3>
<p>captures a sequential CUDA operations from the given callable</p>
<table class="m-table m-fullwidth m-flat">
<thead>
<tr><th colspan="2">Template parameters</th></tr>
</thead>
<tbody>
<tr>
<td style="width: 1%">C</td>
<td>callable type constructible with <code>std::function&lt;void(cudaStream_t)&gt;</code></td>
</tr>
</tbody>
<thead>
<tr><th colspan="2">Parameters</th></tr>
</thead>
<tbody>
<tr>
<td>callable</td>
<td>a callable to capture CUDA operations with the stream</td>
</tr>
</tbody>
</table>
<p>This methods applies a stream created by the flow to capture a sequence of CUDA operations defined in the callable.</p>
</div></section>
<section class="m-doc-details" id="a5215d459df3a0d7bccac1a1f2ce9d1ee"><div>
<h3>
<div class="m-doc-template">
template&lt;typename C, std::enable_if_t&lt;std::is_invocable_r_v&lt;void, C, cudaStream_t&gt;, void&gt;* = nullptr&gt;
</div>
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a5215d459df3a0d7bccac1a1f2ce9d1ee" class="m-doc-self">on</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
C&amp;&amp; callable)</span></span>
</h3>
<p>updates a capture task to another sequential CUDA operations</p>
<p>The method is similar to <a href="#ad0d937ae0d77239f148b66a77e35db41" class="m-doc">cudaFlowCapturer::<wbr />on</a> but operates on an existing task.</p>
</div></section>
<section class="m-doc-details" id="a593335760ea517cea597237137ef9333"><div>
<h3>
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a593335760ea517cea597237137ef9333" class="m-doc-self">noop</a>(</span><span class="m-doc-wrap">)</span></span>
</h3>
<p>captures a no-operation task</p>
<table class="m-table m-fullwidth m-flat">
<tfoot>
<tr>
<th style="width: 1%">Returns</th>
<td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
</tr>
</tfoot>
</table>
<p>An empty node performs no operation during execution, but can be used for transitive ordering. For example, a phased execution graph with 2 groups of <code>n</code> nodes with a barrier between them can be represented using an empty node and <code>2*n</code> dependency edges, rather than no empty node and <code>n^2</code> dependency edges.</p>
</div></section>
<section class="m-doc-details" id="a168a968d7f5833700fcc14a210ad39bc"><div>
<h3>
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a168a968d7f5833700fcc14a210ad39bc" class="m-doc-self">noop</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task)</span></span>
</h3>
<p>updates a task to a no-operation task</p>
<p>The method is similar to <a href="#a593335760ea517cea597237137ef9333" class="m-doc">tf::<wbr />cudaFlowCapturer::<wbr />noop</a> but operates on an existing task.</p>
</div></section>
<section class="m-doc-details" id="ae84d097cdae9e2e8ce108dea760483ed"><div>
<h3>
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#ae84d097cdae9e2e8ce108dea760483ed" class="m-doc-self">memcpy</a>(</span><span class="m-doc-wrap">void* dst,
const void* src,
size_t count)</span></span>
</h3>
<p>copies data between host and device asynchronously through a stream</p>
<table class="m-table m-fullwidth m-flat">
<thead>
<tr><th colspan="2">Parameters</th></tr>
</thead>
<tbody>
<tr>
<td style="width: 1%">dst</td>
<td>destination memory address</td>
</tr>
<tr>
<td>src</td>
<td>source memory address</td>
</tr>
<tr>
<td>count</td>
<td>size in bytes to copy</td>
</tr>
</tbody>
</table>
<p>The method captures a <code>cudaMemcpyAsync</code> operation through an internal stream.</p>
</div></section>
<section class="m-doc-details" id="a20db64e086bf8182b350eaf5d8807af9"><div>
<h3>
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a20db64e086bf8182b350eaf5d8807af9" class="m-doc-self">memcpy</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
void* dst,
const void* src,
size_t count)</span></span>
</h3>
<p>updates a capture task to a memcpy operation</p>
<p>The method is similar to <a href="#ae84d097cdae9e2e8ce108dea760483ed" class="m-doc">cudaFlowCapturer::<wbr />memcpy</a> but operates on an existing task.</p>
</div></section>
<section class="m-doc-details" id="ab70f12050e78b588f5c23d874aa4e538"><div>
<h3>
<div class="m-doc-template">
template&lt;typename T, std::enable_if_t&lt;!std::is_same_v&lt;T, void&gt;, void&gt;* = nullptr&gt;
</div>
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#ab70f12050e78b588f5c23d874aa4e538" class="m-doc-self">copy</a>(</span><span class="m-doc-wrap">T* tgt,
const T* src,
size_t num)</span></span>
</h3>
<p>captures a copy task of typed data</p>
<table class="m-table m-fullwidth m-flat">
<thead>
<tr><th colspan="2">Template parameters</th></tr>
</thead>
<tbody>
<tr>
<td style="width: 1%">T</td>
<td>element type (non-void)</td>
</tr>
</tbody>
<thead>
<tr><th colspan="2">Parameters</th></tr>
</thead>
<tbody>
<tr>
<td>tgt</td>
<td>pointer to the target memory block</td>
</tr>
<tr>
<td>src</td>
<td>pointer to the source memory block</td>
</tr>
<tr>
<td>num</td>
<td>number of elements to copy</td>
</tr>
</tbody>
<tfoot>
<tr>
<th>Returns</th>
<td><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> handle</td>
</tr>
</tfoot>
</table>
<p>A copy task transfers <code>num*sizeof(T)</code> bytes of data from a source location to a target location. Direction can be arbitrary among CPUs and GPUs.</p>
</div></section>
<section class="m-doc-details" id="a605f9dfd1363e10d08cbdab29f59a52e"><div>
<h3>
<div class="m-doc-template">
template&lt;typename T, std::enable_if_t&lt;!std::is_same_v&lt;T, void&gt;, void&gt;* = nullptr&gt;
</div>
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a605f9dfd1363e10d08cbdab29f59a52e" class="m-doc-self">copy</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
T* tgt,
const T* src,
size_t num)</span></span>
</h3>
<p>updates a capture task to a copy operation</p>
<p>The method is similar to <a href="#ab70f12050e78b588f5c23d874aa4e538" class="m-doc">cudaFlowCapturer::<wbr />copy</a> but operates on an existing task.</p>
</div></section>
<section class="m-doc-details" id="a0d38965b380f940bf6cfc6667a281052"><div>
<h3>
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a0d38965b380f940bf6cfc6667a281052" class="m-doc-self">memset</a>(</span><span class="m-doc-wrap">void* ptr,
int v,
size_t n)</span></span>
</h3>
<p>initializes or sets GPU memory to the given value byte by byte</p>
<table class="m-table m-fullwidth m-flat">
<thead>
<tr><th colspan="2">Parameters</th></tr>
</thead>
<tbody>
<tr>
<td style="width: 1%">ptr</td>
<td>pointer to GPU memory</td>
</tr>
<tr>
<td>v</td>
<td>value to set for each byte of the specified memory</td>
</tr>
<tr>
<td>n</td>
<td>size in bytes to set</td>
</tr>
</tbody>
</table>
<p>The method captures a <code>cudaMemsetAsync</code> operation through an internal stream to fill the first <code>count</code> bytes of the memory area pointed to by <code>devPtr</code> with the constant byte value <code>value</code>.</p>
</div></section>
<section class="m-doc-details" id="a4a7c4dd81f5e00e8a4c733417bca3205"><div>
<h3>
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a4a7c4dd81f5e00e8a4c733417bca3205" class="m-doc-self">memset</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
void* ptr,
int value,
size_t n)</span></span>
</h3>
<p>updates a capture task to a memset operation</p>
<p>The method is similar to <a href="#a0d38965b380f940bf6cfc6667a281052" class="m-doc">cudaFlowCapturer::<wbr />memset</a> but operates on an existing task.</p>
</div></section>
<section class="m-doc-details" id="a6f06c7f6954d8d67ad89f0eddfe285e9"><div>
<h3>
<div class="m-doc-template">
template&lt;typename F, typename... ArgsT&gt;
</div>
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a6f06c7f6954d8d67ad89f0eddfe285e9" class="m-doc-self">kernel</a>(</span><span class="m-doc-wrap">dim3 g,
dim3 b,
size_t s,
F f,
ArgsT &amp;&amp; ... args)</span></span>
</h3>
<p>captures a kernel</p>
<table class="m-table m-fullwidth m-flat">
<thead>
<tr><th colspan="2">Template parameters</th></tr>
</thead>
<tbody>
<tr>
<td style="width: 1%">F</td>
<td>kernel function type</td>
</tr>
<tr>
<td>ArgsT</td>
<td>kernel function parameters type</td>
</tr>
</tbody>
<thead>
<tr><th colspan="2">Parameters</th></tr>
</thead>
<tbody>
<tr>
<td>g</td>
<td>configured grid</td>
</tr>
<tr>
<td>b</td>
<td>configured block</td>
</tr>
<tr>
<td>s</td>
<td>configured shared memory size in bytes</td>
</tr>
<tr>
<td>f</td>
<td>kernel function</td>
</tr>
<tr>
<td>args</td>
<td>arguments to forward to the kernel function by copy</td>
</tr>
</tbody>
<tfoot>
<tr>
<th>Returns</th>
<td><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> handle</td>
</tr>
</tfoot>
</table>
</div></section>
<section class="m-doc-details" id="a850c7c028e1535db1deaecd819d82efb"><div>
<h3>
<div class="m-doc-template">
template&lt;typename F, typename... ArgsT&gt;
</div>
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a850c7c028e1535db1deaecd819d82efb" class="m-doc-self">kernel</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
dim3 g,
dim3 b,
size_t s,
F f,
ArgsT &amp;&amp; ... args)</span></span>
</h3>
<p>updates a capture task to a kernel operation</p>
<p>The method is similar to <a href="#a6f06c7f6954d8d67ad89f0eddfe285e9" class="m-doc">cudaFlowCapturer::<wbr />kernel</a> but operates on an existing task.</p>
</div></section>
<section class="m-doc-details" id="ac944c7d20056e0633ef84f1a25b52296"><div>
<h3>
<div class="m-doc-template">
template&lt;typename C&gt;
</div>
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#ac944c7d20056e0633ef84f1a25b52296" class="m-doc-self">single_task</a>(</span><span class="m-doc-wrap">C c)</span></span>
</h3>
<p>capturers a kernel to runs the given callable with only one thread</p>
<table class="m-table m-fullwidth m-flat">
<thead>
<tr><th colspan="2">Template parameters</th></tr>
</thead>
<tbody>
<tr>
<td style="width: 1%">C</td>
<td>callable type</td>
</tr>
</tbody>
<thead>
<tr><th colspan="2">Parameters</th></tr>
</thead>
<tbody>
<tr>
<td>c</td>
<td>callable to run by a single kernel thread</td>
</tr>
</tbody>
</table>
</div></section>
<section class="m-doc-details" id="a2f7e439c336aa43781c3ef1ef0d71154"><div>
<h3>
<div class="m-doc-template">
template&lt;typename C&gt;
</div>
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a2f7e439c336aa43781c3ef1ef0d71154" class="m-doc-self">single_task</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
C c)</span></span>
</h3>
<p>updates a capture task to a single-threaded kernel</p>
<p>This method is similar to <a href="#ac944c7d20056e0633ef84f1a25b52296" class="m-doc">cudaFlowCapturer::<wbr />single_task</a> but operates on an existing task.</p>
</div></section>
<section class="m-doc-details" id="a0b2f1bcd59f0b42e0f823818348b4ae7"><div>
<h3>
<div class="m-doc-template">
template&lt;typename I, typename C&gt;
</div>
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a0b2f1bcd59f0b42e0f823818348b4ae7" class="m-doc-self">for_each</a>(</span><span class="m-doc-wrap">I first,
I last,
C callable)</span></span>
</h3>
<p>captures a kernel that applies a callable to each dereferenced element of the data array</p>
<table class="m-table m-fullwidth m-flat">
<thead>
<tr><th colspan="2">Template parameters</th></tr>
</thead>
<tbody>
<tr>
<td style="width: 1%">I</td>
<td>iterator type</td>
</tr>
<tr>
<td>C</td>
<td>callable type</td>
</tr>
</tbody>
<thead>
<tr><th colspan="2">Parameters</th></tr>
</thead>
<tbody>
<tr>
<td>first</td>
<td>iterator to the beginning</td>
</tr>
<tr>
<td>last</td>
<td>iterator to the end</td>
</tr>
<tr>
<td>callable</td>
<td>a callable object to apply to the dereferenced iterator</td>
</tr>
</tbody>
<tfoot>
<tr>
<th>Returns</th>
<td><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> handle</td>
</tr>
</tfoot>
</table>
<p>This method is equivalent to the parallel execution of the following loop on a GPU:</p><pre class="m-code"><span class="k">for</span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">itr</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">first</span><span class="p">;</span><span class="w"> </span><span class="n">itr</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">last</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">callable</span><span class="p">(</span><span class="o">*</span><span class="n">itr</span><span class="p">);</span>
<span class="p">}</span></pre>
</div></section>
<section class="m-doc-details" id="a17471b99db619c5a6b4645b3dffebe20"><div>
<h3>
<div class="m-doc-template">
template&lt;typename I, typename C&gt;
</div>
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a17471b99db619c5a6b4645b3dffebe20" class="m-doc-self">for_each</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
I first,
I last,
C callable)</span></span>
</h3>
<p>updates a capture task to a for-each kernel task</p>
<p>This method is similar to <a href="#a0b2f1bcd59f0b42e0f823818348b4ae7" class="m-doc">cudaFlowCapturer::<wbr />for_each</a> but operates on an existing task.</p>
</div></section>
<section class="m-doc-details" id="aeb877f42ee3a627c40f1c9c84e31ba3c"><div>
<h3>
<div class="m-doc-template">
template&lt;typename I, typename C&gt;
</div>
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#aeb877f42ee3a627c40f1c9c84e31ba3c" class="m-doc-self">for_each_index</a>(</span><span class="m-doc-wrap">I first,
I last,
I step,
C callable)</span></span>
</h3>
<p>captures a kernel that applies a callable to each index in the range with the step size</p>
<table class="m-table m-fullwidth m-flat">
<thead>
<tr><th colspan="2">Template parameters</th></tr>
</thead>
<tbody>
<tr>
<td style="width: 1%">I</td>
<td>index type</td>
</tr>
<tr>
<td>C</td>
<td>callable type</td>
</tr>
</tbody>
<thead>
<tr><th colspan="2">Parameters</th></tr>
</thead>
<tbody>
<tr>
<td>first</td>
<td>beginning index</td>
</tr>
<tr>
<td>last</td>
<td>last index</td>
</tr>
<tr>
<td>step</td>
<td>step size</td>
</tr>
<tr>
<td>callable</td>
<td>the callable to apply to each element in the data array</td>
</tr>
</tbody>
<tfoot>
<tr>
<th>Returns</th>
<td><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> handle</td>
</tr>
</tfoot>
</table>
<p>This method is equivalent to the parallel execution of the following loop on a GPU:</p><pre class="m-code"><span class="c1">// step is positive [first, last)</span>
<span class="k">for</span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">i</span><span class="o">=</span><span class="n">first</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">&lt;</span><span class="n">last</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">+=</span><span class="n">step</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">callable</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// step is negative [first, last)</span>
<span class="k">for</span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">i</span><span class="o">=</span><span class="n">first</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">&gt;</span><span class="n">last</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">+=</span><span class="n">step</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">callable</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>
<span class="p">}</span></pre>
</div></section>
<section class="m-doc-details" id="a05ca5fb4d005f1ff05fd1e4312fcd357"><div>
<h3>
<div class="m-doc-template">
template&lt;typename I, typename C&gt;
</div>
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a05ca5fb4d005f1ff05fd1e4312fcd357" class="m-doc-self">for_each_index</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
I first,
I last,
I step,
C callable)</span></span>
</h3>
<p>updates a capture task to a for-each-index kernel task</p>
<p>This method is similar to <a href="#aeb877f42ee3a627c40f1c9c84e31ba3c" class="m-doc">cudaFlowCapturer::<wbr />for_each_index</a> but operates on an existing task.</p>
</div></section>
<section class="m-doc-details" id="a99d9a86a7240ebf0767441e4ec2e14c4"><div>
<h3>
<div class="m-doc-template">
template&lt;typename I, typename O, typename C&gt;
</div>
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a99d9a86a7240ebf0767441e4ec2e14c4" class="m-doc-self">transform</a>(</span><span class="m-doc-wrap">I first,
I last,
O output,
C op)</span></span>
</h3>
<p>captures a kernel that transforms an input range to an output range</p>
<table class="m-table m-fullwidth m-flat">
<thead>
<tr><th colspan="2">Template parameters</th></tr>
</thead>
<tbody>
<tr>
<td style="width: 1%">I</td>
<td>input iterator type</td>
</tr>
<tr>
<td>O</td>
<td>output iterator type</td>
</tr>
<tr>
<td>C</td>
<td>unary operator type</td>
</tr>
</tbody>
<thead>
<tr><th colspan="2">Parameters</th></tr>
</thead>
<tbody>
<tr>
<td>first</td>
<td>iterator to the beginning of the input range</td>
</tr>
<tr>
<td>last</td>
<td>iterator to the end of the input range</td>
</tr>
<tr>
<td>output</td>
<td>iterator to the beginning of the output range</td>
</tr>
<tr>
<td>op</td>
<td>unary operator to apply to transform each item in the range</td>
</tr>
</tbody>
<tfoot>
<tr>
<th>Returns</th>
<td><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> handle</td>
</tr>
</tfoot>
</table>
<p>This method is equivalent to the parallel execution of the following loop on a GPU:</p><pre class="m-code"><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">first</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">last</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">*</span><span class="n">output</span><span class="o">++</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">op</span><span class="p">(</span><span class="o">*</span><span class="n">first</span><span class="o">++</span><span class="p">);</span>
<span class="p">}</span></pre>
</div></section>
<section class="m-doc-details" id="afa62195f91702a6f5cbdad6fefb97e4c"><div>
<h3>
<div class="m-doc-template">
template&lt;typename I, typename O, typename C&gt;
</div>
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#afa62195f91702a6f5cbdad6fefb97e4c" class="m-doc-self">transform</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
I first,
I last,
O output,
C op)</span></span>
</h3>
<p>updates a capture task to a transform kernel task</p>
<p>This method is similar to <a href="#a99d9a86a7240ebf0767441e4ec2e14c4" class="m-doc">cudaFlowCapturer::<wbr />transform</a> but operates on an existing task.</p>
</div></section>
<section class="m-doc-details" id="ac2f527e57e8fe447b9f13ba51e9b9c48"><div>
<h3>
<div class="m-doc-template">
template&lt;typename I1, typename I2, typename O, typename C&gt;
</div>
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#ac2f527e57e8fe447b9f13ba51e9b9c48" class="m-doc-self">transform</a>(</span><span class="m-doc-wrap">I1 first1,
I1 last1,
I2 first2,
O output,
C op)</span></span>
</h3>
<p>captures a kernel that transforms two input ranges to an output range</p>
<table class="m-table m-fullwidth m-flat">
<thead>
<tr><th colspan="2">Template parameters</th></tr>
</thead>
<tbody>
<tr>
<td style="width: 1%">I1</td>
<td>first input iterator type</td>
</tr>
<tr>
<td>I2</td>
<td>second input iterator type</td>
</tr>
<tr>
<td>O</td>
<td>output iterator type</td>
</tr>
<tr>
<td>C</td>
<td>unary operator type</td>
</tr>
</tbody>
<thead>
<tr><th colspan="2">Parameters</th></tr>
</thead>
<tbody>
<tr>
<td>first1</td>
<td>iterator to the beginning of the input range</td>
</tr>
<tr>
<td>last1</td>
<td>iterator to the end of the input range</td>
</tr>
<tr>
<td>first2</td>
<td>iterato</td>
</tr>
<tr>
<td>output</td>
<td>iterator to the beginning of the output range</td>
</tr>
<tr>
<td>op</td>
<td>binary operator to apply to transform each pair of items in the two input ranges</td>
</tr>
</tbody>
<tfoot>
<tr>
<th>Returns</th>
<td><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> handle</td>
</tr>
</tfoot>
</table>
<p>This method is equivalent to the parallel execution of the following loop on a GPU:</p><pre class="m-code"><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">first1</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">last1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">*</span><span class="n">output</span><span class="o">++</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">op</span><span class="p">(</span><span class="o">*</span><span class="n">first1</span><span class="o">++</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="n">first2</span><span class="o">++</span><span class="p">);</span>
<span class="p">}</span></pre>
</div></section>
<section class="m-doc-details" id="a568dcdd226d7e466e2ee106fcdde5db9"><div>
<h3>
<div class="m-doc-template">
template&lt;typename I1, typename I2, typename O, typename C&gt;
</div>
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a568dcdd226d7e466e2ee106fcdde5db9" class="m-doc-self">transform</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
I1 first1,
I1 last1,
I2 first2,
O output,
C op)</span></span>
</h3>
<p>updates a capture task to a transform kernel task</p>
<p>This method is similar to <a href="#a99d9a86a7240ebf0767441e4ec2e14c4" class="m-doc">cudaFlowCapturer::<wbr />transform</a> but operates on an existing task.</p>
</div></section>
<section class="m-doc-details" id="aa1d016b56c06cb28eabfebfdd7dbb24d"><div>
<h3>
<div class="m-doc-template">
template&lt;typename OPT, typename... ArgsT&gt;
</div>
<span class="m-doc-wrap-bumper">OPT&amp; tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#aa1d016b56c06cb28eabfebfdd7dbb24d" class="m-doc-self">make_optimizer</a>(</span><span class="m-doc-wrap">ArgsT &amp;&amp; ... args)</span></span>
</h3>
<p>selects a different optimization algorithm</p>
<table class="m-table m-fullwidth m-flat">
<thead>
<tr><th colspan="2">Template parameters</th></tr>
</thead>
<tbody>
<tr>
<td style="width: 1%">OPT</td>
<td>optimizer type</td>
</tr>
<tr>
<td>ArgsT</td>
<td>arguments types</td>
</tr>
</tbody>
<thead>
<tr><th colspan="2">Parameters</th></tr>
</thead>
<tbody>
<tr>
<td>args</td>
<td>arguments to forward to construct the optimizer</td>
</tr>
</tbody>
<tfoot>
<tr>
<th>Returns</th>
<td>a reference to the optimizer</td>
</tr>
</tfoot>
</table>
<p>We currently supports the following optimization algorithms to capture a user-described cudaFlow:</p><ul><li><a href="classtf_1_1cudaFlowSequentialOptimizer.html" class="m-doc">tf::<wbr />cudaFlowSequentialOptimizer</a></li><li><a href="classtf_1_1cudaFlowRoundRobinOptimizer.html" class="m-doc">tf::<wbr />cudaFlowRoundRobinOptimizer</a></li><li><a href="classtf_1_1cudaFlowLinearOptimizer.html" class="m-doc">tf::<wbr />cudaFlowLinearOptimizer</a></li></ul><p>By default, <a href="classtf_1_1cudaFlowCapturer.html" class="m-doc">tf::<wbr />cudaFlowCapturer</a> uses the round-robin optimization algorithm with four streams to transform a user-level graph into a native CUDA graph.</p>
</div></section>
<section class="m-doc-details" id="a952596fd7c46acee4c2459d8fe39da28"><div>
<h3>
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlowCapturer::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a952596fd7c46acee4c2459d8fe39da28" class="m-doc-self">run</a>(</span><span class="m-doc-wrap">cudaStream_t stream)</span></span>
</h3>
<p>offloads the cudaFlowCapturer onto a GPU asynchronously via a stream</p>
<table class="m-table m-fullwidth m-flat">
<thead>
<tr><th colspan="2">Parameters</th></tr>
</thead>
<tbody>
<tr>
<td style="width: 1%">stream</td>
<td>stream for performing this operation</td>
</tr>
</tbody>
</table>
<p>Offloads the present cudaFlowCapturer onto a GPU asynchronously via the given stream.</p><p>An offloaded cudaFlowCapturer forces the underlying graph to be instantiated. After the instantiation, you should not modify the graph topology but update node parameters.</p>
</div></section>
</section>
</div>
</div>
</div>
</article></main>
<div class="m-doc-search" id="search">
<a href="#!" onclick="return hideSearch()"></a>
<div class="m-container">
<div class="m-row">
<div class="m-col-m-8 m-push-m-2">
<div class="m-doc-search-header m-text m-small">
<div><span class="m-label m-default">Tab</span> / <span class="m-label m-default">T</span> to search, <span class="m-label m-default">Esc</span> to close</div>
<div id="search-symbolcount">&hellip;</div>
</div>
<div class="m-doc-search-content">
<form>
<input type="search" name="q" id="search-input" placeholder="Loading &hellip;" disabled="disabled" autofocus="autofocus" autocomplete="off" spellcheck="false" />
</form>
<noscript class="m-text m-danger m-text-center">Unlike everything else in the docs, the search functionality <em>requires</em> JavaScript.</noscript>
<div id="search-help" class="m-text m-dim m-text-center">
<p class="m-noindent">Search for symbols, directories, files, pages or
modules. You can omit any prefix from the symbol or file path; adding a
<code>:</code> or <code>/</code> suffix lists all members of given symbol or
directory.</p>
<p class="m-noindent">Use <span class="m-label m-dim">&darr;</span>
/ <span class="m-label m-dim">&uarr;</span> to navigate through the list,
<span class="m-label m-dim">Enter</span> to go.
<span class="m-label m-dim">Tab</span> autocompletes common prefix, you can
copy a link to the result using <span class="m-label m-dim"></span>
<span class="m-label m-dim">L</span> while <span class="m-label m-dim"></span>
<span class="m-label m-dim">M</span> produces a Markdown link.</p>
</div>
<div id="search-notfound" class="m-text m-warning m-text-center">Sorry, nothing was found.</div>
<ul id="search-results"></ul>
</div>
</div>
</div>
</div>
</div>
<script src="search-v2.js"></script>
<script src="searchdata-v2.js" async="async"></script>
<footer><nav>
<div class="m-container">
<div class="m-row">
<div class="m-col-l-10 m-push-l-1">
<p>Taskflow handbook is part of the <a href="https://taskflow.github.io">Taskflow project</a>, copyright © <a href="https://tsung-wei-huang.github.io/">Dr. Tsung-Wei Huang</a>, 2018&ndash;2024.<br />Generated by <a href="https://doxygen.org/">Doxygen</a> 1.9.1 and <a href="https://mcss.mosra.cz/">m.css</a>.</p>
</div>
</div>
</div>
</nav></footer>
</body>
</html>