1209 lines
77 KiB
HTML
1209 lines
77 KiB
HTML
<!DOCTYPE html>
|
|
<html lang="en">
|
|
<head>
|
|
<meta charset="UTF-8" />
|
|
<title>tf::cudaFlow class | Taskflow QuickStart</title>
|
|
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:400,400i,600,600i%7CSource+Code+Pro:400,400i,600" />
|
|
<link rel="stylesheet" href="m-dark+documentation.compiled.css" />
|
|
<link rel="icon" href="favicon.ico" type="image/vnd.microsoft.icon" />
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
|
<meta name="theme-color" content="#22272e" />
|
|
</head>
|
|
<body>
|
|
<header><nav id="navigation">
|
|
<div class="m-container">
|
|
<div class="m-row">
|
|
<span id="m-navbar-brand" class="m-col-t-8 m-col-m-none m-left-m">
|
|
<a href="https://taskflow.github.io"><img src="taskflow_logo.png" alt="" />Taskflow</a> <span class="m-breadcrumb">|</span> <a href="index.html" class="m-thin">QuickStart</a>
|
|
</span>
|
|
<div class="m-col-t-4 m-hide-m m-text-right m-nopadr">
|
|
<a href="#search" class="m-doc-search-icon" title="Search" onclick="return showSearch()"><svg style="height: 0.9rem;" viewBox="0 0 16 16">
|
|
<path id="m-doc-search-icon-path" d="m6 0c-3.31 0-6 2.69-6 6 0 3.31 2.69 6 6 6 1.49 0 2.85-0.541 3.89-1.44-0.0164 0.338 0.147 0.759 0.5 1.15l3.22 3.79c0.552 0.614 1.45 0.665 2 0.115 0.55-0.55 0.499-1.45-0.115-2l-3.79-3.22c-0.392-0.353-0.812-0.515-1.15-0.5 0.895-1.05 1.44-2.41 1.44-3.89 0-3.31-2.69-6-6-6zm0 1.56a4.44 4.44 0 0 1 4.44 4.44 4.44 4.44 0 0 1-4.44 4.44 4.44 4.44 0 0 1-4.44-4.44 4.44 4.44 0 0 1 4.44-4.44z"/>
|
|
</svg></a>
|
|
<a id="m-navbar-show" href="#navigation" title="Show navigation"></a>
|
|
<a id="m-navbar-hide" href="#" title="Hide navigation"></a>
|
|
</div>
|
|
<div id="m-navbar-collapse" class="m-col-t-12 m-show-m m-col-m-none m-right-m">
|
|
<div class="m-row">
|
|
<ol class="m-col-t-6 m-col-m-none">
|
|
<li><a href="pages.html">Handbook</a></li>
|
|
<li><a href="namespaces.html">Namespaces</a></li>
|
|
</ol>
|
|
<ol class="m-col-t-6 m-col-m-none" start="3">
|
|
<li><a href="annotated.html">Classes</a></li>
|
|
<li><a href="files.html">Files</a></li>
|
|
<li class="m-show-m"><a href="#search" class="m-doc-search-icon" title="Search" onclick="return showSearch()"><svg style="height: 0.9rem;" viewBox="0 0 16 16">
|
|
<use href="#m-doc-search-icon-path" />
|
|
</svg></a></li>
|
|
</ol>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
</nav></header>
|
|
<main><article>
|
|
<div class="m-container m-container-inflatable">
|
|
<div class="m-row">
|
|
<div class="m-col-l-10 m-push-l-1">
|
|
<h1>
|
|
<span class="m-breadcrumb"><a href="namespacetf.html">tf</a>::<wbr/></span>cudaFlow <span class="m-thin">class</span>
|
|
<div class="m-doc-include m-code m-inverted m-text-right"><span class="cp">#include</span> <a class="cpf" href="cudaflow_8hpp.html"><taskflow/cuda/cudaflow.hpp></a></div>
|
|
</h1>
|
|
<p>class to create a cudaFlow task dependency graph</p>
|
|
<nav class="m-block m-default">
|
|
<h3>Contents</h3>
|
|
<ul>
|
|
<li>
|
|
Reference
|
|
<ul>
|
|
<li><a href="#typeless-methods">Constructors, destructors, conversion operators</a></li>
|
|
<li><a href="#pub-methods">Public functions</a></li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</nav>
|
|
<p>A cudaFlow is a high-level interface over CUDA <a href="classtf_1_1Graph.html" class="m-doc">Graph</a> to perform GPU operations using the task dependency graph model. The class provides a set of methods for creating and launch different tasks on one or multiple CUDA devices, for instance, kernel tasks, data transfer tasks, and memory operation tasks. The following example creates a cudaFlow of two kernel tasks, <code>task1</code> and <code>task2</code>, where <code>task1</code> runs before <code>task2</code>.</p><pre class="m-code"><span class="n">tf</span><span class="o">::</span><span class="n">Taskflow</span><span class="w"> </span><span class="n">taskflow</span><span class="p">;</span>
|
|
<span class="n">tf</span><span class="o">::</span><span class="n">Executor</span><span class="w"> </span><span class="n">executor</span><span class="p">;</span>
|
|
|
|
<span class="n">taskflow</span><span class="p">.</span><span class="n">emplace</span><span class="p">([</span><span class="o">&</span><span class="p">](</span><span class="n">tf</span><span class="o">::</span><span class="n">cudaFlow</span><span class="o">&</span><span class="w"> </span><span class="n">cf</span><span class="p">){</span>
|
|
<span class="w"> </span><span class="c1">// create two kernel tasks</span>
|
|
<span class="w"> </span><span class="n">tf</span><span class="o">::</span><span class="n">cudaTask</span><span class="w"> </span><span class="n">task1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cf</span><span class="p">.</span><span class="n">kernel</span><span class="p">(</span><span class="n">grid1</span><span class="p">,</span><span class="w"> </span><span class="n">block1</span><span class="p">,</span><span class="w"> </span><span class="n">shm_size1</span><span class="p">,</span><span class="w"> </span><span class="n">kernel1</span><span class="p">,</span><span class="w"> </span><span class="n">args1</span><span class="p">);</span>
|
|
<span class="w"> </span><span class="n">tf</span><span class="o">::</span><span class="n">cudaTask</span><span class="w"> </span><span class="n">task2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cf</span><span class="p">.</span><span class="n">kernel</span><span class="p">(</span><span class="n">grid2</span><span class="p">,</span><span class="w"> </span><span class="n">block2</span><span class="p">,</span><span class="w"> </span><span class="n">shm_size2</span><span class="p">,</span><span class="w"> </span><span class="n">kernel2</span><span class="p">,</span><span class="w"> </span><span class="n">args2</span><span class="p">);</span>
|
|
|
|
<span class="w"> </span><span class="c1">// kernel1 runs before kernel2</span>
|
|
<span class="w"> </span><span class="n">task1</span><span class="p">.</span><span class="n">precede</span><span class="p">(</span><span class="n">task2</span><span class="p">);</span>
|
|
<span class="p">});</span>
|
|
|
|
<span class="n">executor</span><span class="p">.</span><span class="n">run</span><span class="p">(</span><span class="n">taskflow</span><span class="p">).</span><span class="n">wait</span><span class="p">();</span></pre><p>A cudaFlow is a task (<a href="classtf_1_1Task.html" class="m-doc">tf::<wbr />Task</a>) created from <a href="classtf_1_1Taskflow.html" class="m-doc">tf::<wbr />Taskflow</a> and will be run by <em>one</em> worker thread in the executor. That is, the callable that describes a cudaFlow will be executed sequentially. Inside a cudaFlow task, different GPU tasks (<a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a>) may run in parallel scheduled by the CUDA runtime.</p><p>Please refer to <a href="GPUTaskingcudaFlow.html" class="m-doc">GPU Tasking (cudaFlow)</a> for details.</p>
|
|
<section id="typeless-methods">
|
|
<h2><a href="#typeless-methods">Constructors, destructors, conversion operators</a></h2>
|
|
<dl class="m-doc">
|
|
<dt id="ad4c3e001db151486c8479151a2108d37">
|
|
<span class="m-doc-wrap-bumper"><a href="#ad4c3e001db151486c8479151a2108d37" class="m-doc-self">cudaFlow</a>(</span><span class="m-doc-wrap">)</span>
|
|
</dt>
|
|
<dd>constructs a cudaFlow</dd>
|
|
<dt id="a828c3ab275521672e4ec6c78d3a9ee62">
|
|
<span class="m-doc-wrap-bumper"><a href="#a828c3ab275521672e4ec6c78d3a9ee62" class="m-doc-self">~cudaFlow</a>(</span><span class="m-doc-wrap">) <span class="m-label m-flat m-info">defaulted</span></span>
|
|
</dt>
|
|
<dd>destroys the cudaFlow and its associated native CUDA graph and executable graph</dd>
|
|
<dt id="a677a4b510abee2ac665193389b20f725">
|
|
<span class="m-doc-wrap-bumper"><a href="#a677a4b510abee2ac665193389b20f725" class="m-doc-self">cudaFlow</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaFlow.html" class="m-doc">cudaFlow</a>&&) <span class="m-label m-flat m-info">defaulted</span></span>
|
|
</dt>
|
|
<dd>default move constructor</dd>
|
|
</dl>
|
|
</section>
|
|
<section id="pub-methods">
|
|
<h2><a href="#pub-methods">Public functions</a></h2>
|
|
<dl class="m-doc">
|
|
<dt id="a74beef874538193ac0df81a180faa742">
|
|
<span class="m-doc-wrap-bumper">auto <a href="#a74beef874538193ac0df81a180faa742" class="m-doc-self">operator=</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaFlow.html" class="m-doc">cudaFlow</a>&&) -> <a href="classtf_1_1cudaFlow.html" class="m-doc">cudaFlow</a>& <span class="m-label m-flat m-info">defaulted</span></span>
|
|
</dt>
|
|
<dd>default move assignment operator</dd>
|
|
<dt id="a1926f45a038d8faa9c1b1ee43fd29a93">
|
|
<span class="m-doc-wrap-bumper">auto <a href="#a1926f45a038d8faa9c1b1ee43fd29a93" class="m-doc-self">empty</a>(</span><span class="m-doc-wrap">) const -> bool</span>
|
|
</dt>
|
|
<dd>queries the emptiness of the graph</dd>
|
|
<dt id="ae6560c27d249af7e4b8b921388f5e1e2">
|
|
<span class="m-doc-wrap-bumper">auto <a href="#ae6560c27d249af7e4b8b921388f5e1e2" class="m-doc-self">num_tasks</a>(</span><span class="m-doc-wrap">) const -> size_t</span>
|
|
</dt>
|
|
<dd>queries the number of tasks</dd>
|
|
<dt id="aad726dfe21e9719d96c65530a56d9951">
|
|
<span class="m-doc-wrap-bumper">void <a href="#aad726dfe21e9719d96c65530a56d9951" class="m-doc-self">clear</a>(</span><span class="m-doc-wrap">)</span>
|
|
</dt>
|
|
<dd>clears the cudaFlow object</dd>
|
|
<dt id="a7f97b68fa7c889db49b26aa71a46a7cf">
|
|
<span class="m-doc-wrap-bumper">void <a href="#a7f97b68fa7c889db49b26aa71a46a7cf" class="m-doc-self">dump</a>(</span><span class="m-doc-wrap"><a href="http://en.cppreference.com/w/cpp/io/basic_ostream.html" class="m-doc-external">std::<wbr />ostream</a>& os) const</span>
|
|
</dt>
|
|
<dd>dumps the cudaFlow graph into a DOT format through an output stream</dd>
|
|
<dt>
|
|
<span class="m-doc-wrap-bumper">void <a href="#a43507f21eb9cb77667ffe0ac7e6ae635" class="m-doc">dump_native_graph</a>(</span><span class="m-doc-wrap"><a href="http://en.cppreference.com/w/cpp/io/basic_ostream.html" class="m-doc-external">std::<wbr />ostream</a>& os) const</span>
|
|
</dt>
|
|
<dd>dumps the native CUDA graph into a DOT format through an output stream</dd>
|
|
<dt>
|
|
<span class="m-doc-wrap-bumper">auto <a href="#a30b2e107cb2c90a37f467b28d1b42a74" class="m-doc">noop</a>(</span><span class="m-doc-wrap">) -> <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
|
|
</dt>
|
|
<dd>creates a no-operation task</dd>
|
|
<dt>
|
|
<div class="m-doc-template">template<typename C></div>
|
|
<span class="m-doc-wrap-bumper">auto <a href="#a060e1c96111c2134ce0f896420a42cd0" class="m-doc">host</a>(</span><span class="m-doc-wrap">C&& callable) -> <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
|
|
</dt>
|
|
<dd>creates a host task that runs a callable on the host</dd>
|
|
<dt>
|
|
<div class="m-doc-template">template<typename C></div>
|
|
<span class="m-doc-wrap-bumper">void <a href="#a02e4e5cf7d03b9d087d6fbf54eb86bbf" class="m-doc">host</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
C&& callable)</span>
|
|
</dt>
|
|
<dd>updates parameters of a host task</dd>
|
|
<dt>
|
|
<div class="m-doc-template">template<typename F, typename... ArgsT></div>
|
|
<span class="m-doc-wrap-bumper">auto <a href="#a68f666503d13a7b80fb7399fb2f0c153" class="m-doc">kernel</a>(</span><span class="m-doc-wrap">dim3 g,
|
|
dim3 b,
|
|
size_t s,
|
|
F f,
|
|
ArgsT... args) -> <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
|
|
</dt>
|
|
<dd>creates a kernel task</dd>
|
|
<dt>
|
|
<div class="m-doc-template">template<typename F, typename... ArgsT></div>
|
|
<span class="m-doc-wrap-bumper">void <a href="#a821117dd640807bb7ec114b46888dfb1" class="m-doc">kernel</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
dim3 g,
|
|
dim3 b,
|
|
size_t shm,
|
|
F f,
|
|
ArgsT... args)</span>
|
|
</dt>
|
|
<dd>updates parameters of a kernel task</dd>
|
|
<dt>
|
|
<span class="m-doc-wrap-bumper">auto <a href="#a079ca65da35301e5aafd45878a19e9d2" class="m-doc">memset</a>(</span><span class="m-doc-wrap">void* dst,
|
|
int v,
|
|
size_t count) -> <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
|
|
</dt>
|
|
<dd>creates a memset task that fills untyped data with a byte value</dd>
|
|
<dt>
|
|
<span class="m-doc-wrap-bumper">void <a href="#a082505f0fec89f65808421cdc737fb17" class="m-doc">memset</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
void* dst,
|
|
int ch,
|
|
size_t count)</span>
|
|
</dt>
|
|
<dd>updates parameters of a memset task</dd>
|
|
<dt>
|
|
<span class="m-doc-wrap-bumper">auto <a href="#ad37637606f0643f360e9eda1f9a6e559" class="m-doc">memcpy</a>(</span><span class="m-doc-wrap">void* tgt,
|
|
const void* src,
|
|
size_t bytes) -> <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
|
|
</dt>
|
|
<dd>creates a memcpy task that copies untyped data in bytes</dd>
|
|
<dt>
|
|
<span class="m-doc-wrap-bumper">void <a href="#acf9e6cfa65cbfcd1d33c88e64b487ce6" class="m-doc">memcpy</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
void* tgt,
|
|
const void* src,
|
|
size_t bytes)</span>
|
|
</dt>
|
|
<dd>updates parameters of a memcpy task</dd>
|
|
<dt>
|
|
<div class="m-doc-template">template<typename T, std::enable_if_t<is_pod_v<T> && (sizeof(T)==1||sizeof(T)==2||sizeof(T)==4), void>* = nullptr></div>
|
|
<span class="m-doc-wrap-bumper">auto <a href="#a40172fac4464f6d805f75921ea3c2a3b" class="m-doc">zero</a>(</span><span class="m-doc-wrap">T* dst,
|
|
size_t count) -> <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
|
|
</dt>
|
|
<dd>creates a memset task that sets a typed memory block to zero</dd>
|
|
<dt>
|
|
<div class="m-doc-template">template<typename T, std::enable_if_t<is_pod_v<T> && (sizeof(T)==1||sizeof(T)==2||sizeof(T)==4), void>* = nullptr></div>
|
|
<span class="m-doc-wrap-bumper">void <a href="#a78c2a73243809e3cbd1955cc1ffe6477" class="m-doc">zero</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
T* dst,
|
|
size_t count)</span>
|
|
</dt>
|
|
<dd>updates parameters of a memset task to a zero task</dd>
|
|
<dt>
|
|
<div class="m-doc-template">template<typename T, std::enable_if_t<is_pod_v<T> && (sizeof(T)==1||sizeof(T)==2||sizeof(T)==4), void>* = nullptr></div>
|
|
<span class="m-doc-wrap-bumper">auto <a href="#a21d4447bc834f4d3e1bb4772c850d090" class="m-doc">fill</a>(</span><span class="m-doc-wrap">T* dst,
|
|
T value,
|
|
size_t count) -> <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
|
|
</dt>
|
|
<dd>creates a memset task that fills a typed memory block with a value</dd>
|
|
<dt>
|
|
<div class="m-doc-template">template<typename T, std::enable_if_t<is_pod_v<T> && (sizeof(T)==1||sizeof(T)==2||sizeof(T)==4), void>* = nullptr></div>
|
|
<span class="m-doc-wrap-bumper">void <a href="#a39ed97c9142959c73d4c25c34d71bd5e" class="m-doc">fill</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
T* dst,
|
|
T value,
|
|
size_t count)</span>
|
|
</dt>
|
|
<dd>updates parameters of a memset task to a fill task</dd>
|
|
<dt>
|
|
<div class="m-doc-template">template<typename T, std::enable_if_t<!std::is_same_v<T, void>, void>* = nullptr></div>
|
|
<span class="m-doc-wrap-bumper">auto <a href="#af03e04771b655f9e629eb4c22e19b19f" class="m-doc">copy</a>(</span><span class="m-doc-wrap">T* tgt,
|
|
const T* src,
|
|
size_t num) -> <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
|
|
</dt>
|
|
<dd>creates a memcopy task that copies typed data</dd>
|
|
<dt>
|
|
<div class="m-doc-template">template<typename T, std::enable_if_t<!std::is_same_v<T, void>, void>* = nullptr></div>
|
|
<span class="m-doc-wrap-bumper">void <a href="#a6cf6ec1e85172fa99c16bf0beffc0562" class="m-doc">copy</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
T* tgt,
|
|
const T* src,
|
|
size_t num)</span>
|
|
</dt>
|
|
<dd>updates parameters of a memcpy task to a copy task</dd>
|
|
<dt>
|
|
<span class="m-doc-wrap-bumper">void <a href="#ae6810f7de27e5a347331aacfce67bea1" class="m-doc">run</a>(</span><span class="m-doc-wrap">cudaStream_t stream)</span>
|
|
</dt>
|
|
<dd>offloads the cudaFlow onto a GPU asynchronously via a stream</dd>
|
|
<dt id="acfbee67cff7dc7c6297c20c64f2e015c">
|
|
<span class="m-doc-wrap-bumper">auto <a href="#acfbee67cff7dc7c6297c20c64f2e015c" class="m-doc-self">native_graph</a>(</span><span class="m-doc-wrap">) -> cudaGraph_t</span>
|
|
</dt>
|
|
<dd>acquires a reference to the underlying CUDA graph</dd>
|
|
<dt id="a5bfdaf621ab617ab5f0ca63466570256">
|
|
<span class="m-doc-wrap-bumper">auto <a href="#a5bfdaf621ab617ab5f0ca63466570256" class="m-doc-self">native_executable</a>(</span><span class="m-doc-wrap">) -> cudaGraphExec_t</span>
|
|
</dt>
|
|
<dd>acquires a reference to the underlying CUDA graph executable</dd>
|
|
<dt>
|
|
<div class="m-doc-template">template<typename C></div>
|
|
<span class="m-doc-wrap-bumper">auto <a href="#ac2906cb0002fc411a983d100a3d58d62" class="m-doc">single_task</a>(</span><span class="m-doc-wrap">C c) -> <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
|
|
</dt>
|
|
<dd>runs a callable with only a single kernel thread</dd>
|
|
<dt>
|
|
<div class="m-doc-template">template<typename C></div>
|
|
<span class="m-doc-wrap-bumper">void <a href="#add2d364f38c72322d8e36bc0da0b98e4" class="m-doc">single_task</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
C c)</span>
|
|
</dt>
|
|
<dd>updates a single-threaded kernel task</dd>
|
|
<dt>
|
|
<div class="m-doc-template">template<typename I, typename C></div>
|
|
<span class="m-doc-wrap-bumper">auto <a href="#a1a681f6223853b6445dcfdad07e4d0fd" class="m-doc">for_each</a>(</span><span class="m-doc-wrap">I first,
|
|
I last,
|
|
C callable) -> <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
|
|
</dt>
|
|
<dd>applies a callable to each dereferenced element of the data array</dd>
|
|
<dt>
|
|
<div class="m-doc-template">template<typename I, typename C></div>
|
|
<span class="m-doc-wrap-bumper">void <a href="#af9cc7ee16602754929bb9118a9d7f0b2" class="m-doc">for_each</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
I first,
|
|
I last,
|
|
C callable)</span>
|
|
</dt>
|
|
<dd>updates parameters of a kernel task created from <a href="#a1a681f6223853b6445dcfdad07e4d0fd" class="m-doc">tf::<wbr />cudaFlow::<wbr />for_each</a></dd>
|
|
<dt>
|
|
<div class="m-doc-template">template<typename I, typename C></div>
|
|
<span class="m-doc-wrap-bumper">auto <a href="#a34f1ea89e5651faa6e8af522a42556ac" class="m-doc">for_each_index</a>(</span><span class="m-doc-wrap">I first,
|
|
I last,
|
|
I step,
|
|
C callable) -> <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
|
|
</dt>
|
|
<dd>applies a callable to each index in the range with the step size</dd>
|
|
<dt>
|
|
<div class="m-doc-template">template<typename I, typename C></div>
|
|
<span class="m-doc-wrap-bumper">void <a href="#a3fa7f8e38b4da1fe0cbcfb265f9349a2" class="m-doc">for_each_index</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
I first,
|
|
I last,
|
|
I step,
|
|
C callable)</span>
|
|
</dt>
|
|
<dd>updates parameters of a kernel task created from <a href="#a34f1ea89e5651faa6e8af522a42556ac" class="m-doc">tf::<wbr />cudaFlow::<wbr />for_each_index</a></dd>
|
|
<dt>
|
|
<div class="m-doc-template">template<typename I, typename O, typename C></div>
|
|
<span class="m-doc-wrap-bumper">auto <a href="#af89a9bda182272462a0eda2581536cd8" class="m-doc">transform</a>(</span><span class="m-doc-wrap">I first,
|
|
I last,
|
|
O output,
|
|
C op) -> <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
|
|
</dt>
|
|
<dd>applies a callable to a source range and stores the result in a target range</dd>
|
|
<dt>
|
|
<div class="m-doc-template">template<typename I, typename O, typename C></div>
|
|
<span class="m-doc-wrap-bumper">void <a href="#a4a211b1f8562e10f9aae8b44fd6acdec" class="m-doc">transform</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
I first,
|
|
I last,
|
|
O output,
|
|
C c)</span>
|
|
</dt>
|
|
<dd>updates parameters of a kernel task created from <a href="#af89a9bda182272462a0eda2581536cd8" class="m-doc">tf::<wbr />cudaFlow::<wbr />transform</a></dd>
|
|
<dt>
|
|
<div class="m-doc-template">template<typename I1, typename I2, typename O, typename C></div>
|
|
<span class="m-doc-wrap-bumper">auto <a href="#abab2bfdfc86ef3a764ece4743fdede76" class="m-doc">transform</a>(</span><span class="m-doc-wrap">I1 first1,
|
|
I1 last1,
|
|
I2 first2,
|
|
O output,
|
|
C op) -> <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
|
|
</dt>
|
|
<dd>creates a task to perform parallel transforms over two ranges of items</dd>
|
|
<dt>
|
|
<div class="m-doc-template">template<typename I1, typename I2, typename O, typename C></div>
|
|
<span class="m-doc-wrap-bumper">void <a href="#a7c6ca7be2b6908e8f71570c54303ba9e" class="m-doc">transform</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
I1 first1,
|
|
I1 last1,
|
|
I2 first2,
|
|
O output,
|
|
C c)</span>
|
|
</dt>
|
|
<dd>updates parameters of a kernel task created from <a href="#af89a9bda182272462a0eda2581536cd8" class="m-doc">tf::<wbr />cudaFlow::<wbr />transform</a></dd>
|
|
<dt>
|
|
<div class="m-doc-template">template<typename C></div>
|
|
<span class="m-doc-wrap-bumper">auto <a href="#a89c389fff64a16e5dd8c60875d3b514d" class="m-doc">capture</a>(</span><span class="m-doc-wrap">C&& callable) -> <a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a></span>
|
|
</dt>
|
|
<dd>constructs a subflow graph through <a href="classtf_1_1cudaFlowCapturer.html" class="m-doc">tf::<wbr />cudaFlowCapturer</a></dd>
|
|
<dt>
|
|
<div class="m-doc-template">template<typename C></div>
|
|
<span class="m-doc-wrap-bumper">void <a href="#aa0f182dc0fa99bcc9118311925fddca5" class="m-doc">capture</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
C callable)</span>
|
|
</dt>
|
|
<dd>updates the captured child graph</dd>
|
|
</dl>
|
|
</section>
|
|
<section>
|
|
<h2>Function documentation</h2>
|
|
<section class="m-doc-details" id="a43507f21eb9cb77667ffe0ac7e6ae635"><div>
|
|
<h3>
|
|
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a43507f21eb9cb77667ffe0ac7e6ae635" class="m-doc-self">dump_native_graph</a>(</span><span class="m-doc-wrap"><a href="http://en.cppreference.com/w/cpp/io/basic_ostream.html" class="m-doc-external">std::<wbr />ostream</a>& os) const</span></span>
|
|
</h3>
|
|
<p>dumps the native CUDA graph into a DOT format through an output stream</p>
|
|
<p>The native CUDA graph may be different from the upper-level cudaFlow graph when flow capture is involved.</p>
|
|
</div></section>
|
|
<section class="m-doc-details" id="a30b2e107cb2c90a37f467b28d1b42a74"><div>
|
|
<h3>
|
|
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a30b2e107cb2c90a37f467b28d1b42a74" class="m-doc-self">noop</a>(</span><span class="m-doc-wrap">)</span></span>
|
|
</h3>
|
|
<p>creates a no-operation task</p>
|
|
<table class="m-table m-fullwidth m-flat">
|
|
<tfoot>
|
|
<tr>
|
|
<th style="width: 1%">Returns</th>
|
|
<td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
|
|
</tr>
|
|
</tfoot>
|
|
</table>
|
|
<p>An empty node performs no operation during execution, but can be used for transitive ordering. For example, a phased execution graph with 2 groups of <code>n</code> nodes with a barrier between them can be represented using an empty node and <code>2*n</code> dependency edges, rather than no empty node and <code>n^2</code> dependency edges.</p>
|
|
</div></section>
|
|
<section class="m-doc-details" id="a060e1c96111c2134ce0f896420a42cd0"><div>
|
|
<h3>
|
|
<div class="m-doc-template">
|
|
template<typename C>
|
|
</div>
|
|
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a060e1c96111c2134ce0f896420a42cd0" class="m-doc-self">host</a>(</span><span class="m-doc-wrap">C&& callable)</span></span>
|
|
</h3>
|
|
<p>creates a host task that runs a callable on the host</p>
|
|
<table class="m-table m-fullwidth m-flat">
|
|
<thead>
|
|
<tr><th colspan="2">Template parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td style="width: 1%">C</td>
|
|
<td>callable type</td>
|
|
</tr>
|
|
</tbody>
|
|
<thead>
|
|
<tr><th colspan="2">Parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>callable</td>
|
|
<td>a callable object with neither arguments nor return (i.e., constructible from <code>std::function<void()></code>)</td>
|
|
</tr>
|
|
</tbody>
|
|
<tfoot>
|
|
<tr>
|
|
<th>Returns</th>
|
|
<td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
|
|
</tr>
|
|
</tfoot>
|
|
</table>
|
|
<p>A host task can only execute CPU-specific functions and cannot do any CUDA calls (e.g., <code>cudaMalloc</code>).</p>
|
|
</div></section>
|
|
<section class="m-doc-details" id="a02e4e5cf7d03b9d087d6fbf54eb86bbf"><div>
|
|
<h3>
|
|
<div class="m-doc-template">
|
|
template<typename C>
|
|
</div>
|
|
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a02e4e5cf7d03b9d087d6fbf54eb86bbf" class="m-doc-self">host</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
C&& callable)</span></span>
|
|
</h3>
|
|
<p>updates parameters of a host task</p>
|
|
<p>The method is similar to <a href="#a060e1c96111c2134ce0f896420a42cd0" class="m-doc">tf::<wbr />cudaFlow::<wbr />host</a> but operates on a task of type <a href="namespacetf.html#afebc56ae6d5765010d0dd13a5f04132eab9361011891280a44d85b967739cc6a5" class="m-doc">tf::<wbr />cudaTaskType::<wbr />HOST</a>.</p>
|
|
</div></section>
|
|
<section class="m-doc-details" id="a68f666503d13a7b80fb7399fb2f0c153"><div>
|
|
<h3>
|
|
<div class="m-doc-template">
|
|
template<typename F, typename... ArgsT>
|
|
</div>
|
|
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a68f666503d13a7b80fb7399fb2f0c153" class="m-doc-self">kernel</a>(</span><span class="m-doc-wrap">dim3 g,
|
|
dim3 b,
|
|
size_t s,
|
|
F f,
|
|
ArgsT... args)</span></span>
|
|
</h3>
|
|
<p>creates a kernel task</p>
|
|
<table class="m-table m-fullwidth m-flat">
|
|
<thead>
|
|
<tr><th colspan="2">Template parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td style="width: 1%">F</td>
|
|
<td>kernel function type</td>
|
|
</tr>
|
|
<tr>
|
|
<td>ArgsT</td>
|
|
<td>kernel function parameters type</td>
|
|
</tr>
|
|
</tbody>
|
|
<thead>
|
|
<tr><th colspan="2">Parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>g</td>
|
|
<td>configured grid</td>
|
|
</tr>
|
|
<tr>
|
|
<td>b</td>
|
|
<td>configured block</td>
|
|
</tr>
|
|
<tr>
|
|
<td>s</td>
|
|
<td>configured shared memory size in bytes</td>
|
|
</tr>
|
|
<tr>
|
|
<td>f</td>
|
|
<td>kernel function</td>
|
|
</tr>
|
|
<tr>
|
|
<td>args</td>
|
|
<td>arguments to forward to the kernel function by copy</td>
|
|
</tr>
|
|
</tbody>
|
|
<tfoot>
|
|
<tr>
|
|
<th>Returns</th>
|
|
<td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
|
|
</tr>
|
|
</tfoot>
|
|
</table>
|
|
</div></section>
|
|
<section class="m-doc-details" id="a821117dd640807bb7ec114b46888dfb1"><div>
|
|
<h3>
|
|
<div class="m-doc-template">
|
|
template<typename F, typename... ArgsT>
|
|
</div>
|
|
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a821117dd640807bb7ec114b46888dfb1" class="m-doc-self">kernel</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
dim3 g,
|
|
dim3 b,
|
|
size_t shm,
|
|
F f,
|
|
ArgsT... args)</span></span>
|
|
</h3>
|
|
<p>updates parameters of a kernel task</p>
|
|
<p>The method is similar to <a href="#a68f666503d13a7b80fb7399fb2f0c153" class="m-doc">tf::<wbr />cudaFlow::<wbr />kernel</a> but operates on a task of type <a href="namespacetf.html#afebc56ae6d5765010d0dd13a5f04132ea35c10219c45ccfb5b07444fd7e17214c" class="m-doc">tf::<wbr />cudaTaskType::<wbr />KERNEL</a>. The kernel function name must NOT change.</p>
|
|
</div></section>
|
|
<section class="m-doc-details" id="a079ca65da35301e5aafd45878a19e9d2"><div>
|
|
<h3>
|
|
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a079ca65da35301e5aafd45878a19e9d2" class="m-doc-self">memset</a>(</span><span class="m-doc-wrap">void* dst,
|
|
int v,
|
|
size_t count)</span></span>
|
|
</h3>
|
|
<p>creates a memset task that fills untyped data with a byte value</p>
|
|
<table class="m-table m-fullwidth m-flat">
|
|
<thead>
|
|
<tr><th colspan="2">Parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td style="width: 1%">dst</td>
|
|
<td>pointer to the destination device memory area</td>
|
|
</tr>
|
|
<tr>
|
|
<td>v</td>
|
|
<td>value to set for each byte of specified memory</td>
|
|
</tr>
|
|
<tr>
|
|
<td>count</td>
|
|
<td>size in bytes to set</td>
|
|
</tr>
|
|
</tbody>
|
|
<tfoot>
|
|
<tr>
|
|
<th>Returns</th>
|
|
<td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
|
|
</tr>
|
|
</tfoot>
|
|
</table>
|
|
<p>A memset task fills the first <code>count</code> bytes of device memory area pointed by <code>dst</code> with the byte value <code>v</code>.</p>
|
|
</div></section>
|
|
<section class="m-doc-details" id="a082505f0fec89f65808421cdc737fb17"><div>
|
|
<h3>
|
|
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a082505f0fec89f65808421cdc737fb17" class="m-doc-self">memset</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
void* dst,
|
|
int ch,
|
|
size_t count)</span></span>
|
|
</h3>
|
|
<p>updates parameters of a memset task</p>
|
|
<p>The method is similar to <a href="#a079ca65da35301e5aafd45878a19e9d2" class="m-doc">tf::<wbr />cudaFlow::<wbr />memset</a> but operates on a task of type <a href="namespacetf.html#afebc56ae6d5765010d0dd13a5f04132ea41d4dbfd78ceea21abb0ecb03c3cc921" class="m-doc">tf::<wbr />cudaTaskType::<wbr />MEMSET</a>. The source/destination memory may have different address values but must be allocated from the same contexts as the original source/destination memory.</p>
|
|
</div></section>
|
|
<section class="m-doc-details" id="ad37637606f0643f360e9eda1f9a6e559"><div>
|
|
<h3>
|
|
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#ad37637606f0643f360e9eda1f9a6e559" class="m-doc-self">memcpy</a>(</span><span class="m-doc-wrap">void* tgt,
|
|
const void* src,
|
|
size_t bytes)</span></span>
|
|
</h3>
|
|
<p>creates a memcpy task that copies untyped data in bytes</p>
|
|
<table class="m-table m-fullwidth m-flat">
|
|
<thead>
|
|
<tr><th colspan="2">Parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td style="width: 1%">tgt</td>
|
|
<td>pointer to the target memory block</td>
|
|
</tr>
|
|
<tr>
|
|
<td>src</td>
|
|
<td>pointer to the source memory block</td>
|
|
</tr>
|
|
<tr>
|
|
<td>bytes</td>
|
|
<td>bytes to copy</td>
|
|
</tr>
|
|
</tbody>
|
|
<tfoot>
|
|
<tr>
|
|
<th>Returns</th>
|
|
<td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
|
|
</tr>
|
|
</tfoot>
|
|
</table>
|
|
<p>A memcpy task transfers <code>bytes</code> of data from a source location to a target location. Direction can be arbitrary among CPUs and GPUs.</p>
|
|
</div></section>
|
|
<section class="m-doc-details" id="acf9e6cfa65cbfcd1d33c88e64b487ce6"><div>
|
|
<h3>
|
|
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#acf9e6cfa65cbfcd1d33c88e64b487ce6" class="m-doc-self">memcpy</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
void* tgt,
|
|
const void* src,
|
|
size_t bytes)</span></span>
|
|
</h3>
|
|
<p>updates parameters of a memcpy task</p>
|
|
<p>The method is similar to <a href="#ad37637606f0643f360e9eda1f9a6e559" class="m-doc">tf::<wbr />cudaFlow::<wbr />memcpy</a> but operates on a task of type <a href="namespacetf.html#afebc56ae6d5765010d0dd13a5f04132eac5d10cc70cce96265c445f14e7f5aba4" class="m-doc">tf::<wbr />cudaTaskType::<wbr />MEMCPY</a>. The source/destination memory may have different address values but must be allocated from the same contexts as the original source/destination memory.</p>
|
|
</div></section>
|
|
<section class="m-doc-details" id="a40172fac4464f6d805f75921ea3c2a3b"><div>
|
|
<h3>
|
|
<div class="m-doc-template">
|
|
template<typename T, std::enable_if_t<is_pod_v<T> && (sizeof(T)==1||sizeof(T)==2||sizeof(T)==4), void>* = nullptr>
|
|
</div>
|
|
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a40172fac4464f6d805f75921ea3c2a3b" class="m-doc-self">zero</a>(</span><span class="m-doc-wrap">T* dst,
|
|
size_t count)</span></span>
|
|
</h3>
|
|
<p>creates a memset task that sets a typed memory block to zero</p>
|
|
<table class="m-table m-fullwidth m-flat">
|
|
<thead>
|
|
<tr><th colspan="2">Template parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td style="width: 1%">T</td>
|
|
<td>element type (size of <code>T</code> must be either 1, 2, or 4)</td>
|
|
</tr>
|
|
</tbody>
|
|
<thead>
|
|
<tr><th colspan="2">Parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>dst</td>
|
|
<td>pointer to the destination device memory area</td>
|
|
</tr>
|
|
<tr>
|
|
<td>count</td>
|
|
<td>number of elements</td>
|
|
</tr>
|
|
</tbody>
|
|
<tfoot>
|
|
<tr>
|
|
<th>Returns</th>
|
|
<td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
|
|
</tr>
|
|
</tfoot>
|
|
</table>
|
|
<p>A zero task zeroes the first <code>count</code> elements of type <code>T</code> in a device memory area pointed by <code>dst</code>.</p>
|
|
</div></section>
|
|
<section class="m-doc-details" id="a78c2a73243809e3cbd1955cc1ffe6477"><div>
|
|
<h3>
|
|
<div class="m-doc-template">
|
|
template<typename T, std::enable_if_t<is_pod_v<T> && (sizeof(T)==1||sizeof(T)==2||sizeof(T)==4), void>* = nullptr>
|
|
</div>
|
|
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a78c2a73243809e3cbd1955cc1ffe6477" class="m-doc-self">zero</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
T* dst,
|
|
size_t count)</span></span>
|
|
</h3>
|
|
<p>updates parameters of a memset task to a zero task</p>
|
|
<p>The method is similar to <a href="#a40172fac4464f6d805f75921ea3c2a3b" class="m-doc">tf::<wbr />cudaFlow::<wbr />zero</a> but operates on a task of type <a href="namespacetf.html#afebc56ae6d5765010d0dd13a5f04132ea41d4dbfd78ceea21abb0ecb03c3cc921" class="m-doc">tf::<wbr />cudaTaskType::<wbr />MEMSET</a>.</p><p>The source/destination memory may have different address values but must be allocated from the same contexts as the original source/destination memory.</p>
|
|
</div></section>
|
|
<section class="m-doc-details" id="a21d4447bc834f4d3e1bb4772c850d090"><div>
|
|
<h3>
|
|
<div class="m-doc-template">
|
|
template<typename T, std::enable_if_t<is_pod_v<T> && (sizeof(T)==1||sizeof(T)==2||sizeof(T)==4), void>* = nullptr>
|
|
</div>
|
|
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a21d4447bc834f4d3e1bb4772c850d090" class="m-doc-self">fill</a>(</span><span class="m-doc-wrap">T* dst,
|
|
T value,
|
|
size_t count)</span></span>
|
|
</h3>
|
|
<p>creates a memset task that fills a typed memory block with a value</p>
|
|
<table class="m-table m-fullwidth m-flat">
|
|
<thead>
|
|
<tr><th colspan="2">Template parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td style="width: 1%">T</td>
|
|
<td>element type (size of <code>T</code> must be either 1, 2, or 4)</td>
|
|
</tr>
|
|
</tbody>
|
|
<thead>
|
|
<tr><th colspan="2">Parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>dst</td>
|
|
<td>pointer to the destination device memory area</td>
|
|
</tr>
|
|
<tr>
|
|
<td>value</td>
|
|
<td>value to fill for each element of type <code>T</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td>count</td>
|
|
<td>number of elements</td>
|
|
</tr>
|
|
</tbody>
|
|
<tfoot>
|
|
<tr>
|
|
<th>Returns</th>
|
|
<td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
|
|
</tr>
|
|
</tfoot>
|
|
</table>
|
|
<p>A fill task fills the first <code>count</code> elements of type <code>T</code> with <code>value</code> in a device memory area pointed by <code>dst</code>. The value to fill is interpreted in type <code>T</code> rather than byte.</p>
|
|
</div></section>
|
|
<section class="m-doc-details" id="a39ed97c9142959c73d4c25c34d71bd5e"><div>
|
|
<h3>
|
|
<div class="m-doc-template">
|
|
template<typename T, std::enable_if_t<is_pod_v<T> && (sizeof(T)==1||sizeof(T)==2||sizeof(T)==4), void>* = nullptr>
|
|
</div>
|
|
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a39ed97c9142959c73d4c25c34d71bd5e" class="m-doc-self">fill</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
T* dst,
|
|
T value,
|
|
size_t count)</span></span>
|
|
</h3>
|
|
<p>updates parameters of a memset task to a fill task</p>
|
|
<p>The method is similar to <a href="#a21d4447bc834f4d3e1bb4772c850d090" class="m-doc">tf::<wbr />cudaFlow::<wbr />fill</a> but operates on a task of type <a href="namespacetf.html#afebc56ae6d5765010d0dd13a5f04132ea41d4dbfd78ceea21abb0ecb03c3cc921" class="m-doc">tf::<wbr />cudaTaskType::<wbr />MEMSET</a>.</p><p>The source/destination memory may have different address values but must be allocated from the same contexts as the original source/destination memory.</p>
|
|
</div></section>
|
|
<section class="m-doc-details" id="af03e04771b655f9e629eb4c22e19b19f"><div>
|
|
<h3>
|
|
<div class="m-doc-template">
|
|
template<typename T, std::enable_if_t<!std::is_same_v<T, void>, void>* = nullptr>
|
|
</div>
|
|
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#af03e04771b655f9e629eb4c22e19b19f" class="m-doc-self">copy</a>(</span><span class="m-doc-wrap">T* tgt,
|
|
const T* src,
|
|
size_t num)</span></span>
|
|
</h3>
|
|
<p>creates a memcopy task that copies typed data</p>
|
|
<table class="m-table m-fullwidth m-flat">
|
|
<thead>
|
|
<tr><th colspan="2">Template parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td style="width: 1%">T</td>
|
|
<td>element type (non-void)</td>
|
|
</tr>
|
|
</tbody>
|
|
<thead>
|
|
<tr><th colspan="2">Parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>tgt</td>
|
|
<td>pointer to the target memory block</td>
|
|
</tr>
|
|
<tr>
|
|
<td>src</td>
|
|
<td>pointer to the source memory block</td>
|
|
</tr>
|
|
<tr>
|
|
<td>num</td>
|
|
<td>number of elements to copy</td>
|
|
</tr>
|
|
</tbody>
|
|
<tfoot>
|
|
<tr>
|
|
<th>Returns</th>
|
|
<td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
|
|
</tr>
|
|
</tfoot>
|
|
</table>
|
|
<p>A copy task transfers <code>num*sizeof(T)</code> bytes of data from a source location to a target location. Direction can be arbitrary among CPUs and GPUs.</p>
|
|
</div></section>
|
|
<section class="m-doc-details" id="a6cf6ec1e85172fa99c16bf0beffc0562"><div>
|
|
<h3>
|
|
<div class="m-doc-template">
|
|
template<typename T, std::enable_if_t<!std::is_same_v<T, void>, void>* = nullptr>
|
|
</div>
|
|
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a6cf6ec1e85172fa99c16bf0beffc0562" class="m-doc-self">copy</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
T* tgt,
|
|
const T* src,
|
|
size_t num)</span></span>
|
|
</h3>
|
|
<p>updates parameters of a memcpy task to a copy task</p>
|
|
<p>The method is similar to <a href="#af03e04771b655f9e629eb4c22e19b19f" class="m-doc">tf::<wbr />cudaFlow::<wbr />copy</a> but operates on a task of type <a href="namespacetf.html#afebc56ae6d5765010d0dd13a5f04132eac5d10cc70cce96265c445f14e7f5aba4" class="m-doc">tf::<wbr />cudaTaskType::<wbr />MEMCPY</a>. The source/destination memory may have different address values but must be allocated from the same contexts as the original source/destination memory.</p>
|
|
</div></section>
|
|
<section class="m-doc-details" id="ae6810f7de27e5a347331aacfce67bea1"><div>
|
|
<h3>
|
|
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#ae6810f7de27e5a347331aacfce67bea1" class="m-doc-self">run</a>(</span><span class="m-doc-wrap">cudaStream_t stream)</span></span>
|
|
</h3>
|
|
<p>offloads the cudaFlow onto a GPU asynchronously via a stream</p>
|
|
<table class="m-table m-fullwidth m-flat">
|
|
<thead>
|
|
<tr><th colspan="2">Parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td style="width: 1%">stream</td>
|
|
<td>stream for performing this operation</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>Offloads the present cudaFlow onto a GPU asynchronously via the given stream.</p><p>An offloaded cudaFlow forces the underlying graph to be instantiated. After the instantiation, you should not modify the graph topology but update node parameters.</p>
|
|
</div></section>
|
|
<section class="m-doc-details" id="ac2906cb0002fc411a983d100a3d58d62"><div>
|
|
<h3>
|
|
<div class="m-doc-template">
|
|
template<typename C>
|
|
</div>
|
|
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#ac2906cb0002fc411a983d100a3d58d62" class="m-doc-self">single_task</a>(</span><span class="m-doc-wrap">C c)</span></span>
|
|
</h3>
|
|
<p>runs a callable with only a single kernel thread</p>
|
|
<table class="m-table m-fullwidth m-flat">
|
|
<thead>
|
|
<tr><th colspan="2">Template parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td style="width: 1%">C</td>
|
|
<td>callable type</td>
|
|
</tr>
|
|
</tbody>
|
|
<thead>
|
|
<tr><th colspan="2">Parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>c</td>
|
|
<td>callable to run by a single kernel thread</td>
|
|
</tr>
|
|
</tbody>
|
|
<tfoot>
|
|
<tr>
|
|
<th>Returns</th>
|
|
<td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
|
|
</tr>
|
|
</tfoot>
|
|
</table>
|
|
</div></section>
|
|
<section class="m-doc-details" id="add2d364f38c72322d8e36bc0da0b98e4"><div>
|
|
<h3>
|
|
<div class="m-doc-template">
|
|
template<typename C>
|
|
</div>
|
|
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#add2d364f38c72322d8e36bc0da0b98e4" class="m-doc-self">single_task</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
C c)</span></span>
|
|
</h3>
|
|
<p>updates a single-threaded kernel task</p>
|
|
<p>This method is similar to <a href="#ac2906cb0002fc411a983d100a3d58d62" class="m-doc">cudaFlow::<wbr />single_task</a> but operates on an existing task.</p>
|
|
</div></section>
|
|
<section class="m-doc-details" id="a1a681f6223853b6445dcfdad07e4d0fd"><div>
|
|
<h3>
|
|
<div class="m-doc-template">
|
|
template<typename I, typename C>
|
|
</div>
|
|
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a1a681f6223853b6445dcfdad07e4d0fd" class="m-doc-self">for_each</a>(</span><span class="m-doc-wrap">I first,
|
|
I last,
|
|
C callable)</span></span>
|
|
</h3>
|
|
<p>applies a callable to each dereferenced element of the data array</p>
|
|
<table class="m-table m-fullwidth m-flat">
|
|
<thead>
|
|
<tr><th colspan="2">Template parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td style="width: 1%">I</td>
|
|
<td>iterator type</td>
|
|
</tr>
|
|
<tr>
|
|
<td>C</td>
|
|
<td>callable type</td>
|
|
</tr>
|
|
</tbody>
|
|
<thead>
|
|
<tr><th colspan="2">Parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>first</td>
|
|
<td>iterator to the beginning (inclusive)</td>
|
|
</tr>
|
|
<tr>
|
|
<td>last</td>
|
|
<td>iterator to the end (exclusive)</td>
|
|
</tr>
|
|
<tr>
|
|
<td>callable</td>
|
|
<td>a callable object to apply to the dereferenced iterator</td>
|
|
</tr>
|
|
</tbody>
|
|
<tfoot>
|
|
<tr>
|
|
<th>Returns</th>
|
|
<td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
|
|
</tr>
|
|
</tfoot>
|
|
</table>
|
|
<p>This method is equivalent to the parallel execution of the following loop on a GPU:</p><pre class="m-code"><span class="k">for</span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">itr</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">first</span><span class="p">;</span><span class="w"> </span><span class="n">itr</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">last</span><span class="p">;</span><span class="w"> </span><span class="n">itr</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="n">callable</span><span class="p">(</span><span class="o">*</span><span class="n">itr</span><span class="p">);</span>
|
|
<span class="p">}</span></pre>
|
|
</div></section>
|
|
<section class="m-doc-details" id="af9cc7ee16602754929bb9118a9d7f0b2"><div>
|
|
<h3>
|
|
<div class="m-doc-template">
|
|
template<typename I, typename C>
|
|
</div>
|
|
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#af9cc7ee16602754929bb9118a9d7f0b2" class="m-doc-self">for_each</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
I first,
|
|
I last,
|
|
C callable)</span></span>
|
|
</h3>
|
|
<p>updates parameters of a kernel task created from <a href="#a1a681f6223853b6445dcfdad07e4d0fd" class="m-doc">tf::<wbr />cudaFlow::<wbr />for_each</a></p>
|
|
<p>The type of the iterators and the callable must be the same as the task created from <a href="#a1a681f6223853b6445dcfdad07e4d0fd" class="m-doc">tf::<wbr />cudaFlow::<wbr />for_each</a>.</p>
|
|
</div></section>
|
|
<section class="m-doc-details" id="a34f1ea89e5651faa6e8af522a42556ac"><div>
|
|
<h3>
|
|
<div class="m-doc-template">
|
|
template<typename I, typename C>
|
|
</div>
|
|
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a34f1ea89e5651faa6e8af522a42556ac" class="m-doc-self">for_each_index</a>(</span><span class="m-doc-wrap">I first,
|
|
I last,
|
|
I step,
|
|
C callable)</span></span>
|
|
</h3>
|
|
<p>applies a callable to each index in the range with the step size</p>
|
|
<table class="m-table m-fullwidth m-flat">
|
|
<thead>
|
|
<tr><th colspan="2">Template parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td style="width: 1%">I</td>
|
|
<td>index type</td>
|
|
</tr>
|
|
<tr>
|
|
<td>C</td>
|
|
<td>callable type</td>
|
|
</tr>
|
|
</tbody>
|
|
<thead>
|
|
<tr><th colspan="2">Parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>first</td>
|
|
<td>beginning index</td>
|
|
</tr>
|
|
<tr>
|
|
<td>last</td>
|
|
<td>last index</td>
|
|
</tr>
|
|
<tr>
|
|
<td>step</td>
|
|
<td>step size</td>
|
|
</tr>
|
|
<tr>
|
|
<td>callable</td>
|
|
<td>the callable to apply to each element in the data array</td>
|
|
</tr>
|
|
</tbody>
|
|
<tfoot>
|
|
<tr>
|
|
<th>Returns</th>
|
|
<td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
|
|
</tr>
|
|
</tfoot>
|
|
</table>
|
|
<p>This method is equivalent to the parallel execution of the following loop on a GPU:</p><pre class="m-code"><span class="c1">// step is positive [first, last)</span>
|
|
<span class="k">for</span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">i</span><span class="o">=</span><span class="n">first</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o"><</span><span class="n">last</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">+=</span><span class="n">step</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="n">callable</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>
|
|
<span class="p">}</span>
|
|
|
|
<span class="c1">// step is negative [first, last)</span>
|
|
<span class="k">for</span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">i</span><span class="o">=</span><span class="n">first</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">></span><span class="n">last</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">+=</span><span class="n">step</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="n">callable</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>
|
|
<span class="p">}</span></pre>
|
|
</div></section>
|
|
<section class="m-doc-details" id="a3fa7f8e38b4da1fe0cbcfb265f9349a2"><div>
|
|
<h3>
|
|
<div class="m-doc-template">
|
|
template<typename I, typename C>
|
|
</div>
|
|
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a3fa7f8e38b4da1fe0cbcfb265f9349a2" class="m-doc-self">for_each_index</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
I first,
|
|
I last,
|
|
I step,
|
|
C callable)</span></span>
|
|
</h3>
|
|
<p>updates parameters of a kernel task created from <a href="#a34f1ea89e5651faa6e8af522a42556ac" class="m-doc">tf::<wbr />cudaFlow::<wbr />for_each_index</a></p>
|
|
<p>The type of the iterators and the callable must be the same as the task created from <a href="#a34f1ea89e5651faa6e8af522a42556ac" class="m-doc">tf::<wbr />cudaFlow::<wbr />for_each_index</a>.</p>
|
|
</div></section>
|
|
<section class="m-doc-details" id="af89a9bda182272462a0eda2581536cd8"><div>
|
|
<h3>
|
|
<div class="m-doc-template">
|
|
template<typename I, typename O, typename C>
|
|
</div>
|
|
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#af89a9bda182272462a0eda2581536cd8" class="m-doc-self">transform</a>(</span><span class="m-doc-wrap">I first,
|
|
I last,
|
|
O output,
|
|
C op)</span></span>
|
|
</h3>
|
|
<p>applies a callable to a source range and stores the result in a target range</p>
|
|
<table class="m-table m-fullwidth m-flat">
|
|
<thead>
|
|
<tr><th colspan="2">Template parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td style="width: 1%">I</td>
|
|
<td>input iterator type</td>
|
|
</tr>
|
|
<tr>
|
|
<td>O</td>
|
|
<td>output iterator type</td>
|
|
</tr>
|
|
<tr>
|
|
<td>C</td>
|
|
<td>unary operator type</td>
|
|
</tr>
|
|
</tbody>
|
|
<thead>
|
|
<tr><th colspan="2">Parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>first</td>
|
|
<td>iterator to the beginning of the input range</td>
|
|
</tr>
|
|
<tr>
|
|
<td>last</td>
|
|
<td>iterator to the end of the input range</td>
|
|
</tr>
|
|
<tr>
|
|
<td>output</td>
|
|
<td>iterator to the beginning of the output range</td>
|
|
</tr>
|
|
<tr>
|
|
<td>op</td>
|
|
<td>the operator to apply to transform each element in the range</td>
|
|
</tr>
|
|
</tbody>
|
|
<tfoot>
|
|
<tr>
|
|
<th>Returns</th>
|
|
<td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
|
|
</tr>
|
|
</tfoot>
|
|
</table>
|
|
<p>This method is equivalent to the parallel execution of the following loop on a GPU:</p><pre class="m-code"><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">first</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">last</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="o">*</span><span class="n">output</span><span class="o">++</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">callable</span><span class="p">(</span><span class="o">*</span><span class="n">first</span><span class="o">++</span><span class="p">);</span>
|
|
<span class="p">}</span></pre>
|
|
</div></section>
|
|
<section class="m-doc-details" id="a4a211b1f8562e10f9aae8b44fd6acdec"><div>
|
|
<h3>
|
|
<div class="m-doc-template">
|
|
template<typename I, typename O, typename C>
|
|
</div>
|
|
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a4a211b1f8562e10f9aae8b44fd6acdec" class="m-doc-self">transform</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
I first,
|
|
I last,
|
|
O output,
|
|
C c)</span></span>
|
|
</h3>
|
|
<p>updates parameters of a kernel task created from <a href="#af89a9bda182272462a0eda2581536cd8" class="m-doc">tf::<wbr />cudaFlow::<wbr />transform</a></p>
|
|
<p>The type of the iterators and the callable must be the same as the task created from <a href="#a1a681f6223853b6445dcfdad07e4d0fd" class="m-doc">tf::<wbr />cudaFlow::<wbr />for_each</a>.</p>
|
|
</div></section>
|
|
<section class="m-doc-details" id="abab2bfdfc86ef3a764ece4743fdede76"><div>
|
|
<h3>
|
|
<div class="m-doc-template">
|
|
template<typename I1, typename I2, typename O, typename C>
|
|
</div>
|
|
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#abab2bfdfc86ef3a764ece4743fdede76" class="m-doc-self">transform</a>(</span><span class="m-doc-wrap">I1 first1,
|
|
I1 last1,
|
|
I2 first2,
|
|
O output,
|
|
C op)</span></span>
|
|
</h3>
|
|
<p>creates a task to perform parallel transforms over two ranges of items</p>
|
|
<table class="m-table m-fullwidth m-flat">
|
|
<thead>
|
|
<tr><th colspan="2">Template parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td style="width: 1%">I1</td>
|
|
<td>first input iterator type</td>
|
|
</tr>
|
|
<tr>
|
|
<td>I2</td>
|
|
<td>second input iterator type</td>
|
|
</tr>
|
|
<tr>
|
|
<td>O</td>
|
|
<td>output iterator type</td>
|
|
</tr>
|
|
<tr>
|
|
<td>C</td>
|
|
<td>unary operator type</td>
|
|
</tr>
|
|
</tbody>
|
|
<thead>
|
|
<tr><th colspan="2">Parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>first1</td>
|
|
<td>iterator to the beginning of the input range</td>
|
|
</tr>
|
|
<tr>
|
|
<td>last1</td>
|
|
<td>iterator to the end of the input range</td>
|
|
</tr>
|
|
<tr>
|
|
<td>first2</td>
|
|
<td>iterato</td>
|
|
</tr>
|
|
<tr>
|
|
<td>output</td>
|
|
<td>iterator to the beginning of the output range</td>
|
|
</tr>
|
|
<tr>
|
|
<td>op</td>
|
|
<td>binary operator to apply to transform each pair of items in the two input ranges</td>
|
|
</tr>
|
|
</tbody>
|
|
<tfoot>
|
|
<tr>
|
|
<th>Returns</th>
|
|
<td><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> handle</td>
|
|
</tr>
|
|
</tfoot>
|
|
</table>
|
|
<p>This method is equivalent to the parallel execution of the following loop on a GPU:</p><pre class="m-code"><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">first1</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">last1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="o">*</span><span class="n">output</span><span class="o">++</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">op</span><span class="p">(</span><span class="o">*</span><span class="n">first1</span><span class="o">++</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="n">first2</span><span class="o">++</span><span class="p">);</span>
|
|
<span class="p">}</span></pre>
|
|
</div></section>
|
|
<section class="m-doc-details" id="a7c6ca7be2b6908e8f71570c54303ba9e"><div>
|
|
<h3>
|
|
<div class="m-doc-template">
|
|
template<typename I1, typename I2, typename O, typename C>
|
|
</div>
|
|
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a7c6ca7be2b6908e8f71570c54303ba9e" class="m-doc-self">transform</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
I1 first1,
|
|
I1 last1,
|
|
I2 first2,
|
|
O output,
|
|
C c)</span></span>
|
|
</h3>
|
|
<p>updates parameters of a kernel task created from <a href="#af89a9bda182272462a0eda2581536cd8" class="m-doc">tf::<wbr />cudaFlow::<wbr />transform</a></p>
|
|
<p>The type of the iterators and the callable must be the same as the task created from <a href="#a1a681f6223853b6445dcfdad07e4d0fd" class="m-doc">tf::<wbr />cudaFlow::<wbr />for_each</a>.</p>
|
|
</div></section>
|
|
<section class="m-doc-details" id="a89c389fff64a16e5dd8c60875d3b514d"><div>
|
|
<h3>
|
|
<div class="m-doc-template">
|
|
template<typename C>
|
|
</div>
|
|
<span class="m-doc-wrap-bumper"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a89c389fff64a16e5dd8c60875d3b514d" class="m-doc-self">capture</a>(</span><span class="m-doc-wrap">C&& callable)</span></span>
|
|
</h3>
|
|
<p>constructs a subflow graph through <a href="classtf_1_1cudaFlowCapturer.html" class="m-doc">tf::<wbr />cudaFlowCapturer</a></p>
|
|
<table class="m-table m-fullwidth m-flat">
|
|
<thead>
|
|
<tr><th colspan="2">Template parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td style="width: 1%">C</td>
|
|
<td>callable type constructible from <code>std::function<void(tf::cudaFlowCapturer&)></code></td>
|
|
</tr>
|
|
</tbody>
|
|
<thead>
|
|
<tr><th colspan="2">Parameters</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>callable</td>
|
|
<td>the callable to construct a capture flow</td>
|
|
</tr>
|
|
</tbody>
|
|
<tfoot>
|
|
<tr>
|
|
<th>Returns</th>
|
|
<td>a <a href="classtf_1_1cudaTask.html" class="m-doc">tf::<wbr />cudaTask</a> handle</td>
|
|
</tr>
|
|
</tfoot>
|
|
</table>
|
|
<p>A captured subflow forms a sub-graph to the cudaFlow and can be used to capture custom (or third-party) kernels that cannot be directly constructed from the cudaFlow.</p><p>Example usage:</p><pre class="m-code"><span class="n">taskflow</span><span class="p">.</span><span class="n">emplace</span><span class="p">([</span><span class="o">&</span><span class="p">](</span><span class="n">tf</span><span class="o">::</span><span class="n">cudaFlow</span><span class="o">&</span><span class="w"> </span><span class="n">cf</span><span class="p">){</span>
|
|
|
|
<span class="w"> </span><span class="n">tf</span><span class="o">::</span><span class="n">cudaTask</span><span class="w"> </span><span class="n">my_kernel</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cf</span><span class="p">.</span><span class="n">kernel</span><span class="p">(</span><span class="n">my_arguments</span><span class="p">);</span>
|
|
|
|
<span class="w"> </span><span class="c1">// create a flow capturer to capture custom kernels</span>
|
|
<span class="w"> </span><span class="n">tf</span><span class="o">::</span><span class="n">cudaTask</span><span class="w"> </span><span class="n">my_subflow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cf</span><span class="p">.</span><span class="n">capture</span><span class="p">([</span><span class="o">&</span><span class="p">](</span><span class="n">tf</span><span class="o">::</span><span class="n">cudaFlowCapturer</span><span class="o">&</span><span class="w"> </span><span class="n">capturer</span><span class="p">){</span>
|
|
<span class="w"> </span><span class="n">capturer</span><span class="p">.</span><span class="n">on</span><span class="p">([</span><span class="o">&</span><span class="p">](</span><span class="n">cudaStream_t</span><span class="w"> </span><span class="n">stream</span><span class="p">){</span>
|
|
<span class="w"> </span><span class="n">invoke_custom_kernel_with_stream</span><span class="p">(</span><span class="n">stream</span><span class="p">,</span><span class="w"> </span><span class="n">custom_arguments</span><span class="p">);</span>
|
|
<span class="w"> </span><span class="p">});</span>
|
|
<span class="w"> </span><span class="p">});</span>
|
|
|
|
<span class="w"> </span><span class="n">my_kernel</span><span class="p">.</span><span class="n">precede</span><span class="p">(</span><span class="n">my_subflow</span><span class="p">);</span>
|
|
<span class="p">});</span></pre>
|
|
</div></section>
|
|
<section class="m-doc-details" id="aa0f182dc0fa99bcc9118311925fddca5"><div>
|
|
<h3>
|
|
<div class="m-doc-template">
|
|
template<typename C>
|
|
</div>
|
|
<span class="m-doc-wrap-bumper">void tf::<wbr />cudaFlow::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#aa0f182dc0fa99bcc9118311925fddca5" class="m-doc-self">capture</a>(</span><span class="m-doc-wrap"><a href="classtf_1_1cudaTask.html" class="m-doc">cudaTask</a> task,
|
|
C callable)</span></span>
|
|
</h3>
|
|
<p>updates the captured child graph</p>
|
|
<p>The method is similar to <a href="#a89c389fff64a16e5dd8c60875d3b514d" class="m-doc">tf::<wbr />cudaFlow::<wbr />capture</a> but operates on a task of type <a href="namespacetf.html#afebc56ae6d5765010d0dd13a5f04132ea46be697979903d784a70aeec45eb14ad" class="m-doc">tf::<wbr />cudaTaskType::<wbr />SUBFLOW</a>. The new captured graph must be topologically identical to the original captured graph.</p>
|
|
</div></section>
|
|
</section>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
</article></main>
|
|
<div class="m-doc-search" id="search">
|
|
<a href="#!" onclick="return hideSearch()"></a>
|
|
<div class="m-container">
|
|
<div class="m-row">
|
|
<div class="m-col-m-8 m-push-m-2">
|
|
<div class="m-doc-search-header m-text m-small">
|
|
<div><span class="m-label m-default">Tab</span> / <span class="m-label m-default">T</span> to search, <span class="m-label m-default">Esc</span> to close</div>
|
|
<div id="search-symbolcount">…</div>
|
|
</div>
|
|
<div class="m-doc-search-content">
|
|
<form>
|
|
<input type="search" name="q" id="search-input" placeholder="Loading …" disabled="disabled" autofocus="autofocus" autocomplete="off" spellcheck="false" />
|
|
</form>
|
|
<noscript class="m-text m-danger m-text-center">Unlike everything else in the docs, the search functionality <em>requires</em> JavaScript.</noscript>
|
|
<div id="search-help" class="m-text m-dim m-text-center">
|
|
<p class="m-noindent">Search for symbols, directories, files, pages or
|
|
modules. You can omit any prefix from the symbol or file path; adding a
|
|
<code>:</code> or <code>/</code> suffix lists all members of given symbol or
|
|
directory.</p>
|
|
<p class="m-noindent">Use <span class="m-label m-dim">↓</span>
|
|
/ <span class="m-label m-dim">↑</span> to navigate through the list,
|
|
<span class="m-label m-dim">Enter</span> to go.
|
|
<span class="m-label m-dim">Tab</span> autocompletes common prefix, you can
|
|
copy a link to the result using <span class="m-label m-dim">⌘</span>
|
|
<span class="m-label m-dim">L</span> while <span class="m-label m-dim">⌘</span>
|
|
<span class="m-label m-dim">M</span> produces a Markdown link.</p>
|
|
</div>
|
|
<div id="search-notfound" class="m-text m-warning m-text-center">Sorry, nothing was found.</div>
|
|
<ul id="search-results"></ul>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
<script src="search-v2.js"></script>
|
|
<script src="searchdata-v2.js" async="async"></script>
|
|
<footer><nav>
|
|
<div class="m-container">
|
|
<div class="m-row">
|
|
<div class="m-col-l-10 m-push-l-1">
|
|
<p>Taskflow handbook is part of the <a href="https://taskflow.github.io">Taskflow project</a>, copyright © <a href="https://tsung-wei-huang.github.io/">Dr. Tsung-Wei Huang</a>, 2018–2024.<br />Generated by <a href="https://doxygen.org/">Doxygen</a> 1.9.1 and <a href="https://mcss.mosra.cz/">m.css</a>.</p>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
</nav></footer>
|
|
</body>
|
|
</html>
|