mesytec-mnode/external/taskflow-3.8.0/docs/xml/release-3-0-0.xml
2025-01-04 01:25:05 +01:00

315 lines
16 KiB
XML

<?xml version='1.0' encoding='UTF-8' standalone='no'?>
<doxygen xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="compound.xsd" version="1.9.1" xml:lang="en-US">
<compounddef id="release-3-0-0" kind="page">
<compoundname>release-3-0-0</compoundname>
<title>Release 3.0.0 (2021/01/01)</title>
<tableofcontents>
<tocsect>
<name>Download</name>
<reference>release-3-0-0_1release-3-0-0_download</reference>
</tocsect>
<tocsect>
<name>System Requirements</name>
<reference>release-3-0-0_1release-3-0-0_system_requirements</reference>
</tocsect>
<tocsect>
<name>Working Items</name>
<reference>release-3-0-0_1release-3-0-0_working_items</reference>
</tocsect>
<tocsect>
<name>New Features</name>
<reference>release-3-0-0_1release-3-0-0_new_features</reference>
<tableofcontents>
<tocsect>
<name>Taskflow Core</name>
<reference>release-3-0-0_1release-3-0-0_taskflow_core</reference>
</tocsect>
<tocsect>
<name>cudaFlow</name>
<reference>release-3-0-0_1release-3-0-0_cudaflow</reference>
</tocsect>
<tocsect>
<name>Utilities</name>
<reference>release-3-0-0_1release-3-0-0_utilities</reference>
</tocsect>
<tocsect>
<name>Taskflow Profiler (TFProf)</name>
<reference>release-3-0-0_1release-3-0-0_profiler</reference>
</tocsect>
</tableofcontents>
</tocsect>
<tocsect>
<name>New Algorithms</name>
<reference>release-3-0-0_1release-3-0-0_new_algorithms</reference>
<tableofcontents>
<tocsect>
<name>CPU Algorithms</name>
<reference>release-3-0-0_1release-3-0-0_cpu_algorithms</reference>
</tocsect>
<tocsect>
<name>GPU Algorithms</name>
<reference>release-3-0-0_1release-3-0-0_gpu_algorithms</reference>
</tocsect>
</tableofcontents>
</tocsect>
<tocsect>
<name>Bug Fixes</name>
<reference>release-3-0-0_1release-3-0-0_bug_fixes</reference>
</tocsect>
<tocsect>
<name>Breaking Changes</name>
<reference>release-3-0-0_1release-3-0-0_breaking_changes</reference>
</tocsect>
<tocsect>
<name>Deprecated and Removed Items</name>
<reference>release-3-0-0_1release-3-0-0_deprecated_items</reference>
</tocsect>
<tocsect>
<name>Documentation</name>
<reference>release-3-0-0_1release-3-0-0_documentation</reference>
</tocsect>
<tocsect>
<name>Miscellaneous Items</name>
<reference>release-3-0-0_1release-3-0-0_miscellaneous_items</reference>
</tocsect>
</tableofcontents>
<briefdescription>
</briefdescription>
<detaileddescription>
<para>Taskflow 3.0.0 is the 1st release in the 3.x line! This release includes several new changes such as CPU-GPU tasking, algorithm collection, enhanced web-based profiler, documentation, and unit tests.</para>
<para><simplesect kind="note"><para>Starting from v3, we have migrated the codebase to the <ulink url="https://en.wikipedia.org/wiki/C%2B%2B17">C++17</ulink> standard to largely improve the expressivity and efficiency of the codebase.</para>
</simplesect>
</para>
<sect1 id="release-3-0-0_1release-3-0-0_download">
<title>Download</title>
<para>Taskflow 3.0.0 can be downloaded from <ulink url="https://github.com/taskflow/taskflow/releases/tag/v3.0.0">here</ulink>.</para>
</sect1>
<sect1 id="release-3-0-0_1release-3-0-0_system_requirements">
<title>System Requirements</title>
<para>To use Taskflow v3.0.0, you need a compiler that supports C++17:</para>
<para><itemizedlist>
<listitem><para>GNU C++ Compiler at least v7.0 with -std=c++17 </para>
</listitem>
<listitem><para>Clang C++ Compiler at least v6.0 with -std=c++17 </para>
</listitem>
<listitem><para>Microsoft Visual Studio at least v19.27 with /std:c++17 </para>
</listitem>
<listitem><para>AppleClang Xcode Version at least v12.0 with -std=c++17 </para>
</listitem>
<listitem><para>Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.1 with -std=c++17 </para>
</listitem>
<listitem><para>Intel C++ Compiler at least v19.0.1 with -std=c++17</para>
</listitem>
</itemizedlist>
Taskflow works on Linux, Windows, and Mac OS X.</para>
</sect1>
<sect1 id="release-3-0-0_1release-3-0-0_working_items">
<title>Working Items</title>
<para><itemizedlist>
<listitem><para>enhancing the taskflow profiler (<ulink url="https://github.com/taskflow/tfprof">TFProf</ulink>) </para>
</listitem>
<listitem><para>adding methods for updating <ref refid="classtf_1_1cudaFlow" kindref="compound">tf::cudaFlow</ref> (with unit tests) </para>
</listitem>
<listitem><para>adding support for <ulink url="https://docs.nvidia.com/cuda/cublas/index.html">cuBLAS</ulink> </para>
</listitem>
<listitem><para>adding support for <ulink url="https://developer.nvidia.com/cudnn">cuDNN</ulink> </para>
</listitem>
<listitem><para>adding support for SYCL (ComputeCpp and DPC++)</para>
</listitem>
</itemizedlist>
</para>
</sect1>
<sect1 id="release-3-0-0_1release-3-0-0_new_features">
<title>New Features</title>
<sect2 id="release-3-0-0_1release-3-0-0_taskflow_core">
<title>Taskflow Core</title>
<para><itemizedlist>
<listitem><para>replaced all non-standard libraries with C++17 STL (e.g., <ulink url="https://en.cppreference.com/w/cpp/utility/optional">std::optional</ulink>, <ulink url="https://en.cppreference.com/w/cpp/utility/variant">std::variant</ulink>) </para>
</listitem>
<listitem><para>added <ref refid="classtf_1_1WorkerView" kindref="compound">tf::WorkerView</ref> for users to observe the running works of tasks </para>
</listitem>
<listitem><para>added asynchronous tasking (see <ref refid="AsyncTasking" kindref="compound">Asynchronous Tasking</ref>) </para>
</listitem>
<listitem><para>modified <ref refid="classtf_1_1ObserverInterface_1a8225fcacb03089677a1efc4b16b734cc" kindref="member">tf::ObserverInterface::on_entry</ref> and <ref refid="classtf_1_1ObserverInterface_1aa22f5378154653f08d9a58326bda4754" kindref="member">tf::ObserverInterface::on_exit</ref> to take <ref refid="classtf_1_1WorkerView" kindref="compound">tf::WorkerView</ref> </para>
</listitem>
<listitem><para>added a custom graph interface to support dynamic polymorphism for tf::cudaGraph </para>
</listitem>
<listitem><para>supported separate compilations between Taskflow and CUDA (see <ref refid="CompileTaskflowWithCUDA" kindref="compound">Compile Taskflow with CUDA</ref>) </para>
</listitem>
<listitem><para>added <ref refid="classtf_1_1Semaphore" kindref="compound">tf::Semaphore</ref> and tf::CriticalSection to limit the maximum concurrency </para>
</listitem>
<listitem><para>added <ref refid="classtf_1_1Future" kindref="compound">tf::Future</ref> to support cancellation of submitted tasks (see <ref refid="RequestCancellation" kindref="compound">Request Cancellation</ref>)</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2 id="release-3-0-0_1release-3-0-0_cudaflow">
<title>cudaFlow</title>
<para><itemizedlist>
<listitem><para>added <ref refid="classtf_1_1cudaFlowCapturer" kindref="compound">tf::cudaFlowCapturer</ref> for building a cudaFlow through stream capture (see <ref refid="GPUTaskingcudaFlowCapturer" kindref="compound">GPU Tasking (cudaFlowCapturer)</ref>) </para>
</listitem>
<listitem><para>added tf::cudaFlowCapturerBase for creating custom capturers </para>
</listitem>
<listitem><para>added <ref refid="classtf_1_1cudaFlow_1a89c389fff64a16e5dd8c60875d3b514d" kindref="member">tf::cudaFlow::capture</ref> for capturing a cudaFlow within a parent cudaFlow </para>
</listitem>
<listitem><para>added tf::Taskflow::emplace_on to place a cudaFlow on a GPU </para>
</listitem>
<listitem><para>added <ref refid="classtf_1_1cudaFlow_1a7f97b68fa7c889db49b26aa71a46a7cf" kindref="member">tf::cudaFlow::dump</ref> and <ref refid="classtf_1_1cudaFlowCapturer_1a90d1265bcc27647906bed6e6876c9aa7" kindref="member">tf::cudaFlowCapturer::dump</ref> to visualize cudaFlow </para>
</listitem>
<listitem><para>added tf::cudaFlow::offload and update methods to run and update a cudaFlow explicitly </para>
</listitem>
<listitem><para>supported standalone cudaFlow </para>
</listitem>
<listitem><para>supported standalone cudaFlowCapturer </para>
</listitem>
<listitem><para>added tf::cublasFlowCapturer to support <ulink url="https://docs.nvidia.com/cuda/cublas/index.html">cuBLAS</ulink> (see LinearAlgebracublasFlowCapturer)</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2 id="release-3-0-0_1release-3-0-0_utilities">
<title>Utilities</title>
<para><itemizedlist>
<listitem><para>added utility functions to grab the cuda device properties (see <ref refid="cuda__device_8hpp" kindref="compound">cuda_device.hpp</ref>) </para>
</listitem>
<listitem><para>added utility functions to control cuda memory (see <ref refid="cuda__memory_8hpp" kindref="compound">cuda_memory.hpp</ref>) </para>
</listitem>
<listitem><para>added utility functions for common mathematics operations </para>
</listitem>
<listitem><para>added serializer and deserializer libraries to support tfprof </para>
</listitem>
<listitem><para>added per-thread pool for CUDA streams to improve performance</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2 id="release-3-0-0_1release-3-0-0_profiler">
<title>Taskflow Profiler (TFProf)</title>
<para><itemizedlist>
<listitem><para>added visualization for asynchronous tasks </para>
</listitem>
<listitem><para>added server-based profiler to support large profiling data (see <ref refid="Profiler" kindref="compound">Profile Taskflow Programs</ref>)</para>
</listitem>
</itemizedlist>
</para>
</sect2>
</sect1>
<sect1 id="release-3-0-0_1release-3-0-0_new_algorithms">
<title>New Algorithms</title>
<sect2 id="release-3-0-0_1release-3-0-0_cpu_algorithms">
<title>CPU Algorithms</title>
<para><itemizedlist>
<listitem><para>added parallel sort (see <ref refid="ParallelSort" kindref="compound">Parallel Sort</ref>)</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2 id="release-3-0-0_1release-3-0-0_gpu_algorithms">
<title>GPU Algorithms</title>
<para><itemizedlist>
<listitem><para>added single task (see <ref refid="SingleTaskCUDA" kindref="compound">Single Task</ref>) </para>
</listitem>
<listitem><para>added parallel iterations (see <ref refid="ForEachCUDA" kindref="compound">Parallel Iterations</ref>) </para>
</listitem>
<listitem><para>added parallel transforms </para>
</listitem>
<listitem><para>added parallel reduction</para>
</listitem>
</itemizedlist>
</para>
</sect2>
</sect1>
<sect1 id="release-3-0-0_1release-3-0-0_bug_fixes">
<title>Bug Fixes</title>
<para><itemizedlist>
<listitem><para>fixed the bug in stream capturing (need to use <computeroutput>ThreadLocal</computeroutput> mode) </para>
</listitem>
<listitem><para>fixed the bug in reporting wrong worker ids when compiling a shared library due to the use of <computeroutput>thread_local</computeroutput> (now with C++17 <computeroutput>inline</computeroutput> variable)</para>
</listitem>
</itemizedlist>
</para>
</sect1>
<sect1 id="release-3-0-0_1release-3-0-0_breaking_changes">
<title>Breaking Changes</title>
<para><itemizedlist>
<listitem><para>changed the returned values of asynchronous tasks to be <ulink url="https://en.cppreference.com/w/cpp/utility/optional">std::optional</ulink> in order to support cancellation (see <ref refid="AsyncTasking" kindref="compound">Asynchronous Tasking</ref> and <ref refid="RequestCancellation" kindref="compound">Request Cancellation</ref>)</para>
</listitem>
</itemizedlist>
</para>
</sect1>
<sect1 id="release-3-0-0_1release-3-0-0_deprecated_items">
<title>Deprecated and Removed Items</title>
<para><itemizedlist>
<listitem><para>removed tf::cudaFlow::device; users may call tf::Taskflow::emplace_on to associate a cudaflow with a GPU device </para>
</listitem>
<listitem><para>removed tf::cudaFlow::join, use tf::cudaFlow::offload instead </para>
</listitem>
<listitem><para>removed the legacy tf::Framework </para>
</listitem>
<listitem><para>removed external mutable use of <ref refid="classtf_1_1TaskView" kindref="compound">tf::TaskView</ref></para>
</listitem>
</itemizedlist>
</para>
</sect1>
<sect1 id="release-3-0-0_1release-3-0-0_documentation">
<title>Documentation</title>
<para><itemizedlist>
<listitem><para>added <ref refid="CompileTaskflowWithCUDA" kindref="compound">Compile Taskflow with CUDA</ref> </para>
</listitem>
<listitem><para>added <ref refid="BenchmarkTaskflow" kindref="compound">Benchmark Taskflow</ref> </para>
</listitem>
<listitem><para>added <ref refid="LimitTheMaximumConcurrency" kindref="compound">Limit the Maximum Concurrency</ref> </para>
</listitem>
<listitem><para>added <ref refid="AsyncTasking" kindref="compound">Asynchronous Tasking</ref> </para>
</listitem>
<listitem><para>added <ref refid="GPUTaskingcudaFlowCapturer" kindref="compound">GPU Tasking (cudaFlowCapturer)</ref> </para>
</listitem>
<listitem><para>added <ref refid="RequestCancellation" kindref="compound">Request Cancellation</ref> </para>
</listitem>
<listitem><para>added <ref refid="Profiler" kindref="compound">Profile Taskflow Programs</ref> </para>
</listitem>
<listitem><para>added <ref refid="cudaFlowAlgorithms" kindref="compound">cudaFlow Algorithms</ref><itemizedlist>
<listitem><para><ref refid="SingleTaskCUDA" kindref="compound">Single Task</ref> to run a kernel function in just a single thread</para>
</listitem><listitem><para><ref refid="ForEachCUDA" kindref="compound">Parallel Iterations</ref> to perform parallel iterations over a range of items</para>
</listitem><listitem><para><ref refid="ParallelTransformsCUDA" kindref="compound">Parallel Transforms</ref> to perform parallel transforms over a range of items </para>
</listitem></itemizedlist>
</para>
</listitem>
<listitem><para>added <ref refid="Governance" kindref="compound">Governance</ref><itemizedlist>
<listitem><para><ref refid="rules" kindref="compound">Rules</ref></para>
</listitem><listitem><para><ref refid="team" kindref="compound">Team</ref></para>
</listitem><listitem><para><ref refid="codeofconduct" kindref="compound">Code of Conduct</ref> </para>
</listitem></itemizedlist>
</para>
</listitem>
<listitem><para>added <ref refid="Contributing" kindref="compound">Contributing</ref><itemizedlist>
<listitem><para><ref refid="guidelines" kindref="compound">Guidelines</ref></para>
</listitem><listitem><para><ref refid="contributors" kindref="compound">Contributors</ref> </para>
</listitem></itemizedlist>
</para>
</listitem>
<listitem><para>revised <ref refid="ConditionalTasking" kindref="compound">Conditional Tasking</ref> </para>
</listitem>
<listitem><para>revised documentation pages for files</para>
</listitem>
</itemizedlist>
</para>
</sect1>
<sect1 id="release-3-0-0_1release-3-0-0_miscellaneous_items">
<title>Miscellaneous Items</title>
<para>We have presented Taskflow in the following C++ venues with recorded videos:<itemizedlist>
<listitem><para><ulink url="https://www.youtube.com/watch?v=MX15huP5DsM">2020 CppCon Taskflow Talk</ulink></para>
</listitem><listitem><para><ulink url="https://www.youtube.com/watch?v=u8Mc_WgGwVY">2020 MUC++ Taskflow Talk</ulink></para>
</listitem></itemizedlist>
</para>
<para>We have published Taskflow in the following conferences and journals:<itemizedlist>
<listitem><para>Tsung-Wei Huang, "<ulink url="iccad20.pdf">A General-purpose Parallel and Heterogeneous Task Programming System for VLSI CAD</ulink>," <emphasis>IEEE/ACM International Conference on Computer-aided Design (ICCAD)</emphasis>, CA, 2020</para>
</listitem><listitem><para>Chun-Xun Lin, Tsung-Wei Huang, and Martin Wong, "<ulink url="icpads20.pdf">An Efficient Work-Stealing Scheduler for Task Dependency Graph</ulink>," <emphasis>IEEE International Conference on Parallel and Distributed Systems (ICPADS)</emphasis>, Hong Kong, 2020</para>
</listitem><listitem><para>Tsung-Wei Huang, Dian-Lun Lin, Yibo Lin, and Chun-Xun Lin, "Cpp-Taskflow: A General-purpose Parallel Task Programming System at Scale," <emphasis>IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems (TCAD)</emphasis>, to appear, 2020 </para>
</listitem></itemizedlist>
</para>
</sect1>
</detaileddescription>
<location file="doxygen/releases/release-3.0.0.dox"/>
</compounddef>
</doxygen>