namespace tf { /** @page release-3-0-0 Release 3.0.0 (2021/01/01) %Taskflow 3.0.0 is the 1st release in the 3.x line! This release includes several new changes such as CPU-GPU tasking, algorithm collection, enhanced web-based profiler, documentation, and unit tests. @note Starting from v3, we have migrated the codebase to the @CPP17 standard to largely improve the expressivity and efficiency of the codebase. @tableofcontents @section release-3-0-0_download Download %Taskflow 3.0.0 can be downloaded from here. @section release-3-0-0_system_requirements System Requirements To use %Taskflow v3.0.0, you need a compiler that supports C++17: @li GNU C++ Compiler at least v7.0 with -std=c++17 @li Clang C++ Compiler at least v6.0 with -std=c++17 @li Microsoft Visual Studio at least v19.27 with /std:c++17 @li AppleClang Xcode Version at least v12.0 with -std=c++17 @li Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.1 with -std=c++17 @li Intel C++ Compiler at least v19.0.1 with -std=c++17 %Taskflow works on Linux, Windows, and Mac OS X. @section release-3-0-0_working_items Working Items @li enhancing the taskflow profiler (TFProf) @li adding methods for updating tf::cudaFlow (with unit tests) @li adding support for @cuBLAS @li adding support for @cuDNN @li adding support for SYCL (ComputeCpp and DPC++) @section release-3-0-0_new_features New Features @subsection release-3-0-0_taskflow_core Taskflow Core @li replaced all non-standard libraries with C++17 STL (e.g., @std_optional, @std_variant) @li added tf::WorkerView for users to observe the running works of tasks @li added asynchronous tasking (see @ref AsyncTasking) @li modified tf::ObserverInterface::on_entry and tf::ObserverInterface::on_exit to take tf::WorkerView @li added a custom graph interface to support dynamic polymorphism for tf::cudaGraph @li supported separate compilations between %Taskflow and CUDA (see @ref CompileTaskflowWithCUDA) @li added tf::Semaphore and tf::CriticalSection to limit the maximum concurrency @li added tf::Future to support cancellation of submitted tasks (see @ref RequestCancellation) @subsection release-3-0-0_cudaflow cudaFlow @li added tf::cudaFlowCapturer for building a %cudaFlow through stream capture (see @ref GPUTaskingcudaFlowCapturer) @li added tf::cudaFlowCapturerBase for creating custom capturers @li added tf::cudaFlow::capture for capturing a %cudaFlow within a parent %cudaFlow @li added tf::Taskflow::emplace_on to place a %cudaFlow on a GPU @li added tf::cudaFlow::dump and tf::cudaFlowCapturer::dump to visualize %cudaFlow @li added tf::cudaFlow::offload and update methods to run and update a %cudaFlow explicitly @li supported standalone %cudaFlow @li supported standalone %cudaFlowCapturer @li added tf::cublasFlowCapturer to support @cuBLAS (see LinearAlgebracublasFlowCapturer) @subsection release-3-0-0_utilities Utilities @li added utility functions to grab the cuda device properties (see cuda_device.hpp) @li added utility functions to control cuda memory (see cuda_memory.hpp) @li added utility functions for common mathematics operations @li added serializer and deserializer libraries to support tfprof @li added per-thread pool for CUDA streams to improve performance @subsection release-3-0-0_profiler Taskflow Profiler (TFProf) @li added visualization for asynchronous tasks @li added server-based profiler to support large profiling data (see @ref Profiler) @section release-3-0-0_new_algorithms New Algorithms @subsection release-3-0-0_cpu_algorithms CPU Algorithms @li added parallel sort (see @ref ParallelSort) @subsection release-3-0-0_gpu_algorithms GPU Algorithms @li added single task (see @ref SingleTaskCUDA) @li added parallel iterations (see @ref ForEachCUDA) @li added parallel transforms @li added parallel reduction @section release-3-0-0_bug_fixes Bug Fixes @li fixed the bug in stream capturing (need to use @c ThreadLocal mode) @li fixed the bug in reporting wrong worker ids when compiling a shared library due to the use of @c thread_local (now with C++17 @c inline variable) @section release-3-0-0_breaking_changes Breaking Changes @li changed the returned values of asynchronous tasks to be @std_optional in order to support cancellation (see @ref AsyncTasking and @ref RequestCancellation) @section release-3-0-0_deprecated_items Deprecated and Removed Items @li removed tf::cudaFlow::device; users may call tf::Taskflow::emplace_on to associate a cudaflow with a GPU device @li removed tf::cudaFlow::join, use tf::cudaFlow::offload instead @li removed the legacy tf::Framework @li removed external mutable use of tf::TaskView @section release-3-0-0_documentation Documentation @li added @ref CompileTaskflowWithCUDA @li added @ref BenchmarkTaskflow @li added @ref LimitTheMaximumConcurrency @li added @ref AsyncTasking @li added @ref GPUTaskingcudaFlowCapturer @li added @ref RequestCancellation @li added @ref Profiler @li added @ref cudaFlowAlgorithms + @ref SingleTaskCUDA to run a kernel function in just a single thread + @ref ForEachCUDA to perform parallel iterations over a range of items + @ref ParallelTransformsCUDA to perform parallel transforms over a range of items @li added @ref Governance + @ref rules + @ref team + @ref codeofconduct @li added @ref Contributing + @ref guidelines + @ref contributors @li revised @ref ConditionalTasking @li revised documentation pages for files @section release-3-0-0_miscellaneous_items Miscellaneous Items We have presented %Taskflow in the following C++ venues with recorded videos: + @CppCon20Talk + @MUCpp20Talk We have published %Taskflow in the following conferences and journals: + Tsung-Wei Huang, "[A General-purpose Parallel and Heterogeneous Task Programming System for VLSI CAD](iccad20.pdf)," IEEE/ACM International Conference on Computer-aided Design (ICCAD), CA, 2020 + Chun-Xun Lin, Tsung-Wei Huang, and Martin Wong, "[An Efficient Work-Stealing Scheduler for Task Dependency Graph](icpads20.pdf)," IEEE International Conference on Parallel and Distributed Systems (ICPADS), Hong Kong, 2020 + Tsung-Wei Huang, Dian-Lun Lin, Yibo Lin, and Chun-Xun Lin, "Cpp-Taskflow: A General-purpose Parallel %Task Programming System at Scale," IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems (TCAD), to appear, 2020 */ }