release-3-2-0 Release 3.2.0 (2021/07/29) Download release-3-2-0_1release-3-2-0_download System Requirements release-3-2-0_1release-3-2-0_system_requirements Working Items release-3-2-0_1release-3-2-0_working_items New Features release-3-2-0_1release-3-2-0_new_features Taskflow Core release-3-2-0_1release-3-2-0_taskflow_core cudaFlow release-3-2-0_1release-3-2-0_cudaflow syclFlow release-3-2-0_1release-3-2-0_syclflow CUDA Standard Parallel Algorithms release-3-2-0_1release-3-2-0_cuda_std_algorithms Utilities release-3-2-0_1release-3-2-0_utilities Taskflow Profiler (TFProf) release-3-2-0_1release-3-2-0_profiler Bug Fixes release-3-2-0_1release-3-2-0_bug_fixes Breaking Changes release-3-2-0_1release-3-2-0_breaking_changes Deprecated and Removed Items release-3-2-0_1release-3-2-0_deprecated_items Documentation release-3-2-0_1release-3-2-0_documentation Miscellaneous Items release-3-2-0_1release-3-2-0_miscellaneous_items Taskflow 3.2.0 is the 3rd release in the 3.x line! This release includes several new changes such as CPU-GPU tasking, algorithm collection, enhanced web-based profiler, documentation, and unit tests. Download Taskflow 3.2.0 can be downloaded from here. System Requirements To use Taskflow v3.2.0, you need a compiler that supports C++17: GNU C++ Compiler at least v8.4 with -std=c++17 Clang C++ Compiler at least v6.0 with -std=c++17 Microsoft Visual Studio at least v19.27 with /std:c++17 AppleClang Xcode Version at least v12.0 with -std=c++17 Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.1 with -std=c++17 Intel C++ Compiler at least v19.0.1 with -std=c++17 Intel DPC++ Clang Compiler at least v13.0.0 with -std=c++17 and SYCL20 Taskflow works on Linux, Windows, and Mac OS X. Working Items enhancing support for SYCL with Intel DPC++ enhancing parallel CPU and GPU algorithms designing pipeline interface and its scheduling algorithms New Features Taskflow Core added tf::SmallVector optimization for optimizing the dependency storage in a graph added move constructor and move assignment operator for tf::Taskflow tf::Taskflow::Taskflow(Taskflow&&) tf::Taskflow::operator=(Taskflow&&) added moved run in tf::Executor for automatically managing taskflow's lifetimes tf::Executor::run(Taskflow&&) tf::Executor::run(Taskflow&&, C&&) tf::Executor::run_n(Taskflow&&, size_t) tf::Executor::run_n(Taskflow&&, size_t, C&&) tf::Executor::run_until(Taskflow&&, P&&) tf::Executor::run_until(Taskflow&&, P&&, C&&) cudaFlow improved the execution flow of tf::cudaFlowCapturer when updates involve New algorithms in tf::cudaFlow and tf::cudaFlowCapturer: added tf::cudaFlow::reduce added tf::cudaFlow::transform_reduce added tf::cudaFlow::uninitialized_reduce added tf::cudaFlow::transform_uninitialized_reduce added tf::cudaFlow::inclusive_scan added tf::cudaFlow::exclusive_scan added tf::cudaFlow::transform_inclusive_scan added tf::cudaFlow::transform_exclusive_scan added tf::cudaFlow::merge added tf::cudaFlow::merge_by_key added tf::cudaFlow::sort added tf::cudaFlow::sort_by_key added tf::cudaFlow::find_if added tf::cudaFlow::min_element added tf::cudaFlow::max_element added tf::cudaFlowCapturer::reduce added tf::cudaFlowCapturer::transform_reduce added tf::cudaFlowCapturer::uninitialized_reduce added tf::cudaFlowCapturer::transform_uninitialized_reduce added tf::cudaFlowCapturer::inclusive_scan added tf::cudaFlowCapturer::exclusive_scan added tf::cudaFlowCapturer::transform_inclusive_scan added tf::cudaFlowCapturer::transform_exclusive_scan added tf::cudaFlowCapturer::merge added tf::cudaFlowCapturer::merge_by_key added tf::cudaFlowCapturer::sort added tf::cudaFlowCapturer::sort_by_key added tf::cudaFlowCapturer::find_if added tf::cudaFlowCapturer::min_element added tf::cudaFlowCapturer::max_element added tf::cudaLinearCapturing syclFlow CUDA Standard Parallel Algorithms added tf::cuda_for_each added tf::cuda_for_each_index added tf::cuda_transform added tf::cuda_reduce added tf::cuda_uninitialized_reduce added tf::cuda_transform_reduce added tf::cuda_transform_uninitialized_reduce added tf::cuda_inclusive_scan added tf::cuda_exclusive_scan added tf::cuda_transform_inclusive_scan added tf::cuda_transform_exclusive_scan added tf::cuda_merge added tf::cuda_merge_by_key added tf::cuda_sort added tf::cuda_sort_by_key added tf::cuda_find_if added tf::cuda_min_element added tf::cuda_max_element Utilities added CUDA meta programming added SYCL meta programming Taskflow Profiler (TFProf) Bug Fixes fixed compilation errors in constructing tf::cudaRoundRobinCapturing fixed compilation errors of TLS worker pointer in tf::Executor fixed compilation errors of nvcc v11.3 in auto template deduction std::scoped_lock tf::Serializer and tf::Deserializer fixed memory leak when moving a tf::Taskflow Breaking Changes There are no breaking changes in this release. Deprecated and Removed Items removed tf::cudaFlow::kernel_on method removed explicit partitions in parallel iterations and reductions removed tf::cudaFlowCapturerBase removed tf::cublasFlowCapturer renamed update and rebind methods in tf::cudaFlow and tf::cudaFlowCapturer to overloads Documentation revised Static Tasking Move a Taskflow revised Executor Execute a Taskflow with Transferred Ownership added cudaFlow Algorithms added CUDA Standard Algorithms Execution Policy Parallel Reduction Parallel Scan Parallel Merge Parallel Find Miscellaneous Items We have published tf::cudaFlow in the following conference: Dian-Lun Lin and Tsung-Wei Huang, "Efficient GPU Computation using Task Graph Parallelism," European Conference on Parallel and Distributed Computing (EuroPar), 2021