238 lines
9.5 KiB
Text
238 lines
9.5 KiB
Text
namespace tf {
|
|
|
|
/** @page release-3-6-0 Release 3.6.0 (2023/05/07)
|
|
|
|
%Taskflow 3.6.0 is the 7th release in the 3.x line!
|
|
This release includes several new changes, such as dynamic task graph parallelism,
|
|
improved parallel algorithms, modified GPU tasking interface, documentation, examples, and unit tests.
|
|
|
|
@tableofcontents
|
|
|
|
@section release-3-6-0_download Download
|
|
|
|
%Taskflow 3.6.0 can be downloaded from <a href="https://github.com/taskflow/taskflow/releases/tag/v3.6.0">here</a>.
|
|
|
|
@section release-3-6-0_system_requirements System Requirements
|
|
|
|
To use %Taskflow v3.6.0, you need a compiler that supports C++17:
|
|
|
|
@li GNU C++ Compiler at least v8.4 with -std=c++17
|
|
@li Clang C++ Compiler at least v6.0 with -std=c++17
|
|
@li Microsoft Visual Studio at least v19.27 with /std:c++17
|
|
@li AppleClang Xcode Version at least v12.0 with -std=c++17
|
|
@li Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.1 with -std=c++17
|
|
@li Intel C++ Compiler at least v19.0.1 with -std=c++17
|
|
@li Intel DPC++ Clang Compiler at least v13.0.0 with -std=c++17 and SYCL20
|
|
|
|
%Taskflow works on Linux, Windows, and Mac OS X.
|
|
|
|
@section release-3-6-0_summary Release Summary
|
|
|
|
This release contains several changes to largely enhance the programmability
|
|
of GPU tasking and standard parallel algorithms.
|
|
More importantly, we have introduced a new dependent asynchronous tasking
|
|
model that offers great flexibility for expressing dynamic task graph parallelism.
|
|
|
|
@section release-3-6-0_new_features New Features
|
|
|
|
@subsection release-3-6-0_taskflow_core Taskflow Core
|
|
|
|
+ Added new async methods to support dynamic task graph creation
|
|
+ tf::Executor::dependent_async(F&& func, Tasks&&... tasks)
|
|
+ tf::Executor::dependent_async(F&& func, I first, I last)
|
|
+ tf::Executor::silent_dependent_async(F&& func, Tasks&&... tasks)
|
|
+ tf::Executor::silent_dependent_async(F&& func, I first, I last)
|
|
+ Added new async and join methods to tf::Runtime
|
|
+ tf::Runtime::async
|
|
+ tf::Runtime::silent_async
|
|
+ tf::Runtime::corun_all
|
|
+ Added a new partitioner interface to optimize parallel algorithms
|
|
+ tf::GuidedPartitioner
|
|
+ tf::StaticPartitioner
|
|
+ tf::DynamicPartitioner
|
|
+ tf::RandomPartitioner
|
|
+ Added parallel-scan algorithms to %Taskflow
|
|
+ tf::Taskflow::inclusive_scan(B first, E last, D d_first, BOP bop)
|
|
+ tf::Taskflow::inclusive_scan(B first, E last, D d_first, BOP bop, T init)
|
|
+ tf::Taskflow::transform_inclusive_scan(B first, E last, D d_first, BOP bop, UOP uop)
|
|
+ tf::Taskflow::transform_inclusive_scan(B first, E last, D d_first, BOP bop, UOP uop, T init)
|
|
+ tf::Taskflow::exclusive_scan(B first, E last, D d_first, T init, BOP bop)
|
|
+ tf::Taskflow::transform_exclusive_scan(B first, E last, D d_first, T init, BOP bop, UOP uop)
|
|
+ Added parallel-find algorithms to %Taskflow
|
|
+ tf::Taskflow::find_if(B first, E last, T& result, UOP predicate, P&& part)
|
|
+ tf::Taskflow::find_if_not(B first, E last, T& result, UOP predicate, P&& part)
|
|
+ tf::Taskflow::min_element(B first, E last, T& result, C comp, P&& part)
|
|
+ tf::Taskflow::max_element(B first, E last, T& result, C comp, P&& part)
|
|
+ Modified tf::Subflow as a derived class from tf::Runtime
|
|
+ Extended parallel algorithms to support different partitioning algorithms
|
|
+ tf::Taskflow::for_each_index(B first, E last, S step, C callable, P&& part)
|
|
+ tf::Taskflow::for_each(B first, E last, C callable, P&& part)
|
|
+ tf::Taskflow::transform(B first1, E last1, O d_first, C c, P&& part)
|
|
+ tf::Taskflow::transform(B1 first1, E1 last1, B2 first2, O d_first, C c, P&& part)
|
|
+ tf::Taskflow::reduce(B first, E last, T& result, O bop, P&& part)
|
|
+ tf::Taskflow::transform_reduce(B first, E last, T& result, BOP bop, UOP uop, P&& part)
|
|
+ Improved the performance of tf::Taskflow::sort for plain-old-data (POD) type
|
|
+ Extended task-parallel pipeline to handle token dependencies
|
|
+ @ref TaskParallelPipelineWithTokenDependencies
|
|
|
|
@subsection release-3-6-0_cudaflow cudaFlow
|
|
|
|
+ removed algorithms that require buffer from tf::cudaFlow due to update limitation
|
|
+ removed support for a dedicated %cudaFlow task in %Taskflow
|
|
+ all usage of tf::cudaFlow and tf::cudaFlowCapturer are standalone now
|
|
|
|
@subsection release-3-6-0_utilities Utilities
|
|
|
|
+ Added all_same templates to check if a parameter pack has the same type
|
|
|
|
@subsection release-3-6-0_profiler Taskflow Profiler (TFProf)
|
|
|
|
+ Removed %cudaFlow and %syclFlow tasks
|
|
|
|
@section release-3-6-0_bug_fixes Bug Fixes
|
|
|
|
+ Fixed the compilation error caused by clashing `MAX_PRIORITY` wtih `winspool.h` ([#459](https://github.com/taskflow/taskflow/pull/459))
|
|
+ Fixed the compilation error caused by tf::TaskView::for_each_successor and tf::TaskView::for_each_dependent
|
|
+ Fixed the infinite-loop bug when corunning a module task from tf::Runtime
|
|
|
|
If you encounter any potential bugs, please submit an issue at @IssueTracker.
|
|
|
|
@section release-3-6-0_breaking_changes Breaking Changes
|
|
|
|
+ Dropped support for cancelling asynchronous tasks
|
|
|
|
@code{.cpp}
|
|
// previous - no longer supported
|
|
tf::Future<int> fu = executor.async([](){
|
|
return 1;
|
|
});
|
|
fu.cancel();
|
|
std::optional<int> res = fu.get(); // res may be std::nullopt or 1
|
|
|
|
// now - use std::future instead
|
|
std::future<int> fu = executor.async([](){
|
|
return 1;
|
|
});
|
|
int res = fu.get();
|
|
@endcode
|
|
|
|
+ Dropped in-place support for running tf::cudaFlow from a dedicated task
|
|
|
|
@code{.cpp}
|
|
// previous - no longer supported
|
|
taskflow.emplace([](tf::cudaFlow& cf){
|
|
cf.offload();
|
|
});
|
|
|
|
// now - user to fully control tf::cudaFlow for maximum flexibility
|
|
taskflow.emplace([](){
|
|
tf::cudaFlow cf;
|
|
|
|
// offload the cudaflow asynchronously through a stream
|
|
tf::cudaStream stream;
|
|
cf.run(stream);
|
|
|
|
// wait for the cudaflow completes
|
|
stream.synchronize();
|
|
});
|
|
@endcode
|
|
|
|
+ Dropped in-place support for running tf::cudaFlowCapturer from a dedicated task
|
|
|
|
@code{.cpp}
|
|
// previous - now longer supported
|
|
taskflow.emplace([](tf::cudaFlowCapturer& cf){
|
|
cf.offload();
|
|
});
|
|
|
|
// now - user to fully control tf::cudaFlowCapturer for maximum flexibility
|
|
taskflow.emplace([](){
|
|
tf::cudaFlowCapturer cf;
|
|
|
|
// offload the cudaflow asynchronously through a stream
|
|
tf::cudaStream stream;
|
|
cf.run(stream);
|
|
|
|
// wait for the cudaflow completes
|
|
stream.synchronize();
|
|
});
|
|
@endcode
|
|
|
|
+ Dropped in-place support for running tf::syclFlow from a dedicated task
|
|
+ SYCL can just be used out of box together with %Taskflow
|
|
+ Move all buffer query methods of CUDA standard algorithms inside execution policy
|
|
+ tf::cudaExecutionPolicy<NT, VT>::reduce_bufsz
|
|
+ tf::cudaExecutionPolicy<NT, VT>::scan_bufsz
|
|
+ tf::cudaExecutionPolicy<NT, VT>::merge_bufsz
|
|
+ tf::cudaExecutionPolicy<NT, VT>::min_element_bufsz
|
|
+ tf::cudaExecutionPolicy<NT, VT>::max_element_bufsz
|
|
|
|
@code{.cpp}
|
|
// previous - no longer supported
|
|
tf::cuda_reduce_buffer_size<tf::cudaDefaultExecutionPolicy, int>(N);
|
|
|
|
// now (and similarly for other parallel algorithms)
|
|
tf::cudaDefaultExecutionPolicy policy(stream);
|
|
policy.reduce_bufsz<int>(N);
|
|
@endcode
|
|
|
|
+ Renamed tf::Executor::run_and_wait to tf::Executor::corun for expressiveness
|
|
+ Renamed tf::Executor::loop_until to tf::Executor::corun_until for expressiveness
|
|
+ Renamed tf::Runtime::run_and_wait to tf::Runtime::corun for expressiveness
|
|
+ Disabled argument support for all asynchronous tasking features
|
|
+ users are responsible for creating their own wrapper to make the callable
|
|
|
|
@code{.cpp}
|
|
// previous - async allows passing arguments to the callable
|
|
executor.async([](int i){ std::cout << i << std::endl; }, 4);
|
|
|
|
// now - users are responsible of wrapping the arumgnets into a callable
|
|
executor.async([i=4]( std::cout << i << std::endl; ){});
|
|
@endcode
|
|
|
|
+ Replaced `named_async` with an overload that takes the name string on the first argument
|
|
|
|
@code{.cpp}
|
|
// previous - explicitly calling named_async to assign a name to an async task
|
|
executor.named_async("name", [](){});
|
|
|
|
// now - overlaod
|
|
executor.async("name", [](){});
|
|
@endcode
|
|
|
|
@section release-3-6-0_documentation Documentation
|
|
|
|
+ Revised @ref RequestCancellation to remove support of cancelling async tasks
|
|
+ Revised @ref AsyncTasking to include asynchronous tasking from tf::Runtime
|
|
+ @ref LaunchAsynchronousTasksFromARuntime
|
|
+ Revised %Taskflow algorithms to include execution policy
|
|
+ @ref PartitioningAlgorithm
|
|
+ @ref ParallelIterations
|
|
+ @ref ParallelTransforms
|
|
+ @ref ParallelReduction
|
|
+ Revised CUDA standard algorithms to correct the use of buffer query methods
|
|
+ @ref CUDASTDReduce
|
|
+ @ref CUDASTDFind
|
|
+ @ref CUDASTDMerge
|
|
+ @ref CUDASTDScan
|
|
+ Added @ref TaskParallelPipelineWithTokenDependencies
|
|
+ Added @ref ParallelScan
|
|
+ Added @ref DependentAsyncTasking
|
|
|
|
@section release-3-6-0_miscellaneous_items Miscellaneous Items
|
|
|
|
We have published %Taskflow in the following venues:
|
|
|
|
+ Dian-Lun Lin, Yanqing Zhang, Haoxing Ren, Shih-Hsin Wang, Brucek Khailany and Tsung-Wei Huang, "[GenFuzz: GPU-accelerated Hardware Fuzzing using Genetic Algorithm with Multiple Inputs](https://tsung-wei-huang.github.io/papers/2023-dac.pdf)," <em>ACM/IEEE Design Automation Conference (DAC)</em>, San Francisco, CA, 2023
|
|
+ Tsung-Wei Huang, "[qTask: Task-parallel Quantum Circuit Simulation with Incrementality](https://tsung-wei-huang.github.io/papers/ipdps23.pdf)," <em>IEEE International Parallel and Distributed Processing Symposium (IPDPS)</em>, St. Petersburg, Florida, 2023
|
|
+ Elmir Dzaka, Dian-Lun Lin, and Tsung-Wei Huang, "[Parallel And-Inverter Graph Simulation Using a Task-graph Computing System](https://tsung-wei-huang.github.io/papers/pdco-23.pdf)," <em>IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW)</em>, St. Petersburg, Florida, 2023
|
|
|
|
|
|
Please do not hesitate to contact @twhuang if you intend to collaborate with us
|
|
on using %Taskflow in your scientific computing projects.
|
|
|
|
*/
|
|
|
|
}
|
|
|
|
|