tf::cudaFlowRoundRobinOptimizer tf::cudaFlowOptimizerBase taskflow/cuda/cuda_optimizer.hpp class friend class cudaFlowCapturer cudaFlowCapturer cudaFlowCapturer size_t size_t tf::cudaFlowRoundRobinOptimizer::_num_streams _num_streams {4} tf::cudaFlowRoundRobinOptimizer::cudaFlowRoundRobinOptimizer ()=default cudaFlowRoundRobinOptimizer constructs a round-robin optimizer with 4 streams by default tf::cudaFlowRoundRobinOptimizer::cudaFlowRoundRobinOptimizer (size_t num_streams) cudaFlowRoundRobinOptimizer size_t num_streams constructs a round-robin optimizer with the given number of streams size_t size_t tf::cudaFlowRoundRobinOptimizer::num_streams () const num_streams queries the number of streams used by the optimizer void void tf::cudaFlowRoundRobinOptimizer::num_streams (size_t n) num_streams size_t n sets the number of streams used by the optimizer cudaGraph_t cudaGraph_t tf::cudaFlowRoundRobinOptimizer::_optimize (cudaFlowGraph &graph) _optimize cudaFlowGraph & graph void void tf::cudaFlowRoundRobinOptimizer::_reset (std::vector< std::vector< cudaFlowNode * >> &graph) _reset std::vector< std::vector< cudaFlowNode * >> & graph class to capture a CUDA graph using a round-robin algorithm A round-robin capturing algorithm levelizes the user-described graph and assign streams to nodes in a round-robin order level by level. The algorithm is based on the following paper published in Euro-Par 2021: Dian-Lun Lin and Tsung-Wei Huang, "Efficient GPU Computation using Task Graph Parallelism," European Conference on Parallel and Distributed Computing (Euro-Par), 2021 The round-robin optimization algorithm is best suited for large cudaFlow graphs that compose hundreds of or thousands of GPU operations (e.g., kernels and memory copies) with many of them being able to run in parallel. You can configure the number of streams to the optimizer to adjust the maximum kernel currency in the captured CUDA graph. tf::cudaFlowRoundRobinOptimizer_levelize tf::cudaFlowRoundRobinOptimizer_num_streams tf::cudaFlowRoundRobinOptimizer_optimize tf::cudaFlowRoundRobinOptimizer_reset tf::cudaFlowRoundRobinOptimizer_toposort tf::cudaFlowRoundRobinOptimizercudaFlowCapturer tf::cudaFlowRoundRobinOptimizercudaFlowRoundRobinOptimizer tf::cudaFlowRoundRobinOptimizercudaFlowRoundRobinOptimizer tf::cudaFlowRoundRobinOptimizernum_streams tf::cudaFlowRoundRobinOptimizernum_streams