index Modern C++ Parallel Task Programming Start Your First Taskflow Program indexpage_1ASimpleFirstProgram Create a Subflow Graph indexpage_1QuickStartCreateASubflowGraph Integrate Control Flow into a Task Graph indexpage_1QuickStartIntegrateControlFlowIntoATaskGraph Offload Tasks to a GPU indexpage_1QuickStartOffloadTasksToGPU Compose Task Graphs indexpage_1QuickStartComposeTaskGraphs Launch Asynchronous Tasks indexpage_1QuickStartLaunchAsyncTasks Run a Taskflow through an Executor indexpage_1QuickStartRunATaskflowThroughAnExecution Leverage Standard Parallel Algorithms indexpage_1QuickStartLeverageStandardParallelAlgorithms Visualize Taskflow Graphs indexpage_1QuickStartVisualizeATaskflow Supported Compilers indexpage_1SupportedCompilers Get Involved indexpage_1QuickStartGetInvolved License indexpage_1License Taskflow helps you quickly write parallel and heterogeneous task programs with high performance and simultaneous high productivity. It is faster, more expressive, fewer lines of code, and easier for drop-in integration than many of existing task programming libraries. The source code is available in our Project GitHub. Start Your First Taskflow Program The following program (simple.cpp) creates four tasks A, B, C, and D, where A runs before B and C, and D runs after B and C. When A finishes, B and C can run in parallel. #include<taskflow/taskflow.hpp>//Taskflowisheader-only intmain(){ tf::Executorexecutor; tf::Taskflowtaskflow; auto[A,B,C,D]=taskflow.emplace(//createfourtasks [](){std::cout<<"TaskA\n";}, [](){std::cout<<"TaskB\n";}, [](){std::cout<<"TaskC\n";}, [](){std::cout<<"TaskD\n";} ); A.precede(B,C);//ArunsbeforeBandC D.succeed(B,C);//DrunsafterBandC executor.run(taskflow).wait(); return0; } Taskflow is header-only and there is no wrangle with installation. To compile the program, clone the Taskflow project and tell the compiler to include the headers under taskflow/. ~$gitclonehttps://github.com/taskflow/taskflow.git#cloneitonlyonce ~$g++-std=c++20simple.cpp-Itaskflow/-O2-pthread-osimple ~$./simple TaskA TaskC TaskB TaskD Taskflow comes with a built-in profiler, Taskflow Profiler, for you to profile and visualize taskflow programs in an easy-to-use web-based interface. #runtheprogramwiththeenvironmentvariableTF_ENABLE_PROFILERenabled ~$TF_ENABLE_PROFILER=simple.json./simple ~$catsimple.json [ {"executor":"0","data":[{"worker":0,"level":0,"data":[{"span":[172,186],"name":"0_0","type":"static"},{"span":[187,189],"name":"0_1","type":"static"}]},{"worker":2,"level":0,"data":[{"span":[93,164],"name":"2_0","type":"static"},{"span":[170,179],"name":"2_1","type":"static"}]}]} ] #pastetheprofilingjsondatatohttps://taskflow.github.io/tfprof/ Create a Subflow Graph Taskflow supports recursive tasking for you to create a subflow graph from the execution of a task to perform recursive parallelism. The following program spawns a task dependency graph parented at task B. tf::TaskA=taskflow.emplace([](){}).name("A"); tf::TaskC=taskflow.emplace([](){}).name("C"); tf::TaskD=taskflow.emplace([](){}).name("D"); tf::TaskB=taskflow.emplace([](tf::Subflow&subflow){//subflowtaskB tf::TaskB1=subflow.emplace([](){}).name("B1"); tf::TaskB2=subflow.emplace([](){}).name("B2"); tf::TaskB3=subflow.emplace([](){}).name("B3"); B3.succeed(B1,B2);//B3runsafterB1andB2 }).name("B"); A.precede(B,C);//ArunsbeforeBandC D.succeed(B,C);//DrunsafterBandC Integrate Control Flow into a Task Graph Taskflow supports conditional tasking for you to make rapid control-flow decisions across dependent tasks to implement cycles and conditions in an end-to-end task graph. tf::Taskinit=taskflow.emplace([](){}).name("init"); tf::Taskstop=taskflow.emplace([](){}).name("stop"); //createsaconditiontaskthatreturnsarandombinary tf::Taskcond=taskflow.emplace([](){returnstd::rand()%2;}).name("cond"); //createsafeedbackloop{0:cond,1:stop} init.precede(cond); cond.precede(cond,stop);//movesonto'cond'onreturning0,or'stop'on1 Offload Tasks to a GPU Taskflow supports GPU tasking for you to accelerate a wide range of scientific computing applications by harnessing the power of CPU-GPU collaborative computing using CUDA. __global__voidsaxpy(intn,floata,float*x,float*y){ inti=blockIdx.x*blockDim.x+threadIdx.x; if(i<n){ y[i]=a*x[i]+y[i]; } } tf::Taskcudaflow=taskflow.emplace([&](tf::cudaFlow&cf){ tf::cudaTaskh2d_x=cf.copy(dx,hx.data(),N).name("h2d_x"); tf::cudaTaskh2d_y=cf.copy(dy,hy.data(),N).name("h2d_y"); tf::cudaTaskd2h_x=cf.copy(hx.data(),dx,N).name("d2h_x"); tf::cudaTaskd2h_y=cf.copy(hy.data(),dy,N).name("d2h_y"); tf::cudaTasksaxpy=cf.kernel((N+255)/256,256,0,saxpy,N,2.0f,dx,dy) .name("saxpy");//parameterstothesaxpykernel saxpy.succeed(h2d_x,h2d_y) .precede(d2h_x,d2h_y); }).name("cudaFlow"); Compose Task Graphs Taskflow is composable. You can create large parallel graphs through composition of modular and reusable blocks that are easier to optimize at an individual scope. tf::Taskflowf1,f2; //createtaskflowf1oftwotasks tf::Taskf1A=f1.emplace([](){std::cout<<"Taskf1A\n";}).name("f1A"); tf::Taskf1B=f1.emplace([](){std::cout<<"Taskf1B\n";}).name("f1B"); //createtaskflowf2withonemoduletaskcomposedoff1 tf::Taskf2A=f2.emplace([](){std::cout<<"Taskf2A\n";}).name("f2A"); tf::Taskf2B=f2.emplace([](){std::cout<<"Taskf2B\n";}).name("f2B"); tf::Taskf2C=f2.emplace([](){std::cout<<"Taskf2C\n";}).name("f2C"); tf::Taskf1_module_task=f2.composed_of(f1).name("module"); f1_module_task.succeed(f2A,f2B) .precede(f2C); Launch Asynchronous Tasks Taskflow supports asynchronous tasking. You can launch tasks asynchronously to dynamically explore task graph parallelism. tf::Executorexecutor; //createasynchronoustasksdirectlyfromanexecutor std::future<int>future=executor.async([](){ std::cout<<"asynctaskreturns1\n"; return1; }); executor.silent_async([](){std::cout<<"asynctaskdoesnotreturn\n";}); //createasynchronoustaskswithdynamicdependencies tf::AsyncTaskA=executor.silent_dependent_async([](){printf("A\n");}); tf::AsyncTaskB=executor.silent_dependent_async([](){printf("B\n");},A); tf::AsyncTaskC=executor.silent_dependent_async([](){printf("C\n");},A); tf::AsyncTaskD=executor.silent_dependent_async([](){printf("D\n");},B,C); executor.wait_for_all(); Run a Taskflow through an Executor The executor provides several thread-safe methods to run a taskflow. You can run a taskflow once, multiple times, or until a stopping criteria is met. These methods are non-blocking with a tf::Future<void> return to let you query the execution status. //runsthetaskflowonce tf::Future<void>run_once=executor.run(taskflow); //waitonthisruntofinish run_once.get(); //runthetaskflowfourtimes executor.run_n(taskflow,4); //runsthetaskflowfivetimes executor.run_until(taskflow,[counter=5](){return--counter==0;}); //blockstheexecutoruntilallsubmittedtaskflowscomplete executor.wait_for_all(); Leverage Standard Parallel Algorithms Taskflow defines algorithms for you to quickly express common parallel patterns using standard C++ syntaxes, such as parallel iterations, parallel reductions, and parallel sort. //standardparallelCPUalgorithms tf::Tasktask1=taskflow.for_each(//assigneachelementto100inparallel first,last,[](auto&i){i=100;} ); tf::Tasktask2=taskflow.reduce(//reducearangeofitemsinparallel first,last,init,[](autoa,autob){returna+b;} ); tf::Tasktask3=taskflow.sort(//sortarangeofitemsinparallel first,last,[](autoa,autob){returna<b;} ); Additionally, Taskflow provides composable graph building blocks for you to efficiently implement common parallel algorithms, such as parallel pipeline. //createapipelinetopropagatefivetokensthroughthreeserialstages tf::Pipelinepl(num_lines, tf::Pipe{tf::PipeType::SERIAL,[](tf::Pipeflow&pf){ if(pf.token()==5){ pf.stop(); } }}, tf::Pipe{tf::PipeType::SERIAL,[](tf::Pipeflow&pf){ printf("stage2:inputbuffer[%zu]=%d\n",pf.line(),buffer[pf.line()]); }}, tf::Pipe{tf::PipeType::SERIAL,[](tf::Pipeflow&pf){ printf("stage3:inputbuffer[%zu]=%d\n",pf.line(),buffer[pf.line()]); }} ); taskflow.composed_of(pl) executor.run(taskflow).wait(); Visualize Taskflow Graphs You can dump a taskflow graph to a DOT format and visualize it using a number of free GraphViz tools such as GraphViz Online. tf::Taskflowtaskflow; tf::TaskA=taskflow.emplace([](){}).name("A"); tf::TaskB=taskflow.emplace([](){}).name("B"); tf::TaskC=taskflow.emplace([](){}).name("C"); tf::TaskD=taskflow.emplace([](){}).name("D"); tf::TaskE=taskflow.emplace([](){}).name("E"); A.precede(B,C,E); C.precede(D); B.precede(D,E); //dumpthegraphtoaDOTfilethroughstd::cout taskflow.dump(std::cout); Supported Compilers To use Taskflow, you only need a compiler that supports C++17: GNU C++ Compiler at least v8.4 with -std=c++17 Clang C++ Compiler at least v6.0 with -std=c++17 Microsoft Visual Studio at least v19.27 with /std:c++17 AppleClang Xcode Version at least v12.0 with -std=c++17 Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.1 with -std=c++17 Intel C++ Compiler at least v19.0.1 with -std=c++17 Intel DPC++ Clang Compiler at least v13.0.0 with -std=c++17 and SYCL20 Taskflow works on Linux, Windows, and Mac OS X. Although Taskflow supports primarily C++17, you can enable C++20 compilation through -std=c++20 to achieve better performance due to new C++20 features. Get Involved Visit our Project Website and showcase presentation to learn more about Taskflow. To get involved: See release notes at Release Notes Read the step-by-step tutorial at Cookbook Submit an issue at issue tracker Learn more about our technical details at References Watch our 2020 CppCon Taskflow Talk and 2020 MUC++ Taskflow Talk We are committed to support trustworthy developments for both academic and industrial research projects in parallel and heterogeneous computing. If you are using Taskflow, please cite the following paper we published at 2022 IEEE TPDS: Tsung-Wei Huang, Dian-Lun Lin, Chun-Xun Lin, and Yibo Lin, "Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing System," IEEE Transactions on Parallel and Distributed Systems (TPDS), vol. 33, no. 6, pp. 1303-1320, June 2022 More importantly, we appreciate all Taskflow Contributors and the following organizations for sponsoring the Taskflow project!
License Taskflow is open-source under permissive MIT license. You are completely free to use, modify, and redistribute any work on top of Taskflow. The source code is available in Project GitHub and is actively maintained by Dr. Tsung-Wei Huang and his research group at the University of Wisconsin at Madison.