Taskflow Processing Pipeline

TaskflowProcessingPipeline Taskflow Processing Pipeline Formulate the Taskflow Processing Pipeline Problem TaskflowProcessingPipeline_1FormulateTheTaskflowProcessingPipelineProblem Create a Taskflow Processing Pipeline TaskflowProcessingPipeline_1CreateATaskflowProcessingPipeline Define Taskflows TaskflowProcessingPipeline_1TaskflowPipelineDefineTaskflows Define the Pipes TaskflowProcessingPipeline_1TaskflowPipelineDefineThePipes Define the Task Graph TaskflowProcessingPipeline_1TaskflowPipelineDefineTheTaskGraph Submit the Task Graph TaskflowProcessingPipeline_1TaskflowPipelineSubmitTheTaskGraph We study a taskflow processing pipeline that propagates a sequence of tokens through linearly dependent taskflows. The pipeline embeds a taskflow in each pipe to run a parallel algorithm using task graph parallelism. Formulate the Taskflow Processing Pipeline Problem Many complex and irregular pipeline applications require each pipe to run a parallel algorithm using task graph parallelism. We can formulate such applications as scheduling a sequence of tokens through linearly dependent taskflows. The following example illustrates the pipeline propagation of three scheduling tokens through three linearly dependent taskflows: Each pipe (stage) in the pipeline embeds a taskflow to perform a stage-specific parallel algorithm on an input scheduling token. Parallelism exhibits both inside and outside the three taskflows, combining both task graph parallelism and pipeline parallelism. Create a Taskflow Processing Pipeline Using the example from the previous section, we create a pipeline of three serial pipes each running a taskflow on a sequence of five scheduling tokens. The overall implementation is shown below: #include<taskflow/taskflow.hpp> #include<taskflow/algorithm/pipeline.hpp> //taskflowonthefirstpipe voidmake_taskflow1(tf::Taskflow&tf){ auto[A1,B1,C1,D1]=tf.emplace( [](){printf("A1\n");}, [](){printf("B1\n");}, [](){printf("C1\n");}, [](){printf("D1\n");} ); A1.precede(B1,C1); D1.succeed(B1,C1); } //taskflowonthesecondpipe voidmake_taskflow2(tf::Taskflow&tf){ auto[A2,B2,C2,D2]=tf.emplace( [](){printf("A2\n");}, [](){printf("B2\n");}, [](){printf("C2\n");}, [](){printf("D2\n");} ); tf.linearize({A2,B2,C2,D2}); } //taskflowonthethirdpipe voidmake_taskflow3(tf::Taskflow&tf){ auto[A3,B3,C3,D3]=tf.emplace( [](){printf("A3\n");}, [](){printf("B3\n");}, [](){printf("C3\n");}, [](){printf("D3\n");} ); A3.precede(B3,C3,D3); } intmain(){ tf::Taskflowtaskflow("taskflowprocessingpipeline"); tf::Executorexecutor; constsize_tnum_lines=2; constsize_tnum_pipes=3; //definethetaskflowstorage //weusethepipedimensionbecausewecreatethree'serial'pipes std::array<tf::Taskflow, num_pipes>taskflows; //createthreedifferenttaskflowsforthethreepipes make_taskflow1(taskflows[0]); make_taskflow2(taskflows[1]); make_taskflow3(taskflows[2]); //thepipelineconsistsofthreeserialpipes //anduptotwoconcurrentschedulingtokens tf::Pipelinepl(num_lines, //firstpiperunstaskflow1 tf::Pipe{tf::PipeType::SERIAL,[&](tf::Pipeflow&pf){ if(pf.token()==5){ pf.stop(); return; } printf("begintoken%zu\n",pf.token()); executor.corun(taskflows[pf.pipe()]); }}, //secondpiperunstaskflow2 tf::Pipe{tf::PipeType::SERIAL,[&](tf::Pipeflow&pf){ executor.corun(taskflows[pf.pipe()]); }}, //thirdpipecallstaskflow3 tf::Pipe{tf::PipeType::SERIAL,[&](tf::Pipeflow&pf){ executor.corun(taskflows[pf.pipe()]); }} ); //buildthepipelinegraphusingcomposition tf::Taskinit=taskflow.emplace([](){std::cout<<"ready\n";}) .name("startingpipeline"); tf::Tasktask=taskflow.composed_of(pl) .name("pipeline"); tf::Taskstop=taskflow.emplace([](){std::cout<<"stopped\n";}) .name("pipelinestopped"); //createtaskdependency init.precede(task); task.precede(stop); //dumpthepipelinegraphstructure(withcomposition) taskflow.dump(std::cout); //runthepipeline executor.run(taskflow).wait(); return0; } Define Taskflows First, we define three taskflows for the three pipes in the pipeline: //taskflowonthefirstpipe voidmake_taskflow1(tf::Taskflow&tf){ auto[A1,B1,C1,D1]=tf.emplace( [](){printf("A1\n");}, [](){printf("B1\n");}, [](){printf("C1\n");}, [](){printf("D1\n");} ); A1.precede(B1,C1); D1.succeed(B1,C1); } //taskflowonthesecondpipe voidmake_taskflow2(tf::Taskflow&tf){ auto[A2,B2,C2,D2]=tf.emplace( [](){printf("A2\n");}, [](){printf("B2\n");}, [](){printf("C2\n");}, [](){printf("D2\n");} ); tf.linearize({A2,B2,C2,D2}); } //taskflowonthethirdpipe voidmake_taskflow3(tf::Taskflow&tf){ auto[A3,B3,C3,D3]=tf.emplace( [](){printf("A3\n");}, [](){printf("B3\n");}, [](){printf("C3\n");}, [](){printf("D3\n");} ); A3.precede(B3,C3,D3); } As each taskflow corresponds to a pipe in the pipeline, we create a linear array to store the three taskflows: std::array<tf::Taskflow, num_pipes>taskflows; make_taskflow1(taskflows[0]); make_taskflow2(taskflows[1]); make_taskflow3(taskflows[2]); Since the three taskflows are linearly dependent, at most one taskflow will run at a pipe. We can store the three taskflows in a linear array of dimension equal to the number of pipes. If there is a parallel pipe, we need to use two-dimensional array, as multiple taskflows at a stage can run simultaneously across parallel lines. Define the Pipes The pipe definition is straightforward. Each pipe runs the corresponding taskflow, which can be indexed at taskflows with the pipe's identifier, tf::Pipeflow::pipe(). The first pipe will cease the pipeline scheduling when it has processed five scheduling tokens: //firstpiperunstaskflow1 tf::Pipe{tf::PipeType::SERIAL,[&](tf::Pipeflow&pf){ if(pf.token()==5){ pf.stop(); return; } printf("begintoken%zu\n",pf.token()); executor.corun(taskflows[pf.pipe()]); }}, //secondpiperunstaskflow2 tf::Pipe{tf::PipeType::SERIAL,[&](tf::Pipeflow&pf){ executor.corun(taskflows[pf.pipe()]); }}, //thirdpipecallstaskflow3 tf::Pipe{tf::PipeType::SERIAL,[&](tf::Pipeflow&pf){ executor.corun(taskflows[pf.pipe()]); }} At each pipe, we use tf::Executor::corun to execute the corresponding taskflow and wait until the execution completes. This is important because we want the caller thread, which is the worker that invokes the pipe callable, to not block (i.e., executor.run(taskflows[pf.pipe()]).wait()) but participate in the work-stealing loop of the scheduler to avoid deadlock. Define the Task Graph To build up the taskflow for the pipeline, we create a module task with the defined pipeline structure and connect it with two tasks that output helper messages before and after the pipeline: tf::Taskinit=taskflow.emplace([](){std::cout<<"ready\n";}) .name("startingpipeline"); tf::Tasktask=taskflow.composed_of(pl) .name("pipeline"); tf::Taskstop=taskflow.emplace([](){std::cout<<"stopped\n";}) .name("pipelinestopped"); init.precede(task); task.precede(stop); Submit the Task Graph Finally, we submit the taskflow to the execution and run it once: executor.run(taskflow).wait(); One possible output is shown below: ready begintoken0 A1 C1 B1 D1 begintoken1 A2 B2 A1 C1 B1 D1 C2 D2 A3 D3 C3 B3 begintoken2 A2 B2 C2 D2 A1 C1 B1 D1 A3 D3 C3 B3 A2 B2 C2 D2 begintoken3 A3 D3 C3 B3 A1 C1 B1 D1 begintoken4 A2 A1 C1 B1 D1 B2 C2 D2 A3 D3 C3 B3 A2 B2 C2 D2 A3 D3 C3 B3 stopped