Data-parallel Pipeline

DataParallelPipeline Data-parallel Pipeline Include the Header DataParallelPipeline_1ParallelDataPipelineIncludeHeaderFile Create a Data Pipeline Module Task DataParallelPipeline_1CreateADataPipelineModuleTask Understand Internal Data Storage DataParallelPipeline_1UnderstandInternalDataStorage Learn More about Taskflow Pipeline DataParallelPipeline_1DataParallelPipelineLearnMore Taskflow provides another variant, tf::DataPipeline, on top of tf::Pipeline (see Task-parallel Pipeline) to help you implement data-parallel pipeline algorithms while leaving data management to Taskflow. We recommend you finishing reading TaskParallelPipeline first before learning tf::DataPipeline. Include the Header You need to include the header file, taskflow/algorithm/data_pipeline.hpp, for implementing data-parallel pipeline algorithms. #include<taskflow/algorithm/data_pipeline.hpp> Create a Data Pipeline Module Task Similar to creating a task-parallel pipeline (tf::Pipeline), there are three steps to create a data-parallel pipeline application: Define the pipeline structure (e.g., pipe type, pipe callable, stopping rule, line count) Define the data storage and layout, if needed for the application Define the pipeline taskflow graph using composition The following example creates a data-parallel pipeline that generates a total of five dataflow tokens from void to int at the first stage, from int to std::string at the second stage, and std::string to void at the final stage. Data storage between stages is automatically managed by tf::DataPipeline. #include<taskflow/taskflow.hpp> #include<taskflow/algorithm/data_pipeline.hpp> intmain(){ //dataflow=>void->int->std::string->void tf::Taskflowtaskflow("pipeline"); tf::Executorexecutor; constsize_tnum_lines=4; //createapipelinegraph tf::DataPipelinepl(num_lines, tf::make_data_pipe<void,int>(tf::PipeType::SERIAL,[&](tf::Pipeflow&pf)->int{ if(pf.token()==5){ pf.stop(); return0; } else{ printf("firstpipereturns%lu\n",pf.token()); returnpf.token(); } }), tf::make_data_pipe<int,std::string>(tf::PipeType::SERIAL,[](int&input){ printf("secondpipereturnsastringof%d\n",input+100); returnstd::to_string(input+100); }), tf::make_data_pipe<std::string,void>(tf::PipeType::SERIAL,[](std::string&input){ printf("thirdpipereceivestheinputstring%s\n",input.c_str()); }) ); //buildthepipelinegraphusingcomposition taskflow.composed_of(pl).name("pipeline"); //dumpthepipelinegraphstructure(withcomposition) taskflow.dump(std::cout); //runthepipeline executor.run(taskflow).wait(); return0; } The interface of tf::DataPipeline is very similar to tf::Pipeline, except that the library transparently manages the dataflow between pipes. To create a stage in a data-parallel pipeline, you should always use the helper function tf::make_data_pipe: tf::make_data_pipe<int,std::string>( tf::PipeType::SERIAL, [](int&input){ returnstd::to_string(input+100); } ); The helper function starts with a pair of an input and an output types in its template arguments. Both types will always be decayed to their original form using std::decay (e.g., const int& becomes int) for storage purpose. In terms of function arguments, the first argument specifies the direction of this data pipe, which can be either tf::PipeType::SERIAL or tf::PipeType::PARALLEL, and the second argument is a callable to invoke by the pipeline scheduler. The callable must take the input data type in its first argument and returns a value of the output data type. Additionally, the callable can take a tf::Pipeflow reference in its second argument which allows you to query the runtime information of a stage task, such as its line number and token number. tf::make_data_pipe<int,std::string>( tf::PipeType::SERIAL, [](int&input,tf::Pipeflow&pf){ printf("token=%lu,line=%lu\n",pf.token(),pf.line()); returnstd::to_string(input+100); } ) By default, tf::DataPipeline passes the data in reference to your callable at which you can take it in copy or in reference depending on application needs. For the first pipe, the input type should always be void and the callable must take a tf::Pipeflow reference in its argument. In this example, we will stop the pipeline when processing five tokens. tf::make_data_pipe<void,int>(tf::PipeType::SERIAL,[](tf::Pipeflow&pf)->int{ if(pf.token()==5){ pf.stop(); return0;//returnsadummyvalue } else{ returnpf.token(); } }), Similarly, the output type of the last pipe should be void as no more data will go out of the final pipe. tf::make_data_pipe<std::string,void>(tf::PipeType::SERIAL,[](std::string&input){ std::cout<<input<<std::endl; }) Finally, you need to compose the pipeline graph by creating a module task (i.e., tf::Taskflow::compoased_of). //buildthepipelinegraphusingcomposition taskflow.composed_of(pl).name("pipeline"); //dumpthepipelinegraphstructure(withcomposition) taskflow.dump(std::cout); //runthepipeline executor.run(taskflow).wait(); Understand Internal Data Storage By default, tf::DataPipeline uses std::variant to store a type-safe union of all input and output data types extracted from the given data pipes. To avoid false sharing, each line keeps a variant that is aligned with the cacheline size. When invoking a pipe callable, the input data is acquired in reference from the variant using std::get. When returning from a pipe callable, the output data is stored back to the variant using assignment operator. Learn More about Taskflow Pipeline Visit the following pages to learn more about pipeline: Task-parallel Pipeline Task-parallel Scalable Pipeline Text Processing Pipeline Graph Processing Pipeline Taskflow Processing Pipeline