Benchmark Taskflow

BenchmarkTaskflow Benchmark Taskflow Compile and Run Benchmarks BenchmarkTaskflow_1CompileAndRunBenchmarks Configure Run Options BenchmarkTaskflow_1ConfigureRunOptions Specify the Run Model BenchmarkTaskflow_1SpecifyTheRunModel Specify the Number of Threads BenchmarkTaskflow_1SpecifyTheNumberOfThreads Specify the Number of Rounds BenchmarkTaskflow_1SpecifyTheNumberOfRounds Compile and Run Benchmarks To build the benchmark code, enable the CMake option TF_BUILD_BENCHMARKS to ON as follows: #under/taskflow/build ~$cmake../-DTF_BUILD_BENCHMARKS=ON ~$make After you successfully build the benchmark code, you can find all benchmark instances in the benchmarks/ folder. You can run the executable of each instance in the corresponding folder. ~$cdbenchmarks&ls bench_black_scholesbench_binary_treebench_graph_traversal... ~$./bench_graph_traversal |V|+|E|Runtime 20.197 8420.198 32840.488 72880.774 ...... ...... 61980275.135 66477177.436 71120083.957 You can display the help message by giving the option help. ~$./bench_graph_traversal--help GraphTraversal Usage:./bench_graph_traversal[OPTIONS] Options: -h,--helpPrintthishelpmessageandexit -t,--num_threadsUINTnumberofthreads(default=1) -r,--num_roundsUINTnumberofrounds(default=1) -m,--modelTEXTmodelnametbb|omp|tf(default=tf) We currently implement the following instances that are commonly used by the parallel computing community to evaluate the system performance. Instance Description bench_binary_tree traverses a complete binary tree bench_black_scholes computes option pricing with Black-Shcoles Models bench_graph_traversal traverses a randomly generated direct acyclic graph bench_linear_chain traverses a linear chain of tasks bench_mandelbrot exploits imbalanced workloads in a Mandelbrot set bench_matrix_multiplication multiplies two 2D matrices bench_mnist trains a neural network-based image classifier on the MNIST dataset bench_parallel_sort sorts a range of items bench_reduce_sum sums a range of items using reduction bench_wavefront propagates computations in a 2D grid bench_linear_pipeline pipeline scheduling on a linear chain of pipes bench_graph_pipeline pipeline scheduling on a graph of pipes

Configure Run Options We implement consistent options for each benchmark instance. Common options are: option value function -h none display the help message -t integer configure the number of threads to run -r integer configure the number of rounds to run -m string configure the baseline models to run, tbb, omp, or tf

You can configure the benchmarking environment by giving different options. Specify the Run Model In addition to a Taskflow-based implementation for each benchmark instance, we have implemented two baseline models using the state-of-the-art parallel programming libraries, OpenMP and Intel TBB, to measure and evaluate the performance of Taskflow. You can select different implementations by passing the option -m. ~$./bench_graph_traversal-mtf#runtheTaskflowimplementation(default) ~$./bench_graph_traversal-mtbb#runtheTBBimplementation ~$./bench_graph_traversal-momp#runtheOpenMPimplementation Specify the Number of Threads You can configure the number of threads to run a benchmark instance by passing the option -t. The default value is one. #runtheTaskflowimplementationusing4threads ~$./bench_graph_traversal-mtf-t4 Depending on your environment, you may need to use taskset to set the CPU affinity of the running process. This allows the OS scheduler to keep process on the same CPU(s) as long as practical for performance reason. #affinetheprocessto4CPUs,CPU0,CPU1,CPU2,andCPU3 ~$taskset-c0-3bench_graph_traversal-t4 Specify the Number of Rounds Each benchmark instance evaluates the runtime of the implementation at different problem sizes. Each problem size corresponds to one iteration. You can configure the number of rounds per iteration to average the runtime. #measuretheruntimeinanaverageof10runs ~$./bench_graph_traversal-r10 |V|+|E|Runtime 20.109#theruntimevalue0.109isanaverageof10runs 8420.298 ...... 61980273.135 66477174.436