144 lines
4.6 KiB
Text
144 lines
4.6 KiB
Text
|
namespace tf {
|
||
|
|
||
|
/** @page BenchmarkTaskflow Benchmark Taskflow
|
||
|
|
||
|
@tableofcontents
|
||
|
|
||
|
@section CompileAndRunBenchmarks Compile and Run Benchmarks
|
||
|
|
||
|
To build the benchmark code,
|
||
|
enable the CMake option @c TF_BUILD_BENCHMARKS to @c ON as follows:
|
||
|
|
||
|
@code{.shell-session}
|
||
|
# under /taskflow/build
|
||
|
~$ cmake ../ -DTF_BUILD_BENCHMARKS=ON
|
||
|
~$ make
|
||
|
@endcode
|
||
|
|
||
|
After you successfully build the benchmark code, you can find all benchmark
|
||
|
instances in the @c benchmarks/ folder.
|
||
|
You can run the executable of each instance in the corresponding folder.
|
||
|
|
||
|
@code{.shell-session}
|
||
|
~$ cd benchmarks & ls
|
||
|
bench_black_scholes bench_binary_tree bench_graph_traversal ...
|
||
|
~$ ./bench_graph_traversal
|
||
|
|V|+|E| Runtime
|
||
|
2 0.197
|
||
|
842 0.198
|
||
|
3284 0.488
|
||
|
7288 0.774
|
||
|
... ...
|
||
|
... ...
|
||
|
619802 75.135
|
||
|
664771 77.436
|
||
|
711200 83.957
|
||
|
@endcode
|
||
|
|
||
|
You can display the help message by giving the option @c --help.
|
||
|
|
||
|
@code{.shell-session}
|
||
|
~$ ./bench_graph_traversal --help
|
||
|
Graph Traversal
|
||
|
Usage: ./bench_graph_traversal [OPTIONS]
|
||
|
|
||
|
Options:
|
||
|
-h,--help Print this help message and exit
|
||
|
-t,--num_threads UINT number of threads (default=1)
|
||
|
-r,--num_rounds UINT number of rounds (default=1)
|
||
|
-m,--model TEXT model name tbb|omp|tf (default=tf)
|
||
|
@endcode
|
||
|
|
||
|
We currently implement the following instances that are commonly used by
|
||
|
the parallel computing community to evaluate the system performance.
|
||
|
|
||
|
| Instance | Description |
|
||
|
| :-: | :-: |
|
||
|
| bench_binary_tree | traverses a complete binary tree |
|
||
|
| bench_black_scholes | computes option pricing with Black-Shcoles Models |
|
||
|
| bench_graph_traversal | traverses a randomly generated direct acyclic graph |
|
||
|
| bench_linear_chain | traverses a linear chain of tasks |
|
||
|
| bench_mandelbrot | exploits imbalanced workloads in a Mandelbrot set |
|
||
|
| bench_matrix_multiplication | multiplies two 2D matrices |
|
||
|
| bench_mnist | trains a neural network-based image classifier on the MNIST dataset |
|
||
|
| bench_parallel_sort | sorts a range of items |
|
||
|
| bench_reduce_sum | sums a range of items using reduction |
|
||
|
| bench_wavefront | propagates computations in a 2D grid |
|
||
|
| bench_linear_pipeline | pipeline scheduling on a linear chain of pipes |
|
||
|
| bench_graph_pipeline | pipeline scheduling on a graph of pipes |
|
||
|
|
||
|
|
||
|
@section ConfigureRunOptions Configure Run Options
|
||
|
|
||
|
We implement consistent options for each benchmark instance.
|
||
|
Common options are:
|
||
|
|
||
|
| option | value | function |
|
||
|
| :-: | :-: | :-: |
|
||
|
| @c -h | none | display the help message |
|
||
|
| @c -t | integer | configure the number of threads to run |
|
||
|
| @c -r | integer | configure the number of rounds to run |
|
||
|
| @c -m | string | configure the baseline models to run, tbb, omp, or tf |
|
||
|
|
||
|
You can configure the benchmarking environment by giving different options.
|
||
|
|
||
|
@subsection SpecifyTheRunModel Specify the Run Model
|
||
|
|
||
|
In addition to a %Taskflow-based implementation for each benchmark instance,
|
||
|
we have implemented two baseline models using the state-of-the-art parallel
|
||
|
programming libraries, @OpenMP and @TBB,
|
||
|
to measure and evaluate the performance of %Taskflow.
|
||
|
You can select different implementations by passing the option @c -m.
|
||
|
|
||
|
@code{.shell-session}
|
||
|
~$ ./bench_graph_traversal -m tf # run the Taskflow implementation (default)
|
||
|
~$ ./bench_graph_traversal -m tbb # run the TBB implementation
|
||
|
~$ ./bench_graph_traversal -m omp # run the OpenMP implementation
|
||
|
@endcode
|
||
|
|
||
|
@subsection SpecifyTheNumberOfThreads Specify the Number of Threads
|
||
|
|
||
|
You can configure the number of threads to run a benchmark instance
|
||
|
by passing the option @c -t.
|
||
|
The default value is one.
|
||
|
|
||
|
@code{.shell-session}
|
||
|
# run the Taskflow implementation using 4 threads
|
||
|
~$ ./bench_graph_traversal -m tf -t 4
|
||
|
@endcode
|
||
|
|
||
|
Depending on your environment, you may need to use @c taskset to set the CPU
|
||
|
affinity of the running process.
|
||
|
This allows the OS scheduler to keep process on the same CPU(s) as long as
|
||
|
practical for performance reason.
|
||
|
|
||
|
@code{.shell-session}
|
||
|
# affine the process to 4 CPUs, CPU 0, CPU 1, CPU 2, and CPU 3
|
||
|
~$ taskset -c 0-3 bench_graph_traversal -t 4
|
||
|
@endcode
|
||
|
|
||
|
@subsection SpecifyTheNumberOfRounds Specify the Number of Rounds
|
||
|
|
||
|
Each benchmark instance evaluates the runtime of the implementation
|
||
|
at different problem sizes.
|
||
|
Each problem size corresponds to one iteration.
|
||
|
You can configure the number of rounds per iteration to average the runtime.
|
||
|
|
||
|
@code{.shell-session}
|
||
|
# measure the runtime in an average of 10 runs
|
||
|
~$ ./bench_graph_traversal -r 10
|
||
|
|V|+|E| Runtime
|
||
|
2 0.109 # the runtime value 0.109 is an average of 10 runs
|
||
|
842 0.298
|
||
|
... ...
|
||
|
619802 73.135
|
||
|
664771 74.436
|
||
|
@endcode
|
||
|
|
||
|
*/
|
||
|
|
||
|
|
||
|
}
|
||
|
|
||
|
|