mesytec-mnode/external/taskflow-3.8.0/doxygen/cookbook/dependent_async_tasking.dox

namespace tf {

/** @page DependentAsyncTasking Asynchronous Tasking with Dependencies

This chapters discusses how to create a task graph dynamically
using asynchronous tasks,
which is extremely beneficial for workloads that want to
(1) explore task graph parallelism out of dynamic control flow
or
(2) overlap task graph creation time with individual task execution time.
We recommend that you first read @ref AsyncTasking before digesting this chapter.

@tableofcontents

@section CreateADynamicTaskGraph Create a Dynamic Task Graph

When the construct-and-run model of a task graph is not possible in your application,
you can use tf::Executor::dependent_async and tf::Executor::silent_dependent_async
to create a task graph dynamically.
This type of parallelism is also known as <i>on-the-fly</i> task graph parallelism,
which offers great flexibility for expressing dynamic task graph parallelism.
The example below dynamically creates a task graph of
four dependent async tasks, @c A, @c B, @c C, and @c D, where @c A runs before @c B and @c C
and @c D runs after @c B and @c C:

@dotfile images/simple.dot

@code{.cpp}
tf::Executor executor;
tf::AsyncTask A = executor.silent_dependent_async([](){ printf("A\n"); });
tf::AsyncTask B = executor.silent_dependent_async([](){ printf("B\n"); }, A);
tf::AsyncTask C = executor.silent_dependent_async([](){ printf("C\n"); }, A);
auto [D, fuD] = executor.dependent_async([](){ printf("D\n"); }, B, C);
fuD.get();  // wait for D to finish, which in turn means A, B, C have finished
@endcode

Both tf::Executor::dependent_async and tf::Executor::silent_dependent_async
create a task of type tf::AsyncTask to run the given function asynchronously.
Additionally, tf::Executor::dependent_async returns a @std_future
that eventually holds the result of the execution.
When returning from both calls, the executor has scheduled a worker
to run the task whenever its dependencies are met.
That is, task execution happens @em simultaneously
with the creation of the task graph, which is different from constructing a %Taskflow
and running it from an executor, illustrated in the figure below:

@image html images/dependent_async_execution_diagram.png


Since this model only allows relating a dependency from the current task
to a previously created task,
you need a correct topological order of graph expression.
In our example, there are only two possible topological orderings,
either @c ABCD or @c ACBD.
The code below shows another feasible order of expressing this
dynamic task graph parallelism:

@code{.cpp}
tf::Executor executor;
tf::AsyncTask A = executor.silent_dependent_async([](){ printf("A\n"); });
tf::AsyncTask C = executor.silent_dependent_async([](){ printf("C\n"); }, A);
tf::AsyncTask B = executor.silent_dependent_async([](){ printf("B\n"); }, A);
auto [D, fuD] = executor.dependent_async([](){ printf("D\n"); }, B, C);
fuD.get();  // wait for D to finish, which in turn means A, B, C have finished
@endcode

In addition to using @std_future to synchronize the execution,
you can use tf::Executor::wait_for_all to wait for all scheduled tasks
to finish:

@code{.cpp}
tf::Executor executor;
tf::AsyncTask A = executor.silent_dependent_async([](){ printf("A\n"); });
tf::AsyncTask B = executor.silent_dependent_async([](){ printf("B\n"); }, A);
tf::AsyncTask C = executor.silent_dependent_async([](){ printf("C\n"); }, A);
tf::AsyncTask D = executor.silent_dependent_async([](){ printf("D\n"); }, B, C);
executor.wait_for_all();
@endcode

@section SpecifyARagneOfDependentAsyncTasks Specify a Range of Dependent Async Tasks

Both tf::Executor::dependent_async(F&& func, Tasks&&... tasks) and
tf::Executor::silent_dependent_async(F&& func, Tasks&&... tasks)
accept an arbitrary number of tasks in the dependency list.
If the number of dependent tasks is unknown at programming time,
such as those relying on runtime variables,
you can use the following two overloads
to specify dependent tasks in an iterable range <tt>[first, last)</tt>:

  + tf::Executor::dependent_async(F&& func, I first, I last)
  + tf::Executor::silent_dependent_async(F&& func, I first, I last)

The code below creates an asynchronous task that depends on
@c N previously created asynchronous tasks stored in a vector,
where @c N is a runtime variable:

@code{.cpp}
tf::Executor executor;
std::vector<tf::AsyncTask> dependents;
for(size_t i=0; i<N; i++) {  // N is a runtime variable
  dependents.push_back(executor.silent_dependent_async([](){}));
}
executor.silent_dependent_async([](){}, dependents.begin(), dependents.end());
executor.wait_for_all();
@endcode

@section UnderstandTheLifeTimeOfADependentAsyncTask Understand the Lifetime of a Dependent Async Task

A tf::AsyncTask is a lightweight handle that retains @em shared ownership
of a dependent async task created by an executor.
This shared ownership ensures that the async task remains alive when
adding it to the dependency list of another async task,
thus avoiding the classical [ABA problem](https://en.wikipedia.org/wiki/ABA_problem).

@code{.cpp}
// main thread retains shared ownership of async task A
tf::AsyncTask A = executor.silent_dependent_async([](){});

// task A remains alive (i.e., at least one ref count by the main thread)
// when being added to the dependency list of async task B
tf::AsyncTask B = executor.silent_dependent_async([](){}, A);
@endcode

Currently, tf::AsyncTask is implemented based on the logic of C++ smart pointer
std::shared_ptr and is considered cheap to copy or move as long as only
a handful of objects own it.
When a worker completes an async task, it will remove the task from the executor,
decrementing the number of shared owners by one.
If that counter reaches zero, the task is destroyed.

@section CreateADynamicTaskGraphByMultipleThreads Create a Dynamic Task Graph by Multiple Threads

You can use multiple threads to create a dynamic task graph as long as
the order of simultaneously creating tasks is topologically correct.
The example below uses creates a dynamic task graph
using three threads (including the main thread),
where task @c A runs before task @c B and task @c C:

@code{.cpp}
tf::Executor executor;

// main thread creates a dependent async task A
tf::AsyncTask A = executor.silent_dependent_async([](){});

// spawn a new thread to create an async task B that runs after A
std::thread t1([&](){
  tf::AsyncTask B = executor.silent_dependent_async([](){}, A);
});

// spawn a new thread to create an async task C that runs after A
std::thread t2([&](){
  tf::AsyncTask C = executor.silent_dependent_async([](){}, A);
});

executor.wait_for_all();
t1.join();
t2.join();
@endcode

Regardless of @c t1 runs before or after @c t2,
the resulting topological order is always correct with the graph definition,
either @c ABC or @c ACB.

@section QueryTheComppletionStatusOfDependentAsyncTasks Query the Completion Status of Dependent Async Tasks

When you create a dependent async task, you can query its completion status by tf::AsyncTask::is_done,
which returns @c true upon completion or @c false otherwise.
A completed dependent async task indicates that a worker has executed its associated callable.

@code{.cpp}
// create a dependent async task that returns 100
auto [task, fu] = executor.dependent_async([](){ return 100; });

// loops until the dependent async task completes
while(!task.is_done());
assert(fu.get() == 100);
@endcode

tf::AsyncTask::is_done is useful when you need to wait on the result of a dependent async task
before moving onto the next program instruction.
Often, tf::AsyncTask is used together with tf::Executor::corun_until to keep a worker awake
in its work-stealing loop to avoid deadlock (see @ref ExecuteATaskflowFromAnInternalWorker
for more details).
For instance, the code below implements the famous Fibonacci sequence using recursive
asynchronous tasking:

@code{.cpp}
tf::Executor executor;
std::function<int(int)> fibonacci;

// calculate the Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89
fibonacci = [&](int N){
  if (N < 2) {
    return N;
  }
  auto [t1, fu1] = executor.dependent_async(std::bind(fibonacci, N-1));
  auto [t2, fu2] = executor.dependent_async(std::bind(fibonacci, N-2));
  executor.corun_until([&](){ return t1.is_done() && t2.is_done(); });
  return fu1.get() + fu2.get();
};

auto [task, fib11] = executor.dependent_async(std::bind(fibonacci, 11));
assert(fib11 == 89);  // the 11-th Fibonacci number is 89
@endcode

*/

}