mesytec-mnode/external/taskflow-3.8.0/docs/matrix_multiplication.html
2025-01-04 01:25:05 +01:00

140 lines
19 KiB
HTML

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<title>Learning from Examples &raquo; Matrix Multiplication | Taskflow QuickStart</title>
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:400,400i,600,600i%7CSource+Code+Pro:400,400i,600" />
<link rel="stylesheet" href="m-dark+documentation.compiled.css" />
<link rel="icon" href="favicon.ico" type="image/vnd.microsoft.icon" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="theme-color" content="#22272e" />
</head>
<body>
<header><nav id="navigation">
<div class="m-container">
<div class="m-row">
<span id="m-navbar-brand" class="m-col-t-8 m-col-m-none m-left-m">
<a href="https://taskflow.github.io"><img src="taskflow_logo.png" alt="" />Taskflow</a> <span class="m-breadcrumb">|</span> <a href="index.html" class="m-thin">QuickStart</a>
</span>
<div class="m-col-t-4 m-hide-m m-text-right m-nopadr">
<a href="#search" class="m-doc-search-icon" title="Search" onclick="return showSearch()"><svg style="height: 0.9rem;" viewBox="0 0 16 16">
<path id="m-doc-search-icon-path" d="m6 0c-3.31 0-6 2.69-6 6 0 3.31 2.69 6 6 6 1.49 0 2.85-0.541 3.89-1.44-0.0164 0.338 0.147 0.759 0.5 1.15l3.22 3.79c0.552 0.614 1.45 0.665 2 0.115 0.55-0.55 0.499-1.45-0.115-2l-3.79-3.22c-0.392-0.353-0.812-0.515-1.15-0.5 0.895-1.05 1.44-2.41 1.44-3.89 0-3.31-2.69-6-6-6zm0 1.56a4.44 4.44 0 0 1 4.44 4.44 4.44 4.44 0 0 1-4.44 4.44 4.44 4.44 0 0 1-4.44-4.44 4.44 4.44 0 0 1 4.44-4.44z"/>
</svg></a>
<a id="m-navbar-show" href="#navigation" title="Show navigation"></a>
<a id="m-navbar-hide" href="#" title="Hide navigation"></a>
</div>
<div id="m-navbar-collapse" class="m-col-t-12 m-show-m m-col-m-none m-right-m">
<div class="m-row">
<ol class="m-col-t-6 m-col-m-none">
<li><a href="pages.html">Handbook</a></li>
<li><a href="namespaces.html">Namespaces</a></li>
</ol>
<ol class="m-col-t-6 m-col-m-none" start="3">
<li><a href="annotated.html">Classes</a></li>
<li><a href="files.html">Files</a></li>
<li class="m-show-m"><a href="#search" class="m-doc-search-icon" title="Search" onclick="return showSearch()"><svg style="height: 0.9rem;" viewBox="0 0 16 16">
<use href="#m-doc-search-icon-path" />
</svg></a></li>
</ol>
</div>
</div>
</div>
</div>
</nav></header>
<main><article>
<div class="m-container m-container-inflatable">
<div class="m-row">
<div class="m-col-l-10 m-push-l-1">
<h1>
<span class="m-breadcrumb"><a href="Examples.html">Learning from Examples</a> &raquo;</span>
Matrix Multiplication
</h1>
<nav class="m-block m-default">
<h3>Contents</h3>
<ul>
<li><a href="#MatrixMultiplicationProblem">Problem Formulation</a></li>
<li><a href="#MatrixMultiplicationParallelPattern">Parallel Patterns</a></li>
<li><a href="#MatrixMultiplicationBenchmarking">Benchmarking</a></li>
</ul>
</nav>
<p>We study the classic problem, <em>2D matrix multiplication</em>. We will start with a short introduction about the problem and then discuss how to solve it parallel CPUs.</p><section id="MatrixMultiplicationProblem"><h2><a href="#MatrixMultiplicationProblem">Problem Formulation</a></h2><p>We are multiplying two matrices, <code>A</code> (<code>MxK</code>) and <code>B</code> (<code>KxN</code>). The numbers of columns of <code>A</code> must match the number of rows of <code>B</code>. The output matrix <code>C</code> has the shape of (MxN) where <code>M</code> is the rows of <code>A</code> and <code>N</code> the columns of <code>B</code>. The following example multiplies a <code>3x3</code> matrix with a <code>3x2</code> matrix to derive a <code>3x2</code> matrix.</p><img class="m-image" src="matrix_multiplication_1.png" alt="Image" style="width: 50%;" /><p>As a general view, for each element of <code>C</code> we iterate a complete row of <code>A</code> and a complete column of <code>B</code>, multiplying each element and summing them.</p><img class="m-image" src="matrix_multiplication_2.png" alt="Image" style="width: 50%;" /><p>We can implement matrix multiplication using three nested loops.</p><pre class="m-code"><span class="k">for</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">m</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">m</span><span class="o">&lt;</span><span class="n">M</span><span class="p">;</span><span class="w"> </span><span class="n">m</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">n</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">n</span><span class="o">&lt;</span><span class="n">N</span><span class="p">;</span><span class="w"> </span><span class="n">n</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">C</span><span class="p">[</span><span class="n">m</span><span class="p">][</span><span class="n">n</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">k</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">k</span><span class="o">&lt;</span><span class="n">K</span><span class="p">;</span><span class="w"> </span><span class="n">k</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">C</span><span class="p">[</span><span class="n">m</span><span class="p">][</span><span class="n">n</span><span class="p">]</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">A</span><span class="p">[</span><span class="n">m</span><span class="p">][</span><span class="n">k</span><span class="p">]</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">B</span><span class="p">[</span><span class="n">k</span><span class="p">][</span><span class="n">n</span><span class="p">];</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span></pre></section><section id="MatrixMultiplicationParallelPattern"><h2><a href="#MatrixMultiplicationParallelPattern">Parallel Patterns</a></h2><p>At a fine-grained level, computing each element of <code>C</code> is independent of each other. Similarly, computing each row of <code>C</code> or each column of <code>C</code> is also independent of one another. With task parallelism, we prefer <em>coarse-grained</em> model to have each task perform rather large computation to amortize the overhead of creating and scheduling tasks. In this case, we avoid intensive tasks each working on only a single element. by creating a task per row of <code>C</code> to multiply a row of <code>A</code> by every column of <code>B</code>.</p><pre class="m-code"><span class="c1">// C = A * B</span>
<span class="c1">// A is a MxK matrix, B is a KxN matrix, and C is a MxN matrix</span>
<span class="kt">void</span><span class="w"> </span><span class="nf">matrix_multiplication</span><span class="p">(</span><span class="kt">int</span><span class="o">**</span><span class="w"> </span><span class="n">A</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="o">**</span><span class="w"> </span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="o">**</span><span class="w"> </span><span class="n">C</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">M</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">K</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">N</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">tf</span><span class="o">::</span><span class="n">Taskflow</span><span class="w"> </span><span class="n">taskflow</span><span class="p">;</span>
<span class="w"> </span><span class="n">tf</span><span class="o">::</span><span class="n">Executor</span><span class="w"> </span><span class="n">executor</span><span class="p">;</span>
<span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">m</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">m</span><span class="o">&lt;</span><span class="n">M</span><span class="p">;</span><span class="w"> </span><span class="o">++</span><span class="n">m</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">taskflow</span><span class="p">.</span><span class="n">emplace</span><span class="p">([</span><span class="n">m</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="p">]</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">n</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">n</span><span class="o">&lt;</span><span class="n">N</span><span class="p">;</span><span class="w"> </span><span class="n">n</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">k</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">k</span><span class="o">&lt;</span><span class="n">K</span><span class="p">;</span><span class="w"> </span><span class="n">k</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">C</span><span class="p">[</span><span class="n">m</span><span class="p">][</span><span class="n">n</span><span class="p">]</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">A</span><span class="p">[</span><span class="n">m</span><span class="p">][</span><span class="n">k</span><span class="p">]</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">B</span><span class="p">[</span><span class="n">k</span><span class="p">][</span><span class="n">n</span><span class="p">];</span><span class="w"> </span><span class="c1">// inner product</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">executor</span><span class="p">.</span><span class="n">run</span><span class="p">(</span><span class="n">taskflow</span><span class="p">).</span><span class="n">wait</span><span class="p">();</span>
<span class="p">}</span></pre><p>Instead of creating tasks one-by-one over a loop, you can leverage <a href="classtf_1_1FlowBuilder.html#a3b132bd902331a11b04b4ad66cf8bf77" class="m-doc">Taskflow::<wbr />for_each_index</a> to create a <em>parallel-for</em> task. A parallel-for task spawns a subflow to perform parallel iterations over the given range.</p><pre class="m-code"><span class="c1">// perform parallel iterations on the range [0, M) with the step size of 1</span>
<span class="n">tf</span><span class="o">::</span><span class="n">Task</span><span class="w"> </span><span class="n">task</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">taskflow</span><span class="p">.</span><span class="n">for_each_index</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">M</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="o">&amp;</span><span class="p">]</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">m</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">n</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">n</span><span class="o">&lt;</span><span class="n">N</span><span class="p">;</span><span class="w"> </span><span class="n">n</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">k</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">k</span><span class="o">&lt;</span><span class="n">K</span><span class="p">;</span><span class="w"> </span><span class="n">k</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">C</span><span class="p">[</span><span class="n">m</span><span class="p">][</span><span class="n">n</span><span class="p">]</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">A</span><span class="p">[</span><span class="n">m</span><span class="p">][</span><span class="n">k</span><span class="p">]</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">B</span><span class="p">[</span><span class="n">k</span><span class="p">][</span><span class="n">n</span><span class="p">];</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span>
<span class="p">});</span><span class="w"> </span></pre><p>Please visit <a href="ParallelIterations.html" class="m-doc">Parallel Iterations</a> for more details.</p></section><section id="MatrixMultiplicationBenchmarking"><h2><a href="#MatrixMultiplicationBenchmarking">Benchmarking</a></h2><p>Based on the discussion above, we compare the runtime of computing various matrix sizes of <code>A</code>, <code>B</code>, and <code>C</code> between a sequential CPU and parallel CPUs on a machine of 12 Intel i7-8700 CPUs at 3.2 GHz.</p><table class="m-table"><thead><tr><th>A</th><th>B</th><th>C</th><th>CPU Sequential</th><th>CPU Parallel</th></tr></thead><tbody><tr><td>10x10</td><td>10x10</td><td>10x10</td><td>0.142 ms</td><td>0.414 ms</td></tr><tr><td>100x100</td><td>100x100</td><td>100x100</td><td>1.641 ms</td><td>0.733 ms</td></tr><tr><td>1000x1000</td><td>1000x1000</td><td>1000x1000</td><td>1532 ms</td><td>504 ms</td></tr><tr><td>2000x2000</td><td>2000x2000</td><td>2000x2000</td><td>25688 ms</td><td>4387 ms</td></tr><tr><td>3000x3000</td><td>3000x3000</td><td>3000x3000</td><td>104838 ms</td><td>16170 ms</td></tr><tr><td>4000x4000</td><td>4000x4000</td><td>4000x4000</td><td>250133 ms</td><td>39646 ms</td></tr></tbody></table><p>The speed-up of parallel execution becomes clean as we increase the problem size. For example, at <code>4000x4000</code>, the parallel runtime is 6.3 times faster than the sequential runtime.</p></section>
</div>
</div>
</div>
</article></main>
<div class="m-doc-search" id="search">
<a href="#!" onclick="return hideSearch()"></a>
<div class="m-container">
<div class="m-row">
<div class="m-col-m-8 m-push-m-2">
<div class="m-doc-search-header m-text m-small">
<div><span class="m-label m-default">Tab</span> / <span class="m-label m-default">T</span> to search, <span class="m-label m-default">Esc</span> to close</div>
<div id="search-symbolcount">&hellip;</div>
</div>
<div class="m-doc-search-content">
<form>
<input type="search" name="q" id="search-input" placeholder="Loading &hellip;" disabled="disabled" autofocus="autofocus" autocomplete="off" spellcheck="false" />
</form>
<noscript class="m-text m-danger m-text-center">Unlike everything else in the docs, the search functionality <em>requires</em> JavaScript.</noscript>
<div id="search-help" class="m-text m-dim m-text-center">
<p class="m-noindent">Search for symbols, directories, files, pages or
modules. You can omit any prefix from the symbol or file path; adding a
<code>:</code> or <code>/</code> suffix lists all members of given symbol or
directory.</p>
<p class="m-noindent">Use <span class="m-label m-dim">&darr;</span>
/ <span class="m-label m-dim">&uarr;</span> to navigate through the list,
<span class="m-label m-dim">Enter</span> to go.
<span class="m-label m-dim">Tab</span> autocompletes common prefix, you can
copy a link to the result using <span class="m-label m-dim"></span>
<span class="m-label m-dim">L</span> while <span class="m-label m-dim"></span>
<span class="m-label m-dim">M</span> produces a Markdown link.</p>
</div>
<div id="search-notfound" class="m-text m-warning m-text-center">Sorry, nothing was found.</div>
<ul id="search-results"></ul>
</div>
</div>
</div>
</div>
</div>
<script src="search-v2.js"></script>
<script src="searchdata-v2.js" async="async"></script>
<footer><nav>
<div class="m-container">
<div class="m-row">
<div class="m-col-l-10 m-push-l-1">
<p>Taskflow handbook is part of the <a href="https://taskflow.github.io">Taskflow project</a>, copyright © <a href="https://tsung-wei-huang.github.io/">Dr. Tsung-Wei Huang</a>, 2018&ndash;2024.<br />Generated by <a href="https://doxygen.org/">Doxygen</a> 1.9.1 and <a href="https://mcss.mosra.cz/">m.css</a>.</p>
</div>
</div>
</div>
</nav></footer>
</body>
</html>