Low Level Bits 🇺🇦 on Low Level Bits 🇺🇦 https://lowlevelbits.org/ Recent content in Low Level Bits 🇺🇦 on Low Level Bits 🇺🇦 Hugo -- gohugo.io en-us [email protected] (Alex Denisov) [email protected] (Alex Denisov) Fri, 02 May 2025 02:00:00 +0100 Different ways to build LLVM/MLIR tools https://lowlevelbits.org/different-ways-to-build-llvm/mlir-tools/ Fri, 02 May 2025 [email protected] (Alex Denisov) https://lowlevelbits.org/different-ways-to-build-llvm/mlir-tools/ <div id="wrap" class="text-center"> <div style="display: inline-block;" class="content-upgrade"> <div style="margin: 6px;"> This is a mirror of the Substack article <br/> <a href="https://lowlevelbits.com/p/different-ways-to-build-llvmmlir"> Different ways to build LLVM/MLIR tools </a><br/> The most recent version is there. </div> </div> </div> <p>LLVM and MLIR frameworks are typically used to build compilers for various use cases, but I’m using word “tools” here to cover a broader set of possibilities (compilers, language plugins, analyzers, etc.).</p> <p>If you want to build such a tool, then you obviously need to somehow “connect” your code to LLVM or MLIR libraries.</p> <p>In this article I’m not going to cover how to do the build itself (I believe there are plenty of great resources out there already), but rather focus on various ways to actually obtain those LLVM libraries and what kinds of features those options bring with them.</p> <p>I’m also considering the simplest integration: CMake and C++, no fancy build systems, no fancy languages. Different build systems and languages would require different considerations.</p> <p>Effectively, this article is organized as a table with different ways to get LLVM/MLIR on one axis, and various available features on another.</p> <p>The actual table is at the very end.</p> <h2>Features</h2> <p>Here is a non-exhaustive list of different features that I consider important.<br /> If you believe something is missing, please leave a comment.</p> <p><a href="https://lowlevelbits.com/p/different-ways-to-build-llvmmlir/comments">Leave a comment</a></p> <h3>(Fast) Build Times</h3> <p>Obviously, everyone wants to have fast build times. There are two slightly different angles to this story: if you decide to build LLVM from scratch, it would obviously take long time. But even if you don’t build LLVM from scratch, you may still have to wait for way too long due to the static linking.</p> <p>Also, building LLVM/MLIR from scratch without caching is going to be a huge bottleneck on the CI.</p> <h3>Debugging Experience</h3> <p>Once in a while things go south, so you need to debug not only your code, but also look into what’s “wrong” inside of LLVM.<br /> What I mean here is not just having debug info and assertions enabled, but also facilities like <code>-debug-only=.</code><br /> One example from MLIR is debugging long conversion pipelines/pattern matching, when things don’t quite work the way you’d expect.</p> <h3>Testing Infrastructure</h3> <p>Both LLVM and MLIR heavily rely on integration testing using <a href="https://llvm.org/docs/CommandGuide/lit.html">lit</a> and <a href="https://llvm.org/docs/CommandGuide/FileCheck.html">filecheck</a>.<br /> None of these are part of the “official distribution” unfortunately. While the official lit can be installed as a separate python package, for filecheck your best bet is third-party solutions, which are actually pretty good starting points if you don’t need very advanced filecheck features (e.g. <a href="https://github.com/mull-project/filecheck.py">mull-project/filecheck.py</a> or <a href="https://github.com/AntonLydike/filecheck">AntonLydike/filecheck</a>).</p> <h3>Bleeding Edge</h3> <p>This is also an important factor. As a starting point, you can just use whatever is available from your default OS package manager (e.g. apt or homebrew), but at some point you may need to pick something much newer due to bugfixes or new features.</p> <h3>Dynamic Linking</h3> <p>This is more of a niche feature, but it is very important if you are working on any kind of plugins, or if you don’t want to deal with long static linking time during development.</p> <h2>Different LLVM distributions</h2> <p>Here I’m considering more or less cross-platform solutions, so I’m not covering Debian/Ubuntu specific <a href="https://apt.llvm.org">repo</a>. Which leaves us with three options: (semi-)official versions from an OS package manager, precompiled binaries (submitted by volunteers), and BYOB: “bring your own build” story.</p> <h3>(Semi-)official OS packages</h3> <p>These are the packages maintained by the OS maintainers and not necessarily by LLVM maintainers. These packages are the easiest way to start: just call <code>apt/brew install llvm</code> and you are done.</p> <p>The packages come with dynamic libraries, which enables both fast build times and plugin support. The packages usually contain everything that is needed for testing, but they of course lack the debugging story.</p> <p>The other inconvenience might be the age of the package: depending on the OS and its stability guarantees, the package might be way too old for your use case.<br /> For LLVM it’s probably fine, but it gets trickier for MLIR as the APIs are less stable across the recent versions.</p> <h3>Precompiled packages</h3> <p>These packages are available as the release artifacts, for example <a href="https://github.com/llvm/llvm-project/releases/tag/llvmorg-20.1.4">20.1.4</a> or <a href="https://github.com/llvm/llvm-project/releases/tag/llvmorg-18.1.8">18.1.8</a>.</p> <p>On one hand, this is the most convenient way to get those binaries: the most recent binaries appear there just a few days after the official release.<br /> On the other hand, some packages are prepared by volunteers, so some releases might be missing the build for your specific OS/version, and the presence of e.g. <a href="https://github.com/llvm/llvm-project/releases/download/llvmorg-20.1.4/LLVM-20.1.4-Linux-X64.tar.xz">LLVM-20.1.4-Linux-X64.tar.xz</a> build doesn’t guarantee compatibility with e.g. Ubuntu 20.04 due to the the “old” glibc.</p> <p>Just as with the official OS packages, the debugging story is not there: the packages are built in the release mode.</p> <p>In general, these packages are kinda the “best effort”: if it works - great, if not - well, you are out of luck.</p> <h3>Build your own LLVM</h3> <p>This is obviously the most flexible approach: you can build any version/commit on any supported OS, you get the debugging facilities if you wish so, all the testing infrastructure is there, it’s your choice whether to use dynamic or static linking.</p> <p>But of course the price is the long build times, especially if you want to get more than just LLVM (e.g, MLIR or clang libraries).</p> <h2>Summary</h2> <p>As a conclusion, the exact option depends on your use case.<br /> Just to start with, you can pick the official package available on your OS and then decide whether you need more.</p> <p>If you need the newest version, then the precompiled packages from LLVM releases page is your best bet, especially when it comes to CI integration.</p> <p>However, at least at some point, you may consider building your own version of LLVM/MLIR libraries for local development, but still stick to the precompiled packages for CI checks.</p> <p>To wrap it up, here is a table that sums it all up.</p> <p><a href="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F670d014b-2b4f-4576-982b-a7066b2d4dcd_3268x1716.png"><img alt="" src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F670d014b-2b4f-4576-982b-a7066b2d4dcd_3268x1716.png" /></a></p> Building LLVM plugins with Bazel https://lowlevelbits.org/building-llvm-plugins-with-bazel/ Tue, 01 Apr 2025 [email protected] (Alex Denisov) https://lowlevelbits.org/building-llvm-plugins-with-bazel/ <div id="wrap" class="text-center"> <div style="display: inline-block;" class="content-upgrade"> <div style="margin: 6px;"> This is a mirror of the Substack article <br/> <a href="https://lowlevelbits.com/p/building-llvm-plugin-with-bazel"> Building LLVM plugins with Bazel </a><br/> The most recent version is there. </div> </div> </div> <p>One of the premises of <a href="https://bazel.build">Bazel</a> is to provide reproducible, hermetic builds, thus you shouldn’t depend on whatever is installed on the host OS and all the dependencies typically managed by Bazel directly.</p> <p>However, if you want to build plugins for LLVM (or any other project really), then you should link against the specific versions installed on the user’s system.</p> <p>As I’m working on <a href="https://github.com/mull-project/mull">such a plugin</a>, it’s been a long “dream” of mine to migrate to Bazel for the many benefits it provides. Over time, the existing build system (CMake) has grown its capabilities and I have certain requirements for how the builds should work.Namely:</p> <ul> <li>the plugin must work on different versions of OS (Ubuntu 20.xx-24.xx, macOS)</li> <li>the plugin must support different versions of LLVM, which are different on each OS (e.g., LLVM 12 on Ubuntu 20.04, LLVM 16, 17, 18 on Ubuntu 24.04 etc)</li> <li>the plugin must be linking against the system libraries due to the ABI requirements</li> <li>the build system should support multiple versions at the same time</li> </ul> <p>None of these are necessarily hard or impossible with Bazel, but the devil is always in the details.</p> <p>What follows is my take on solving this problem.</p> <p>Source code is available here <a href="https://github.com/AlexDenisov/bazel-llvm-plugin">https://github.com/AlexDenisov/bazel-llvm-plugin</a>.</p> <blockquote> <p>Following the <a href="https://meta.wikimedia.org/wiki/Cunningham%27s_Law">Cunningham&rsquo;s Law</a> I claim that there is no better way to do it.</p> </blockquote> <h3 id="detecting-available-llvm-versions">Detecting available LLVM versions</h3> <p>Third-party dependencies in Bazel are typically coming in a form of external repositories, thus all supported LLVM versions must be defined in MODULE.bazel upfront. However, what happens if the version is not supported or not installed on the host OS? In this case, these repositories must be defined dynamically.</p> <p>To do so, first we need to define a custom dynamic repository which will check which versions are installed on the host OS and store this information in a global variable available for later use by different parts of the build system:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># available_llvm_versions.bzl</span> </span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">_is_macos</span>(ctx): </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">return</span> ctx<span style="color:#f92672">.</span>os<span style="color:#f92672">.</span>name<span style="color:#f92672">.</span>find(<span style="color:#e6db74">&#34;mac&#34;</span>) <span style="color:#f92672">!=</span> <span style="color:#f92672">-</span><span style="color:#ae81ff">1</span> </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">llvm_path</span>(ctx, version): </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">if</span> _is_macos(ctx): </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">return</span> <span style="color:#e6db74">&#34;/opt/homebrew/opt/llvm@&#34;</span> <span style="color:#f92672">+</span> version </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">return</span> <span style="color:#e6db74">&#34;/usr/lib/llvm-&#34;</span> <span style="color:#f92672">+</span> version </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">_is_supported</span>(repository_ctx, version): </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">return</span> repository_ctx<span style="color:#f92672">.</span>path(llvm_path(repository_ctx, version))<span style="color:#f92672">.</span>exists </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">_llvm_versions_repo_impl</span>(repository_ctx): </span></span><span style="display:flex;"><span> available_versions <span style="color:#f92672">=</span> [] </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">for</span> version <span style="color:#f92672">in</span> repository_ctx<span style="color:#f92672">.</span>attr<span style="color:#f92672">.</span>versions: </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">if</span> _is_supported(repository_ctx, version): </span></span><span style="display:flex;"><span> available_versions<span style="color:#f92672">.</span>append(version) </span></span><span style="display:flex;"><span> repository_ctx<span style="color:#f92672">.</span>file(<span style="color:#e6db74">&#34;llvm_versions.bzl&#34;</span>, </span></span><span style="display:flex;"><span> content <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;AVAILABLE_LLVM_VERSIONS = &#34;</span> <span style="color:#f92672">+</span> str(available_versions), </span></span><span style="display:flex;"><span> ) </span></span><span style="display:flex;"><span> repository_ctx<span style="color:#f92672">.</span>file( </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;BUILD&#34;</span>, </span></span><span style="display:flex;"><span> content <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;&#34;</span>, </span></span><span style="display:flex;"><span> ) </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span>available_llvm_versions_repo <span style="color:#f92672">=</span> repository_rule( </span></span><span style="display:flex;"><span> local <span style="color:#f92672">=</span> <span style="color:#66d9ef">True</span>, </span></span><span style="display:flex;"><span> implementation <span style="color:#f92672">=</span> _llvm_versions_repo_impl, </span></span><span style="display:flex;"><span> attrs <span style="color:#f92672">=</span> { </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;versions&#34;</span>: attr<span style="color:#f92672">.</span>string_list(), </span></span><span style="display:flex;"><span> }, </span></span><span style="display:flex;"><span>) </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">_available_llvm_versions_impl</span>(module_ctx): </span></span><span style="display:flex;"><span> versions <span style="color:#f92672">=</span> [] </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">for</span> mod <span style="color:#f92672">in</span> module_ctx<span style="color:#f92672">.</span>modules: </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">for</span> data <span style="color:#f92672">in</span> mod<span style="color:#f92672">.</span>tags<span style="color:#f92672">.</span>detect_available: </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">for</span> version <span style="color:#f92672">in</span> data<span style="color:#f92672">.</span>versions: </span></span><span style="display:flex;"><span> versions<span style="color:#f92672">.</span>append(version) </span></span><span style="display:flex;"><span> available_llvm_versions_repo(name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;available_llvm_versions&#34;</span>, versions <span style="color:#f92672">=</span> versions) </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span>available_llvm_versions <span style="color:#f92672">=</span> module_extension( </span></span><span style="display:flex;"><span> implementation <span style="color:#f92672">=</span> _available_llvm_versions_impl, </span></span><span style="display:flex;"><span> tag_classes <span style="color:#f92672">=</span> { </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;detect_available&#34;</span>: tag_class(attrs <span style="color:#f92672">=</span> {<span style="color:#e6db74">&#34;versions&#34;</span>: attr<span style="color:#f92672">.</span>string_list(allow_empty <span style="color:#f92672">=</span> <span style="color:#66d9ef">False</span>)}), </span></span><span style="display:flex;"><span> }, </span></span><span style="display:flex;"><span>) </span></span></code></pre></div><p>Which must be defined in MODULE.bazel:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># MODULE.bazel</span> </span></span><span style="display:flex;"><span>SUPPORTED_LLVM_VERSIONS <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;17&#34;</span>, <span style="color:#e6db74">&#34;18&#34;</span>] </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span>available_llvm_versions <span style="color:#f92672">=</span> use_extension(<span style="color:#e6db74">&#34;//:bazel/available_llvm_versions.bzl&#34;</span>, <span style="color:#e6db74">&#34;available_llvm_versions&#34;</span>) </span></span><span style="display:flex;"><span>available_llvm_versions<span style="color:#f92672">.</span>detect_available(versions <span style="color:#f92672">=</span> SUPPORTED_LLVM_VERSIONS) </span></span><span style="display:flex;"><span>use_repo(available_llvm_versions, <span style="color:#e6db74">&#34;available_llvm_versions&#34;</span>) </span></span></code></pre></div><h3 id="defining-llvm-repositories">Defining LLVM repositories</h3> <p>Now, as we know which versions are available installed on the host system, we can define LLVM repositories which will expose <code>libLLVM.so</code> and all the needed headers.</p> <p>This part requires a dynamic module extension which will either define a real repository, or will define a “fake” empty repo. This is needed so that all the repositories can be later defined in MODULE.bazel safely.</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># llvm_repos.bzl</span> </span></span><span style="display:flex;"><span>load(<span style="color:#e6db74">&#34;@available_llvm_versions//:llvm_versions.bzl&#34;</span>, <span style="color:#e6db74">&#34;AVAILABLE_LLVM_VERSIONS&#34;</span>) </span></span><span style="display:flex;"><span>load(<span style="color:#e6db74">&#34;@bazel_tools//tools/build_defs/repo:local.bzl&#34;</span>, <span style="color:#e6db74">&#34;new_local_repository&#34;</span>) </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">_empty_repo_impl</span>(repository_ctx): </span></span><span style="display:flex;"><span> repository_ctx<span style="color:#f92672">.</span>file( </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;BUILD&#34;</span>, </span></span><span style="display:flex;"><span> content <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;&#34;</span>, </span></span><span style="display:flex;"><span> ) </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span>empty_repo <span style="color:#f92672">=</span> repository_rule( </span></span><span style="display:flex;"><span> local <span style="color:#f92672">=</span> <span style="color:#66d9ef">True</span>, </span></span><span style="display:flex;"><span> implementation <span style="color:#f92672">=</span> _empty_repo_impl, </span></span><span style="display:flex;"><span>) </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">_llvm_repos_extension</span>(module_ctx): </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;&#34;&#34;Module extension to dynamically declare local LLVM repositories.&#34;&#34;&#34;</span> </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">for</span> mod <span style="color:#f92672">in</span> module_ctx<span style="color:#f92672">.</span>modules: </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">for</span> data <span style="color:#f92672">in</span> mod<span style="color:#f92672">.</span>tags<span style="color:#f92672">.</span>configure: </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">for</span> version <span style="color:#f92672">in</span> data<span style="color:#f92672">.</span>versions: </span></span><span style="display:flex;"><span> llvm_repo_name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;llvm_&#34;</span> <span style="color:#f92672">+</span> version </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">if</span> version <span style="color:#f92672">not</span> <span style="color:#f92672">in</span> AVAILABLE_LLVM_VERSIONS: </span></span><span style="display:flex;"><span> empty_repo(name <span style="color:#f92672">=</span> llvm_repo_name) </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">continue</span> </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span> path <span style="color:#f92672">=</span> llvm_path(module_ctx, version) </span></span><span style="display:flex;"><span> new_local_repository( </span></span><span style="display:flex;"><span> name <span style="color:#f92672">=</span> llvm_repo_name, </span></span><span style="display:flex;"><span> path <span style="color:#f92672">=</span> path, </span></span><span style="display:flex;"><span> build_file <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;:third_party/LLVM/llvm.BUILD&#34;</span> </span></span><span style="display:flex;"><span> ) </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">return</span> modules<span style="color:#f92672">.</span>use_all_repos(module_ctx) </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span>llvm_repos <span style="color:#f92672">=</span> module_extension( </span></span><span style="display:flex;"><span> implementation <span style="color:#f92672">=</span> _llvm_repos_extension, </span></span><span style="display:flex;"><span> tag_classes <span style="color:#f92672">=</span> {<span style="color:#e6db74">&#34;configure&#34;</span>: tag_class(attrs <span style="color:#f92672">=</span> {<span style="color:#e6db74">&#34;versions&#34;</span>: attr<span style="color:#f92672">.</span>string_list()})}, </span></span><span style="display:flex;"><span>) </span></span></code></pre></div><p>How we can tell Bazel that these repos are available for consumption:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># MODULE.bazel</span> </span></span><span style="display:flex;"><span>SUPPORTED_LLVM_VERSIONS <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;17&#34;</span>, <span style="color:#e6db74">&#34;18&#34;</span>] </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span>available_llvm_versions <span style="color:#f92672">=</span> use_extension(<span style="color:#e6db74">&#34;//:bazel/available_llvm_versions.bzl&#34;</span>, <span style="color:#e6db74">&#34;available_llvm_versions&#34;</span>) </span></span><span style="display:flex;"><span>available_llvm_versions<span style="color:#f92672">.</span>detect_available(versions <span style="color:#f92672">=</span> SUPPORTED_LLVM_VERSIONS) </span></span><span style="display:flex;"><span>use_repo(available_llvm_versions, <span style="color:#e6db74">&#34;available_llvm_versions&#34;</span>) </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span>llvm_repos <span style="color:#f92672">=</span> use_extension(<span style="color:#e6db74">&#34;:bazel/llvm_repos.bzl&#34;</span>, <span style="color:#e6db74">&#34;llvm_repos&#34;</span>) </span></span><span style="display:flex;"><span>llvm_repos<span style="color:#f92672">.</span>configure(versions <span style="color:#f92672">=</span> SUPPORTED_LLVM_VERSIONS) </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span>[use_repo(llvm_repos, <span style="color:#e6db74">&#34;llvm_</span><span style="color:#e6db74">%s</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">%</span> v) <span style="color:#66d9ef">for</span> v <span style="color:#f92672">in</span> SUPPORTED_LLVM_VERSIONS] </span></span></code></pre></div><h3 id="defining-plugin-targets">Defining plugin targets</h3> <p>Now, the rest is rather trivial. We can define all the plugin libraries depending on the LLVM versions available on the host OS:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># src/BUILD</span> </span></span><span style="display:flex;"><span>load(<span style="color:#e6db74">&#34;@available_llvm_versions//:llvm_versions.bzl&#34;</span>, <span style="color:#e6db74">&#34;AVAILABLE_LLVM_VERSIONS&#34;</span>) </span></span><span style="display:flex;"><span>load(<span style="color:#e6db74">&#34;@rules_cc//cc:defs.bzl&#34;</span>, <span style="color:#e6db74">&#34;cc_binary&#34;</span>) </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span>[ </span></span><span style="display:flex;"><span> cc_binary( </span></span><span style="display:flex;"><span> name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;llvm_plugin_</span><span style="color:#e6db74">%s</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">%</span> llvm_version, </span></span><span style="display:flex;"><span> srcs <span style="color:#f92672">=</span> [ </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;plugin.cpp&#34;</span>, </span></span><span style="display:flex;"><span> ], </span></span><span style="display:flex;"><span> linkshared <span style="color:#f92672">=</span> <span style="color:#66d9ef">True</span>, </span></span><span style="display:flex;"><span> visibility <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;//visibility:public&#34;</span>], </span></span><span style="display:flex;"><span> deps <span style="color:#f92672">=</span> [ </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;@llvm_</span><span style="color:#e6db74">%s</span><span style="color:#e6db74">//:libllvm&#34;</span> <span style="color:#f92672">%</span> llvm_version, </span></span><span style="display:flex;"><span> ], </span></span><span style="display:flex;"><span> ) </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">for</span> llvm_version <span style="color:#f92672">in</span> AVAILABLE_LLVM_VERSIONS </span></span><span style="display:flex;"><span>] </span></span></code></pre></div><h3 id="defining-test-targets">Defining test targets</h3> <p>Obviously, we must have tests for the plugin. This is also relatively trivial, we need to define a test case for each available LLVM versions as well, thus producing NxM tests where N is the number of tests and M is the number of LLVM versions.</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># tests/BUILD</span> </span></span><span style="display:flex;"><span>load(<span style="color:#e6db74">&#34;@available_llvm_versions//:llvm_versions.bzl&#34;</span>, <span style="color:#e6db74">&#34;AVAILABLE_LLVM_VERSIONS&#34;</span>) </span></span><span style="display:flex;"><span>load(<span style="color:#e6db74">&#34;@bazel_itertools//lib:itertools.bzl&#34;</span>, <span style="color:#e6db74">&#34;itertools&#34;</span>) </span></span><span style="display:flex;"><span>load(<span style="color:#e6db74">&#34;@pypi//:requirements.bzl&#34;</span>, <span style="color:#e6db74">&#34;requirement&#34;</span>) </span></span><span style="display:flex;"><span>load(<span style="color:#e6db74">&#34;@rules_python//python:defs.bzl&#34;</span>, <span style="color:#e6db74">&#34;py_test&#34;</span>) </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span>[ </span></span><span style="display:flex;"><span> py_test( </span></span><span style="display:flex;"><span> name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">%s</span><span style="color:#e6db74">_</span><span style="color:#e6db74">%s</span><span style="color:#e6db74">_test&#34;</span> <span style="color:#f92672">%</span> (test, llvm_version), </span></span><span style="display:flex;"><span> srcs <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;lit_runner.py&#34;</span>], </span></span><span style="display:flex;"><span> args <span style="color:#f92672">=</span> [ <span style="color:#e6db74">&#34;-v&#34;</span>, test], </span></span><span style="display:flex;"><span> data <span style="color:#f92672">=</span> [ </span></span><span style="display:flex;"><span> requirement(<span style="color:#e6db74">&#34;lit&#34;</span>), </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;:lit.cfg.py&#34;</span>, </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;@llvm_</span><span style="color:#e6db74">%s</span><span style="color:#e6db74">//:clang&#34;</span> <span style="color:#f92672">%</span> llvm_version, </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;@llvm_</span><span style="color:#e6db74">%s</span><span style="color:#e6db74">//:FileCheck&#34;</span> <span style="color:#f92672">%</span> llvm_version, </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;//src:llvm_plugin_</span><span style="color:#e6db74">%s</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">%</span> llvm_version, </span></span><span style="display:flex;"><span> test, </span></span><span style="display:flex;"><span> ], </span></span><span style="display:flex;"><span> main <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;lit_runner.py&#34;</span>, </span></span><span style="display:flex;"><span> deps <span style="color:#f92672">=</span> [requirement(<span style="color:#e6db74">&#34;lit&#34;</span>), <span style="color:#e6db74">&#34;@rules_python//python/runfiles&#34;</span>], </span></span><span style="display:flex;"><span> ) </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">for</span> (test, llvm_version) <span style="color:#f92672">in</span> itertools<span style="color:#f92672">.</span>product( </span></span><span style="display:flex;"><span> glob([<span style="color:#e6db74">&#34;*.c&#34;</span>]), </span></span><span style="display:flex;"><span> AVAILABLE_LLVM_VERSIONS, </span></span><span style="display:flex;"><span> ) </span></span><span style="display:flex;"><span>] </span></span></code></pre></div><h3 id="conclusion">Conclusion</h3> <p>With all the little pieces above, the builds are now completely transparent and smooth for the end user:</p> <p><img src="https://lowlevelbits.org/img/llvm-plugins-bazel/plugin-build.png" alt="Build &amp; test the plugin"></p> <p>Full working example can be found here: <a href="https://github.com/AlexDenisov/bazel-llvm-plugin">https://github.com/AlexDenisov/bazel-llvm-plugin</a></p> Compiling Ruby. Part 5: exceptions https://lowlevelbits.org/compiling-ruby-part-5/ Fri, 22 Dec 2023 [email protected] (Alex Denisov) https://lowlevelbits.org/compiling-ruby-part-5/ <h3 id="call-stack-stack-frames-and-program-counter">Call Stack, Stack Frames, and Program Counter</h3> <p>During the program execution, a machine maintains a pointer to the instruction being executed. It&rsquo;s called <a href="https://en.wikipedia.org/wiki/Program_counter">Program Counter</a> (or <code>Instruction Pointer</code>).</p> <p>When you call a method (or send a message if we are speaking of Ruby), the program counter is set to the first instruction on the called function (<code>callee</code>). The program somehow needs to know how to get back to the call site once the &ldquo;child&rdquo; method has completed its execution.</p> <p>This information is typically maintained using the concept of a <a href="https://en.wikipedia.org/wiki/Call_stack">Call Stack</a>.</p> <p>Consider the following program and its call stack on the right.</p> <p><img src="https://lowlevelbits.org/img/compiling-ruby-5/call-stack.png" alt="Call Stack"></p> <p>The call stack consists of <a href="https://en.wikipedia.org/wiki/Call_stack#Structure">Stack Frames</a>. Whenever a function is called, a new stack frame is created and <code>push</code>ed onto the stack. When the called function returns - the stack frame is <code>pop</code>ed.</p> <p>At every point, the call stack represents the actual <a href="https://en.wikipedia.org/wiki/Stack_trace">Stack Trace</a>.</p> <p>The very top of the call stack represents the scope of the whole file, followed by the stack frame of the <code>first</code> function, followed by the <code>second</code> function, and so forth. In Ruby, the top function/file scope is referred to as simply <code>top</code>.</p> <p>Now, imagine that we want to pass some information from the <code>second</code> function to the <code>top</code>. Some error or something <em>exceptional</em> happened, and this specific program state needs some special handling.</p> <p>There are several limited ways to handle such case: either return some special value up (thus, each function on the call stack should be aware of this), or we can use some global variable to communicate with the callers (e.g., <code>errno</code> in C) which is again &ldquo;pollutes&rdquo; the business logic through the call stack.</p> <p>One way to handle this problem more elegantly is to use particular language constructs - exceptions.</p> <p>Instead of polluting the whole call stack, we can <code>throw</code>/<code>raise</code> an exception and then add special handling at the <code>top</code>, like in this picture:</p> <p><img src="https://lowlevelbits.org/img/compiling-ruby-5/exception-example.png" alt="Simple Exception"></p> <h3 id="stack-unwinding">Stack Unwinding</h3> <p>Now, the question is: How do we implement this feature? To answer it, let&rsquo;s understand what needs to happen!</p> <p>The program was in some specific state before it called the <code>first</code> function at the top. Now, the program is in another specific state around the <code>raise &quot;error&quot;</code> line in the <code>second</code> function.</p> <p>We need to restore the state somehow as it was right before the <code>first</code> call and continue execution right after the <code>rescue</code> in <code>top</code> (by changing the program counter accordingly).</p> <p>Conceptually, we can save the machine state before calling the <code>first</code> method and restoring it later. The problem is that storing the state of the whole machine is too expensive and adds overhead by saving more than needed.</p> <p>Instead, we can put the responsibility for maintaining the program on the actual program developers.</p> <p>Most languages provide useful features for dealing with this:</p> <ul> <li>Ruby has explicit <code>ensure</code> blocks</li> <li>Java has explicit <code>finally</code> statements</li> <li>C++ has RAII and implicit destructors</li> <li>(C has <code>setjmp</code>/<code>longjmp</code>, but we are only talking about useful features)</li> </ul> <p>Here is how it works in the case of Ruby.</p> <p>Whenever the exception is thrown, the program climbs up through the call stack and executes code from those <a href="https://en.wikipedia.org/wiki/Finalizer#Connection_with_finally">finalizers</a>until it reaches the exception handler.</p> <p>This process is called <code>Stack Unwinding</code>.</p> <p><em>I&rsquo;m not a native speaker, but I&rsquo;d say it should be called &ldquo;Stack Winding&rdquo;, but oh well</em></p> <p>Here is an updated example with explicit state restoration during the stack unwinding.</p> <p>Without executing code from the <code>ensure</code> block, the hypothetical lock would never be released, thus breaking the program in terrible ways.</p> <p><img src="https://lowlevelbits.org/img/compiling-ruby-5/stack-unwinding.png" alt="Stack Unwinding"></p> <h3 id="exceptions-in-ruby">Exceptions in Ruby</h3> <p>Now, I can talk about different kinds of exceptions in Ruby. From my perspective, there are three different kinds:</p> <ul> <li>actual <code>raise</code>d exceptions</li> <li><code>break</code> statements</li> <li><code>return</code> statements</li> </ul> <p>Both <code>break</code> and <code>return</code> statements have special meaning when used in the context of <code>Proc</code>s.</p> <p>Let me elaborate on all the three with the examples.</p> <h4 id="normal-exceptions">Normal Exceptions</h4> <p>Actual exceptions climb up the stack, calling finalizers until an exception handler is found.</p> <p>These are the normal exceptions you are all familiar with.</p> <p><img src="https://lowlevelbits.org/img/compiling-ruby-5/exception-example.png" alt="Simple Exception"></p> <h4 id="returns-from-a-block"><code>return</code>s from a block</h4> <p><code>return</code> statements behave differently depending on the lexical scope they are part of.</p> <p>Here is a little puzzle for you.</p> <p>What will be printed on the screen:</p> <p><img src="https://lowlevelbits.org/img/compiling-ruby-5/return_blk-puzzle.png" alt="Return from a block puzzle"></p> <p><code>return</code> is called from within a block. You may expect the <code>x * 4</code> to be returned from the block, but it&rsquo;s returned from the enclosing function (lexical scope).</p> <p><img src="https://lowlevelbits.org/img/compiling-ruby-5/return_blk-example.png" alt="Return from a block call stack"></p> <p>As you can see, <code>return x * 4</code> would return from <code>f</code> instead of from the block.</p> <p>The code prints</p> <pre tabindex="0"><code>2: 8 </code></pre><p>instead of</p> <pre tabindex="0"><code>1: 8 2: 42 </code></pre><h4 id="breaks"><code>break</code>s</h4> <p><em>Almost</em> like <code>return</code>s, <code>break</code>s allow returning from the enclosing function, but in a slightly different way.</p> <p>This is the most complex example here. Let me write down the steps explicitly. You may want to open the picture in a separate tab to read it.</p> <p><img src="https://lowlevelbits.org/img/compiling-ruby-5/break-example.png" alt="Break example"></p> <ol> <li><code>top</code> calls the <code>loop</code> function and passes the block to it. The block is just another function under the hood; it&rsquo;s presented separately here as the <code>__anonymous_block.</code></li> <li>Runtime creates a new stack frame for <code>loop</code> and puts it on the call stack.</li> <li><code>loop</code> calls the passed block (<code>__anonymous_block</code>).</li> <li>Runtime creates new stack frame for <code>__anonymous_block</code> and puts it on the stack.</li> <li>The <code>__anonymous_block</code> increments <code>i</code>, checks for equality, and returns to <code>loop</code>, nothing special.</li> <li>Runtime removes the <code>__anonymous_block</code> stack frame from the call stack.</li> <li><code>loop</code>s stack frame is kept on the call stack, and the next iteration of <code>while true</code> calls the <code>__anonymous_block</code> again.</li> <li>Runtime creates new stack frame for <code>__anonymous_block</code> and puts it on the stack.</li> <li>The <code>__anonymous_block</code> increments <code>i</code>, checks for equality, and invokes <code>break</code>.</li> <li>The <code>break</code> initiates stack unwinding and returns from the enclosing function (<code>loop</code>). See the dashed line.</li> <li><code>loop</code> returns, thus bypassing the endless loop <code>while true</code>.</li> </ol> <p>The <code>break</code> construct is effectively equivalent to the following code:</p> <p><img src="https://lowlevelbits.org/img/compiling-ruby-5/break-exception.png" alt="Break implemented using exception"></p> <h3 id="implementation">Implementation</h3> <p>All the language constructs described above (exceptions, <code>return</code>s and <code>break</code>s within a block) behave similarly: they unwind the stack (calling the finalizers on the way up) and stop at some well-defined point.</p> <p>They are implemented slightly differently in the original mruby runtime. Still, I implemented them all as exceptions, with <code>return</code>s and <code>break</code>s being special exceptions: they need to carry a value and store information on where to stop the unwinding process.</p> <p>The implementation from the LLVM perspective is covered in my recent talk at LLVM Social Berlin: <a href="https://www.youtube.com/watch?v=gH5-lITYrMg">Stack unwinding, landing pads, and other catches</a>.</p> <p>Here, I&rsquo;ll mainly focus on the details from the Mruby runtime perspective.</p> <p>Consider the following example:</p> <p><img src="https://lowlevelbits.org/img/compiling-ruby-5/landing-pads.png" alt="Landing pads"></p> <p>The blocks following <code>rescue</code> and <code>ensure</code> are called <em><strong>Landing Pads</strong></em>.</p> <p>This example has two kinds of landing pads: catch (<code>rescue</code>) and cleanup (<code>ensure</code>). Catches are &ldquo;conditional&rdquo; landing pads: they will be executed only if the exception type matches their type. Note the last <code>rescue</code>: it doesn&rsquo;t have any type attached, so it will just catch any exception.</p> <p>Conversely, cleanups are &ldquo;unconditional&rdquo; - they will always run, but they will also forward the exception up to the next function on the call stack.</p> <p>Another important detail in this example is the second <code>rescue</code>: it uses function argument as its type. That is, the landing pad type is only known at run time, and it could be anything.</p> <p><em>In C++, for example, all the <code>catch</code> types must be known upfront, and the compiler emits special Runtime Type Information (RTTI). Again, IMO, it should be Compile Time Type Information, but it&rsquo;s C++&hellip;</em></p> <p>For this reason, Ruby VM always enters each landing pad. For catches, it first checks (at run time!) if the exception type matches the landing pad&rsquo;s type, and if so, the exception is marked as caught, and the landing pad&rsquo;s execution continues.</p> <p>If the exception type doesn&rsquo;t match - the exception is immediately re-thrown so the next landing pad can try to catch it.</p> <h3 id="mlir">MLIR</h3> <p>I&rsquo;d love to describe how I modeled exceptions at the MLIR level, but it will take more time to do it for several reasons:</p> <ul> <li>my original approach to constructing SSA right away didn&rsquo;t work due to the way exceptions work (namely, some registers must spill on the stack), so the dialects have changed a bit, and I need to clean them up a bit</li> <li>the way I model them currently is more of a hack and only works because I have certain conventions, so it&rsquo;s not a solid model yet</li> <li>I added JIT support (for <code>Kernel.eval</code>) and need to do some tweaking there to make exceptions work during just-in-time evaluation</li> </ul> <p>I&rsquo;ll write down all the low-level details at some point, but I don&rsquo;t have an ETA, so I&rsquo;ll stop here.</p> <hr> <p><strong>Thank you so much for reaching this far!</strong></p> <p>The following articles will focus on JIT compilation and debug information.</p> <p><a href="https://lowlevelbits.org/subscribe/">Don&rsquo;t miss those details!</a></p> Compiling Ruby. Part 4: progress update https://lowlevelbits.org/compiling-ruby-part-4/ Thu, 30 Nov 2023 [email protected] (Alex Denisov) https://lowlevelbits.org/compiling-ruby-part-4/ <p>It&rsquo;s been a while since I wrote the last blog post. One of the reasons is that so far, I had to change a lot of things in the implementation due to the exception support.</p> <p>I&rsquo;m writing a short progress update on where we are and what&rsquo;s coming next.</p> <h3 id="what-happened">What Happened</h3> <p>During this year, I gave two short talks related to this project:</p> <ul> <li><a href="https://www.youtube.com/watch?v=NfMX-dFMSr0">a high-level overview of the project</a> (EuroLLVM dev meeting)</li> <li><a href="https://www.youtube.com/watch?v=gH5-lITYrMg">intro into exception handling in LLVM</a> (LLVM Social Berlin)</li> </ul> <p>The state as of EuroLLVM (May 2023) was as follows:</p> <ul> <li>compiler supported <strong>104</strong> out of <strong>107</strong> bytecode operations</li> <li>it could compile <strong>~150</strong> out of <strong>~180</strong> files</li> <li>it could compile <strong>~15KLoC</strong> out of <strong>~20KLOC</strong></li> <li><strong>~72%</strong> of tests were passing (1033 out of 1416 it could compile)</li> </ul> <h3 id="current-status">Current Status</h3> <p>The three missing opcodes were all about exception handling, and this is what (so far) took the most time to implement. I have some drafts on the details, and I plan to publish them before the end of the year.</p> <p>With the proper exception handling in place, things are finally starting to take the right shape. There is still much work to do, but it&rsquo;s more predictable now.</p> <p>Some new stats:</p> <ul> <li>all bytecode operations are implemented 🎉</li> <li>all the ruby code in the repo is now compiled (stdlib, gems, tests) 🎉</li> <li><strong>~95%</strong> of the tests are passing (1378 out of 1450) 🎉</li> </ul> <h3 id="next-steps">Next Steps</h3> <p>The test suite now drives the next steps:</p> <ul> <li>the majority of the failing tests (42 out of 71) are due to the missing fibers implementation</li> <li>the second biggest group is various proc/methods metadata for runtime reflection</li> <li>the next big part is related to JIT/runtime evaluation (i.e., when you can execute arbitrary Ruby code not known/visible at compile time)</li> <li>and there is a long tail of more minor things</li> </ul> <p>Besides that, I need to figure out a better build system for all of it. Currently, It&rsquo;s a mess glued together by CMake scripts and CMake templates. It works perfectly for development and testing, but I&rsquo;d hate to use such a system as an end user.</p> <p>Ideally, I want a one-click solution that would take Ruby files as input and produce a native executable.</p> <p>What is the state of the art when it comes to build systems/orchestration of compilation? Please let me know if you have any pointers 🙌</p> <hr> <p><strong>Thank you so much for reaching this far!</strong></p> <p>The next article is about exceptions - <a href="https://lowlevelbits.org/compiling-ruby-part-5/">Exceptions</a></p> Compiling Ruby. Part 3: MLIR and compilation https://lowlevelbits.org/compiling-ruby-part-3/ Fri, 06 Jan 2023 [email protected] (Alex Denisov) https://lowlevelbits.org/compiling-ruby-part-3/ <p>Now as we have a decent understanding of how RiteVM works, we can tackle the compilation. The question I had around two years ago - how do I even do this?</p> <p><strong>A note of warning: so far, this is the longest article on this blog. And I&rsquo;m afraid the most cryptic one.</strong></p> <p>The topics covered here:</p> <ul> <li>MLIR</li> <li>Control-Flow Graphs (CFG)</li> <li>Static Single Assignment (SSA)</li> <li>Dataflow Analysis</li> </ul> <h3 id="compilation">Compilation</h3> <p>mruby is written in C, so the logic behind each opcode is implemented in C. To compile a Ruby program from bytecode, we can emit an equivalent C program that uses mruby C API.</p> <p>Some opcodes have direct API counterparts, e.g., <code>OP_LOADI</code> is equivalent to <code>mrb_value mrb_fixnum_value(mrb_int i);</code>. Yet, most opcodes are inlined in the giant dispatch loop in <code>vm.c</code>. However, we can extract these implementations into separate functions and call them from C.</p> <p>Consider the following Ruby program:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-ruby" data-lang="ruby"><span style="display:flex;"><span>puts <span style="color:#ae81ff">42</span> </span></span></code></pre></div><p>and its bytecode:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-asm" data-lang="asm"><span style="display:flex;"><span><span style="color:#a6e22e">OP_LOADSELF</span> <span style="color:#66d9ef">R1</span> </span></span><span style="display:flex;"><span><span style="color:#a6e22e">OP_LOADI</span> <span style="color:#66d9ef">R2</span> <span style="color:#ae81ff">42</span> </span></span><span style="display:flex;"><span><span style="color:#a6e22e">OP_SEND</span> <span style="color:#66d9ef">R1</span> :<span style="color:#66d9ef">puts</span> <span style="color:#ae81ff">1</span> </span></span><span style="display:flex;"><span><span style="color:#a6e22e">OP_RETURN</span> <span style="color:#66d9ef">R1</span> </span></span><span style="display:flex;"><span><span style="color:#a6e22e">OP_STOP</span> </span></span></code></pre></div><p>An equivalent C program looks like this:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span>mrb_state <span style="color:#f92672">*</span>mrb <span style="color:#f92672">=</span> <span style="color:#a6e22e">mrb_open</span>(); </span></span><span style="display:flex;"><span>mrb_value receiver <span style="color:#f92672">=</span> <span style="color:#a6e22e">fs_load_self</span>(); </span></span><span style="display:flex;"><span>mrb_value number <span style="color:#f92672">=</span> <span style="color:#a6e22e">mrb_fixnum_value</span>(<span style="color:#ae81ff">42</span>); </span></span><span style="display:flex;"><span><span style="color:#a6e22e">mrb_funcall</span>(mrb, receiver, <span style="color:#e6db74">&#34;puts&#34;</span>, <span style="color:#ae81ff">1</span>, <span style="color:#f92672">&amp;</span>number); </span></span><span style="display:flex;"><span><span style="color:#a6e22e">mrb_close</span>(mrb); </span></span></code></pre></div><p><em><code>fs_load_self</code> is a custom runtime function as <code>OP_LOADSELF</code> doesn&rsquo;t have a C API counterpart.</em></p> <p><em><code>OP_RETURN</code> is ignored in this small example.</em></p> <p>To compile a Ruby program from its bytecode, we &ldquo;just&rdquo; need to generate the equivalent C program. In fact, this is what I did to start two years ago. It worked well and had some nice debugging capabilities - in the end, it&rsquo;s just a C program.</p> <p>Yet, at some point, the implementation became daunting. As I was generating a C program, it was pretty hard to do some custom analysis or optimizations on the C code. I started adding my auxiliary data structures (really, just arrays of hashmaps of hashmaps of pairs and tuples) before I generated the C code.</p> <p>I realized I was about to invent my intermediate representation of questionable quality.</p> <p>I needed a better solution.</p> <h3 id="mlir">MLIR</h3> <p>I remember watching the <a href="https://www.youtube.com/watch?v=qzljG6DKgic">MLIR talk</a> by Tatiana Shpeisman and Chris Lattner live at EuroLLVM in Brussels. It went over my head back then, as there was a lot of talk about machine learning, tensors, heterogeneous accelerators, and some other dark magic.</p> <p>Yet, I also remember some mentions of custom intermediate representations. So I decided to give it a try and dig into it more. It turned out to be great.</p> <p>One of the key features of MLIR is the ability to define custom intermediate representations called <em>dialects</em>. MLIR provides an infrastructure to mix and match different dialects and run analyses or transformations against them. Further, the dialects can be lowered to machine code (e.g., for CPU or GPU).</p> <p>Here is a slide from my <a href="https://www.youtube.com/watch?v=Cl5SgDxvZ8w">LLVM Social talk</a> to illustrate the idea:</p> <p><img src="https://lowlevelbits.org/img/compiling-ruby-3/what-is-mlir.png" alt="What is MLIR?"></p> <h3 id="mlir-rite-dialect">MLIR Rite Dialect</h3> <p>I need to define a custom dialect to make MLIR work for my use case. I called it &ldquo;Rite.&rdquo; The dialect needs an operation of each RiteVM opcode and some RiteVM types.</p> <p>Here is the minimum required to compile the code sample from above (<code>puts 42</code>).</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">Rite_Dialect</span> : Dialect { </span></span><span style="display:flex;"><span> let name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;rite&#34;</span>; </span></span><span style="display:flex;"><span> let summary <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;A one-to-one mapping from mruby RITE VM bytecode to MLIR&#34;</span>; </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span> let cppNamespace <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;rite&#34;</span>; </span></span><span style="display:flex;"><span>} </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span><span style="color:#66d9ef">class</span> <span style="color:#a6e22e">RiteType</span><span style="color:#f92672">&lt;</span>string name<span style="color:#f92672">&gt;</span> : TypeDef<span style="color:#f92672">&lt;</span>Rite_Dialect, name<span style="color:#f92672">&gt;</span> { </span></span><span style="display:flex;"><span> let summary <span style="color:#f92672">=</span> name; </span></span><span style="display:flex;"><span> let mnemonic <span style="color:#f92672">=</span> name; </span></span><span style="display:flex;"><span>} </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">ValueType</span> : RiteType<span style="color:#f92672">&lt;</span><span style="color:#e6db74">&#34;value&#34;</span><span style="color:#f92672">&gt;</span> {} </span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">StateType</span> : RiteType<span style="color:#f92672">&lt;</span><span style="color:#e6db74">&#34;state&#34;</span><span style="color:#f92672">&gt;</span> {} </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span><span style="color:#66d9ef">class</span> <span style="color:#a6e22e">Rite_Op</span><span style="color:#f92672">&lt;</span>string mnemonic, list<span style="color:#f92672">&lt;</span>Trait<span style="color:#f92672">&gt;</span> traits <span style="color:#f92672">=</span> []<span style="color:#f92672">&gt;</span> : </span></span><span style="display:flex;"><span> Op<span style="color:#f92672">&lt;</span>Rite_Dialect, mnemonic, traits<span style="color:#f92672">&gt;</span>; </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span><span style="color:#f92672">//</span> OPCODE(LOADSELF, B) <span style="color:#f92672">/*</span> R(a) <span style="color:#f92672">=</span> self <span style="color:#f92672">*/</span> </span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">LoadSelfOp</span> : Rite_Op<span style="color:#f92672">&lt;</span><span style="color:#e6db74">&#34;OP_LOADSELF&#34;</span><span style="color:#f92672">&gt;</span> { </span></span><span style="display:flex;"><span> let summary <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;OP_LOADSELF&#34;</span>; </span></span><span style="display:flex;"><span> let results <span style="color:#f92672">=</span> (outs ValueType); </span></span><span style="display:flex;"><span>} </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span><span style="color:#f92672">//</span> OPCODE(LOADI, BB) <span style="color:#f92672">/*</span> R(a) <span style="color:#f92672">=</span> mrb_int(b) <span style="color:#f92672">*/</span> </span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">LoadIOp</span> : Rite_Op<span style="color:#f92672">&lt;</span><span style="color:#e6db74">&#34;OP_LOADI&#34;</span><span style="color:#f92672">&gt;</span> { </span></span><span style="display:flex;"><span> let summary <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;OP_LOADI&#34;</span>; </span></span><span style="display:flex;"><span> let arguments <span style="color:#f92672">=</span> (ins SI64Attr:<span style="color:#960050;background-color:#1e0010">$</span>value); </span></span><span style="display:flex;"><span> let results <span style="color:#f92672">=</span> (outs ValueType); </span></span><span style="display:flex;"><span>} </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span><span style="color:#f92672">//</span> OPCODE(SEND, BBB) <span style="color:#f92672">/*</span> R(a) <span style="color:#f92672">=</span> call(R(a),Syms(b),R(a<span style="color:#f92672">+</span><span style="color:#ae81ff">1</span>),<span style="color:#f92672">...</span>,R(a<span style="color:#f92672">+</span>c)) <span style="color:#f92672">*/</span> </span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">SendOp</span> : Rite_Op<span style="color:#f92672">&lt;</span><span style="color:#e6db74">&#34;OP_SEND&#34;</span><span style="color:#f92672">&gt;</span> { </span></span><span style="display:flex;"><span> let summary <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;OP_SEND&#34;</span>; </span></span><span style="display:flex;"><span> let arguments <span style="color:#f92672">=</span> (ins ValueType:<span style="color:#960050;background-color:#1e0010">$</span>receiver, StringAttr:<span style="color:#960050;background-color:#1e0010">$</span>symbol, UI32Attr:<span style="color:#960050;background-color:#1e0010">$</span>argc, Variadic<span style="color:#f92672">&lt;</span>ValueType<span style="color:#f92672">&gt;</span>:<span style="color:#960050;background-color:#1e0010">$</span>argv); </span></span><span style="display:flex;"><span> let results <span style="color:#f92672">=</span> (outs ValueType); </span></span><span style="display:flex;"><span>} </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span><span style="color:#f92672">//</span> OPCODE(RETURN, B) <span style="color:#f92672">/*</span> <span style="color:#66d9ef">return</span> R(a) (normal) <span style="color:#f92672">*/</span> </span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">ReturnOp</span> : Rite_Op<span style="color:#f92672">&lt;</span><span style="color:#e6db74">&#34;OP_RETURN&#34;</span>, [Terminator]<span style="color:#f92672">&gt;</span> { </span></span><span style="display:flex;"><span> let summary <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;OP_RETURN&#34;</span>; </span></span><span style="display:flex;"><span> let arguments <span style="color:#f92672">=</span> (ins ValueType:<span style="color:#960050;background-color:#1e0010">$</span>src); </span></span><span style="display:flex;"><span> let results <span style="color:#f92672">=</span> (outs ValueType); </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><p>It defines the dialect, the types needed, and the operations. Some entities come from the MLIR&rsquo;s predefined dialects (<code>StringAttr</code>, <code>UI32Attr</code>, <code>Variadic&lt;...&gt;</code>, <code>Terminator</code>). We define the rest.</p> <p>Each operation may take zero or more arguments, but it also may produce zero or more results. Unlike a &ldquo;typical&rdquo; programming language, MLIR dialects define a graph (as <code>ins</code> and <code>outs</code> hint at). The dialects also have some other properties, but one step at a time.</p> <p>With the dialect in place, I can generate an &ldquo;MLIR program&rdquo; which is roughly equivalent to the C program above:</p> <p><em>Note: I omit some details for brevity.</em></p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-mlir" data-lang="mlir"><span style="display:flex;"><span>module <span style="color:#960050;background-color:#1e0010">@</span><span style="color:#e6db74">&#34;test.rb&#34;</span> { </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">func</span> <span style="color:#a6e22e">@top</span>(%arg0: !rite.state, %arg1: !rite.value) -&gt; !rite.value { </span></span><span style="display:flex;"><span> %0 = rite.OP_LOADSELF() : () -&gt; !rite.value </span></span><span style="display:flex;"><span> %1 = rite.OP_LOADI() {value = <span style="color:#ae81ff">42</span> : si64} : () -&gt; !rite.value </span></span><span style="display:flex;"><span> %2 = rite.OP_SEND(%0, %1) {argc = <span style="color:#ae81ff">1</span> : ui32, symbol = <span style="color:#e6db74">&#34;puts&#34;</span>} : (!rite.value, !rite.value) -&gt; !rite.value </span></span><span style="display:flex;"><span> %3 = rite.OP_RETURN(%2) : (!rite.value) -&gt; !rite.value </span></span><span style="display:flex;"><span> } </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><p>Here, I generated an MLIR module containing a function (<code>top</code>) with four operations corresponding to each bytecode operation.</p> <p>Let&rsquo;s take a detailed look at one operation:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-mlir" data-lang="mlir"><span style="display:flex;"><span>%2 = rite.OP_SEND(%0, %1) {argc = <span style="color:#ae81ff">1</span> : ui32, symbol = <span style="color:#e6db74">&#34;puts&#34;</span>} : (!rite.value, !rite.value) -&gt; !rite.value </span></span></code></pre></div><p>This piece defines a value named <code>%2</code>, which takes two other values (<code>%0</code> and <code>%1</code>). In MLIR, constants are defined as &ldquo;attributes,&rdquo; which are <code>argc = 1 : ui32</code> and <code>symbol = &quot;puts&quot;</code> in this case. What follows is the operation signature <code>(!rite.value, !rite.value) -&gt; !rite.value</code>. The operation returns <code>rite.value</code> and takes several arguments: <code>%0</code> is the receiver, and <code>%1</code> is part of the <code>Variadic&lt;ValueType&gt;:$argv</code>.</p> <p>MLIR takes the declarative dialect definition and generates C++ code out of it. The C++ code serves as a programmatic API to generate the MLIR module.</p> <p>Once the module is generated, I can analyze and transform it. The next step is directly converting the Rite Dialect into LLVM Dialect and lowering it into LLVM IR.</p> <p>From there on, I can emit an object file (machine code) and link it with mruby runtime.</p> <h3 id="static-single-assignment-ssa">Static Single Assignment (SSA)</h3> <p>In the previous article, I mentioned that the virtual stack is essential, yet here in both C and MLIR programs, I use &ldquo;local variables&rdquo; instead of the stack. What&rsquo;s going on here?</p> <p>The answer is simple - MLIR uses a Static Single-Assignment form for all its representations.</p> <p>As a reminder, SSA means that each variable can only be defined once.</p> <p><em>Pedantic note: the &ldquo;variables&rdquo; should be referred to as &ldquo;values&rdquo; as they cannot vary.</em></p> <p>Here is an &ldquo;invalid&rdquo; SSA form:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#66d9ef">int</span> x <span style="color:#f92672">=</span> <span style="color:#ae81ff">42</span>; </span></span><span style="display:flex;"><span>x <span style="color:#f92672">=</span> <span style="color:#ae81ff">55</span>; <span style="color:#75715e">// redefinition not allowed in SSA </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#a6e22e">print</span>(x); </span></span></code></pre></div><p>And here is the same code in the SSA form:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#66d9ef">int</span> x <span style="color:#f92672">=</span> <span style="color:#ae81ff">42</span>; </span></span><span style="display:flex;"><span><span style="color:#66d9ef">int</span> x1 <span style="color:#f92672">=</span> <span style="color:#ae81ff">55</span>; <span style="color:#75715e">// &#34;redefinition&#34; generates a new value </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#a6e22e">print</span>(x1); </span></span></code></pre></div><p>We must convert the registers into SSA values to satisfy the MLIR requirement to be in SSA form.</p> <p>At first glance, the problem is trivial. We can maintain a map of definitions for each register at each point in time. For example, for the following bytecode:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-asm" data-lang="asm"><span style="display:flex;"><span><span style="color:#a6e22e">OP_LOADSELF</span> <span style="color:#66d9ef">R1</span> <span style="color:#75715e">// #1 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#a6e22e">OP_LOADI</span> <span style="color:#66d9ef">R2</span> <span style="color:#ae81ff">10</span> <span style="color:#75715e">// #2 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#a6e22e">OP_LOADI</span> <span style="color:#66d9ef">R3</span> <span style="color:#ae81ff">20</span> <span style="color:#75715e">// #3 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#a6e22e">OP_LOADI</span> <span style="color:#66d9ef">R3</span> <span style="color:#ae81ff">30</span> <span style="color:#75715e">// #4 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#a6e22e">OP_ADD</span> <span style="color:#66d9ef">R2</span> <span style="color:#66d9ef">R3</span> <span style="color:#75715e">// #5 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#a6e22e">OP_RETURN</span> <span style="color:#66d9ef">R2</span> <span style="color:#75715e">// #6 </span></span></span></code></pre></div><p>The map changes as follows:</p> <pre tabindex="0"><code>Step #1: { empty } Step #2: { R1 defined by #1 } Step #3: { R1 defined by #1 R2 defined by #2 } Step #4: { R1 defined by #1 R2 defined by #2 R3 defined by #3 } Step #5: { R1 defined by #1 R2 defined by #2 R3 defined by #4 // R3 redefined at #4 } Step #5: { R1 defined by #1 R2 defined by #5 // OP_ADD stores the result in the first operand R3 defined by #4 } </code></pre><p>With this map, we know precisely where a register was defined when an operation uses the register.</p> <p>So MLIR version will look like this:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-mlir" data-lang="mlir"><span style="display:flex;"><span><span style="color:#75715e">// OP_LOADSELF R1 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>%0 = rite.OP_LOADSELF() : () -&gt; !rite.value </span></span><span style="display:flex;"><span><span style="color:#75715e">// OP_LOADI R2 10 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>%1 = rite.OP_LOADI() {value = <span style="color:#ae81ff">10</span> : si64} : () -&gt; !rite.value </span></span><span style="display:flex;"><span><span style="color:#75715e">// OP_LOADI R3 20 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>%2 = rite.OP_LOADI() {value = <span style="color:#ae81ff">20</span> : si64} : () -&gt; !rite.value </span></span><span style="display:flex;"><span><span style="color:#75715e">// OP_LOADI R3 30 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>%3 = rite.OP_LOADI() {value = <span style="color:#ae81ff">30</span> : si64} : () -&gt; !rite.value </span></span><span style="display:flex;"><span><span style="color:#75715e">// OP_ADD R2 R3 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>%4 = rite.OP_ADD(%1, %3) : (!rite.value, !rite.value) -&gt; !rite.value </span></span><span style="display:flex;"><span><span style="color:#75715e">// OP_RETURN R2 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>%5 = rite.OP_RETURN(%4) : (!rite.value) -&gt; !rite.value </span></span></code></pre></div><p><em>Side note: <code>%0</code> and <code>%2</code> are never used and can be eliminated (if <code>OP_LOADSELF</code>/<code>OP_LOADI</code> don&rsquo;t have side effects).</em></p> <p>This solution is pleasant until the code has branching such as <code>if</code>/<code>else</code>, loops, or exceptions.</p> <p>Consider the following non-SSA example:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span>x <span style="color:#f92672">=</span> <span style="color:#ae81ff">10</span>; </span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> (something) { </span></span><span style="display:flex;"><span> x <span style="color:#f92672">=</span> <span style="color:#ae81ff">20</span>; </span></span><span style="display:flex;"><span>} <span style="color:#66d9ef">else</span> { </span></span><span style="display:flex;"><span> x <span style="color:#f92672">=</span> <span style="color:#ae81ff">30</span>; </span></span><span style="display:flex;"><span>} </span></span><span style="display:flex;"><span><span style="color:#a6e22e">print</span>(x); <span style="color:#75715e">// Where x is defined? </span></span></span></code></pre></div><p>Classical SSA solves this problem with artificial <code>phi</code>-nodes:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span>x1 <span style="color:#f92672">=</span> <span style="color:#ae81ff">10</span>; </span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> (something) { </span></span><span style="display:flex;"><span> x2 <span style="color:#f92672">=</span> <span style="color:#ae81ff">20</span>; </span></span><span style="display:flex;"><span>} <span style="color:#66d9ef">else</span> { </span></span><span style="display:flex;"><span> x3 <span style="color:#f92672">=</span> <span style="color:#ae81ff">30</span>; </span></span><span style="display:flex;"><span>} </span></span><span style="display:flex;"><span>x4 <span style="color:#f92672">=</span> <span style="color:#a6e22e">phi</span>(x2, x3); <span style="color:#75715e">// Will magically resolve to the right x depending on where it comes from </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#a6e22e">print</span>(x4); </span></span></code></pre></div><p>MLIR approaches this differently and elegantly - via &ldquo;block arguments.&rdquo;</p> <p>But first, let&rsquo;s talk about Control-Flow Graphs.</p> <h3 id="control-flow-graph-cfg">Control-Flow Graph (CFG)</h3> <p>A control-flow graph is a form of intermediate representation that maintains the program in the form of a graph where operations are connected to each other based on the execution (or control) flow.</p> <p>Consider the following bytecode (the number on the left is an operation address):</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-asm" data-lang="asm"><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">001:</span> <span style="color:#a6e22e">OP_LOADT</span> <span style="color:#66d9ef">R1</span> <span style="color:#75715e">// puts &#34;true&#34; in R1 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#960050;background-color:#1e0010">002:</span> <span style="color:#a6e22e">OP_LOADI</span> <span style="color:#66d9ef">R2</span> <span style="color:#ae81ff">42</span> </span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">003:</span> <span style="color:#a6e22e">OP_JMPIF</span> <span style="color:#66d9ef">R1</span> <span style="color:#ae81ff">006</span> <span style="color:#75715e">// jump to 006 if R1 contains &#34;true&#34; </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// otherwise implicitly falls through to 004 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#960050;background-color:#1e0010">004:</span> <span style="color:#a6e22e">OP_LOADI</span> <span style="color:#66d9ef">R3</span> <span style="color:#ae81ff">20</span> </span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">005:</span> <span style="color:#a6e22e">OP_JMP</span> <span style="color:#ae81ff">007</span> <span style="color:#75715e">// jump to 007 unconditionally </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#960050;background-color:#1e0010">006:</span> <span style="color:#a6e22e">OP_LOADI</span> <span style="color:#66d9ef">R3</span> <span style="color:#ae81ff">30</span> </span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">007:</span> <span style="color:#a6e22e">OP_ADD</span> <span style="color:#66d9ef">R2</span> <span style="color:#66d9ef">R3</span> <span style="color:#75715e">// R3 may be either 20 or 30, depending on the branching </span></span></span></code></pre></div><p>The same program in the form of a graph:</p> <p><img src="https://lowlevelbits.org/img/compiling-ruby-3/naive-cfg.png" alt="CFG without basic blocks"></p> <p>This CFG can be further optimized: we can merge all the subsequent nodes unless the node has more than one incoming or more than one outgoing edge.</p> <p>The merged nodes are called basic blocks:</p> <p><img src="https://lowlevelbits.org/img/compiling-ruby-3/complete-cfg.png" alt="CFG with basic blocks"></p> <p>Some more terms for completeness:</p> <ul> <li>the &ldquo;first&rdquo; basic block where the execution of a function starts is called &ldquo;entry.&rdquo;</li> <li>similarly, the &ldquo;last&rdquo; basic block is called &ldquo;exit.&rdquo;</li> <li>preceding (incoming, previous) basic blocks are called predecessors. The entry block doesn&rsquo;t have predecessors.</li> <li>succeeding (outgoing, next) basic blocks are called successors. Exit blocks don&rsquo;t have successors.</li> <li>the last operation in a basic block is called a terminator</li> </ul> <p>Based on the last picture:</p> <ul> <li><code>B1</code>: entry block</li> <li><code>B4</code>: single exit block. There could be several exit blocks, yet we can always add one &ldquo;empty&rdquo; block as a successor for the exit blocks to have only one exit block.</li> <li><code>B1</code>: predecessors: [], successors: [<code>B2</code>, <code>B3</code>], terminator: <code>OP_JMPIF</code></li> <li><code>B2</code>: predecessors: [<code>B1</code>], successors: [<code>B4</code>], terminator: <code>OP_JMP</code></li> <li><code>B3</code>: predecessors: [<code>B1</code>], successors: [<code>B4</code>], terminator: <code>OP_LOADI</code></li> <li><code>B4</code>: predecessors: [<code>B2</code>, <code>B3</code>], successors: [], terminator: <code>OP_ADD</code></li> </ul> <h3 id="cfgs-in-mlir">CFGs in MLIR</h3> <p>Now we can take a look at CFGs from the MLIR perspective. If you are familiar with CFGs in LLVM, then the important difference is that in MLIR, all the basic blocks may have arguments. Function arguments are, in fact, the block arguments from the entry block. For example, this is a more accurate representation of a function:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-mlir" data-lang="mlir"><span style="display:flex;"><span><span style="color:#66d9ef">func</span> <span style="color:#a6e22e">@top</span>() -&gt; !rite.value { </span></span><span style="display:flex;"><span>^bb0(%arg0: !rite.state, %arg1: !rite.value): </span></span><span style="display:flex;"><span> %0 = rite.OP_LOADSELF() : () -&gt; !rite.value </span></span><span style="display:flex;"><span> %1 = rite.OP_LOADI() {value = <span style="color:#ae81ff">42</span> : si64} : () -&gt; !rite.value </span></span><span style="display:flex;"><span> %2 = rite.OP_SEND(%0, %1) {argc = <span style="color:#ae81ff">1</span> : ui32, symbol = <span style="color:#e6db74">&#34;puts&#34;</span>} : (!rite.value, !rite.value) -&gt; !rite.value </span></span><span style="display:flex;"><span> %3 = rite.OP_RETURN(%2) : (!rite.value) -&gt; !rite.value </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><p><em>Note, <code>^bbX</code> represents the basic blocks.</em></p> <p>To convert the following bytecode:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-asm" data-lang="asm"><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">001:</span> <span style="color:#a6e22e">OP_LOADT</span> <span style="color:#66d9ef">R1</span> <span style="color:#75715e">// puts &#34;true&#34; in R1 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#960050;background-color:#1e0010">002:</span> <span style="color:#a6e22e">OP_LOADI</span> <span style="color:#66d9ef">R2</span> <span style="color:#ae81ff">42</span> </span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">003:</span> <span style="color:#a6e22e">OP_JMPIF</span> <span style="color:#66d9ef">R1</span> <span style="color:#ae81ff">006</span> <span style="color:#75715e">// jump to 006 if R1 contains &#34;true&#34; </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// otherwise implicitly falls through to 004 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#960050;background-color:#1e0010">004:</span> <span style="color:#a6e22e">OP_LOADI</span> <span style="color:#66d9ef">R3</span> <span style="color:#ae81ff">20</span> </span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">005:</span> <span style="color:#a6e22e">OP_JMP</span> <span style="color:#ae81ff">007</span> <span style="color:#75715e">// jump to 007 unconditionally </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#960050;background-color:#1e0010">006:</span> <span style="color:#a6e22e">OP_LOADI</span> <span style="color:#66d9ef">R3</span> <span style="color:#ae81ff">30</span> </span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">007:</span> <span style="color:#a6e22e">OP_ADD</span> <span style="color:#66d9ef">R2</span> <span style="color:#66d9ef">R3</span> <span style="color:#75715e">// R3 may be either 20 or 30, depending on the branching </span></span></span></code></pre></div><p>we need to take several steps:</p> <ul> <li>add an address attribute to all addressable operations (they could be jump targets)</li> <li>add &ldquo;targets&rdquo; attribute to all the jumps, including implicit fallthrough jumps</li> <li>add an explicit jump in place of the implicit jumps</li> <li>add the successor blocks for all jump instructions</li> <li>put all the operations in a single, entry basic block</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-mlir" data-lang="mlir"><span style="display:flex;"><span><span style="color:#66d9ef">func</span> <span style="color:#a6e22e">@top</span>(%arg0: !rite.state, %arg1: !rite.value) -&gt; !rite.value { </span></span><span style="display:flex;"><span> %0 = rite.PhonyValue() : () -&gt; !rite.value </span></span><span style="display:flex;"><span> %1 = rite.OP_LOADT() { address = <span style="color:#ae81ff">001</span> } : () -&gt; !rite.value </span></span><span style="display:flex;"><span> %2 = rite.OP_LOADI() { address = <span style="color:#ae81ff">002</span>, value = <span style="color:#ae81ff">42</span> } : () -&gt; !rite.value </span></span><span style="display:flex;"><span> rite.OP_JMPIF(%0)[^bb1, ^bb1] { address = <span style="color:#ae81ff">003</span>, targets = [<span style="color:#ae81ff">006</span>, <span style="color:#ae81ff">004</span>] } </span></span><span style="display:flex;"><span> %3 = rite.OP_LOADI() { address = <span style="color:#ae81ff">004</span>, value = <span style="color:#ae81ff">20</span> } : () -&gt; !rite.value </span></span><span style="display:flex;"><span> rite.OP_JMP()[^bb1] { address = <span style="color:#ae81ff">005</span>, targets = [<span style="color:#ae81ff">007</span>] } </span></span><span style="display:flex;"><span> %4 = rite.OP_LOADI() { address = <span style="color:#ae81ff">006</span>, value = <span style="color:#ae81ff">30</span> } : () -&gt; !rite.value </span></span><span style="display:flex;"><span> rite.FallthroughJump()[^bb1] </span></span><span style="display:flex;"><span> %5 = rite.OP_ADD(%0, %0) { address = <span style="color:#ae81ff">007</span> } : () -&gt; !rite.value </span></span><span style="display:flex;"><span>^bb1: </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><p><em>Note: I&rsquo;m omitting some details from the textual representation for brevity.</em></p> <p>Notice, here, I added a &ldquo;phony value&rdquo; as a placeholder for SSA values as we cannot yet construct the proper SSA. We will remove them in the next section.</p> <p>Additionally, I added a phony basic block to serve as a placeholder successor for the jump targets.</p> <p>Now, the last steps are:</p> <ul> <li>split the entry basic block by cutting it right before each jump target operation</li> <li>rewire the jumps to point to the right target basic blocks</li> <li>delete the phony basic block used as a placeholder</li> </ul> <p>The final CFG looks like this:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-mlir" data-lang="mlir"><span style="display:flex;"><span><span style="color:#66d9ef">func</span> <span style="color:#a6e22e">@top</span>(%arg0: !rite.state, %arg1: !rite.value) -&gt; !rite.value { </span></span><span style="display:flex;"><span> %0 = rite.PhonyValue() : () -&gt; !rite.value </span></span><span style="display:flex;"><span> %1 = rite.OP_LOADT() { address = <span style="color:#ae81ff">001</span> } : () -&gt; !rite.value </span></span><span style="display:flex;"><span> %2 = rite.OP_LOADI() { address = <span style="color:#ae81ff">002</span>, value = <span style="color:#ae81ff">42</span> } : () -&gt; !rite.value </span></span><span style="display:flex;"><span> rite.OP_JMPIF(%0)[^bb1, ^bb2] { address = <span style="color:#ae81ff">003</span>, targets = [<span style="color:#ae81ff">006</span>, <span style="color:#ae81ff">004</span>] } </span></span><span style="display:flex;"><span>^bb1: <span style="color:#75715e">// pred: ^bb0 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %3 = rite.OP_LOADI() { address = <span style="color:#ae81ff">004</span>, value = <span style="color:#ae81ff">20</span> } : () -&gt; !rite.value </span></span><span style="display:flex;"><span> rite.OP_JMP()[^bb3] { address = <span style="color:#ae81ff">005</span>, targets = [<span style="color:#ae81ff">007</span>] } </span></span><span style="display:flex;"><span>^bb2: <span style="color:#75715e">// pred: ^bb0 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %4 = rite.OP_LOADI() { address = <span style="color:#ae81ff">006</span>, value = <span style="color:#ae81ff">30</span> } : () -&gt; !rite.value </span></span><span style="display:flex;"><span> rite.FallthroughJump()[^bb3] </span></span><span style="display:flex;"><span>^bb3: <span style="color:#75715e">// pred: ^bb1, ^bb2 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %5 = rite.OP_ADD(%0, %0) { address = <span style="color:#ae81ff">007</span> } : () -&gt; !rite.value </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><p>It corresponds to the last picture above, except that we now have an explicit <code>rite.FallthroughJump()</code>.</p> <p>With the CFG in place, we can solve the SSA problem and eliminate the <code>rite.PhonyValue()</code> placeholder.</p> <h3 id="ssa-in-mlir">SSA in MLIR</h3> <p>As a reminder, here is the CFG of the &ldquo;problematic&rdquo; program:</p> <p><img src="https://lowlevelbits.org/img/compiling-ruby-3/complete-cfg.png" alt="CFG with basic blocks"></p> <p>In the MLIR form, we no longer have registers from the virtual stack. We only have values such as <code>%2</code>, <code>%3</code>, <code>%4</code>, and so on. The tricky part is the <code>007: OP_ADD R2 R3</code> operation - where <code>R3</code> is coming from? Is it <code>%3</code> or <code>%4</code>?</p> <p>To answer this question, we can use <a href="https://en.wikipedia.org/wiki/Data-flow_analysis">Data-flow analysis</a>.</p> <p>Dataflow analysis is used to derive specific facts about the program. The analysis is an iterative process: first, collect the base facts for each basic block, then for each basic block, update the facts combining them with the facts from successors or predecessors. As the facts updated for a basic block may affect the facts from successors/predecessors, the process should run iteratively until no new facts are derived.</p> <p>A critical requirement for the facts - they should be monotonic. Once the fact is known, it cannot &ldquo;disappear.&rdquo; This way, the iterative process eventually stops as, in the worst case, the analysis will derive &ldquo;all&rdquo; the facts about the program and won&rsquo;t be able to derive any more.</p> <p>My favorite resource about dataflow analysis is Adrian Sampson&rsquo;s lectures on the subject - <a href="https://www.cs.cornell.edu/courses/cs6120/2020fa/lesson/4/">The Data Flow Framework</a>. I highly recommend it.</p> <p>In our case, the facts we need to derive are: which values/registers are required for each operation.</p> <p>Here is an algorithm briefly:</p> <ul> <li>at every point in time, there is a map of the values defined so far</li> <li>if an operation is using a value that is not defined, then this value is <code>required</code></li> <li>the required values become the block arguments and must be coming from the predecessors</li> <li>the terminators of the &ldquo;required&rdquo; predecessors now use the values required by the successors</li> <li>at the next iteration, the block arguments define the previously required values</li> </ul> <p>The process runs iteratively until no new required values appear.</p> <p>An important detail for the entry basic block is that, as it doesn&rsquo;t have a predecessor, all the required values must come from the virtual stack.</p> <p>Let&rsquo;s look a the example bytecode once again:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-asm" data-lang="asm"><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">001:</span> <span style="color:#a6e22e">OP_LOADT</span> <span style="color:#66d9ef">R1</span> </span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">002:</span> <span style="color:#a6e22e">OP_LOADI</span> <span style="color:#66d9ef">R2</span> <span style="color:#ae81ff">42</span> </span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">003:</span> <span style="color:#a6e22e">OP_JMPIF</span> <span style="color:#66d9ef">R1</span> <span style="color:#ae81ff">006</span> </span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">004:</span> <span style="color:#a6e22e">OP_LOADI</span> <span style="color:#66d9ef">R3</span> <span style="color:#ae81ff">20</span> </span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">005:</span> <span style="color:#a6e22e">OP_JMP</span> <span style="color:#ae81ff">007</span> </span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">006:</span> <span style="color:#a6e22e">OP_LOADI</span> <span style="color:#66d9ef">R3</span> <span style="color:#ae81ff">30</span> </span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">007:</span> <span style="color:#a6e22e">OP_ADD</span> <span style="color:#66d9ef">R2</span> <span style="color:#66d9ef">R3</span> </span></span></code></pre></div><p>This is the initial state for the dataflow analysis. The comments above contain information about defined values for the given point in time. Comment on the side of each operation tells about the operation itself:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-mlir" data-lang="mlir"><span style="display:flex;"><span><span style="color:#66d9ef">func</span> <span style="color:#a6e22e">@top</span>(%arg0: !rite.state, %arg1: !rite.value) -&gt; !rite.value { </span></span><span style="display:flex;"><span> <span style="color:#75715e">// defined: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %0 = rite.PhonyValue() : () -&gt; !rite.value <span style="color:#75715e">// defines: [], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %1 = rite.OP_LOADT() : () -&gt; !rite.value <span style="color:#75715e">// defines: [R1], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [R1] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %2 = rite.OP_LOADI(<span style="color:#ae81ff">42</span>) : () -&gt; !rite.value <span style="color:#75715e">// defines: [R2], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [R1, R2] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> rite.OP_JMPIF(%0)[^bb1, ^bb2] <span style="color:#75715e">// defines: [], uses: [R1] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> </span></span><span style="display:flex;"><span>^bb1: <span style="color:#75715e">// pred: ^bb0 // defines: [], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %3 = rite.OP_LOADI(<span style="color:#ae81ff">20</span>) : () -&gt; !rite.value <span style="color:#75715e">// defines: [R3], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [R3] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> rite.OP_JMP()[^bb3] <span style="color:#75715e">// defines: [], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> </span></span><span style="display:flex;"><span>^bb2: <span style="color:#75715e">// pred: ^bb0 // defines: [], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %4 = rite.OP_LOADI(<span style="color:#ae81ff">30</span>) : () -&gt; !rite.value <span style="color:#75715e">// defines: [R3], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [R3] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> rite.FallthroughJump()[^bb3] <span style="color:#75715e">// defines: [], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> </span></span><span style="display:flex;"><span>^bb3: <span style="color:#75715e">// pred: ^bb1, ^bb2 // defines: [], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %5 = rite.OP_ADD(%0, %0) : () -&gt; !rite.value <span style="color:#75715e">// defines: [R2], uses: [R2, R3] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>} </span></span></code></pre></div><p>The last operation uses values that are not defined. Therefore <code>R2</code> and <code>R3</code> are required and must come from the predecessors.</p> <p>Update predecessors and rerun the analysis.</p> <p><em>Note: I am using %RX_Y names to distinguish them from the original numerical value names. X is the register number, and Y is the basic block number.</em></p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-mlir" data-lang="mlir"><span style="display:flex;"><span><span style="color:#66d9ef">func</span> <span style="color:#a6e22e">@top</span>(%arg0: !rite.state, %arg1: !rite.value) -&gt; !rite.value { </span></span><span style="display:flex;"><span> <span style="color:#75715e">// defined: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %0 = rite.PhonyValue() : () -&gt; !rite.value <span style="color:#75715e">// defines: [], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %1 = rite.OP_LOADT() : () -&gt; !rite.value <span style="color:#75715e">// defines: [R1], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [R1] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %2 = rite.OP_LOADI(<span style="color:#ae81ff">42</span>) : () -&gt; !rite.value <span style="color:#75715e">// defines: [R2], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [R1, R2] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> rite.OP_JMPIF(%0)[^bb1, ^bb2] <span style="color:#75715e">// defines: [], uses: [R1] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> </span></span><span style="display:flex;"><span>^bb1: <span style="color:#75715e">// pred: ^bb0 // defines: [], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %3 = rite.OP_LOADI(<span style="color:#ae81ff">20</span>) : () -&gt; !rite.value <span style="color:#75715e">// defines: [R3], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [R3] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> rite.OP_JMP(%0, %0)[^bb3] <span style="color:#75715e">// defines: [], uses: [R2, R3] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> </span></span><span style="display:flex;"><span>^bb2: <span style="color:#75715e">// pred: ^bb0 // defines: [], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %4 = rite.OP_LOADI(<span style="color:#ae81ff">30</span>) : () -&gt; !rite.value <span style="color:#75715e">// defines: [R3], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [R3] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> rite.FallthroughJump(%0, %0)[^bb3] <span style="color:#75715e">// defines: [], uses: [R2, R3] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> </span></span><span style="display:flex;"><span>^bb3(%R2_3, %R3_3): <span style="color:#75715e">// pred: ^bb1, ^bb2 // defines: [R2, R3], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [R2, R3] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %5 = rite.OP_ADD(%0, %0) : () -&gt; !rite.value <span style="color:#75715e">// defines: [R2], uses: [R2, R3] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>} </span></span></code></pre></div><p>Basic block <code>^bb3</code> now has two block arguments. The terminators from its predecessors (<code>^bb1</code> and <code>^bb2</code>) now use an undefined value, <code>R2</code>. <code>R2</code> is now required. We must add it as a block argument and propagate it to the predecessors&rsquo; terminators.</p> <p>Rerun the analysis:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-mlir" data-lang="mlir"><span style="display:flex;"><span><span style="color:#66d9ef">func</span> <span style="color:#a6e22e">@top</span>(%arg0: !rite.state, %arg1: !rite.value) -&gt; !rite.value { </span></span><span style="display:flex;"><span> <span style="color:#75715e">// defined: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %0 = rite.PhonyValue() : () -&gt; !rite.value <span style="color:#75715e">// defines: [], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %1 = rite.OP_LOADT() : () -&gt; !rite.value <span style="color:#75715e">// defines: [R1], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [R1] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %2 = rite.OP_LOADI(<span style="color:#ae81ff">42</span>) : () -&gt; !rite.value <span style="color:#75715e">// defines: [R2], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [R1, R2] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> rite.OP_JMPIF(%0, %0, %0)[^bb1, ^bb2] <span style="color:#75715e">// defines: [], uses: [R1, R2, R2] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> </span></span><span style="display:flex;"><span>^bb1(%R2_1): <span style="color:#75715e">// pred: ^bb0 // defines: [R2], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [R2] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %3 = rite.OP_LOADI(<span style="color:#ae81ff">20</span>) : () -&gt; !rite.value <span style="color:#75715e">// defines: [R3], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [R2, R3] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> rite.OP_JMP(%0, %0)[^bb3] <span style="color:#75715e">// defines: [], uses: [R2, R3] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> </span></span><span style="display:flex;"><span>^bb2(%R2_2): <span style="color:#75715e">// pred: ^bb0 // defines: [R2], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [R2] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %4 = rite.OP_LOADI(<span style="color:#ae81ff">30</span>) : () -&gt; !rite.value <span style="color:#75715e">// defines: [R3], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [R2, R3] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> rite.FallthroughJump(%0, %0)[^bb3] <span style="color:#75715e">// defines: [], uses: [R2, R3] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> </span></span><span style="display:flex;"><span>^bb3(%R2_3, %R3_3): <span style="color:#75715e">// pred: ^bb1, ^bb2 // defines: [R2, R3], uses: [] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e">// defined: [R2, R3] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %5 = rite.OP_ADD(%0, %0) : () -&gt; !rite.value <span style="color:#75715e">// defines: [R2], uses: [R2, R3] </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>} </span></span></code></pre></div><p>We can run the analysis one more time, but it won&rsquo;t change anything, so that would conclude the analysis, and we should have all the information we need to replace the phony value with the correct values.</p> <p>Additionally, now we can replace our custom jump operations with the builtin ones from MLIR, so the final function looks like this:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-mlir" data-lang="mlir"><span style="display:flex;"><span><span style="color:#66d9ef">func</span> <span style="color:#a6e22e">@top</span>(%arg0: !rite.state, %arg1: !rite.value) -&gt; !rite.value { </span></span><span style="display:flex;"><span> %1 = rite.OP_LOADT() : () -&gt; !rite.value </span></span><span style="display:flex;"><span> %2 = rite.OP_LOADI(<span style="color:#ae81ff">42</span>) : () -&gt; !rite.value </span></span><span style="display:flex;"><span> cond_br %1, ^bb1(%2), ^bb2(%2) </span></span><span style="display:flex;"><span>^bb1(%R2_1): <span style="color:#75715e">// pred: ^bb0 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %3 = rite.OP_LOADI(<span style="color:#ae81ff">20</span>) : () -&gt; !rite.value </span></span><span style="display:flex;"><span> br ^bb3(%R2_1, %3) </span></span><span style="display:flex;"><span>^bb2(%R2_2): <span style="color:#75715e">// pred: ^bb0 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %4 = rite.OP_LOADI(<span style="color:#ae81ff">30</span>) : () -&gt; !rite.value </span></span><span style="display:flex;"><span> br ^bb3(%R2_2, %4) </span></span><span style="display:flex;"><span>^bb3(%R2_3, %R3_3): <span style="color:#75715e">// pred: ^bb1, ^bb2 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> %5 = rite.OP_ADD(%R2_3, %R3_3) : () -&gt; !rite.value </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><p>Now, onto drawing the rest of the fu**ing owl.</p> <hr> <p><strong>Thank you so much for reaching this far!</strong></p> <p>The next article gives a short <a href="https://lowlevelbits.org/compiling-ruby-part-4/">progress update</a>.</p> Compiling Ruby. Part 2: RiteVM https://lowlevelbits.org/compiling-ruby-part-2/ Wed, 04 Jan 2023 [email protected] (Alex Denisov) https://lowlevelbits.org/compiling-ruby-part-2/ <p>mruby (so-called &ldquo;embedded&rdquo; Ruby) is a relatively small Ruby implementation.</p> <p>mruby is based on a register-based virtual machine. In the previous article, I mentioned the difference between stack- and register-based VMs, but what is a Virtual Machine? As obvious as it gets, a Virtual Machine is a piece of software that mimics specific behavior(s) of a Real Machine.</p> <p>Depending on the kind of virtual machine, the capabilities may vary. A VM can mimic a typical computer&rsquo;s complete behavior, allowing us to run any software we&rsquo;d run on a regular machine (think VirtualBox or VMware). Or it can implement a behavior of an imaginary, artificial machine that doesn&rsquo;t have a counterpart in the real physical world (think JVM or CLR).</p> <p>The mruby RiteVM is of a latter kind. It defines a set of &ldquo;CPU&rdquo; operations and provides a runtime to run them. The operations are referred to as bytecode. The bytecode consists of an operation kind (opcode) and its corresponding metadata (registers, flags, etc.).</p> <h3 id="bytecode">Bytecode</h3> <p>Here is a tiny snippet of various RiteVM operations (coming from <code>mruby/ops.h</code>):</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#a6e22e">OPCODE</span>(NOP, Z) <span style="color:#75715e">/* no operation */</span> </span></span><span style="display:flex;"><span><span style="color:#a6e22e">OPCODE</span>(MOVE, BB) <span style="color:#75715e">/* R(a) = R(b) */</span> </span></span><span style="display:flex;"><span><span style="color:#a6e22e">OPCODE</span>(ADD, B) <span style="color:#75715e">/* R(a) = R(a)+R(a+1) */</span> </span></span><span style="display:flex;"><span><span style="color:#a6e22e">OPCODE</span>(ENTER, W) <span style="color:#75715e">/* arg setup according to flags (23=m5:o5:r1:m5:k5:d1:b1) */</span> </span></span><span style="display:flex;"><span><span style="color:#a6e22e">OPCODE</span>(JMP, S) <span style="color:#75715e">/* pc+=a */</span> </span></span></code></pre></div><p>All the opcodes follow the same form:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#a6e22e">OPCODE</span>(name, operands) <span style="color:#75715e">/* comment */</span> </span></span></code></pre></div><p>The <code>name</code> is self-explanatory. The <code>comment</code> describes (or hints at) an operation&rsquo;s semantics. The <code>operands</code> is a bit more tricky and is directly related to the bytecode encoding.</p> <p>Each letter in the <code>operands</code> describes the size of the operand. <code>Z</code> means that the operand&rsquo;s size is zero bytes (i.e., there is no operand). <code>B</code>, <code>S</code>, and <code>W</code> all mean one operand, but their sizes are 1, 2, and 3 bytes, respectively. These definitions can be mixed and matched as needed, but in practice, only the following combinations are used (from <code>mruby/ops.h</code>):</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#75715e">/* operand types: </span></span></span><span style="display:flex;"><span><span style="color:#75715e"> + BB: 8+8bit </span></span></span><span style="display:flex;"><span><span style="color:#75715e"> + BBB: 8+8+8bit </span></span></span><span style="display:flex;"><span><span style="color:#75715e"> + BS: 8+16bit </span></span></span><span style="display:flex;"><span><span style="color:#75715e"> + BSS: 8+16+16bit </span></span></span><span style="display:flex;"><span><span style="color:#75715e">*/</span> </span></span></code></pre></div><p>as the operation may have up to three operands max.</p> <p>The operands are called <code>a</code>, <code>b</code>, and <code>c</code>. The following bytecode string will be decoded differently depending on the operand definition (the <code>42</code> will be mapped to a corresponding opcode):</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#ae81ff">42</span> <span style="color:#ae81ff">1</span> <span style="color:#ae81ff">2</span> <span style="color:#ae81ff">3</span> </span></span></code></pre></div><ul> <li><code>BBB</code> -&gt; <code>a = 1, b = 2, c = 3</code></li> <li><code>B</code> -&gt; <code>a = 1, b = undefined, c = undefined</code>, <code>2</code> is treated as the next opcode</li> <li><code>BS</code> -&gt; <code>a = 1, b = 2 &lt;&lt; 8 | 3, c = undefined</code></li> <li><code>W</code> -&gt; <code>a = 1 &lt;&lt; 16 | 2 &lt;&lt; 8 | 3, b = undefined, c = undefined</code></li> <li>and so on.</li> </ul> <p>Now the comments from the snippet above make more sense:</p> <ul> <li><code>NOP</code> does nothing with all its zero operands</li> <li><code>MOVE</code> copies value from register <code>b</code> to register <code>a</code></li> <li><code>ENTER</code> maps the operand <code>a</code> to the flags needed for its logic</li> <li><code>JMP</code> changes the program counter to point to a new location <code>b</code></li> </ul> <p>With all this information, we now understand <em>what</em> the operations do. The next question is <em>how</em> do they do it?</p> <h3 id="bytecode-execution">Bytecode Execution</h3> <p>The bytecode doesn&rsquo;t live in a vacuum. Each bytecode sequence is part of a method. Consider the following example:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-ruby" data-lang="ruby"><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">sum</span>(a, b) </span></span><span style="display:flex;"><span> a <span style="color:#f92672">+</span> b </span></span><span style="display:flex;"><span><span style="color:#66d9ef">end</span> </span></span><span style="display:flex;"><span>puts sum(<span style="color:#ae81ff">10</span>, <span style="color:#ae81ff">32</span>) </span></span></code></pre></div><p>We can look into its bytecode:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-asm" data-lang="asm"><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">&gt;</span> <span style="color:#a6e22e">mruby</span> --<span style="color:#66d9ef">verbose</span> <span style="color:#66d9ef">sum.rb</span> </span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">&lt;</span><span style="color:#a6e22e">skipped</span><span style="color:#960050;background-color:#1e0010">&gt;</span> </span></span><span style="display:flex;"><span><span style="color:#a6e22e">irep</span> <span style="color:#ae81ff">0x600001390000</span> <span style="color:#66d9ef">nregs</span><span style="color:#960050;background-color:#1e0010">=</span><span style="color:#ae81ff">6</span> <span style="color:#66d9ef">nlocals</span><span style="color:#960050;background-color:#1e0010">=</span><span style="color:#ae81ff">1</span> <span style="color:#66d9ef">pools</span><span style="color:#960050;background-color:#1e0010">=</span><span style="color:#ae81ff">0</span> <span style="color:#66d9ef">syms</span><span style="color:#960050;background-color:#1e0010">=</span><span style="color:#ae81ff">2</span> <span style="color:#66d9ef">reps</span><span style="color:#960050;background-color:#1e0010">=</span><span style="color:#ae81ff">1</span> <span style="color:#66d9ef">ilen</span><span style="color:#960050;background-color:#1e0010">=</span><span style="color:#ae81ff">25</span> </span></span><span style="display:flex;"><span>file: <span style="color:#a6e22e">sum.rb</span> </span></span><span style="display:flex;"><span> <span style="color:#960050;background-color:#1e0010">1</span> <span style="color:#960050;background-color:#1e0010">000</span> <span style="color:#a6e22e">TCLASS</span> <span style="color:#66d9ef">R1</span> </span></span><span style="display:flex;"><span> <span style="color:#960050;background-color:#1e0010">1</span> <span style="color:#960050;background-color:#1e0010">002</span> <span style="color:#a6e22e">METHOD</span> <span style="color:#66d9ef">R2</span> <span style="color:#66d9ef">I</span>(<span style="color:#ae81ff">0</span>:<span style="color:#ae81ff">0x600001390050</span>) </span></span><span style="display:flex;"><span> <span style="color:#960050;background-color:#1e0010">1</span> <span style="color:#960050;background-color:#1e0010">005</span> <span style="color:#a6e22e">DEF</span> <span style="color:#66d9ef">R1</span> :<span style="color:#66d9ef">sum</span> </span></span><span style="display:flex;"><span> <span style="color:#960050;background-color:#1e0010">4</span> <span style="color:#960050;background-color:#1e0010">008</span> <span style="color:#a6e22e">LOADI</span> <span style="color:#66d9ef">R3</span> <span style="color:#ae81ff">10</span> </span></span><span style="display:flex;"><span> <span style="color:#960050;background-color:#1e0010">4</span> <span style="color:#960050;background-color:#1e0010">011</span> <span style="color:#a6e22e">LOADI</span> <span style="color:#66d9ef">R4</span> <span style="color:#ae81ff">32</span> </span></span><span style="display:flex;"><span> <span style="color:#960050;background-color:#1e0010">4</span> <span style="color:#960050;background-color:#1e0010">014</span> <span style="color:#a6e22e">SSEND</span> <span style="color:#66d9ef">R2</span> :<span style="color:#66d9ef">sum</span> <span style="color:#66d9ef">n</span><span style="color:#960050;background-color:#1e0010">=</span><span style="color:#ae81ff">2</span> (<span style="color:#ae81ff">0x02</span>) </span></span><span style="display:flex;"><span> <span style="color:#960050;background-color:#1e0010">4</span> <span style="color:#960050;background-color:#1e0010">018</span> <span style="color:#a6e22e">SSEND</span> <span style="color:#66d9ef">R1</span> :<span style="color:#66d9ef">puts</span> <span style="color:#66d9ef">n</span><span style="color:#960050;background-color:#1e0010">=</span><span style="color:#ae81ff">1</span> (<span style="color:#ae81ff">0x01</span>) </span></span><span style="display:flex;"><span> <span style="color:#960050;background-color:#1e0010">4</span> <span style="color:#960050;background-color:#1e0010">022</span> <span style="color:#a6e22e">RETURN</span> <span style="color:#66d9ef">R1</span> </span></span><span style="display:flex;"><span> <span style="color:#960050;background-color:#1e0010">4</span> <span style="color:#960050;background-color:#1e0010">024</span> <span style="color:#a6e22e">STOP</span> </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span><span style="color:#a6e22e">irep</span> <span style="color:#ae81ff">0x600001390050</span> <span style="color:#66d9ef">nregs</span><span style="color:#960050;background-color:#1e0010">=</span><span style="color:#ae81ff">7</span> <span style="color:#66d9ef">nlocals</span><span style="color:#960050;background-color:#1e0010">=</span><span style="color:#ae81ff">4</span> <span style="color:#66d9ef">pools</span><span style="color:#960050;background-color:#1e0010">=</span><span style="color:#ae81ff">0</span> <span style="color:#66d9ef">syms</span><span style="color:#960050;background-color:#1e0010">=</span><span style="color:#ae81ff">0</span> <span style="color:#66d9ef">reps</span><span style="color:#960050;background-color:#1e0010">=</span><span style="color:#ae81ff">0</span> <span style="color:#66d9ef">ilen</span><span style="color:#960050;background-color:#1e0010">=</span><span style="color:#ae81ff">14</span> </span></span><span style="display:flex;"><span><span style="color:#a6e22e">local</span> <span style="color:#66d9ef">variable</span> <span style="color:#66d9ef">names</span>: </span></span><span style="display:flex;"><span> R1:<span style="color:#a6e22e">a</span> </span></span><span style="display:flex;"><span> R2:<span style="color:#a6e22e">b</span> </span></span><span style="display:flex;"><span> R3:<span style="color:#960050;background-color:#1e0010">&amp;</span> </span></span><span style="display:flex;"><span>file: <span style="color:#a6e22e">sum.rb</span> </span></span><span style="display:flex;"><span> <span style="color:#960050;background-color:#1e0010">1</span> <span style="color:#960050;background-color:#1e0010">000</span> <span style="color:#a6e22e">ENTER</span> <span style="color:#ae81ff">2</span>:<span style="color:#ae81ff">0</span>:<span style="color:#ae81ff">0</span>:<span style="color:#ae81ff">0</span>:<span style="color:#ae81ff">0</span>:<span style="color:#ae81ff">0</span>:<span style="color:#ae81ff">0</span> (<span style="color:#ae81ff">0x80000</span>) </span></span><span style="display:flex;"><span> <span style="color:#960050;background-color:#1e0010">2</span> <span style="color:#960050;background-color:#1e0010">004</span> <span style="color:#a6e22e">MOVE</span> <span style="color:#66d9ef">R4</span> <span style="color:#66d9ef">R1</span> <span style="color:#75715e">; R1:a </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#960050;background-color:#1e0010">2</span> <span style="color:#960050;background-color:#1e0010">007</span> <span style="color:#a6e22e">MOVE</span> <span style="color:#66d9ef">R5</span> <span style="color:#66d9ef">R2</span> <span style="color:#75715e">; R2:b </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#960050;background-color:#1e0010">2</span> <span style="color:#960050;background-color:#1e0010">010</span> <span style="color:#a6e22e">ADD</span> <span style="color:#66d9ef">R4</span> <span style="color:#66d9ef">R5</span> </span></span><span style="display:flex;"><span> <span style="color:#960050;background-color:#1e0010">2</span> <span style="color:#960050;background-color:#1e0010">012</span> <span style="color:#a6e22e">RETURN</span> <span style="color:#66d9ef">R4</span> </span></span></code></pre></div><p>The bytecode sequence is part of the <code>mrb_irep</code> struct, which is subsequently part of the <code>RProc</code> struct, which corresponds to a Ruby method (procedure?) object.</p> <p>The distinction is necessary as <code>RProc</code> is a higher-level abstraction over an executable code, which might be either a RiteVM bytecode or a C function. Additionally, there is a distinction between a <code>lambda</code>, a <code>block</code>, and a <code>method</code>. Yet, we will only focus on the bytecode parts and ignore all the lambda/block/method shenanigans.</p> <p>In the <a href="https://lowlevelbits.org/compiling-ruby-part-1/#ruby-and-its-many-virtual-machines">previous article</a>, I briefly described the dispatch loop and how a VM interacts with the virtual stack. The description is not precise but accurate and catches the essential details.</p> <p>Execution of each <code>RProc</code> requires a virtual stack to operate on the data, but it also requires some additional metadata. The &ldquo;metadata&rdquo; is part of the so-called <code>mrb_callinfo</code> struct. This concept is known as <code>stack frame</code> or <code>activation record</code>. The virtual stack is stored separately but is part of the <code>mrb_callinfo</code> (sort of). The virtual stack is essential as it is the only way to communicate between different operations and different <code>RProc</code>s.</p> <p>Here is what happens during bytecode execution:</p> <ol> <li><code>mrb_callinfo</code> is created from an <code>RProc</code> and is put onto the &ldquo;call info&rdquo; stack or simply a call stack. The new <code>mrb_callinfo</code> points to a new location of the shared virtual stack (see the first picture below).</li> <li>Each operation in <code>RProc</code>&rsquo;s <code>mrb_irep</code> is executed in the context of the top <code>mrb_callinfo</code> on the call stack. The virtual stack and state of the VM are updated accordingly.</li> <li>When any &ldquo;sendable&rdquo; (<code>OP_SEND</code>, <code>OP_SSEND</code>, <code>OP_SENDBV</code>, etc.) operation is encountered, we move to step 1.</li> <li>When any &ldquo;returnable&rdquo; (<code>OP_RETURN</code>, <code>OP_RETURN_BLK</code>) operation is encountered, then the operand is put into the &ldquo;return register&rdquo; (for consumption by the caller), and the call stack is popped, effectively removing <code>mrb_callinfo</code> created at step 1.</li> </ol> <p>Here is how it looks in memory:</p> <p><img src="https://lowlevelbits.org/img/compiling-ruby-2/bytecode-execution-high-level.png" alt="Bytecode Execution Highlevel View"></p> <p><code>mrb_state</code> (the state of the whole VM) has a stack of <code>mrb_context</code>s (more on them in a later article). Each <code>mrb_context</code> maintains the stack of <code>mrb_callinfo</code> (the call stack). Each <code>mrb_context</code> owns a virtual stack, which is shared among several <code>mrb_callinfo</code>.</p> <p>This way, the caller prepares the stack for the callee.</p> <p>As a reminder, here is the bytecode from the example above:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-asm" data-lang="asm"><span style="display:flex;"><span>top: </span></span><span style="display:flex;"><span><span style="color:#a6e22e">TCLASS</span> <span style="color:#66d9ef">R1</span> </span></span><span style="display:flex;"><span><span style="color:#a6e22e">METHOD</span> <span style="color:#66d9ef">R2</span> <span style="color:#66d9ef">I</span>(<span style="color:#ae81ff">0</span>:<span style="color:#ae81ff">0x600001390050</span>) </span></span><span style="display:flex;"><span><span style="color:#a6e22e">DEF</span> <span style="color:#66d9ef">R1</span> :<span style="color:#66d9ef">sum</span> </span></span><span style="display:flex;"><span><span style="color:#a6e22e">LOADI</span> <span style="color:#66d9ef">R3</span> <span style="color:#ae81ff">10</span> </span></span><span style="display:flex;"><span><span style="color:#a6e22e">LOADI</span> <span style="color:#66d9ef">R4</span> <span style="color:#ae81ff">32</span> </span></span><span style="display:flex;"><span><span style="color:#a6e22e">SSEND</span> <span style="color:#66d9ef">R2</span> :<span style="color:#66d9ef">sum</span> <span style="color:#66d9ef">n</span><span style="color:#960050;background-color:#1e0010">=</span><span style="color:#ae81ff">2</span> (<span style="color:#ae81ff">0x02</span>) </span></span><span style="display:flex;"><span><span style="color:#a6e22e">SSEND</span> <span style="color:#66d9ef">R1</span> :<span style="color:#66d9ef">puts</span> <span style="color:#66d9ef">n</span><span style="color:#960050;background-color:#1e0010">=</span><span style="color:#ae81ff">1</span> (<span style="color:#ae81ff">0x01</span>) </span></span><span style="display:flex;"><span><span style="color:#a6e22e">RETURN</span> <span style="color:#66d9ef">R1</span> </span></span><span style="display:flex;"><span><span style="color:#a6e22e">STOP</span> </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span>sum: </span></span><span style="display:flex;"><span><span style="color:#a6e22e">ENTER</span> <span style="color:#ae81ff">2</span>:<span style="color:#ae81ff">0</span>:<span style="color:#ae81ff">0</span>:<span style="color:#ae81ff">0</span>:<span style="color:#ae81ff">0</span>:<span style="color:#ae81ff">0</span>:<span style="color:#ae81ff">0</span> (<span style="color:#ae81ff">0x80000</span>) </span></span><span style="display:flex;"><span><span style="color:#a6e22e">MOVE</span> <span style="color:#66d9ef">R4</span> <span style="color:#66d9ef">R1</span> <span style="color:#75715e">; R1:a </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#a6e22e">MOVE</span> <span style="color:#66d9ef">R5</span> <span style="color:#66d9ef">R2</span> <span style="color:#75715e">; R2:b </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#a6e22e">ADD</span> <span style="color:#66d9ef">R4</span> <span style="color:#66d9ef">R5</span> </span></span><span style="display:flex;"><span><span style="color:#a6e22e">RETURN</span> <span style="color:#66d9ef">R4</span> </span></span></code></pre></div><p>This is how the shared stack looks from the perspective of both the top-level method <code>top</code> and the method <code>sum</code>: by the time the first <code>SSEND</code> operand (&ldquo;send to self&rdquo;) is executed, all the values are ready for consumption by the callee.</p> <p><img src="https://lowlevelbits.org/img/compiling-ruby-2/bytecode-execution.png" style=" display: block; margin-left: 0; margin-right: auto; width: 80%; height: auto;" /></p> <hr> <p>Hopefully, now you better understand how RiteVM uses bytecode, and we are one step closer to the actual fun part - compilation!</p> <p>The following article covers MLIR and the way I modeled dialects - <a href="https://lowlevelbits.org/compiling-ruby-part-3/">MLIR and compilation</a></p> Compiling Ruby. Part 1: Compilers vs. Interpreters https://lowlevelbits.org/compiling-ruby-part-1/ Fri, 02 Dec 2022 [email protected] (Alex Denisov) https://lowlevelbits.org/compiling-ruby-part-1/ <p>With the (hopefully) convincing <a href="https://lowlevelbits.org/compiling-ruby-part-0">motivation</a> out of the way, we can get to the technical details.</p> <h3 id="compiling-interpreter-interpreting-compiler">Compiling Interpreter, Interpreting Compiler</h3> <p>As mentioned in the motivation, I want to build an ahead-of-time compiler for Ruby. I want it to be compatible with the existing Ruby implementation to fit it naturally into the existing system.</p> <p>So the first question I had to answer is - how do I even do it?</p> <h4 id="compilers-vs-interpreters">Compilers vs. Interpreters</h4> <p>The execution model of compiled and interpreted languages is slightly different:</p> <ul> <li> <p>a compiler takes the source program and outputs another program that can be run on any other machine even when the compiler is not on that target machine</p> </li> <li> <p>an interpreter also takes the source program as an input but does not output anything and runs the program right away</p> </li> </ul> <p>Unlike the compiler, the interpreter must be present on the machine you want to run the program. To build the compiler, I have to somehow combine the interpreter with the program it runs.</p> <p>Let&rsquo;s take a high-level schematic view of a typical compiler and interpreter.</p> <p><img src="https://lowlevelbits.org/img/compiling-ruby-1/compiler-vs-interpreter.png" alt="Execution model of an interpreted and compiled program"></p> <p>The compiler is a straightforward one-way process: the source code is parsed, then the machine code is generated, and the executable is produced. The executable also depends on a runtime. The runtime can be either embedded into the executable or be an external entity, but usually both.</p> <p>The interpreter is more complex in this regard. It contains everything in one place: parser, runtime, and a virtual machine. Also, note the two-way arrows <code>Parser &lt;-&gt; VM</code> and <code>Runtime &lt;-&gt; VM.</code> The reason is that Ruby is a dynamic language. During the regular program execution, a program can read more code from the disk or network and execute it, thus the interconnection between these components.</p> <h4 id="parser--vm--runtime">Parser + VM + Runtime</h4> <p>Arguably, the triple <code>VM</code> + <code>Parser</code> + <code>Runtime</code> can be called &ldquo;a runtime,&rdquo; but I prefer to have some separation of concerns. Here is where I draw the boundaries:</p> <ul> <li>Parser: only does the parsing of the source code and converts it into a form suitable for execution via the Virtual Machine (&ldquo;bytecode&rdquo;)</li> <li>Virtual Machine: the primary &ldquo;computational device,&rdquo; it operates on the bytecode and actually &ldquo;runs&rdquo; the program</li> <li>Runtime: machinery required by the parser and VM (e.g., VM state manipulation, resource management, etc.)</li> </ul> <p>A naĂŻve approach to building the compiler is to tear the interpreter apart: replace VM and runtime with codegen and embed the runtime into the resulting executable. However, the runtime extraction won&rsquo;t work due to the dynamism mentioned above - the resulting executable should be able to parse and run any arbitrary Ruby code.</p> <p><em>Side note: an alternative approach is to build a JIT compiler and embed the whole compiler into the executable, but it adds more complexity than I am ready to deal with.</em></p> <p>In the end, the solution is simpler - the compiler and the final executable include the whole interpreter. So the final &ldquo;compiling interpreter&rdquo; (or &ldquo;interpreting compiler&rdquo;) looks like this:</p> <p><img src="https://lowlevelbits.org/img/compiling-ruby-1/compiler.png" alt="Compiling interpreter"></p> <h3 id="ruby-and-its-many-virtual-machines">Ruby and its many Virtual Machines</h3> <p>Now it&rsquo;s time to discuss the <code>Virtual Machine</code> component.</p> <p>The most widely used Ruby implementation is CRuby, also known as MRI (as in &ldquo;<a href="https://en.wikipedia.org/wiki/Yukihiro_Matsumoto">Matz</a>&rsquo; Ruby Interpreter&rdquo;). It is an interpreter built on top of a custom virtual machine (YARV).</p> <p>Another widely used implementation is <a href="https://mruby.org">mruby</a> (so-called &ldquo;embedded&rdquo; Ruby). It is also an interpreter and built on top of another custom VM (RiteVM).</p> <p>YARV and RiteVM are rather lightweight virtual machines. Unlike full-fledged system or process-level VMs (e.g., VirtualBox, JVM, CLR, etc.), they only provide a &ldquo;computational device&rdquo; - there is no resource control, sandboxing, etc.</p> <h4 id="stack-vs-registers">Stack vs. Registers</h4> <p>The &ldquo;computational device&rdquo; executes certain operations on certain data. The operations are encoded in the form of a &ldquo;bytecode.&rdquo; And the data is stored on a &ldquo;virtual stack&rdquo;. Though, the stack is accessed differently.</p> <p>YARV accesses the stack implicitly (this is also known as a &ldquo;stack-based VM&rdquo;). RiteVM accesses the stack explicitly via registers (you got it, &ldquo;register-based VM&rdquo;).</p> <p>To illustrate the bytecode and the difference between YARV and RiteVM, consider the following artificial examples.</p> <p>Stack-based bytecode:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-asm" data-lang="asm"><span style="display:flex;"><span><span style="color:#a6e22e">load</span> <span style="color:#ae81ff">10</span> </span></span><span style="display:flex;"><span><span style="color:#a6e22e">load</span> <span style="color:#ae81ff">32</span> </span></span><span style="display:flex;"><span><span style="color:#a6e22e">plus</span> </span></span><span style="display:flex;"><span><span style="color:#a6e22e">print</span> </span></span></code></pre></div><p>Register-based bytecode:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-asm" data-lang="asm"><span style="display:flex;"><span><span style="color:#a6e22e">load</span> <span style="color:#66d9ef">R1</span> <span style="color:#ae81ff">10</span> </span></span><span style="display:flex;"><span><span style="color:#a6e22e">load</span> <span style="color:#66d9ef">R2</span> <span style="color:#ae81ff">32</span> </span></span><span style="display:flex;"><span><span style="color:#a6e22e">plus</span> <span style="color:#66d9ef">R1</span> <span style="color:#66d9ef">R1</span> <span style="color:#66d9ef">R2</span> </span></span><span style="display:flex;"><span><span style="color:#a6e22e">print</span> <span style="color:#66d9ef">R1</span> </span></span></code></pre></div><p>The stack-based version uses the stack implicitly, while another version specifies the storage explicitly.</p> <p>Let&rsquo;s &ldquo;run&rdquo; both examples to see them in action.</p> <p><img src="https://lowlevelbits.org/img/compiling-ruby-1/vm-execution.png" alt="Comparison of stack and register-based VMs"></p> <p>At every step, the VM does something according to the currently running instruction/opcode (underscored lines) and updates the virtual stack.</p> <p>Stack-based VM only reads/writes data from/to the place where an arrow points to - this is the top of the virtual stack.</p> <p>Register-based VM does the same but has random access to the virtual stack.</p> <p>While the underlying machinery is very similar, there are good reasons for picking one or the other form of a VM. Yet, these reasons are out of the scope of this series. Please, consult elsewhere if you want to learn more. The topic of VMs is huge but fascinating.</p> <h4 id="dispatch-loop">Dispatch loop</h4> <p>Let&rsquo;s consider how the VM works and deals with the bytecode. YARV and RiteVM use the so-called &ldquo;dispatch loop,&rdquo; which is effectively a for-loop + a huge <code>switch</code>-statement. Typical pseudocode looks like this:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#75715e">// Iterate through each opcode in the bytecode stream </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">for</span> (opcode in bytecode) { </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">switch</span> (opcode) { </span></span><span style="display:flex;"><span> <span style="color:#75715e">// Take a corresponding action for each separate opcode </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#66d9ef">case</span> OP_CODE_1: <span style="color:#75715e">/* do something */</span>; </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">case</span> OP_CODE_2: <span style="color:#75715e">/* do something */</span>; </span></span><span style="display:flex;"><span> <span style="color:#75715e">// ... more opcodes </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#66d9ef">case</span> OP_CODE_N: <span style="color:#75715e">/* do something */</span>; </span></span><span style="display:flex;"><span> } </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><p>And then, the bodies for the actual opcodes may look as follows. Stack-based VM:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#75715e">/* </span></span></span><span style="display:flex;"><span><span style="color:#75715e">Example program: </span></span></span><span style="display:flex;"><span><span style="color:#75715e"> load 10 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"> load 32 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"> plus </span></span></span><span style="display:flex;"><span><span style="color:#75715e"> print </span></span></span><span style="display:flex;"><span><span style="color:#75715e">*/</span> </span></span><span style="display:flex;"><span><span style="color:#66d9ef">case</span> OP_LOAD: </span></span><span style="display:flex;"><span> val <span style="color:#f92672">=</span> pool[<span style="color:#ae81ff">0</span>] <span style="color:#75715e">// pool is some abstract additional storage </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> stack.<span style="color:#a6e22e">push</span>(val) </span></span><span style="display:flex;"><span><span style="color:#66d9ef">case</span> OP_PLUS: </span></span><span style="display:flex;"><span> lhs <span style="color:#f92672">=</span> stack.<span style="color:#a6e22e">pop</span>() </span></span><span style="display:flex;"><span> rhs <span style="color:#f92672">=</span> stack.<span style="color:#a6e22e">pop</span>() </span></span><span style="display:flex;"><span> res <span style="color:#f92672">=</span> lhs <span style="color:#f92672">+</span> rhs </span></span><span style="display:flex;"><span> stack.<span style="color:#a6e22e">push</span>(res) </span></span><span style="display:flex;"><span><span style="color:#66d9ef">case</span> OP_PRINT: </span></span><span style="display:flex;"><span> val <span style="color:#f92672">=</span> stack.<span style="color:#a6e22e">pop</span>() </span></span><span style="display:flex;"><span> <span style="color:#a6e22e">print</span>(val) </span></span></code></pre></div><p>And the register-based version for completeness:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#75715e">/* </span></span></span><span style="display:flex;"><span><span style="color:#75715e">Example program: </span></span></span><span style="display:flex;"><span><span style="color:#75715e"> load R1 10 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"> load R2 32 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"> plus R1 R1 R2 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"> print R1 </span></span></span><span style="display:flex;"><span><span style="color:#75715e">*/</span> </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span><span style="color:#75715e">// md is some additional opcode metadata </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">case</span> OP_LOAD: </span></span><span style="display:flex;"><span> registers[md.reg1] <span style="color:#f92672">=</span> pool[<span style="color:#ae81ff">0</span>] </span></span><span style="display:flex;"><span><span style="color:#66d9ef">case</span> OP_PLUS: </span></span><span style="display:flex;"><span> lhs <span style="color:#f92672">=</span> registers[md.reg1] </span></span><span style="display:flex;"><span> rhs <span style="color:#f92672">=</span> registers[md.reg2] </span></span><span style="display:flex;"><span> res <span style="color:#f92672">=</span> lhs <span style="color:#f92672">+</span> rhs </span></span><span style="display:flex;"><span> registers[md.reg1] <span style="color:#f92672">=</span> res </span></span><span style="display:flex;"><span><span style="color:#66d9ef">case</span> OP_PRINT: </span></span><span style="display:flex;"><span> val <span style="color:#f92672">=</span> registers[md.reg1] </span></span><span style="display:flex;"><span> <span style="color:#a6e22e">print</span>(val) </span></span></code></pre></div><p>In this case, if we know the values behind <code>pool[0]</code> and the actual values of <code>md.regN</code>, then we compile the example program to something like this:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#75715e">/* </span></span></span><span style="display:flex;"><span><span style="color:#75715e"> load R1 10 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"> load R2 32 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"> plus R1 R1 R2 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"> print R1 </span></span></span><span style="display:flex;"><span><span style="color:#75715e">*/</span> </span></span><span style="display:flex;"><span>R1 <span style="color:#f92672">=</span> <span style="color:#ae81ff">10</span> </span></span><span style="display:flex;"><span>R2 <span style="color:#f92672">=</span> <span style="color:#ae81ff">32</span> </span></span><span style="display:flex;"><span>R1 <span style="color:#f92672">=</span> R1 <span style="color:#f92672">+</span> R2 </span></span><span style="display:flex;"><span><span style="color:#a6e22e">print</span>(R1) </span></span></code></pre></div><p>and avoid the whole dispatch loop, but I digress :)</p> <hr> <p>In the following article, we look into mruby&rsquo;s implementation and virtual machine in more detail - <a href="https://lowlevelbits.org/compiling-ruby-part-2/">Compiling Ruby. Part 2: RiteVM</a>.</p> Compiling Ruby. Part 0: Motivation https://lowlevelbits.org/compiling-ruby-part-0/ Fri, 02 Dec 2022 [email protected] (Alex Denisov) https://lowlevelbits.org/compiling-ruby-part-0/ <p>For the last couple of years, I&rsquo;ve been working on a fun side project called <a href="https://dragonruby.org/toolkit/game">DragonRuby Game Toolkit</a>, or GTK for short.</p> <p>GTK is a professional-grade 2D game engine. Among the many incredible features:</p> <ul> <li>you can build games in Ruby</li> <li>it targets many (like, many!) platforms (Windows, Linux, macOS, iOS, Android, WASM, Nintendo Switch, Xbox, PlayStation, Oculus VR, Steam Deck)</li> <li>super lightweight (~3.5 megabytes)</li> <li><a href="https://dragonruby.org/toolkit/game">and many more really</a></li> </ul> <p>GTK is built on top of a slightly customized mruby runtime and allows you to write games purely in Ruby. It comes with all the batteries included, but if you need more in a specific case, you can always fall back to C via the C extensions mechanism.</p> <p>From a user perspective, the end product (the game) looks like this:</p> <p><img src="https://lowlevelbits.org/img/compiling-ruby-0/end-product.png" alt="End Product"></p> <p>While the engine itself is pretty fast, what annoys me personally (from the aesthetic point of view) is that we cannot fully optimize the C extensions as they are compiled separately from the rest of the engine.</p> <p>Looking at the picture, we have four components of the game:</p> <ul> <li>the engine&rsquo;s runtime (Ruby)</li> <li>the engine&rsquo;s runtime (C)</li> <li>the game code (Ruby)</li> <li>the game code (C)</li> </ul> <p>Suppose we want to optimize all the C code together. In that case, we&rsquo;d have to ship the runtime in some &lsquo;common&rsquo; denominator form (e.g., <a href="https://lowlevelbits.org/bitcode-demystified/">LLVM Bitcode</a>), then compile the C extension into the same form, optimize it all together and then link into an executable.</p> <p>This is doable, but while I was thinking about this problem I&rsquo;ve found even bigger (and much more interesting) &lsquo;problem&rsquo; - what about all that Ruby code? Can we also compile it to some common form and then optimize it with the rest of the C code out there?</p> <p><img src="https://lowlevelbits.org/img/compiling-ruby-0/optimizations.png" alt="Optimizations"></p> <p>The answer is - definitely yes! We just need to build a compiler that would do that job.</p> <p>At the time of writing, the compiler is far from being done, but it works reasonably well, and I can successfully compile and run more than half of the mruby test suite.</p> <p>As a sneak peek, here is an output from the test suite:</p> <pre tabindex="0"><code>/opt/DragonRuby/FireStorm/cmake-build-llvm-14-asan/tests/MrbTests/firestorm_mrbtest mrbtest - Embeddable Ruby Test ............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................?......................................................................................................................... Skip: File.expand_path (with ENV) Total: 934 OK: 933 KO: 0 Crash: 0 Warning: 0 Skip: 1 Time: 0.45 seconds Process finished with exit code 0 </code></pre><hr> <p>I hope this motivation gives you enough information on why someone would do what I am doing!</p> <p>Let&rsquo;s take a look at the approach I am taking to solve this problem - <a href="https://lowlevelbits.org/compiling-ruby-part-1/">Compilers vs. Interpreters</a></p> How to learn compilers: LLVM Edition https://lowlevelbits.org/how-to-learn-compilers-llvm-edition/ Thu, 04 Nov 2021 [email protected] (Alex Denisov) https://lowlevelbits.org/how-to-learn-compilers-llvm-edition/ <div id="wrap" class="text-center"> <div style="display: inline-block;" class="content-upgrade"> <div style="margin: 6px;"> This is a mirror of the Substack article <br/> <a href="https://lowlevelbits.com/p/how-to-learn-compilers-llvm-edition"> How to learn Compilers (LLVM Edition) </a><br/> The most recent version is there. </div> </div> </div> <p>Compilers and Programming Languages is a huge topic. You cannot just take a learning path and finish it at some point. There are many different areas, each of which is endless.</p> <p>Here, I want to share some links that would help to learn compilers. The list could not be exhaustive - everyone is busy, and no one has time to read the <a href="https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools">Dragon Book</a>.</p> <p>The main criteria behind each link:</p> <ul> <li>I can personally recommend the material as I went through it</li> <li>each entry should be relatively short and can be consumed in a reasonable time</li> </ul> <p>I&rsquo;m a big fan of learning through practicing. Thus the main focus is on LLVM, as you can go and do something cool with real-world programs!</p> <p>The list consists of four groups: general theory, front-end, middle-end, and back-end.</p> <p>At the first run, you can take the first item from each group, and it should put you on solid ground.</p> <h3 id="disclaimer">Disclaimer</h3> <p>There are a lot of excellent resources out there! Some of them are not on the list because of my subjective judgment, and the others are not here because I&rsquo;ve never seen them!</p> <p><strong>Please, share your favorite resource either via <a href="mailto:[email protected]">email</a> or on <a href="https://twitter.com/1101_debian/status/1456346324794806274">Twitter</a>.</strong></p> <h3 id="general-theory--introduction">General Theory / Introduction</h3> <ul> <li> <p><a href="http://aosabook.org/en/llvm.html">AOSA book: LLVM</a>. This is a chapter from the <a href="http://aosabook.org/en/index.html">Architecture of Open Source Applications</a> book. It is written by Chris Lattner and covers high-level LLVM design.</p> </li> <li> <p><a href="https://online.stanford.edu/courses/soe-ycscs1-compilers">Compilers</a>. The course is taught by Alex Aiken. In this course, you build a compiler for a real programming language from scratch. It covers the whole compilation pipeline: parsing, type-checking, optimizations, code generation. Besides practical parts, it also dives into the theory.</p> </li> <li> <p><a href="https://online.stanford.edu/courses/soe-ycsautomata-automata-theory">Automata Theory</a>. The course is taught by Jeffrey Ullman. This one is pretty heavy on theory. It starts with relatively simple topics like state machines and finite automata (deterministic and otherwise). It gradually moves on to more complex things like Turing-machines, computational complexity, famous P vs. NP, etc.</p> </li> </ul> <p>or</p> <ul> <li><a href="https://ocw.mit.edu/courses/mathematics/18-404j-theory-of-computation-fall-2020/">Theory of Computation</a>. This course is taught by Michael Sipser. It is similar to the one above but delivered in a different style. It goes into more detail on specific topics.</li> </ul> <h3 id="front-end">Front-end</h3> <p>The compiler front-end is where the interaction with the actual source code happens. The compiler parses the source code into an Abstract Syntax Tree (AST), does semantic analysis and type-checking, and converts it into the intermediate representation (IR).</p> <p>The Compilers course from the above covers the general parts. Here are some links specific to Clang:</p> <ul> <li> <p><a href="https://jonasdevlieghere.com/understanding-the-clang-ast/">Understanding the Clang AST</a>. This article is written by Jonas Devlieghere. It goes into detail and touches implementation details of Clang&rsquo;s AST. It also has a lot of excellent links to dive deeper into the subject.</p> </li> <li> <p><a href="https://github.com/banach-space/clang-tutor/">clang-tutor</a>. This repository maintained by Andrzej WarzyĹ„ski. It contains several Clang plugins covering various topics, from simple AST traversals to more involved subjects such as automatic refactoring and obfuscation.</p> </li> </ul> <h3 id="middle-end">Middle-end</h3> <p>The middle-end is a place where various optimizations happen. Typically, the middle-ends use some intermediate representation. The intermediate representation of LLVM is usually referred to as LLVM IR or LLVM Bitcode. In a nutshell, it is a human-readable assembly language for a pseudo-machine (i.e., the IR does not target any specific CPU). The LLVM IR maintains certain properties: it is in a Static Single Assignment (SSA) form organized as a Control-Flow Graph (CFG).</p> <ul> <li> <p><a href="https://www.youtube.com/watch?v=m8G_S5LwlTo">LLVM IR Tutorial - Phis, GEPs and other things, oh my!</a>. This is a great talk by Vince Bridgers and Felipe de Azevedo Piovezan.</p> </li> <li> <p><a href="https://www.youtube.com/watch?v=J5xExRGaIIY">Introduction to LLVM</a>. A one-hour-long talk/tutorial from LLVM Developers meeting given by Eric Christopher and Johannes Doerfert. Another great tutorial that better builds on top of the previous video.</p> </li> <li> <p><a href="https://www.cs.cornell.edu/courses/cs6120/2020fa/self-guided/">CS 6120: Advanced Compilers</a>. The course is taught by Adrian Sampson. The title says &ldquo;advanced,&rdquo; but it covers what one would expect in a modern production-grade compiler: SSA, CFG, optimizations, various analyses.</p> </li> <li> <p><a href="https://lowlevelbits.org/bitcode-demystified/">Bitcode Demystified</a>(🔌). This one is from me. It gives a high-level description of what&rsquo;s the LLVM Bitcode is.</p> </li> <li> <p><a href="https://github.com/banach-space/llvm-tutor">llvm-tutor</a>. This one is also from Andrzej WarzyĹ„ski. It covers LLVM plugins (so-called passes) that allow one to analyze and transform the programs in the LLVM IR form.</p> </li> </ul> <h3 id="back-end">Back-end</h3> <p>The last phase of the compilation is a back-end. This phase aims to convert the intermediate representation into a machine code (zeros and ones). The zeros and ones later can be run on the CPU. Therefore, to understand the back-end, you need to understand the machine code and how CPUs work.</p> <ul> <li> <p><a href="https://www.coursera.org/learn/build-a-computer">Build a Modern Computer from First Principles: From Nand to Tetris</a>. Taught by Shimon Schocken and Noam Nisan. This course starts backward: first, you build the logic gates (and, or, xor, etc.), then use the logic gates to construct Arithmetic-Logic Unit (ALU), and then use the ALU to build the CPU. Then you learn how to control the CPU with zeros and ones (machine code), and eventually, you develop your assembler to convert the human-readable assembly into the machine code.</p> </li> <li> <p><a href="https://lowlevelbits.org/parsing-mach-o-files/">Parsing Mach-O files</a>(🔌). This is a short article written by me. It shows how to parse object files on macOS (Mach-O). If you are on Linux or Windows, search for similar articles on <code>elf</code> and <code>PE/COFF</code> files, respectively.</p> </li> <li> <p><a href="https://book.easyperf.net/perf_book">Performance Analysis and Tuning on Modern CPUs</a>. The book by Denis Bakhvalov. While it is about performance, it gives an excellent introduction to how CPUs work.</p> </li> </ul> <h3 id="bonus-points">Bonus points</h3> <p>Here are some more LLVM related channels I recommend looking at:</p> <ul> <li> <p><a href="https://www.youtube.com/channel/UCv2_41bSAa5Y_8BacJUZfjQ">LLVM&rsquo;s YouTube channel</a>. Here you can find a lot of talks from developer meetings.</p> </li> <li> <p><a href="https://llvmweekly.org">LLVM Weekly</a>. A weekly newsletter run by Alex Bradbury. This is the single newsletter I am aware of that doesn&rsquo;t have ads!</p> </li> <li> <p><a href="https://blog.llvm.org">LLVM Blog</a>. This is, well, LLVM&rsquo;s blog.</p> </li> <li> <p><a href="https://llvm.org/docs/tutorial/">LLVM Tutorials</a>. Good starting points, even if you know nothing about compilers.</p> </li> <li> <p><a href="https://blog.regehr.org/archives/category/compilers">Embedded in academia</a>. John Regehr&rsquo;s blog has lots of goodies when it comes to LLVM and compilers!</p> </li> </ul> <h3 id="strings-attached">Strings attached</h3> <p>As I mentioned in the beginning, Compilers is a huge field! If you go through the material above, you will learn a lot, but you will still have a few knowledge gaps in the whole compilation pipeline (I certainly do). But the good thing is - you&rsquo;d know what the gaps are and how to address them!</p> <p>Good luck!</p> LLVM meets Code Property Graphs https://lowlevelbits.org/llvm-meets-code-property-graphs/ Tue, 23 Feb 2021 [email protected] (Alex Denisov) https://lowlevelbits.org/llvm-meets-code-property-graphs/ <p><em>This is a cross-post from LLVM&rsquo;s blog post <a href="https://blog.llvm.org/posts/2021-02-23-llvm-meets-code-property-graphs/">LLVM meets Code Property Graphs</a></em></p> <p>The code property graph (CPG) is a data structure designed to mine large codebases for instances of programming patterns via a domain-specific query language. It was first introduced in the proceedings of the IEEE Security and Privacy conference in 2014 (<a href="https://ieeexplore.ieee.org/abstract/document/6956589">publication</a>, <a href="https://www.sec.cs.tu-bs.de/pubs/2014-ieeesp.pdf">PDF</a>) in the context of vulnerability discovery in C system code and the Linux kernel in particular. The core ideas of the approach are the following:</p> <ul> <li>the CPG combines several program representations into one</li> <li>the CPG is stored in a graph database</li> <li>the graph database comes with a DSL allowing to traverse and query the CPG</li> </ul> <p>Currently, the CPG infrastructure is supported by several tools:</p> <ul> <li><a href="https://ocular.shiftleft.io">Ocular</a> - a proprietary code analysis tool supporting Java, Scala, C#, Go, Python, and JavaScript languages</li> <li><a href="https://joern.io">Joern</a> - an open-source counterpart of Ocular supporting C and C++</li> <li><a href="https://plume-oss.github.io/plume-docs/">Plume</a> - an open-source tool supporting Java Bytecode</li> </ul> <p>This article presents <a href="https://www.shiftleft.io">ShiftLeft</a>&rsquo;s open-source implementation of <a href="https://github.com/ShiftLeftSecurity/llvm2cpg">llvm2cpg</a> - a standalone tool that brings LLVM Bitcode support to Joern. But before we dive into details, let us say few more words about CPG and Joern.</p> <h2 id="code-property-graph">Code Property Graph</h2> <p>The core idea of the CPG is that different classic program representations are merged into a property graph, a single data structure that holds information about the program&rsquo;s syntax, control- and intra-procedural data-flow.</p> <p>Graphically speaking, the following piece of code:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#66d9ef">void</span> <span style="color:#a6e22e">foo</span>() { </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">int</span> x <span style="color:#f92672">=</span> <span style="color:#a6e22e">source</span>(); </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">if</span> (x <span style="color:#f92672">&lt;</span> MAX) { </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">int</span> y <span style="color:#f92672">=</span> <span style="color:#ae81ff">2</span> <span style="color:#f92672">*</span> x; </span></span><span style="display:flex;"><span> <span style="color:#a6e22e">sink</span>(y); </span></span><span style="display:flex;"><span> } </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><p>combines these three different representations:</p> <p><img src="https://lowlevelbits.org/img/cpg/different-representations.png" alt="Different program representations"></p> <p>into a single representation - Code Property Graph:</p> <p><img src="https://lowlevelbits.org/img/cpg/cpg.png" alt="Code Property Graph"></p> <h2 id="joern">Joern</h2> <p>The property graph is stored in a graph database and made accessible via a domain-specific language (DSL) to identify programming patterns based on a DSL for graph traversals. The query language allows a seamless transition between the original code representations, making it possible to combine aspects of the code from different views these representations offer.</p> <p>One of the primary interfaces to the code property graphs is a tool called <a href="https://joern.io">Joern</a>. It provides the mentioned DSL and allows to query the CPG to discover specific properties of a program. Here are some examples of the Joern&rsquo;s DSL:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-scala" data-lang="scala"><span style="display:flex;"><span>joern<span style="color:#f92672">&gt;</span> cpg<span style="color:#f92672">.</span>typeDecl<span style="color:#f92672">.</span>name<span style="color:#f92672">.</span>p </span></span><span style="display:flex;"><span><span style="color:#a6e22e">List</span><span style="color:#f92672">[</span><span style="color:#66d9ef">String</span><span style="color:#f92672">]</span> <span style="color:#66d9ef">=</span> <span style="color:#a6e22e">List</span><span style="color:#f92672">(</span><span style="color:#e6db74">&#34;ANY&#34;</span><span style="color:#f92672">,</span> <span style="color:#e6db74">&#34;int&#34;</span><span style="color:#f92672">,</span> <span style="color:#e6db74">&#34;void&#34;</span><span style="color:#f92672">)</span> </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span>joern<span style="color:#f92672">&gt;</span> cpg<span style="color:#f92672">.</span>method<span style="color:#f92672">.</span>name<span style="color:#f92672">.</span>p </span></span><span style="display:flex;"><span><span style="color:#a6e22e">List</span><span style="color:#f92672">[</span><span style="color:#66d9ef">String</span><span style="color:#f92672">]</span> <span style="color:#66d9ef">=</span> <span style="color:#a6e22e">List</span><span style="color:#f92672">(</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;foo&#34;</span><span style="color:#f92672">,</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;&lt;operator&gt;.multiplication&#34;</span><span style="color:#f92672">,</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;source&#34;</span><span style="color:#f92672">,</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;&lt;operator&gt;.lessThan&#34;</span><span style="color:#f92672">,</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;&lt;operator&gt;.assignment&#34;</span><span style="color:#f92672">,</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;sink&#34;</span> </span></span><span style="display:flex;"><span><span style="color:#f92672">)</span> </span></span><span style="display:flex;"><span>joern<span style="color:#f92672">&gt;</span> cpg<span style="color:#f92672">.</span>method<span style="color:#f92672">(</span><span style="color:#e6db74">&#34;foo&#34;</span><span style="color:#f92672">).</span>ast<span style="color:#f92672">.</span>isControlStructure<span style="color:#f92672">.</span>code<span style="color:#f92672">.</span>p </span></span><span style="display:flex;"><span><span style="color:#a6e22e">List</span><span style="color:#f92672">[</span><span style="color:#66d9ef">String</span><span style="color:#f92672">]</span> <span style="color:#66d9ef">=</span> <span style="color:#a6e22e">List</span><span style="color:#f92672">(</span><span style="color:#e6db74">&#34;if (x &lt; MAX)&#34;</span><span style="color:#f92672">)</span> </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span>joern<span style="color:#f92672">&gt;</span> cpg<span style="color:#f92672">.</span>method<span style="color:#f92672">(</span><span style="color:#e6db74">&#34;foo&#34;</span><span style="color:#f92672">).</span>ast<span style="color:#f92672">.</span>isCall<span style="color:#f92672">.</span>map<span style="color:#f92672">(</span>c <span style="color:#66d9ef">=&gt;</span> c<span style="color:#f92672">.</span>file<span style="color:#f92672">.</span>name<span style="color:#f92672">.</span>head <span style="color:#f92672">+</span> <span style="color:#e6db74">&#34;:&#34;</span> <span style="color:#f92672">+</span> c<span style="color:#f92672">.</span>lineNumber<span style="color:#f92672">.</span>get <span style="color:#f92672">+</span> <span style="color:#e6db74">&#34; &#34;</span> <span style="color:#f92672">+</span> c<span style="color:#f92672">.</span>name <span style="color:#f92672">+</span> <span style="color:#e6db74">&#34;: &#34;</span> <span style="color:#f92672">+</span> c<span style="color:#f92672">.</span>code<span style="color:#f92672">).</span>p </span></span><span style="display:flex;"><span><span style="color:#a6e22e">List</span><span style="color:#f92672">[</span><span style="color:#66d9ef">String</span><span style="color:#f92672">]</span> <span style="color:#66d9ef">=</span> <span style="color:#a6e22e">List</span><span style="color:#f92672">(</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;main.c:2 &lt;operator&gt;.assignment: x = source()&#34;</span><span style="color:#f92672">,</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;main.c:2 source: source()&#34;</span><span style="color:#f92672">,</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;main.c:3 &lt;operator&gt;.lessThan: x &lt; MAX&#34;</span><span style="color:#f92672">,</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;main.c:4 &lt;operator&gt;.assignment: y = 2 * x&#34;</span><span style="color:#f92672">,</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;main.c:4 &lt;operator&gt;.multiplication: 2 * x&#34;</span><span style="color:#f92672">,</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;main.c:5 sink: sink(y)&#34;</span> </span></span><span style="display:flex;"><span><span style="color:#f92672">)</span> </span></span></code></pre></div><p>Besides the DSL, Joern comes with a data-flow tracker enabling more sophisticated queries, such as &ldquo;is there a user controlled malloc in the program?&rdquo;</p> <p>The DSL is much more powerful than in the example, but that is out of scope of this article. Please, refer to the <a href="https://docs.joern.io/home">documentation</a> to learn more.</p> <h2 id="llvm-and-cpg">LLVM and CPG</h2> <p>This part is split into two smaller parts: the first one covers a few implementation details, the second one shows an example of how to use <code>llvm2cpg</code>. If you are not interested in the implementation - scroll down :)</p> <h3 id="implementation-details">Implementation Details</h3> <p>When we decided to add LLVM support for CPG, one of the first questions was: how do we map bitcode representation onto CPG?</p> <p>We took a simple approach - let&rsquo;s pretend the SSA representation is just a flat source program. In other words, the following bitcode</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-llvm" data-lang="llvm"><span style="display:flex;"><span><span style="color:#66d9ef">define</span> <span style="color:#66d9ef">i32</span> @sum(<span style="color:#66d9ef">i32</span> %a, <span style="color:#66d9ef">i32</span> %a) { </span></span><span style="display:flex;"><span> %r = <span style="color:#66d9ef">add</span> <span style="color:#66d9ef">nsw</span> <span style="color:#66d9ef">i32</span> %a, %b </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">ret</span> <span style="color:#66d9ef">i32</span> %r </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><p>can be seen as a C program:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span>i32 <span style="color:#a6e22e">sum</span>(i32 a, i32 b) { </span></span><span style="display:flex;"><span> i32 r <span style="color:#f92672">=</span> <span style="color:#a6e22e">add</span>(a, b); </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">return</span> r; </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><p>From the high-level perspective, the approach is simple, but there are some tiny details we had to overcome.</p> <h4 id="instruction-semantics">Instruction semantics</h4> <p>We can map some of the LLVM instructions back onto the internal CPG operations. Here are some examples:</p> <ul> <li><code>add</code>, <code>fadd</code> -&gt; <code>&lt;operator&gt;.addition</code></li> <li><code>bitcast</code> -&gt; <code>&lt;operator&gt;.cast</code></li> <li><code>fcmp eq</code>, <code>icmp eq</code> -&gt; <code>&lt;operator&gt;.equals</code></li> <li><code>urem</code>, <code>srem</code>, <code>frem</code> -&gt; <code>&lt;operator&gt;.modulo</code></li> <li><code>getelementptr</code> -&gt; a combination of <code>&lt;operator&gt;.pointerShift</code>, <code>&lt;operator&gt;.indexAccess</code>, and <code>&lt;operator&gt;.memberAccess</code> depending on the underlying types of the GEP operand</li> </ul> <p>Most of these <code>&lt;operator&gt;.*</code>s have special semantics, which plays a crucial role in the Joern and Ocular built-in data-flow trackers.</p> <p>Unfortunately, not every LLVM instruction has a corresponding operator in the CPG. In those cases, we had to fall back to function calls. For example:</p> <ul> <li><code>select i1 %cond, i32 %v1, i32 %v3</code> turns into <code>select(cond, v1, v2)</code></li> <li><code>atomicrmw add i32* %ptr, i32 1</code> turns into <code>atomicrmwAdd(ptr, 1)</code> (same for any other <code>atomicrmw</code> operator)</li> <li><code>fneg float %val</code> turns into <code>fneg(val)</code></li> </ul> <p>The only instruction we could not map to the CPG is the <code>phi</code>: CPG doesn&rsquo;t have a Phi node concept. We had to eliminate <code>phi</code> instructions using <code>reg2mem</code> machinery.</p> <h4 id="redundancy">Redundancy</h4> <p>For a small C program</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#66d9ef">int</span> <span style="color:#a6e22e">sum</span>(<span style="color:#66d9ef">int</span> a, <span style="color:#66d9ef">int</span> b) { </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">return</span> a <span style="color:#f92672">+</span> b; </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><p>Clang emits a lot of redundant instructions by default</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-llvm" data-lang="llvm"><span style="display:flex;"><span><span style="color:#66d9ef">define</span> <span style="color:#66d9ef">i32</span> @sum(<span style="color:#66d9ef">i32</span> %0, <span style="color:#66d9ef">i32</span> %1) { </span></span><span style="display:flex;"><span> %3 = <span style="color:#66d9ef">alloca</span> <span style="color:#66d9ef">i32</span>, <span style="color:#66d9ef">align</span> <span style="color:#ae81ff">4</span> </span></span><span style="display:flex;"><span> %4 = <span style="color:#66d9ef">alloca</span> <span style="color:#66d9ef">i32</span>, <span style="color:#66d9ef">align</span> <span style="color:#ae81ff">4</span> </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">store</span> <span style="color:#66d9ef">i32</span> %0, <span style="color:#66d9ef">i32</span>* %3, <span style="color:#66d9ef">align</span> <span style="color:#ae81ff">4</span> </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">store</span> <span style="color:#66d9ef">i32</span> %1, <span style="color:#66d9ef">i32</span>* %4, <span style="color:#66d9ef">align</span> <span style="color:#ae81ff">4</span> </span></span><span style="display:flex;"><span> %5 = <span style="color:#66d9ef">load</span> <span style="color:#66d9ef">i32</span>, <span style="color:#66d9ef">i32</span>* %3, <span style="color:#66d9ef">align</span> <span style="color:#ae81ff">4</span> </span></span><span style="display:flex;"><span> %6 = <span style="color:#66d9ef">load</span> <span style="color:#66d9ef">i32</span>, <span style="color:#66d9ef">i32</span>* %4, <span style="color:#66d9ef">align</span> <span style="color:#ae81ff">4</span> </span></span><span style="display:flex;"><span> %7 = <span style="color:#66d9ef">add</span> <span style="color:#66d9ef">nsw</span> <span style="color:#66d9ef">i32</span> %5, %6 </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">ret</span> <span style="color:#66d9ef">i32</span> %7 </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><p>instead of a more concise version</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-llvm" data-lang="llvm"><span style="display:flex;"><span><span style="color:#66d9ef">define</span> <span style="color:#66d9ef">i32</span> @sum(<span style="color:#66d9ef">i32</span> %0, <span style="color:#66d9ef">i32</span> %1) { </span></span><span style="display:flex;"><span> %3 = <span style="color:#66d9ef">add</span> <span style="color:#66d9ef">nsw</span> <span style="color:#66d9ef">i32</span> %1, %0 </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">ret</span> <span style="color:#66d9ef">i32</span> %3 </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><p>In general, this is not a problem, but it adds more complexity for the data-flow tracker and needlessly increases the graph&rsquo;s size. One of the considerations was to run optimizations before emitting CPG for the bitcode. Still, in the end, we decided to offload this work to an end-user: if you want fewer instructions, then apply the optimizations manually before emitting the CPG.</p> <h4 id="type-equality">Type Equality</h4> <p>The other issue is related to the way LLVM handles types. If two modules in the same context use the same struct with the same name, LLVM renames the other struct to prevent name collisions. For example</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-llvm" data-lang="llvm"><span style="display:flex;"><span><span style="color:#75715e">; Module1 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>%struct.Point = <span style="color:#66d9ef">type</span> { <span style="color:#66d9ef">i32</span>, <span style="color:#66d9ef">i32</span> } </span></span></code></pre></div><p>and</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-llvm" data-lang="llvm"><span style="display:flex;"><span><span style="color:#75715e">; Module 2 </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>%struct.Point = <span style="color:#66d9ef">type</span> { <span style="color:#66d9ef">i32</span>, <span style="color:#66d9ef">i32</span> } </span></span></code></pre></div><p>when loaded into the same context yield two types</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-llvm" data-lang="llvm"><span style="display:flex;"><span>%struct.Point = <span style="color:#66d9ef">type</span> { <span style="color:#66d9ef">i32</span>, <span style="color:#66d9ef">i32</span> } </span></span><span style="display:flex;"><span>%struct.Point.1 = <span style="color:#66d9ef">type</span> { <span style="color:#66d9ef">i32</span>, <span style="color:#66d9ef">i32</span> } </span></span></code></pre></div><p>We wanted to deduplicate these types for a better user experience and only emit <code>Point</code> in the final graph.</p> <p>The obvious solution was to consider two structs with &ldquo;similar&rdquo; names and the same layout to be the same. However, we could not rely on the <code>llvm::StructType::isLayoutIdentical</code> because, despite the name, it produces misleading results.</p> <p>According to <code>llvm::StructType::isLayoutIdentical</code> the structs <code>Point</code> and <code>Pair</code> have identical layout, but <code>PointWrap</code> and <code>PairWrap</code> are not.</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-llvm" data-lang="llvm"><span style="display:flex;"><span><span style="color:#75715e">; these two have identical layout </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>%Point = <span style="color:#66d9ef">type</span> { <span style="color:#66d9ef">i32</span>, <span style="color:#66d9ef">i32</span> } </span></span><span style="display:flex;"><span>%Pair = <span style="color:#66d9ef">type</span> { <span style="color:#66d9ef">i32</span>, <span style="color:#66d9ef">i32</span> } </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span><span style="color:#75715e">; these two DO NOT have identical layout </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>%PointWrap = <span style="color:#66d9ef">type</span> { %Point } </span></span><span style="display:flex;"><span>%PairWrap = <span style="color:#66d9ef">type</span> { %Pair } </span></span></code></pre></div><p>This happens because <code>llvm::StructType::isLayoutIdentical</code> determines equality based on the pointers. That is, if all the struct elements are identical, then the layout identical. It also meant we could not use this approach to compare types from different LLVM contexts. We had to roll out our custom solution based on the <a href="https://lowlevelbits.org/type-equality-in-llvm/">Tree Automata</a> to solve this issue.</p> <hr> <p>There are few more details, but the article is getting longer than it needs to be. So let&rsquo;s look at how to use <code>llvm2cpg</code> with Joern.</p> <h3 id="example">Example</h3> <p>Once you have <a href="https://docs.joern.io/installation">Joern</a> and <a href="http://github.com/ShiftLeftSecurity/llvm2cpg/releases/latest">llvm2cpg</a> installed the usage is straightforward:</p> <ol> <li>Convert a program into LLVM Bitcode</li> <li>Emit CPG</li> <li>Load the CPG into Joern and start the analysis</li> </ol> <p>Here are the steps codified:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>$ cat main.c </span></span><span style="display:flex;"><span>extern int MAX; </span></span><span style="display:flex;"><span>extern int source<span style="color:#f92672">()</span>; </span></span><span style="display:flex;"><span>extern void sink<span style="color:#f92672">(</span>int<span style="color:#f92672">)</span>; </span></span><span style="display:flex;"><span>void foo<span style="color:#f92672">()</span> <span style="color:#f92672">{</span> </span></span><span style="display:flex;"><span> int x <span style="color:#f92672">=</span> source<span style="color:#f92672">()</span>; </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">if</span> <span style="color:#f92672">(</span>x &lt; MAX<span style="color:#f92672">)</span> <span style="color:#f92672">{</span> </span></span><span style="display:flex;"><span> int y <span style="color:#f92672">=</span> <span style="color:#ae81ff">2</span> * x; </span></span><span style="display:flex;"><span> sink<span style="color:#f92672">(</span>y<span style="color:#f92672">)</span>; </span></span><span style="display:flex;"><span> <span style="color:#f92672">}</span> </span></span><span style="display:flex;"><span><span style="color:#f92672">}</span> </span></span><span style="display:flex;"><span>$ clang -S -emit-llvm -g -O1 main.c -o main.ll </span></span><span style="display:flex;"><span>$ llvm2cpg -output<span style="color:#f92672">=</span>/tmp/cpg.bin.zip main.ll </span></span></code></pre></div><p>Now you get the CPG saved at <code>/tmp/cpg.bin.zip</code> which you can load into Joern and find if there is a flow from the <code>source</code> function to the <code>sink</code>:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>$ joern </span></span><span style="display:flex;"><span>joern&gt; importCpg<span style="color:#f92672">(</span><span style="color:#e6db74">&#34;/tmp/cpg.bin.zip&#34;</span><span style="color:#f92672">)</span> </span></span><span style="display:flex;"><span>joern&gt; run.ossdataflow </span></span><span style="display:flex;"><span>joern&gt; def source <span style="color:#f92672">=</span> cpg.call<span style="color:#f92672">(</span><span style="color:#e6db74">&#34;source&#34;</span><span style="color:#f92672">)</span> </span></span><span style="display:flex;"><span>joern&gt; def sink <span style="color:#f92672">=</span> cpg.call<span style="color:#f92672">(</span><span style="color:#e6db74">&#34;sink&#34;</span><span style="color:#f92672">)</span>.argument </span></span><span style="display:flex;"><span>joern&gt; sink.reachableByFlows<span style="color:#f92672">(</span>source<span style="color:#f92672">)</span>.p </span></span><span style="display:flex;"><span>List<span style="color:#f92672">[</span>String<span style="color:#f92672">]</span> <span style="color:#f92672">=</span> List<span style="color:#f92672">(</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">&#34;&#34;&#34;_____________________________________________________ </span></span></span><span style="display:flex;"><span><span style="color:#e6db74">| tracked | lineNumber| method| file | </span></span></span><span style="display:flex;"><span><span style="color:#e6db74">|====================================================| </span></span></span><span style="display:flex;"><span><span style="color:#e6db74">| source | 5 | foo | main.c | </span></span></span><span style="display:flex;"><span><span style="color:#e6db74">| &lt;operator&gt;.assignment | 5 | foo | main.c | </span></span></span><span style="display:flex;"><span><span style="color:#e6db74">| &lt;operator&gt;.lessThan | 6 | foo | main.c | </span></span></span><span style="display:flex;"><span><span style="color:#e6db74">| &lt;operator&gt;.shiftLeft | 7 | foo | main.c | </span></span></span><span style="display:flex;"><span><span style="color:#e6db74">| &lt;operator&gt;.shiftLeft | 7 | foo | main.c | </span></span></span><span style="display:flex;"><span><span style="color:#e6db74">| &lt;operator&gt;.assignment | 7 | foo | main.c | </span></span></span><span style="display:flex;"><span><span style="color:#e6db74">| sink | 8 | foo | main.c | </span></span></span><span style="display:flex;"><span><span style="color:#e6db74">&#34;&#34;&#34;</span> </span></span><span style="display:flex;"><span><span style="color:#f92672">)</span> </span></span></code></pre></div><p>Which indeed exists!</p> <h2 id="conclusion">Conclusion</h2> <p>To conclude, let us outline some of the advantages and constraints implied by LLVM Bitcode:</p> <ul> <li>the &ldquo;surface&rdquo; of the LLVM language is smaller than that of C and C++</li> <li>many high-level details do not exist at the IR level</li> <li>the program must be compiled, thus limiting the range of programs that one can analyze with Joern</li> </ul> <p><a href="https://docs.joern.io/llvm2cpg/hello-llvm">Here</a> you can find more tutorials and information.</p> <p>If you get any questions, feel free to ping <a href="https://twitter.com/fabsx00">Fabs</a> or <a href="https://twitter.com/1101_debian">Alex</a> on Twitter, or better come over to the <a href="https://gitter.im/joern-code-analyzer/community">Joern chat</a>.</p> Exploring LLVM Bitcode interactively https://lowlevelbits.org/exploring-llvm-bitcode-interactively/ Fri, 28 Feb 2020 [email protected] (Alex Denisov) https://lowlevelbits.org/exploring-llvm-bitcode-interactively/ <p>While working on <a href="https://ocular.shiftleft.io">a tool for software analysis</a>, I find myself looking into the bitcode quiet often. It works OK when there is one small file, but it&rsquo;s incredibly annoying when it comes to real-world projects which have tens and hundreds of files.</p> <p>To simplify my life, I built a tool that converts LLVM Bitcode into the GraphML format: <a href="https://github.com/ShiftLeftSecurity/llvm2graphml">llvm2graphml</a>.</p> <h2 id="what-is-graphml">What is GraphML</h2> <p>GraphML is an XML-based file format for storing graphs. The beautiful part is that it supported by many tools: you can use Neo4J, Cassandra, or TinkerPop to mine data or things like yEd or Gephi to visualize it.</p> <p>My use-case is graph databases.</p> <h2 id="what-is-graph-database">What is Graph Database</h2> <p>To understand what a graph database is to think of SQLite but for property graphs. And a property graph is simply a graph where each vertex (or node) and edge may have several key-value properties.</p> <p>The classical example: there is a number of people in the graph and they have some relationship, e.g.: &lsquo;Alice -&gt; knows -&gt; Bob&rsquo;, &lsquo;Bob -&gt; friends-with -&gt; Eve&rsquo;, etc. In this case, we can model a query like &ldquo;Find friends of people whom Alice knows&rdquo; in the form of a query language:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-groovy" data-lang="groovy"><span style="display:flex;"><span>graph<span style="color:#f92672">.</span><span style="color:#a6e22e">vertex</span><span style="color:#f92672">(</span><span style="color:#e6db74">&#39;person&#39;</span><span style="color:#f92672">).</span><span style="color:#a6e22e">has</span><span style="color:#f92672">(</span><span style="color:#e6db74">&#39;name&#39;</span><span style="color:#f92672">,</span> <span style="color:#e6db74">&#39;Alice&#39;</span><span style="color:#f92672">).</span><span style="color:#a6e22e">edge</span><span style="color:#f92672">(</span><span style="color:#e6db74">&#39;knows&#39;</span><span style="color:#f92672">).</span><span style="color:#a6e22e">edge</span><span style="color:#f92672">(</span><span style="color:#e6db74">&#39;friends-with&#39;</span><span style="color:#f92672">)</span> </span></span></code></pre></div><p>Each step narrows down the search space:</p> <ul> <li>from a graph get all the vertices labeled &lsquo;person&rsquo;</li> <li>among those select the ones that have the property &rsquo;name&rsquo; with the value &lsquo;Alice&rsquo;</li> <li>from the vertices select nodes through edges labeled &lsquo;knows&rsquo;</li> <li>and from what&rsquo;s left pick all the nodes reachable through the edges labeled &lsquo;friends-with&rsquo;</li> </ul> <p><em>Note: this is an imaginary, simplified query language, but you&rsquo;ve got the idea.</em></p> <h2 id="llvm2graphml">llvm2graphml</h2> <p>Let me walk you through an example of how to use <code>llvm2graphml</code>. To follow along you need to install <code>llvm2graphml</code> itself (<a href="https://github.com/ShiftLeftSecurity/llvm2graphml/releases">prebuilt packages</a> available for macOS and Ubuntu) and <a href="https://www.apache.org/dyn/closer.lua/tinkerpop/3.4.6/apache-tinkerpop-gremlin-console-3.4.6-bin.zip">Gremlin Console</a> from <a href="http://tinkerpop.apache.org">Apache TinkerPop</a> project.</p> <p>There are essentially three steps:</p> <ol> <li>Create <code>main.ll</code> file with the following content:</li> </ol> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-ll" data-lang="ll"><span style="display:flex;"><span><span style="color:#75715e">; main.ll </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">define</span> <span style="color:#66d9ef">i32</span> @increment(<span style="color:#66d9ef">i32</span> %x) { </span></span><span style="display:flex;"><span> %result = <span style="color:#66d9ef">add</span> <span style="color:#66d9ef">i32</span> %x, <span style="color:#ae81ff">1</span> </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">ret</span> <span style="color:#66d9ef">i32</span> %result </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><p>2. Run <code>llvm2graphml</code> to emit the GraphML file:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span>&gt; llvm2graphml --output-dir<span style="color:#f92672">=</span>/tmp main.ll </span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>info<span style="color:#f92672">]</span> More details: /tmp/llvm2graphml-38dfea.log </span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>info<span style="color:#f92672">]</span> Loading main.ll </span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>info<span style="color:#f92672">]</span> Saved result into /tmp/llvm.graphml.xml </span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>info<span style="color:#f92672">]</span> Shutting down </span></span></code></pre></div><p>3. Create the database from the GraphML file</p> <p>Start console:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span>&gt; gremlin-console/bin/gremlin.sh </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span> <span style="color:#ae81ff">\,</span>,,/ </span></span><span style="display:flex;"><span> <span style="color:#f92672">(</span>o o<span style="color:#f92672">)</span> </span></span><span style="display:flex;"><span>-----oOOo-<span style="color:#f92672">(</span>3<span style="color:#f92672">)</span>-oOOo----- </span></span><span style="display:flex;"><span>plugin activated: tinkerpop.server </span></span><span style="display:flex;"><span>plugin activated: tinkerpop.utilities </span></span><span style="display:flex;"><span>plugin activated: tinkerpop.tinkergraph </span></span><span style="display:flex;"><span>gremlin&gt; </span></span></code></pre></div><p>Create the database:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-groovy" data-lang="groovy"><span style="display:flex;"><span>gremlin<span style="color:#f92672">&gt;</span> graph <span style="color:#f92672">=</span> TinkerGraph<span style="color:#f92672">.</span><span style="color:#a6e22e">open</span><span style="color:#f92672">()</span> </span></span><span style="display:flex;"><span>gremlin<span style="color:#f92672">&gt;</span> g <span style="color:#f92672">=</span> graph<span style="color:#f92672">.</span><span style="color:#a6e22e">traversal</span><span style="color:#f92672">()</span> </span></span><span style="display:flex;"><span>gremlin<span style="color:#f92672">&gt;</span> g<span style="color:#f92672">.</span><span style="color:#a6e22e">io</span><span style="color:#f92672">(</span><span style="color:#e6db74">&#34;/tmp/llvm.graphml.xml&#34;</span><span style="color:#f92672">).</span><span style="color:#a6e22e">read</span><span style="color:#f92672">()</span> </span></span><span style="display:flex;"><span>gremlin<span style="color:#f92672">&gt;</span> g </span></span><span style="display:flex;"><span><span style="color:#f92672">==&gt;</span>graphtraversalsource<span style="color:#f92672">[</span>tinkergraph<span style="color:#f92672">[</span>vertices:<span style="color:#ae81ff">12</span> edges:<span style="color:#ae81ff">27</span><span style="color:#f92672">],</span> standard<span style="color:#f92672">]</span> </span></span></code></pre></div><p>Now go and run some queries!</p> <h2 id="example-queries">Example queries</h2> <p>List all modules:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-groovy" data-lang="groovy"><span style="display:flex;"><span>gremlin<span style="color:#f92672">&gt;</span> g<span style="color:#f92672">.</span><span style="color:#a6e22e">V</span><span style="color:#f92672">().</span><span style="color:#a6e22e">hasLabel</span><span style="color:#f92672">(</span><span style="color:#e6db74">&#39;module&#39;</span><span style="color:#f92672">).</span><span style="color:#a6e22e">valueMap</span><span style="color:#f92672">().</span><span style="color:#a6e22e">unfold</span><span style="color:#f92672">()</span> </span></span><span style="display:flex;"><span><span style="color:#f92672">==&gt;</span>moduleIdentifier<span style="color:#f92672">=[</span>main<span style="color:#f92672">.</span><span style="color:#a6e22e">ll</span><span style="color:#f92672">]</span> </span></span></code></pre></div><p>List all functions:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-groovy" data-lang="groovy"><span style="display:flex;"><span>gremlin<span style="color:#f92672">&gt;</span> g<span style="color:#f92672">.</span><span style="color:#a6e22e">V</span><span style="color:#f92672">().</span><span style="color:#a6e22e">hasLabel</span><span style="color:#f92672">(</span><span style="color:#e6db74">&#39;function&#39;</span><span style="color:#f92672">).</span><span style="color:#a6e22e">valueMap</span><span style="color:#f92672">().</span><span style="color:#a6e22e">unfold</span><span style="color:#f92672">()</span> </span></span><span style="display:flex;"><span><span style="color:#f92672">==&gt;</span>argSize<span style="color:#f92672">=[</span><span style="color:#ae81ff">1</span><span style="color:#f92672">]</span> </span></span><span style="display:flex;"><span><span style="color:#f92672">==&gt;</span>basicBlockCount<span style="color:#f92672">=[</span><span style="color:#ae81ff">1</span><span style="color:#f92672">]</span> </span></span><span style="display:flex;"><span><span style="color:#f92672">==&gt;</span>name<span style="color:#f92672">=[</span>increment<span style="color:#f92672">]</span> </span></span><span style="display:flex;"><span><span style="color:#f92672">==&gt;</span>isDeclaration<span style="color:#f92672">=[</span><span style="color:#66d9ef">false</span><span style="color:#f92672">]</span> </span></span><span style="display:flex;"><span><span style="color:#f92672">==&gt;</span>isVarArg<span style="color:#f92672">=[</span><span style="color:#66d9ef">false</span><span style="color:#f92672">]</span> </span></span><span style="display:flex;"><span><span style="color:#f92672">==&gt;</span>isIntrinsic<span style="color:#f92672">=[</span><span style="color:#66d9ef">false</span><span style="color:#f92672">]</span> </span></span><span style="display:flex;"><span><span style="color:#f92672">==&gt;</span>numOperands<span style="color:#f92672">=[</span><span style="color:#ae81ff">0</span><span style="color:#f92672">]</span> </span></span><span style="display:flex;"><span><span style="color:#f92672">==&gt;</span>instructionCount<span style="color:#f92672">=[</span><span style="color:#ae81ff">2</span><span style="color:#f92672">]</span> </span></span></code></pre></div><p>Count all the instructions:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-groovy" data-lang="groovy"><span style="display:flex;"><span>gremlin<span style="color:#f92672">&gt;</span> g<span style="color:#f92672">.</span><span style="color:#a6e22e">V</span><span style="color:#f92672">().</span><span style="color:#a6e22e">hasLabel</span><span style="color:#f92672">(</span><span style="color:#e6db74">&#39;instruction&#39;</span><span style="color:#f92672">).</span><span style="color:#a6e22e">groupCount</span><span style="color:#f92672">().</span><span style="color:#a6e22e">by</span><span style="color:#f92672">(</span><span style="color:#e6db74">&#39;opcode&#39;</span><span style="color:#f92672">).</span><span style="color:#a6e22e">unfold</span><span style="color:#f92672">()</span> </span></span><span style="display:flex;"><span><span style="color:#f92672">==&gt;</span>ret<span style="color:#f92672">=</span><span style="color:#ae81ff">1</span> </span></span><span style="display:flex;"><span><span style="color:#f92672">==&gt;</span>add<span style="color:#f92672">=</span><span style="color:#ae81ff">1</span> </span></span></code></pre></div><p>Explore the types:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-groovy" data-lang="groovy"><span style="display:flex;"><span>gremlin<span style="color:#f92672">&gt;</span> g<span style="color:#f92672">.</span><span style="color:#a6e22e">V</span><span style="color:#f92672">().</span><span style="color:#a6e22e">hasLabel</span><span style="color:#f92672">(</span><span style="color:#e6db74">&#39;type&#39;</span><span style="color:#f92672">).</span><span style="color:#a6e22e">valueMap</span><span style="color:#f92672">(</span><span style="color:#e6db74">&#39;typeID&#39;</span><span style="color:#f92672">).</span><span style="color:#a6e22e">unfold</span><span style="color:#f92672">()</span> </span></span><span style="display:flex;"><span><span style="color:#f92672">==&gt;</span>typeID<span style="color:#f92672">=[</span>label<span style="color:#f92672">]</span> </span></span><span style="display:flex;"><span><span style="color:#f92672">==&gt;</span>typeID<span style="color:#f92672">=[</span>pointer<span style="color:#f92672">]</span> </span></span><span style="display:flex;"><span><span style="color:#f92672">==&gt;</span>typeID<span style="color:#f92672">=[</span>function<span style="color:#f92672">]</span> </span></span><span style="display:flex;"><span><span style="color:#f92672">==&gt;</span>typeID<span style="color:#f92672">=[</span>integer<span style="color:#f92672">]</span> </span></span><span style="display:flex;"><span><span style="color:#f92672">==&gt;</span>typeID<span style="color:#f92672">=[</span><span style="color:#66d9ef">void</span><span style="color:#f92672">]</span> </span></span></code></pre></div><p>Find a function with an argument called &lsquo;x&rsquo;:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-groovy" data-lang="groovy"><span style="display:flex;"><span>gremlin<span style="color:#f92672">&gt;</span> g<span style="color:#f92672">.</span><span style="color:#a6e22e">V</span><span style="color:#f92672">().</span><span style="color:#a6e22e">has</span><span style="color:#f92672">(</span><span style="color:#e6db74">&#39;argument&#39;</span><span style="color:#f92672">,</span> <span style="color:#e6db74">&#39;name&#39;</span><span style="color:#f92672">,</span> <span style="color:#e6db74">&#39;x&#39;</span><span style="color:#f92672">).</span><span style="color:#a6e22e">out</span><span style="color:#f92672">(</span><span style="color:#e6db74">&#39;function&#39;</span><span style="color:#f92672">).</span><span style="color:#a6e22e">valueMap</span><span style="color:#f92672">(</span><span style="color:#e6db74">&#39;name&#39;</span><span style="color:#f92672">)</span> </span></span><span style="display:flex;"><span><span style="color:#f92672">==&gt;[</span>name:<span style="color:#f92672">[</span>increment<span style="color:#f92672">]]</span> </span></span></code></pre></div><p>Et cetera, et cetera, et cetera&hellip;</p> <h2 id="some-numbers">Some numbers</h2> <p>These are just some numbers mined from the <code>libLLVMCore.a</code>.</p> <h4 id="how-many">How many</h4> <table class="table"> <tr><td>Number of functions</td> <td>71 019</td></tr> <tr><td>Number of basic blocks</td> <td>172 621</td></tr> <tr><td>Number of instructions</td> <td>1 212 322</td></tr> <tr><td>Number of types</td> <td>122 220</td></tr> </table> <h4 id="top-10-instructions">Top 10 instructions:</h4> <table class="table"> <tr><td>call</td> <td>290 495</td></tr> <tr><td>load</td> <td>214 769</td></tr> <tr><td>store</td> <td>167 640</td></tr> <tr><td>alloca</td> <td>154 922</td></tr> <tr><td>br</td> <td>96 848</td></tr> <tr><td>getelementptr</td> <td>78 622</td></tr> <tr><td>ret</td> <td>67 729</td></tr> <tr><td>bitcast</td> <td>62 760</td></tr> <tr><td>icmp</td> <td>20 624</td></tr> <tr><td>phi</td> <td>9 716</td></tr> </table> <h4 id="top-10-biggest-functions">Top 10 biggest functions:</h4> <table class="table"> <tr><td>llvm::UpgradeIntrinsicCall(llvm::CallInst*, llvm::Function*)</td> <td>14033</td></tr> <tr><td>llvm::Intrinsic::getAttributes(llvm::LLVMContext&, unsigned int)</td> <td>8420</td></tr> <tr><td>ShouldUpgradeX86Intrinsic(llvm::Function*, llvm::StringRef)</td> <td>3635</td></tr> <tr><td>llvm::LLVMContextImpl::~LLVMContextImpl()</td> <td>2181</td></tr> <tr><td>UpgradeIntrinsicFunction1(llvm::Function*, llvm::Function*&)</td> <td>2006</td></tr> <tr><td>(anonymous namespace)::Verifier::visitIntrinsicCall(unsigned int, llvm::CallBase&)</td> <td>1887</td></tr> <tr><td>(anonymous namespace)::AssemblyWriter::printInstruction(llvm::Instruction const&)</td> <td>1869</td></tr> <tr><td>llvm::ConstantFoldBinaryInstruction(unsigned int, llvm::Constant*, llvm::Constant*)</td> <td>1244</td></tr> <tr><td>upgradeAVX512MaskToSelect(llvm::StringRef, llvm::IRBuilder<llvm::ConstantFolder, llvm::IRBuilderDefaultInserter>&, llvm::CallInst&, llvm::Value*&)</td> <td>1073</td></tr> <tr><td>llvm::ConstantFoldGetElementPtr(llvm::Type*, llvm::Constant*, bool, llvm::Optional<unsigned int>, llvm::ArrayRef<llvm::Value*>)</td> <td>1055</td></tr> </table> <h2 id="resources">Resources</h2> <p>Here are some links if you want to learn more about Gremlin Queries and what&rsquo;s possible:</p> <ul> <li><a href="http://tinkerpop.apache.org/docs/3.4.6/tutorials/getting-started/">Getting Started with TinkerPop</a></li> <li><a href="http://tinkerpop.apache.org/docs/3.4.6/reference/#graph-traversal-steps">Available Graph Traversals</a></li> </ul> <h2 id="next-steps">Next steps</h2> <p>Currently, the project is in its very early days, and many features are missing, to name a few: specific properties on instructions and values, def-use chains and other connections, complex constants (such as vectors of structs), and many more.</p> <p>With that being said - <a href="https://github.com/ShiftLeftSecurity/llvm2graphml">contributions are welcome</a>!</p> Type Equality in LLVM https://lowlevelbits.org/type-equality-in-llvm/ Tue, 28 Jan 2020 [email protected] (Alex Denisov) https://lowlevelbits.org/type-equality-in-llvm/ <p>Some months ago, I joined <a href="https://www.shiftleft.io">ShiftLeft Security</a> to work on the LLVM support for the custom code analysis platform <a href="https://www.shiftleft.io/ocular.html">Ocular</a>. During these months, we have faced and overcome several challenges.</p> <p>Here I want to share one of them: Type Equality in LLVM.</p> <h2 id="intro">Intro</h2> <p>LLVM&rsquo;s type system is a complicated topic. It attempts to solve problems that are not so obvious when you look at them from a high-level. Recently, I had a chance to dive deeper into the subject and discovered that while the current implementation makes some things more straightforward, some other parts are counter-intuitive and may not meet your expectations.</p> <p>In this article, I want to describe some limitations of the LLVM type system and share how we solved one particular problem: detecting equivalent types in LLVM. The article is organized as follows: I start with the recap of the LLVM type system, followed by the problem statement, then describe how we attempted to solve the issue using existing LLVM features, and finally conclude with the solution we came up with.</p> <h2 id="llvm-type-system-recap">LLVM Type System recap</h2> <p>It is highly recommended to read this post from Chris Lattner explaining some of the considerations that were taken into account when the type system was revised around LLVM 3.0: <a href="http://blog.llvm.org/2011/11/llvm-30-type-system-rewrite.html">LLVM 3.0 Type System Rewrite</a>.</p> <p>Just a few random words on the current type system (if you didn&rsquo;t read the linked article):</p> <ul> <li>types belong to an <code>LLVMContext</code></li> <li>instances of each type allocated on the heap (e.g., <code>llvm::Type *type = new llvm::Type;</code>)</li> <li>type comparison is done via pointer comparison</li> <li>types in LLVM go into three groups: primitive types (integers, floats, etc.), derived types (structs, arrays, pointers, etc.), forward-declared types (opaque structs)</li> </ul> <h2 id="problem-statement">Problem Statement</h2> <p>Consider the following example:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#75715e">// Point.h </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">struct</span> Point { </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">int</span> x; </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">int</span> y; </span></span><span style="display:flex;"><span>}; </span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#75715e">// foo.c </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#75715e">#include</span> <span style="color:#75715e">&#34;Point.h&#34;</span><span style="color:#75715e"> </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> </span></span><span style="display:flex;"><span><span style="color:#75715e">// use struct Point </span></span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#75715e">// bar.c </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#75715e">#include</span> <span style="color:#75715e">&#34;Point.h&#34;</span><span style="color:#75715e"> </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> </span></span><span style="display:flex;"><span><span style="color:#75715e">// use struct Point </span></span></span></code></pre></div><p>When <code>foo.c</code> and <code>bar.c</code> compiled down to the LLVM IR (<code>foo.ll</code> and <code>bar.ll</code>) they both have the <code>struct Point</code> defined as follows:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-llvm" data-lang="llvm"><span style="display:flex;"><span>%struct.Point = <span style="color:#66d9ef">type</span> { <span style="color:#66d9ef">i32</span>, <span style="color:#66d9ef">i32</span> } </span></span></code></pre></div><p>Though, when both IR files loaded in one context, the type names changed to prevent name collisions, so they end up being defined as</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-llvm" data-lang="llvm"><span style="display:flex;"><span>%struct.Point = <span style="color:#66d9ef">type</span> { <span style="color:#66d9ef">i32</span>, <span style="color:#66d9ef">i32</span> } </span></span><span style="display:flex;"><span>%struct.Point.0 = <span style="color:#66d9ef">type</span> { <span style="color:#66d9ef">i32</span>, <span style="color:#66d9ef">i32</span> } </span></span></code></pre></div><p>We want to deduplicate such types.</p> <h2 id="our-failed-attempts">Our (failed) attempts</h2> <p>We made several attempts to solve the problem using simple heuristics and built-in LLVM features.</p> <p>It went wrong in many ways.</p> <h3 id="types-with-the-same-name-are-the-same-type-false">&lsquo;Types with the same name are the same type&rsquo; (false)</h3> <p>This is a very simple heuristic:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-llvm" data-lang="llvm"><span style="display:flex;"><span>%struct.Point = <span style="color:#66d9ef">type</span> { <span style="color:#66d9ef">i32</span>, <span style="color:#66d9ef">i32</span> } </span></span><span style="display:flex;"><span>%struct.Point.0 = <span style="color:#66d9ef">type</span> { <span style="color:#66d9ef">i32</span>, <span style="color:#66d9ef">i32</span> } </span></span></code></pre></div><p>If we strip the numeric suffix that is added by LLVM, then the types have the same name, and therefore they are the same. This is a good idea, but it does not work. This is a perfectly valid LLVM bitcode:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-llvm" data-lang="llvm"><span style="display:flex;"><span>%struct.Point = <span style="color:#66d9ef">type</span> { <span style="color:#66d9ef">i32</span>, <span style="color:#66d9ef">i32</span> } </span></span><span style="display:flex;"><span>%struct.Point.0 = <span style="color:#66d9ef">type</span> { <span style="color:#66d9ef">float</span>, <span style="color:#66d9ef">float</span>, <span style="color:#66d9ef">float</span> } </span></span></code></pre></div><p>for which our heuristic does not apply.</p> <h3 id="primitive-types-equality">Primitive Types Equality</h3> <p>In LLVM, types belong to the <code>LLVMContext</code>. Primitive types such as <code>int32</code>, <code>float</code>, or <code>double</code> pre-allocated and then reused. In the context of <code>LLVMContext</code> (pun intended), you can only create one instance of a primitive type. With this solution, it is easy to check if types are the same - simply compare the pointers.</p> <p>However, this solution cannot work if you want to compare types from different contexts. According to LLVM, <code>int32</code> from one <code>LLVMContext</code> differs from <code>int32</code> from another <code>LLVMContext</code>, even though they are the same type according to intuition.</p> <h3 id="struct-types-equality">Struct Types Equality</h3> <p>This situation gets even more complicated when it comes to identified (named) structs.</p> <p>Consider the same example I gave initially.</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#75715e">// Point.h </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">struct</span> Point { </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">int</span> x; </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">int</span> y; </span></span><span style="display:flex;"><span>}; </span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#75715e">// foo.c </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#75715e">#include</span> <span style="color:#75715e">&#34;Point.h&#34;</span><span style="color:#75715e"> </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> </span></span><span style="display:flex;"><span><span style="color:#75715e">// use struct Point </span></span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#75715e">// bar.c </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#75715e">#include</span> <span style="color:#75715e">&#34;Point.h&#34;</span><span style="color:#75715e"> </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> </span></span><span style="display:flex;"><span><span style="color:#75715e">// use struct Point </span></span></span></code></pre></div><p>So far so good, but as mentioned previously, LLVM keeps both types and renames one of them to prevent name collisions:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-ll" data-lang="ll"><span style="display:flex;"><span>%struct.Point = <span style="color:#66d9ef">type</span> { <span style="color:#66d9ef">i32</span>, <span style="color:#66d9ef">i32</span> } </span></span><span style="display:flex;"><span>%struct.Point.0 = <span style="color:#66d9ef">type</span> { <span style="color:#66d9ef">i32</span>, <span style="color:#66d9ef">i32</span> } </span></span></code></pre></div><p>Even though these are the same types from a user perspective, they are very different from the LLVM&rsquo;s point of view. Therefore, we cannot use pointer comparison: the types are distinct and point to different memory regions. In this case, the best we can do is to compare the layout of the types and consider them equal if the layouts are identical.</p> <p>The good part is that LLVM has a function for that: <a href="https://llvm.org/doxygen/classllvm_1_1StructType.html#ab45c5514ecd4390e8702c69b19705742">llvm::StructType::isLayoutIdentical</a>.</p> <p>The bad part is that this function is broken. Consider the following example:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-ll" data-lang="ll"><span style="display:flex;"><span>%struct.Point = <span style="color:#66d9ef">type</span> { <span style="color:#66d9ef">i32</span>, <span style="color:#66d9ef">i32</span> } </span></span><span style="display:flex;"><span>%struct.Point.0 = <span style="color:#66d9ef">type</span> { <span style="color:#66d9ef">i32</span>, <span style="color:#66d9ef">i32</span> } </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span>%struct.wrapper = <span style="color:#66d9ef">type</span> { %struct.Point } </span></span><span style="display:flex;"><span>%struct.wrapper.0 = <span style="color:#66d9ef">type</span> { %struct.Point.0 } </span></span></code></pre></div><p>According to LLVM, the layouts of <code>struct.Point</code> and <code>struct.Point.0</code> are identical, while the layouts of <code>struct.wrapper</code> and <code>struct.wrapper.0</code> are not: <code>isLayoutIdentical</code> returns <code>true</code> only when all the type elements of the struct are equal. And this equality is checked via pointer comparison.</p> <h3 id="irlinkerllvm-link"><code>IRLinker</code>/<code>llvm-link</code></h3> <p>LLVM has a class that merges two modules into one: <code>IRLinker</code>. LLVM also comes with a CLI tool <code>llvm-link</code>, which does the same. The <code>IRLinker</code> works fine, but far away from being good: it drops important information.</p> <p>The following IR after running through <code>IRLinker</code></p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-llvm" data-lang="llvm"><span style="display:flex;"><span>%struct.Point = <span style="color:#66d9ef">type</span> { <span style="color:#66d9ef">i32</span>, <span style="color:#66d9ef">i32</span> } </span></span><span style="display:flex;"><span>%struct.Tuple = <span style="color:#66d9ef">type</span> { <span style="color:#66d9ef">i32</span>, <span style="color:#66d9ef">i32</span> } </span></span></code></pre></div><p>becomes</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-llvm" data-lang="llvm"><span style="display:flex;"><span>%struct.Point = <span style="color:#66d9ef">type</span> { <span style="color:#66d9ef">i32</span>, <span style="color:#66d9ef">i32</span> } </span></span></code></pre></div><p>dropping the other struct since both have the same layout. We don&rsquo;t want to lose this information.</p> <p>Moreover, <code>IRLinker</code> does another kind of magic that may introduce types that never existed at the source code level. This is what I&rsquo;ve seen after running <code>llvm-link</code> on the XNU kernel bitcode:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-llvm" data-lang="llvm"><span style="display:flex;"><span>%struct.tree_desc_s = <span style="color:#66d9ef">type</span> { </span></span><span style="display:flex;"><span> %struct.ct_data_s*, </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">i32</span>, </span></span><span style="display:flex;"><span> %struct.mach_msg_body_t* </span></span><span style="display:flex;"><span>} </span></span><span style="display:flex;"><span>%struct.tree_desc_s.79312 = <span style="color:#66d9ef">type</span> { </span></span><span style="display:flex;"><span> %struct.ct_data_s*, </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">i32</span>, </span></span><span style="display:flex;"><span> %struct.static_tree_desc_s* </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><p>Notice the different types of the third element: <code>struct.mach_msg_body_t*</code> vs <code>struct.static_tree_desc_s</code>, even though there is only one definition of <code>tree_desc_s</code> at the source code level:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#66d9ef">struct</span> tree_desc_s { </span></span><span style="display:flex;"><span> ct_data <span style="color:#f92672">*</span>dyn_tree; </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">int</span> max_code; </span></span><span style="display:flex;"><span> static_tree_desc <span style="color:#f92672">*</span>stat_desc; </span></span><span style="display:flex;"><span>}; </span></span></code></pre></div><p>So the <code>IRLinker</code> did something odd, at which point I gave up all the attempts to understand how it works and what it does.</p> <h2 id="our-solution-to-this-problem">Our solution to this problem</h2> <p>I could not find any other solution to the problem, so we decided to roll out our own.</p> <h3 id="a-bit-of-background">A bit of background</h3> <p>Our implementation is inspired by <a href="https://en.wikipedia.org/wiki/Tree_automaton">Tree Automata</a> and <a href="https://en.wikipedia.org/wiki/Ranked_alphabet">Ranked Alphabets</a>.</p> <p>Here is a short description: a ranked alphabet consists of a finite set of symbols <code>F</code>, and a function <code>Arity(f)</code>, where <code>f</code> belongs to the set <code>F</code>. The <code>Arity</code> tells how many arguments a symbol <code>f</code> has. Symbols can be constant, unary, binary, ternary, or n-ary.</p> <p>Here is an example of the notation: <code>a</code>, <code>b</code>, <code>f(,)</code>, <code>g()</code>, <code>h(,,,,)</code>. <code>a</code> and <code>b</code> are constants, <code>f(,)</code> is binary, <code>g()</code> is unary, and <code>h(,,,,)</code> is n-ary. The arity of each symbol is 0, 0, 2, 1, and 5, respectively.</p> <p>Given the alphabet <code>a</code>, <code>b</code>, <code>f(,)</code>, <code>g()</code> we can construct a number of trees:</p> <ul> <li>f(a, b)</li> <li>g(b)</li> <li>g(f(b, b))</li> <li>f(g(a), f(f(a, a), b))</li> <li>f(g(a), g(f(a, a)))</li> </ul> <p>etc.</p> <p>If we know the arity of each symbol, then we can omit parentheses and commas and write the tree as a string. The tree is constructed in the depth-first order, here are the same examples as above, but in the string notation:</p> <ul> <li>fab</li> <li>gb</li> <li>gfbb</li> <li>fgaffaab</li> <li>fgagfaa</li> </ul> <p>Here is a more comprehensive example:</p> <p><img src="https://lowlevelbits.org/img/llvm-type-equality/tree-automata.png" style=" display: block; margin-left: auto; margin-right: auto; width: 100%; height: auto;" /></p> <p>The arrows show the depth-first order.</p> <p>We can map our type equivalence problem on the ranked alphabet/tree automaton concepts.</p> <h3 id="type-equality">Type Equality</h3> <p>We consider each type to be a symbol, and its arity is the number of properties we want to compare. Then, we build a tree of the type and convert it to the string representation. If two types have the same string representation, then they are equal.</p> <p>Some examples:</p> <ul> <li><code>i32</code>, <code>i64</code>, <code>i156</code>: symbol <code>I</code>, arity is 1 since we only care about bitwidth (e.g., 32, 64, 156)</li> <li><code>float</code>: symbol <code>F</code>, arity is 0, all <code>float</code> types are the same</li> <li><code>[16 x i32]</code>: symbol <code>A</code>, arity is 2, we care only about the length of the array and its element type</li> <li><code>i8*</code>: symbol <code>P</code>, arity is 1, we care only about the pointee type</li> <li><code>{ i32, [16 x i8], i8* }</code>: symbol <code>S</code>, arity is number of elements + 2. We want to store the struct ID and number of its elements.</li> </ul> <p>If we care about more or fewer values, then we can simply change the arity for a given symbol. Examples of types represented as a tree:</p> <ul> <li><code>i32</code> -&gt; <code>I(32)</code> -&gt; <code>I32</code></li> <li><code>i177</code> -&gt; <code>I(177)</code> -&gt; <code>I177</code></li> <li><code>[16 x i8*]</code> -&gt; <code>A(16, P(I(8)))</code> -&gt; <code>A16PI8</code></li> <li><code>{ i32, i8*, float }</code> -&gt; <code>S(3, S0, I(32), P(I(8)), F)</code> -&gt; <code>S3S0I32PI8F</code></li> </ul> <p><em>Note: the values in <code>S</code> are the number of elements (3), struct ID (<code>S0</code>), and all its contained types defined recursively.</em></p> <p>Same types, but represented graphically:</p> <p><img src="https://lowlevelbits.org/img/llvm-type-equality/tree-automata-types.png" style=" display: block; margin-left: auto; margin-right: auto; width: 100%; height: auto;" /></p> <h3 id="structural-equality">Structural Equality</h3> <p>Above, I mentioned the <code>struct ID</code>. We need it to define the structural equality for recursive types. Consider the following example:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-llvm" data-lang="llvm"><span style="display:flex;"><span>%list = <span style="color:#66d9ef">type</span> { %list*, <span style="color:#66d9ef">i32</span> } </span></span><span style="display:flex;"><span>%node = <span style="color:#66d9ef">type</span> { %node*, <span style="color:#66d9ef">i32</span> } </span></span><span style="display:flex;"><span>%root = <span style="color:#66d9ef">type</span> { %node*, <span style="color:#66d9ef">i32</span> } </span></span></code></pre></div><p>All of the above structs have the same layout: a pointer + an integer. But we do not consider them all to be equal. By our definition of equality the following holds:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span>list <span style="color:#f92672">==</span> node </span></span><span style="display:flex;"><span>root <span style="color:#f92672">!=</span> node </span></span><span style="display:flex;"><span>root <span style="color:#f92672">!=</span> list </span></span></code></pre></div><p>The reasoning is simple: the <code>list</code> and <code>node</code> has the same layout and the same structure (recursive), while <code>root</code> has another structure.</p> <p>Here is a graphical representation to highlight the idea. If we discard the struct titles, then it&rsquo;s clear the first two are equal while the third one is distinct.</p> <p><img src="https://lowlevelbits.org/img/llvm-type-equality/recursive-structs.png" style=" display: block; margin-left: auto; margin-right: auto; width: 100%; height: auto;" /></p> <p>To take the structure into account and to make the equality hold, we do not use the names of the structures, but before building the tree, we assign them symbolic names or IDs. So both the <code>list</code> and <code>node</code> encoded as the following: <code>S(2, S0, P(S(2, S0, x, x), I(32))</code> where <code>S0</code> is the struct ID. To terminate the recursion we do not re-emit types for the structure that has already been emitted, but we do emit symbols <code>x</code> instead (otherwise we won&rsquo;t respect the arity of the struct).</p> <p>The <code>root</code> is defined as follows <code>S(2, S0, P(S(2, S1, P(S(2, S1, x, x), I(32), I(32))), I(32))</code> please note the nestedness and <code>S0</code> and <code>S1</code> struct IDs.</p> <p>Given these two encodings, the comparison above holds.</p> <h3 id="opaque-struct-equality">Opaque Struct Equality</h3> <p>Comparing opaque structs is as easy as the comparison of infinities. It&rsquo;s totally up to us how we define this property.</p> <p>The right and sound approach is to say that the opaque struct equals only to itself, but we need to do better than this.</p> <p>For opaque structs, we also use symbolic names. But different opaque structs get the same symbolic name as soon as they have the same canonical name.</p> <p>Example:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-llvm" data-lang="llvm"><span style="display:flex;"><span>%struct.A = <span style="color:#66d9ef">type</span> <span style="color:#66d9ef">opaque</span> </span></span><span style="display:flex;"><span>%struct.A.0 = <span style="color:#66d9ef">type</span> <span style="color:#66d9ef">opaque</span> </span></span><span style="display:flex;"><span>%struct.B = <span style="color:#66d9ef">type</span> <span style="color:#66d9ef">opaque</span> </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span>%foo = <span style="color:#66d9ef">type</span> { %struct.A* } </span></span><span style="display:flex;"><span>%bar = <span style="color:#66d9ef">type</span> { %struct.A.0* } </span></span><span style="display:flex;"><span>%buzz = <span style="color:#66d9ef">type</span> { %struct.B* } </span></span></code></pre></div><p>Here, the canonical names for the opaque structs are <code>A</code> (<code>%struct.A</code>, <code>%struct.A.0</code>) and <code>B</code> (<code>%struct.B</code>). Therefore, we treat the <code>%struct.A</code> and <code>%struct.A.0</code> as equal, while <code>%struct.B</code> is not equal to the either of <code>A</code>s. Even though all of the 3 structs can point to the same type or completely different types.</p> <h3 id="letters-symbols-and-ids">Letters, symbols, and IDs</h3> <p>While IMO, letters and symbols are easier to work with for a human being, I implemented all the encodings as vectors of numbers. It is then easy to get a hash of such vector and add some memoization for better performance, even though I didn&rsquo;t spend any time measuring and looking for bottlenecks.</p> <h2 id="conclusion">Conclusion</h2> <p>To conclude, I&rsquo;d say that one should not rely on the built-in capabilities of LLVM to compare types. In fact, <code>IRLinker</code> uses a very different algorithm.</p> <p>The algorithm I described has drawbacks, and I probably missed some edge cases. But anyway, I would love to get some feedback on it, and I hope it may help someone who gets into a similar situation.</p> Building an LLVM-based tool. Lessons learned https://lowlevelbits.org/building-an-llvm-based-tool.-lessons-learned/ Thu, 04 Jul 2019 [email protected] (Alex Denisov) https://lowlevelbits.org/building-an-llvm-based-tool.-lessons-learned/ <p>This article is a text version of my recent EuroLLVM talk called <a href="https://www.youtube.com/watch?v=Yvj4G9B6pcU">Building an LLVM-based tool: lessons learned</a>.</p> <h3 id="intro">Intro</h3> <p>For the last three years, I work on a tool for mutation testing: <a href="http://github.com/mull-project/mull">Mull</a>. It is based on LLVM and targets C and C++ primarily. What makes it interesting?</p> <ul> <li>it works on Linux, macOS, and FreeBSD</li> <li>it supports any version of LLVM starting from 3.9</li> <li>it is fast because of JIT and parallelization</li> <li>packaging and distribution is done in one click</li> </ul> <p>Keep reading if you want to know how it works and how to apply it on your project.</p> <h3 id="the-build-system">The Build System</h3> <h4 id="llvm-config">llvm-config</h4> <p>The most famous way to connect LLVM as a library is to use <code>llvm-config</code>. The simplest <code>llvm-config</code>-based build system:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span>&gt; clang -c <span style="color:#e6db74">`</span>llvm-config --cxxflags<span style="color:#e6db74">`</span> foo.cpp -o foo.o </span></span><span style="display:flex;"><span>&gt; clang -c <span style="color:#e6db74">`</span>llvm-config --cxxflags<span style="color:#e6db74">`</span> bar.cpp -o bar.o </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span>&gt; clang <span style="color:#e6db74">`</span>llvm-config --ldflags<span style="color:#e6db74">`</span> <span style="color:#e6db74">`</span>llvm-config --libs core support<span style="color:#e6db74">`</span> bar.o foo.o -o foobar.bin </span></span></code></pre></div><p>It works quite well in the very beginning, but there are some issues with it.</p> <ol> <li> <p>The compiler flags: <code>llvm-config --cxxflags</code> gives you the flags the LLVM was compiled with, these are not the flags you necessarily want for your project. Let&rsquo;s look at the example:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span>-I/opt/llvm/6.0.0/include </span></span><span style="display:flex;"><span>-Werror<span style="color:#f92672">=</span>unguarded-availability-new </span></span><span style="display:flex;"><span>-O3 -DNDEBUG </span></span><span style="display:flex;"><span>... </span></span></code></pre></div><p>The first flag is correct, and you need it. The second one is specific to Clang: it may not work with gcc, and it may not work with an older of Clang itself. The rest (<code>-O3 -NDEBUG</code>) will force you to compile your project in the release mode. It&rsquo;s fine, but not always desirable.</p> </li> <li> <p>The linker flags. <code>llvm-config --ldflags</code> does the right job. It tells where to look for the libraries and tweaks some other linker settings. <code>llvm-config --libs &lt;components&gt;</code> also does the right job. It prints the set of libraries you need to link against to use the specified components (you can see the whole list of components via <code>llvm-config --components</code>). However, there is a weird edge case. If, on your system, you have installed several versions of LLVM, and they come with a dynamic library, e.g.:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span>/usr/lib/llvm-4.0/lib/libLLVM.dylib </span></span><span style="display:flex;"><span>/usr/lib/llvm-6.0/lib/libLLVM.dylib </span></span></code></pre></div><p>Then, you may get a runtime error after successful linking:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span>&gt; clang foo.o bar.o -lLLVMSupport -o foobar.bin </span></span><span style="display:flex;"><span>&gt; ./foobar.bin </span></span><span style="display:flex;"><span>LLVM ERROR: inconsistency in registered CommandLine options </span></span></code></pre></div><p>To prevent this from happening, you should instead link against the dynamic library:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span>&gt; clang foo.o bar.o -lLLVM -o foobar.bin </span></span><span style="display:flex;"><span>&gt; ./foobar.bin </span></span><span style="display:flex;"><span>Yay! We are good to go now! </span></span></code></pre></div><p>To handle this case properly, you need to check the presence of the libLLVM.dylib on your system somehow. Alternatively, use CMake (see the next part).</p> </li> <li> <p>The linking order. As I said, <code>llvm-config --libs</code> does the right job, but it only applies to the LLVM libraries. If you also want to use Clang libraries with llvm-config, then you are in trouble: the libraries should be placed in the right order. It may work, or may not. The problem arises only on Linux. Either you manually re-order the Clang libraries until it compiles, or you wrap the libraries list into the <code>--start-group</code>/<code>--end-group</code>. That&rsquo;s a reasonable solution, but it does not work on macOS. Before migrating to CMake we ended up with something like this:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span><span style="color:#66d9ef">if</span> macOS </span></span><span style="display:flex;"><span>LDFLAGS<span style="color:#f92672">=</span>-lLLVM -lclangEdit </span></span><span style="display:flex;"><span><span style="color:#66d9ef">else</span> </span></span><span style="display:flex;"><span>LDFLAGS<span style="color:#f92672">=</span>-Wl,--start-group -lLLVM -lclangEdit -Wl,--end-group </span></span><span style="display:flex;"><span>endif </span></span><span style="display:flex;"><span>clang foo.o bar.o $LDFLAGS -o foobar.bin </span></span></code></pre></div></li> </ol> <p>Quite frankly, <code>llvm-config</code> is rather a suboptimal solution for the long run&hellip;</p> <h4 id="cmake">CMake</h4> <p>LLVM itself uses CMake as its primary build system. LLVM engineers put an enormous amount of work into making it very friendly to the LLVM users.</p> <p><em><strong>Note</strong></em>: I assume that you understand CMake, otherwise I suggest you build the mental model through this short article: <a href="https://lowlevelbits.org/bottom-up-cmake-introduction/">Bottom-up CMake introduction</a>.</p> <p>Adding LLVM and Clang as a dependency through CMake is reasonably straightforward:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cmake" data-lang="cmake"><span style="display:flex;"><span>find_package(<span style="color:#e6db74">LLVM</span> <span style="color:#e6db74">REQUIRED</span> <span style="color:#e6db74">CONFIG</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">PATHS</span> <span style="color:#f92672">${</span>search_paths<span style="color:#f92672">}</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">NO_DEFAULT_PATH</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>find_package(<span style="color:#e6db74">Clang</span> <span style="color:#e6db74">REQUIRED</span> <span style="color:#e6db74">CONFIG</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">PATHS</span> <span style="color:#f92672">${</span>search_paths<span style="color:#f92672">}</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">NO_DEFAULT_PATH</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span></code></pre></div><p>Please, note the <code>${search_paths}</code> and the <code>NO_DEFAULT_PATH</code>.</p> <p>This is the <code>${search_paths}</code> in our case:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cmake" data-lang="cmake"><span style="display:flex;"><span>set (<span style="color:#e6db74">search_paths</span> </span></span><span style="display:flex;"><span> <span style="color:#f92672">${</span>PATH_TO_LLVM<span style="color:#f92672">}</span> </span></span><span style="display:flex;"><span> <span style="color:#f92672">${</span>PATH_TO_LLVM<span style="color:#f92672">}</span><span style="color:#e6db74">/lib/cmake</span> </span></span><span style="display:flex;"><span> <span style="color:#f92672">${</span>PATH_TO_LLVM<span style="color:#f92672">}</span><span style="color:#e6db74">/lib/cmake/llvm</span> </span></span><span style="display:flex;"><span> <span style="color:#f92672">${</span>PATH_TO_LLVM<span style="color:#f92672">}</span><span style="color:#e6db74">/lib/cmake/clang</span> </span></span><span style="display:flex;"><span> <span style="color:#f92672">${</span>PATH_TO_LLVM<span style="color:#f92672">}</span><span style="color:#e6db74">/share/clang/cmake/</span> </span></span><span style="display:flex;"><span> <span style="color:#f92672">${</span>PATH_TO_LLVM<span style="color:#f92672">}</span><span style="color:#e6db74">/share/llvm/cmake/</span> </span></span><span style="display:flex;"><span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span></code></pre></div><p>The <code>PATH_TO_LLVM</code> is provided to CMake externally by the user.</p> <p><strong>Bold statement:</strong> You should not rely on the &lsquo;use whatever is installed on the machine,&rsquo; but explicitly provide the path to the LLVM installation.</p> <p><strong>Bold statement:</strong> For development, you should not use LLVM/Clang provided by your Linux distro, but instead, install it manually using <a href="http://releases.llvm.org">official precompiled binaries</a>.</p> <p>You can ignore the above statements if you only use LLVM libraries. If you also need Clang libraries, then you may get into trouble. On Ubuntu, some versions of Clang were coming with a broken CMake support:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span>CMake Error at /usr/share/llvm-6.0/cmake/ClangConfig.cmake:18 <span style="color:#f92672">(</span>include<span style="color:#f92672">)</span>: </span></span><span style="display:flex;"><span> include could not find load file: </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span> /usr/lib/cmake/clang/ClangTargets.cmake </span></span><span style="display:flex;"><span>Call Stack <span style="color:#f92672">(</span>most recent call first<span style="color:#f92672">)</span>: </span></span><span style="display:flex;"><span> CMakeLists.txt:8 <span style="color:#f92672">(</span>find_package<span style="color:#f92672">)</span> </span></span></code></pre></div><p>Search on the Internets for &ldquo;CMake cannot find ClangConfig&rdquo; to see how many projects and users suffered from this.</p> <p>Once the <code>find_package</code> succeeds, you get <code>LLVM_INCLUDE_DIRS</code> variable and bunch of LLVM targets you can use:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cmake" data-lang="cmake"><span style="display:flex;"><span>target_include_directories(<span style="color:#e6db74">mull</span> <span style="color:#e6db74">PUBLIC</span> <span style="color:#f92672">${</span>LLVM_INCLUDE_DIRS<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>target_link_libraries(<span style="color:#e6db74">mull</span> <span style="color:#e6db74">LLVMSupport</span> <span style="color:#e6db74">clangTooling</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span></code></pre></div><p>Except there is the</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span>LLVM ERROR: inconsistency in registered CommandLine options </span></span></code></pre></div><p>runtime error. To handle it with CMake, consider using the following snippet:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cmake" data-lang="cmake"><span style="display:flex;"><span>if (<span style="color:#e6db74">LLVM</span> <span style="color:#e6db74">IN_LIST</span> <span style="color:#e6db74">LLVM_AVAILABLE_LIBS</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> target_link_libraries(<span style="color:#e6db74">mull</span> <span style="color:#e6db74">LLVM</span> <span style="color:#e6db74">clangTooling</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>else()<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> target_link_libraries(<span style="color:#e6db74">mull</span> <span style="color:#e6db74">LLVMSupport</span> <span style="color:#e6db74">clangTooling</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>endif()<span style="color:#960050;background-color:#1e0010"> </span></span></span></code></pre></div><p>That should do the trick.</p> <h3 id="supporting-multiple-llvm-versions">Supporting multiple LLVM versions</h3> <p>There are at least two ways to support several versions of LLVM. You can add a bunch of <code>#ifdef</code>s to the source code. This is how Klee does it, and it works for them pretty well (seems like).</p> <p>Example #1:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cpp" data-lang="cpp"><span style="display:flex;"><span><span style="color:#75715e">#if LLVM_VERSION_CODE &gt;= LLVM_VERSION(4, 0) </span></span></span><span style="display:flex;"><span><span style="color:#75715e">#include</span> <span style="color:#75715e">&lt;llvm/Bitcode/BitcodeReader.h&gt;</span><span style="color:#75715e"> </span></span></span><span style="display:flex;"><span><span style="color:#75715e">#else </span></span></span><span style="display:flex;"><span><span style="color:#75715e">#include</span> <span style="color:#75715e">&lt;llvm/Bitcode/ReaderWriter.h&gt;</span><span style="color:#75715e"> </span></span></span><span style="display:flex;"><span><span style="color:#75715e">#endif </span></span></span></code></pre></div><p>Example #2:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cpp" data-lang="cpp"><span style="display:flex;"><span><span style="color:#75715e">#if LLVM_VERSION_CODE &gt;= LLVM_VERSION(5, 0) </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> assert(ii<span style="color:#f92672">-&gt;</span>getNumOperands() <span style="color:#f92672">==</span> <span style="color:#ae81ff">3</span> <span style="color:#f92672">&amp;&amp;</span> <span style="color:#e6db74">&#34;wrong number of arguments&#34;</span>); </span></span><span style="display:flex;"><span><span style="color:#75715e">#else </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> assert(ii<span style="color:#f92672">-&gt;</span>getNumOperands() <span style="color:#f92672">==</span> <span style="color:#ae81ff">2</span> <span style="color:#f92672">&amp;&amp;</span> <span style="color:#e6db74">&#34;wrong number of arguments&#34;</span>); </span></span><span style="display:flex;"><span><span style="color:#75715e">#endif </span></span></span></code></pre></div><p>The other way, the one Mull uses, is to provide a façade library. Mull has several libraries with the same interface, but with slightly different implementations. They are simply pairs of a header and <code>.cpp</code> file:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span>&gt; tree LLVMCompatibility/ </span></span><span style="display:flex;"><span>LLVMCompatibility/ </span></span><span style="display:flex;"><span>├── 3.9.x </span></span><span style="display:flex;"><span>│   ├── CMakeLists.txt </span></span><span style="display:flex;"><span>│   ├── LLVMCompatibility.cpp </span></span><span style="display:flex;"><span>│   └── LLVMCompatibility.h </span></span><span style="display:flex;"><span>├── 4.x.x </span></span><span style="display:flex;"><span>│   ├── CMakeLists.txt </span></span><span style="display:flex;"><span>│   ├── LLVMCompatibility.cpp </span></span><span style="display:flex;"><span>│   └── LLVMCompatibility.h </span></span><span style="display:flex;"><span>... </span></span><span style="display:flex;"><span>├── 8.x.x </span></span><span style="display:flex;"><span>│   ├── CMakeLists.txt </span></span><span style="display:flex;"><span>│   ├── LLVMCompatibility.cpp </span></span><span style="display:flex;"><span>│   └── LLVMCompatibility.h </span></span></code></pre></div><p>Then, we can use CMake to decide which version to use:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cmake" data-lang="cmake"><span style="display:flex;"><span>set (<span style="color:#e6db74">llvm_patch_version</span> <span style="color:#e6db74">&#34;${LLVM_VERSION_MAJOR}.${LLVM_VERSION_MINOR}.${LLVM_VERSION_PATCH}&#34;</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>set (<span style="color:#e6db74">llvm_minor_version</span> <span style="color:#e6db74">&#34;${LLVM_VERSION_MAJOR}.${LLVM_VERSION_MINOR}.x&#34;</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>set (<span style="color:#e6db74">llvm_major_version</span> <span style="color:#e6db74">&#34;${LLVM_VERSION_MAJOR}.x.x&#34;</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>set (<span style="color:#e6db74">full_llvm_version</span> <span style="color:#f92672">${</span>llvm_patch_version<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>if (<span style="color:#e6db74">EXISTS</span> <span style="color:#f92672">${</span>CMAKE_CURRENT_LIST_DIR<span style="color:#f92672">}</span><span style="color:#e6db74">/LLVMCompatibility/</span><span style="color:#f92672">${</span>llvm_patch_version<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> set (<span style="color:#e6db74">LLVM_COMPATIBILITY_DIR</span> <span style="color:#f92672">${</span>llvm_patch_version<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>elseif(<span style="color:#e6db74">EXISTS</span> <span style="color:#f92672">${</span>CMAKE_CURRENT_LIST_DIR<span style="color:#f92672">}</span><span style="color:#e6db74">/LLVMCompatibility/</span><span style="color:#f92672">${</span>llvm_minor_version<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> set (<span style="color:#e6db74">LLVM_COMPATIBILITY_DIR</span> <span style="color:#f92672">${</span>llvm_minor_version<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>elseif(<span style="color:#e6db74">EXISTS</span> <span style="color:#f92672">${</span>CMAKE_CURRENT_LIST_DIR<span style="color:#f92672">}</span><span style="color:#e6db74">/LLVMCompatibility/</span><span style="color:#f92672">${</span>llvm_major_version<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> set (<span style="color:#e6db74">LLVM_COMPATIBILITY_DIR</span> <span style="color:#f92672">${</span>llvm_major_version<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>else()<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> message(<span style="color:#e6db74">FATAL_ERROR</span> <span style="color:#e6db74">&#34;LLVM-${full_llvm_version} is not supported&#34;</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>endif()<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>add_subdirectory(<span style="color:#e6db74">LLVMCompatibility/</span><span style="color:#f92672">${</span>LLVM_COMPATIBILITY_DIR<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span></code></pre></div><p>What happens here: CMake is looking for a directory with the compatibility layer for the given LLVM version in a special order. For example, for the version 8.0.1 it will do the following:</p> <ol> <li>Use <code>LLVMCompatibility/8.0.1</code> if it exists</li> <li>Use <code>LLVMCompatibility/8.0.x</code> if it exists</li> <li>Use <code>LLVMCompatibility/8.x.x</code> if it exists</li> <li>Give up and fail</li> </ol> <p>As soon as it finds the right folder, it will include it in the build process. So far we used only <code>&lt;number&gt;.x.x</code>, but the idea is that we can provide a particular library for any version of LLVM if we need to. Here is how two header files look like:</p> <p><img src="https://lowlevelbits.org/img/llvm-lessons-learned/llvm-compat.png" style=" display: block; margin-left: 0; margin-right: auto;" /></p> <p>Then, in the source code we simply use the compatibility layer instead of bunch of <code>ifdef</code>s:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cpp" data-lang="cpp"><span style="display:flex;"><span><span style="color:#66d9ef">auto</span> module <span style="color:#f92672">=</span> llvm_compat<span style="color:#f92672">::</span>parseBitcode(buffer.getMemBufferRef(), </span></span><span style="display:flex;"><span> context); </span></span></code></pre></div><h3 id="sources-vs-binaries">Sources VS Binaries</h3> <p>So far I only covered builds against precompiled binary versions of LLVM. However, there are reasons you should also build against the source code. Look at the table:</p> <p><img src="https://lowlevelbits.org/img/llvm-lessons-learned/sources-binaries.png" style=" display: block; margin-left: 0; margin-right: auto;" /></p> <p>Build time against precompiled versions is much faster, but you give up the ability to debug the LLVM itself which is needed when you hit some bug or some weird behavior. Another significant drawback: asserts. They are disabled in the release builds you get from the <a href="http://releases.llvm.org">http://releases.llvm.org</a>. In fact, we did violate some of the LLVM constraints but didn&rsquo;t realize it until somebody tried to build Mull against the source code.</p> <p>You can easily teach CMake to build against source code and against precompiled libraries at the same time.</p> <p>Here is the trick:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cmake" data-lang="cmake"><span style="display:flex;"><span>if (<span style="color:#e6db74">EXISTS</span> <span style="color:#f92672">${</span>PATH_TO_LLVM<span style="color:#f92672">}</span><span style="color:#e6db74">/CMakeLists.txt</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> add_subdirectory(<span style="color:#f92672">${</span>PATH_TO_LLVM<span style="color:#f92672">}</span> <span style="color:#e6db74">llvm-build-dir</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> <span style="color:#75715e"># LLVM_INCLUDE_DIRS ??? </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> <span style="color:#75715e"># LLVM_VERSION ??? </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>else()<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> <span style="color:#960050;background-color:#1e0010">... </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>endif()<span style="color:#960050;background-color:#1e0010"> </span></span></span></code></pre></div><p>If the <code>PATH_TO_LLVM</code> contains <code>CMakeLists.txt</code>, then we are building against the source code. Otherwise, the behavior is the same as written in the previous paragraphs.</p> <p>However, <code>LLVM_INCLUDE_DIRS</code> and <code>LLVM_VERSION</code> are not available in this case. We can fix that with these tricks:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cmake" data-lang="cmake"><span style="display:flex;"><span>get_target_property(<span style="color:#e6db74">LLVM_INCLUDE_DIRS</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">LLVMSupport</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">INCLUDE_DIRECTORIES</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span></code></pre></div><p>It will fill in the <code>LLVM_INCLUDE_DIRS</code> with the right header search paths.</p> <p>The <code>LLVM_VERSION</code> is a bit less trivial: we need to parse the <code>CMakeLists.txt</code>:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cmake" data-lang="cmake"><span style="display:flex;"><span>macro(<span style="color:#e6db74">get_llvm_version_component</span> <span style="color:#e6db74">input</span> <span style="color:#e6db74">component</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> string(<span style="color:#e6db74">REGEX</span> <span style="color:#e6db74">MATCH</span> <span style="color:#e6db74">&#34;${component} ([0-9]+)&#34;</span> <span style="color:#e6db74">match</span> <span style="color:#f92672">${</span>input<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> if (<span style="color:#e6db74">NOT</span> <span style="color:#e6db74">match</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> message(<span style="color:#e6db74">FATAL_ERROR</span> <span style="color:#e6db74">&#34;Cannot find LLVM version component &#39;${component}&#39;&#34;</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> endif()<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> set (<span style="color:#f92672">${</span>component<span style="color:#f92672">}</span> <span style="color:#f92672">${</span>CMAKE_MATCH_1<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>endmacro()<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>file(<span style="color:#e6db74">READ</span> <span style="color:#f92672">${</span>PATH_TO_LLVM<span style="color:#f92672">}</span><span style="color:#e6db74">/CMakeLists.txt</span> <span style="color:#e6db74">LLVM_CMAKELISTS</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>get_llvm_version_component(<span style="color:#e6db74">&#34;${LLVM_CMAKELISTS}&#34;</span> <span style="color:#e6db74">LLVM_VERSION_MAJOR</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>get_llvm_version_component(<span style="color:#e6db74">&#34;${LLVM_CMAKELISTS}&#34;</span> <span style="color:#e6db74">LLVM_VERSION_MINOR</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>get_llvm_version_component(<span style="color:#e6db74">&#34;${LLVM_CMAKELISTS}&#34;</span> <span style="color:#e6db74">LLVM_VERSION_PATCH</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>set (<span style="color:#e6db74">LLVM_VERSION</span> <span style="color:#f92672">${</span>LLVM_VERSION_MAJOR<span style="color:#f92672">}</span><span style="color:#e6db74">.</span><span style="color:#f92672">${</span>LLVM_VERSION_MINOR<span style="color:#f92672">}</span><span style="color:#e6db74">.</span><span style="color:#f92672">${</span>LLVM_VERSION_PATCH<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span></code></pre></div><p>The macro will extract all the information we need from this piece of text (<code>llvm/CMakeLists.txt</code>):</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cmake" data-lang="cmake"><span style="display:flex;"><span>if(<span style="color:#e6db74">NOT</span> <span style="color:#e6db74">DEFINED</span> <span style="color:#e6db74">LLVM_VERSION_MAJOR</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> set(<span style="color:#e6db74">LLVM_VERSION_MAJOR</span> <span style="color:#e6db74">6</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>endif()<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>if(<span style="color:#e6db74">NOT</span> <span style="color:#e6db74">DEFINED</span> <span style="color:#e6db74">LLVM_VERSION_MINOR</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> set(<span style="color:#e6db74">LLVM_VERSION_MINOR</span> <span style="color:#e6db74">0</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>endif()<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>if(<span style="color:#e6db74">NOT</span> <span style="color:#e6db74">DEFINED</span> <span style="color:#e6db74">LLVM_VERSION_PATCH</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> set(<span style="color:#e6db74">LLVM_VERSION_PATCH</span> <span style="color:#e6db74">1</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>endif()<span style="color:#960050;background-color:#1e0010"> </span></span></span></code></pre></div><p>That&rsquo;s it. We are ready to build against LLVM&rsquo;s source code.</p> <h3 id="parallelization">Parallelization</h3> <p><strong>Bold statement:</strong> Avoid using LLVM Passes for better parallelization (explanation follows).</p> <p>Any LLVM-based tool is an excellent example of the fair parallelization: if you have 20 tasks and 4 cores, then you can run 5 tasks per each core and them merge the results. However, LLVM is not very friendly when it comes to the parallelization: lots of classes are not thread-safe.</p> <p>Let&rsquo;s consider this picture:</p> <p><img src="https://lowlevelbits.org/img/llvm-lessons-learned/LLVM-parallelization.png" style=" display: block; margin-left: 0; margin-right: auto;" /></p> <p>There are three phases: loading, analysis, and transformation:</p> <ol> <li>We load two modules(#1, #2) within the Thread 1, and the third module (#3) within the Thread 2. What&rsquo;s important is that each thread should have its own <code>LLVMContext</code>!</li> <li>The next phase is the analysis. At this point we only read information from LLVM IR, so we can distribute all the 8 functions (F1-F8) across two threads evenly: Thread 1 analyzes F1-F4, and Thread 2 deals with F5-F8.</li> <li>Transformation. Is it essential to ensure that any transformation of a module does not escape the module&rsquo;s thread boundaries: even such &lsquo;minor&rsquo; changes as renaming an instruction is not thread-safe.</li> </ol> <p><em>Note: of course you can put there lots of locks, but what&rsquo;s the point of parallelization then?</em></p> <p>Now I can tell why <strong>you should avoid LLVM Passes</strong>: this approach incentivizes you to merge analysis and transformation into one phase, and therefore lose the ability to parallelize efficiently. (There are other issues with LLVM Passes, but it&rsquo;s a different topic).</p> <p>Also, LLVM&rsquo;s <code>PassManager</code>s are not (yet?) parallelization-friendly.</p> <p>My advice here is to start with separate analysis &amp; transformation phases. It&rsquo;s easier to implement and easier to test. You can wrap these phases into LLVM pass later if needed.</p> <p>And of course, you should always measure the performance. Here is one of our measurements:</p> <p><img src="https://lowlevelbits.org/img/llvm-lessons-learned/analysis-transform.png" style=" display: block; margin-left: 0; margin-right: auto;" /></p> <p>You may get the opposite results.</p> <h3 id="getting-bitcode">Getting Bitcode</h3> <p>Once per 2-3 months, there is a question on the mailing lists: &ldquo;How do I compile my program to bitcode?&rdquo; Clearly, there is a demand for that.</p> <p>The most common answer I&rsquo;ve seen is the <a href="https://github.com/travitch/whole-program-llvm">whole-program-llvm</a>. It&rsquo;s a great tool, and I can also recommend using it, but keep in mind that it produces one large bitcode file as output. Therefore, you cannot get the benefits of your multicore machine.</p> <p>There are a few other ways to get the bitcode:</p> <ol> <li><code>-emit-llvm</code>: passing this flag to the compiler will give you an LLVM Bitcode/IR file as an output. It will break the linking phase of your build system, though.</li> <li><code>-flto</code>: with this flag all the intermediate object files will, in fact, be LLVM Bitcode files. The program will compile just fine. It won&rsquo;t work though if you don&rsquo;t have any intermediate object files in the pipeline (e.g. <code>clang foo.c bar.c -o foobar</code>)</li> <li><code>-fembed-bitcode</code>: this should be your choice! Clang will compile your program just fine, but it will also include a special section into the binary containing all the Bitcode files (<a href="https://lowlevelbits.org/bitcode-demystified/">Learn More</a>). You can extract the Bitcode from the binary programmatically using my fork of the awesome <a href="https://github.com/AlexDenisov/LibEBC">LibEBC</a> tool.</li> </ol> <h3 id="multi-os-support">Multi-OS Support</h3> <p>For more straightforward support of several operating systems, I highly recommend these two tools: <a href="https://www.vagrantup.com">Vagrant</a> and <a href="https://www.ansible.com">Ansible</a>.</p> <p>Vagrant allows you to manage virtual machines easily:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-ruby" data-lang="ruby"><span style="display:flex;"><span>config<span style="color:#f92672">.</span>vm<span style="color:#f92672">.</span>define <span style="color:#e6db74">&#34;debian&#34;</span> <span style="color:#66d9ef">do</span> <span style="color:#f92672">|</span>cfg<span style="color:#f92672">|</span> </span></span><span style="display:flex;"><span> cfg<span style="color:#f92672">.</span>vm<span style="color:#f92672">.</span>box <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;debian/stretch64&#34;</span> </span></span><span style="display:flex;"><span> cfg<span style="color:#f92672">.</span>vm<span style="color:#f92672">.</span>provision <span style="color:#e6db74">&#34;ansible&#34;</span> <span style="color:#66d9ef">do</span> <span style="color:#f92672">|</span>ansible<span style="color:#f92672">|</span> </span></span><span style="display:flex;"><span> ansible<span style="color:#f92672">.</span>verbose <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;v&#34;</span> </span></span><span style="display:flex;"><span> ansible<span style="color:#f92672">.</span>playbook <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;debian-playbook.yaml&#34;</span> </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">end</span> </span></span><span style="display:flex;"><span><span style="color:#66d9ef">end</span> </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span>config<span style="color:#f92672">.</span>vm<span style="color:#f92672">.</span>define <span style="color:#e6db74">&#34;ubuntu&#34;</span> <span style="color:#66d9ef">do</span> <span style="color:#f92672">|</span>cfg<span style="color:#f92672">|</span> </span></span><span style="display:flex;"><span> cfg<span style="color:#f92672">.</span>vm<span style="color:#f92672">.</span>box <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ubuntu/xenial64&#34;</span> </span></span><span style="display:flex;"><span> cfg<span style="color:#f92672">.</span>vm<span style="color:#f92672">.</span>provision <span style="color:#e6db74">&#34;ansible&#34;</span> <span style="color:#66d9ef">do</span> <span style="color:#f92672">|</span>ansible<span style="color:#f92672">|</span> </span></span><span style="display:flex;"><span> ansible<span style="color:#f92672">.</span>verbose <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;v&#34;</span> </span></span><span style="display:flex;"><span> ansible<span style="color:#f92672">.</span>playbook <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ubuntu-playbook.yaml&#34;</span> </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">end</span> </span></span><span style="display:flex;"><span><span style="color:#66d9ef">end</span> </span></span></code></pre></div><p>With this config you can create a VM ready for use:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span>vagrant up debian </span></span><span style="display:flex;"><span>vagrant up ubuntu </span></span></code></pre></div><p>Vagrant also allows you to provision the machine using various providers: from old-school shell scripts to modern tools such as Chef and Ansible.</p> <p>I prefer Ansible as it is the most straightforward tool, in my opinion. Basically, an Ansible playbook is a shell script on steroids. Here is how a part of it looks like:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">packages</span>: </span></span><span style="display:flex;"><span> - <span style="color:#ae81ff">fish</span> </span></span><span style="display:flex;"><span> - <span style="color:#ae81ff">vim</span> </span></span><span style="display:flex;"><span> - <span style="color:#ae81ff">wget</span> </span></span><span style="display:flex;"><span> - <span style="color:#ae81ff">git</span> </span></span><span style="display:flex;"><span> - <span style="color:#ae81ff">cmake</span> </span></span><span style="display:flex;"><span> - <span style="color:#ae81ff">ninja-build</span> </span></span><span style="display:flex;"><span> - <span style="color:#ae81ff">libz-dev</span> </span></span><span style="display:flex;"><span> - <span style="color:#ae81ff">libsqlite3-dev</span> </span></span><span style="display:flex;"><span> - <span style="color:#ae81ff">ncurses-dev</span> </span></span><span style="display:flex;"><span> - <span style="color:#ae81ff">libstdc++-6-dev</span> </span></span><span style="display:flex;"><span> - <span style="color:#ae81ff">pkg-config</span> </span></span><span style="display:flex;"><span> - <span style="color:#ae81ff">libxml2-dev</span> </span></span><span style="display:flex;"><span> - <span style="color:#ae81ff">uuid-dev</span> </span></span><span style="display:flex;"><span> - <span style="color:#ae81ff">liblzma-dev</span> </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span><span style="color:#f92672">tasks</span>: </span></span><span style="display:flex;"><span> - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Install Required Packages</span> </span></span><span style="display:flex;"><span> <span style="color:#f92672">apt</span>: </span></span><span style="display:flex;"><span> <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;{{ packages }}&#34;</span> </span></span><span style="display:flex;"><span> <span style="color:#f92672">state</span>: <span style="color:#ae81ff">present</span> </span></span><span style="display:flex;"><span> <span style="color:#f92672">become</span>: <span style="color:#66d9ef">true</span> </span></span></code></pre></div><p>This small snippet will make sure that all the <code>packages</code> are installed (<code>present</code>) in the VM. You can use Ansible to automate lots of things. In our case, we automate the following processes:</p> <ul> <li>install packages</li> <li>download LLVM</li> <li>build &amp; run Mull&rsquo;s unit tests</li> <li>create an OS dependent package (<code>pkg</code>, <code>deb</code>, <code>rpm</code>, <code>sh</code>)</li> <li>run integration tests</li> </ul> <p>Another great thing about Ansible: you can run it locally, not necessarily in the VM. We use this feature on CI: executing each mentioned step for every pull request.</p> <p>It saves me lots of time and simplifies the release process. Here is the whole release script:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span>mkdir -p packages </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> prepare_package <span style="color:#f92672">()</span> <span style="color:#f92672">{</span> </span></span><span style="display:flex;"><span> printf <span style="color:#e6db74">&#34;Preparing package for </span>$1<span style="color:#e6db74">... &#34;</span> </span></span><span style="display:flex;"><span> export LLVM_VERSION<span style="color:#f92672">=</span>$2 </span></span><span style="display:flex;"><span> vagrant up $1 --provision 2&gt; ./packages/$1.err.log &gt; ./packages/$1.out.log </span></span><span style="display:flex;"><span> vagrant destroy -f $1 2&gt;&gt; ./packages/$1.err.log &gt;&gt; ./packages/$1.out.log </span></span><span style="display:flex;"><span> printf <span style="color:#e6db74">&#34;Done.\n&#34;</span> </span></span><span style="display:flex;"><span><span style="color:#f92672">}</span> </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span>prepare_package debian 6.0.0 </span></span><span style="display:flex;"><span>prepare_package freebsd 8.0.0 </span></span><span style="display:flex;"><span>prepare_package ubuntu 8.0.0 </span></span></code></pre></div><p>In the end, I have packages ready in the <code>packages</code> folder for Debian, FreeBSD, and Ubuntu. Doing so for macOS is not as straightforward, but we will get there soon as well.</p> <h3 id="summary">Summary</h3> <p>Just reiterating all those bold statements one more time:</p> <ul> <li><strong>don&rsquo;t use</strong> <code>llvm-config</code> as part of the build system</li> <li><strong>don&rsquo;t use</strong> LLVM/Clang from your distro for development</li> <li><strong>don&rsquo;t use</strong> LLVM passes</li> <li><strong>don&rsquo;t use</strong> <code>whole-program-llvm</code></li> <li><strong>use</strong> Vagrant &amp; Ansible for multi-OS support</li> <li><strong>use</strong> different versions of LLVM for development</li> </ul> <p>There is another big topic: <strong>Testing</strong>, but I will leave it for the next article.</p> Bottom-up CMake introduction https://lowlevelbits.org/bottom-up-cmake-introduction/ Fri, 24 May 2019 [email protected] (Alex Denisov) https://lowlevelbits.org/bottom-up-cmake-introduction/ <p>If you want to learn CMake, but do not have time to go through all the resources on the internet, then this article is for you. I will cover essentials you&rsquo;ll need to start:</p> <ul> <li>targets</li> <li>commands</li> <li>variables</li> <li>functions</li> <li>macros</li> </ul> <p>In the next few minutes, we will reimplement some CMake&rsquo;s builtin functionality using the CMake itself.</p> <p><strong>Disclaimer:</strong> there are several very inaccurate statements about CMake in this article. Most of them are here on purpose: the goal is to build an intuition of how CMake works, not to be 100% correct.</p> <h3 id="what-is-cmake">What is CMake?</h3> <p>CMake is not a build system as many think of it. CMake is a <em>build system generator</em>. Basically, you can see it as a compiler that compiles CMake scripts into Makefiles. Or several other build systems including <a href="https://ninja-build.org">Ninja</a> and Xcode, Eclipse, and Visual Studio projects.</p> <p><img src="https://lowlevelbits.org/img/bottom-up-cmake/cmake-compiler.png" style="width: 100%; height: auto;" /></p> <p>The typical workflow is as follows: you create a <code>CMakeLists.txt</code> (can be empty), you generate the build system, you build something. Here is the code:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>&gt; touch CMakeLists.txt </span></span><span style="display:flex;"><span>&gt; cmake . </span></span><span style="display:flex;"><span>&gt; make help </span></span></code></pre></div><p>This is the bare minimum you need to start. Now let&rsquo;s learn some CMake concepts.</p> <h3 id="targets">Targets</h3> <p>All the work in CMake is organized around <strong>targets</strong>. A <strong>target</strong> is something you can <em>build</em> or <em>call</em>.</p> <h4 id="target-calls">Target calls</h4> <p>Create a <code>CMakeLists.txt</code> with the following content:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cmake" data-lang="cmake"><span style="display:flex;"><span>add_custom_target(<span style="color:#e6db74">hello-target</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">COMMAND</span> <span style="color:#e6db74">cmake</span> <span style="color:#e6db74">-E</span> <span style="color:#e6db74">echo</span> <span style="color:#e6db74">&#34;Hello, CMake World&#34;</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span></code></pre></div><p>And run the following commands:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>&gt; cmake . </span></span><span style="display:flex;"><span>&lt; truncated &gt; </span></span><span style="display:flex;"><span>-- Configuring <span style="color:#66d9ef">done</span> </span></span><span style="display:flex;"><span>-- Generating <span style="color:#66d9ef">done</span> </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span>&gt; make hello-target </span></span><span style="display:flex;"><span>Scanning dependencies of target hello-target </span></span><span style="display:flex;"><span>Hello, CMake World </span></span><span style="display:flex;"><span>Built target hello-target </span></span></code></pre></div><p>Here, the <code>make</code> tells us that it has built the target <code>hello-target</code>, even though it just called <code>echo</code> command and did not produce any artifacts.</p> <h4 id="build-targets">Build targets</h4> <p>Let&rsquo;s fix that and actually build some simple program. Create the following files:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#75715e">// main.c </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span> </span></span><span style="display:flex;"><span><span style="color:#66d9ef">extern</span> <span style="color:#66d9ef">void</span> <span style="color:#a6e22e">hello_world</span>(); </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span><span style="color:#66d9ef">int</span> <span style="color:#a6e22e">main</span>() { </span></span><span style="display:flex;"><span> <span style="color:#a6e22e">hello_world</span>(); </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">return</span> <span style="color:#ae81ff">0</span>; </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#75715e">// hello.c </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">extern</span> <span style="color:#66d9ef">int</span> <span style="color:#a6e22e">printf</span>(<span style="color:#66d9ef">const</span> <span style="color:#66d9ef">char</span> <span style="color:#f92672">*</span>, ...); </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span><span style="color:#66d9ef">void</span> <span style="color:#a6e22e">hello_world</span>() { </span></span><span style="display:flex;"><span> <span style="color:#a6e22e">printf</span>(<span style="color:#e6db74">&#34;Hello, CMake world</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">&#34;</span>); </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><p>Replace the custom target COMMAND:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cmake" data-lang="cmake"><span style="display:flex;"><span>add_custom_target(<span style="color:#e6db74">hello-target</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">COMMAND</span> <span style="color:#e6db74">gcc</span> <span style="color:#e6db74">main.c</span> <span style="color:#e6db74">hello.c</span> <span style="color:#e6db74">-o</span> <span style="color:#e6db74">hello</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span></code></pre></div><p>And re-run <code>make</code>:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>&gt; make hello-target </span></span><span style="display:flex;"><span>&lt; truncated &gt; </span></span><span style="display:flex;"><span>-- Configuring <span style="color:#66d9ef">done</span> </span></span><span style="display:flex;"><span>-- Generating <span style="color:#66d9ef">done</span> </span></span><span style="display:flex;"><span>Built target hello-target </span></span></code></pre></div><p>As you can see <code>make</code> detected the change and reconfigured CMake. If everything is right, you should have the <code>hello</code> executable:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>&gt; ./hello </span></span><span style="display:flex;"><span>Hello, CMake world </span></span></code></pre></div><p>Great success!!!</p> <h3 id="commands">Commands</h3> <p>In fact, you can describe the whole build process using the custom target as we did above. The problem is, however, that the command will re-run every time whenever you run <code>make hello-target</code>: the <code>hello</code> program will be re-compiled completely even when nothing has changed.</p> <p>Let&rsquo;s use separate <strong>commands</strong> to solve this problem. The new version of <code>CMakeLists.txt</code>:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cmake" data-lang="cmake"><span style="display:flex;"><span>add_custom_command(<span style="color:#e6db74">OUTPUT</span> <span style="color:#e6db74">hello.o</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">COMMAND</span> <span style="color:#e6db74">gcc</span> <span style="color:#e6db74">-c</span> <span style="color:#e6db74">hello.c</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">DEPENDS</span> <span style="color:#e6db74">hello.c</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>add_custom_command(<span style="color:#e6db74">OUTPUT</span> <span style="color:#e6db74">main.o</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">COMMAND</span> <span style="color:#e6db74">gcc</span> <span style="color:#e6db74">-c</span> <span style="color:#e6db74">main.c</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">DEPENDS</span> <span style="color:#e6db74">main.c</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>add_custom_target(<span style="color:#e6db74">hello-target</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">COMMAND</span> <span style="color:#e6db74">gcc</span> <span style="color:#e6db74">main.o</span> <span style="color:#e6db74">hello.o</span> <span style="color:#e6db74">-o</span> <span style="color:#e6db74">hello</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">DEPENDS</span> <span style="color:#e6db74">main.o</span> <span style="color:#e6db74">hello.o</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span></code></pre></div><p>The important point here is the <code>DEPENDS</code>. This construct describes the build process in the form of a (direct acyclic) graph: A depends on B, B depends on C and D, and so forth. Then, a change of D or C means that B is changed, which means that A is also changed, and therefore, all the changed items should be re-created.</p> <p>Now try the following: build the program, add some change to one of the files, re-run the build twice, you should see something like this:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>&gt; make hello-target </span></span><span style="display:flex;"><span><span style="color:#f92672">[</span> 50%<span style="color:#f92672">]</span> Generating hello.o </span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>100%<span style="color:#f92672">]</span> Generating main.o </span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>100%<span style="color:#f92672">]</span> Built target hello-target </span></span><span style="display:flex;"><span><span style="color:#75715e"># Add small change to hello.c</span> </span></span><span style="display:flex;"><span>&gt; make hello-target </span></span><span style="display:flex;"><span><span style="color:#f92672">[</span> 50%<span style="color:#f92672">]</span> Generating hello.o </span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>100%<span style="color:#f92672">]</span> Built target hello-target </span></span><span style="display:flex;"><span>&gt; make hello-target </span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>100%<span style="color:#f92672">]</span> Built target hello-target </span></span></code></pre></div><p>We&rsquo;ve just got <strong>incremental compilation</strong>, yay!</p> <h3 id="variables">Variables</h3> <p>It is time to do some refactoring: I&rsquo;m more of a <code>clang</code> person than <code>gcc</code>, and therefore I want an easier way to change the compiler. Let&rsquo;s extract it into a separate variable.</p> <p>Definition of a variable is as easy as <code>set (FOO bar)</code> call, that defines a variable <code>FOO</code> with value <code>bar</code>. The usage is also straightforward: <code>${FOO}</code> becomes <code>bar</code> when executed.</p> <p>Here is how a better version of <code>CMakeLists.txt</code> looks like:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cmake" data-lang="cmake"><span style="display:flex;"><span>set (<span style="color:#e6db74">C_COMPILER</span> <span style="color:#e6db74">gcc</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>add_custom_command(<span style="color:#e6db74">OUTPUT</span> <span style="color:#e6db74">hello.o</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">COMMAND</span> <span style="color:#f92672">${</span>C_COMPILER<span style="color:#f92672">}</span> <span style="color:#e6db74">-c</span> <span style="color:#e6db74">hello.c</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">DEPENDS</span> <span style="color:#e6db74">hello.c</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>add_custom_command(<span style="color:#e6db74">OUTPUT</span> <span style="color:#e6db74">main.o</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">COMMAND</span> <span style="color:#f92672">${</span>C_COMPILER<span style="color:#f92672">}</span> <span style="color:#e6db74">-c</span> <span style="color:#e6db74">main.c</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">DEPENDS</span> <span style="color:#e6db74">main.c</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>add_custom_target(<span style="color:#e6db74">hello-target</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">COMMAND</span> <span style="color:#f92672">${</span>C_COMPILER<span style="color:#f92672">}</span> <span style="color:#e6db74">main.o</span> <span style="color:#e6db74">hello.o</span> <span style="color:#e6db74">-o</span> <span style="color:#e6db74">hello</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">DEPENDS</span> <span style="color:#e6db74">main.o</span> <span style="color:#e6db74">hello.o</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span></code></pre></div><p>The variable definition can be recursive. Try to add the following code to the CMake script:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cmake" data-lang="cmake"><span style="display:flex;"><span>set (<span style="color:#e6db74">NUMBERS</span> <span style="color:#e6db74">1</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>message(<span style="color:#f92672">${</span>NUMBERS<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>set (<span style="color:#e6db74">NUMBERS</span> <span style="color:#f92672">${</span>NUMBERS<span style="color:#f92672">}</span> <span style="color:#e6db74">2</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>message(<span style="color:#f92672">${</span>NUMBERS<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>set (<span style="color:#e6db74">NUMBERS</span> <span style="color:#f92672">${</span>NUMBERS<span style="color:#f92672">}</span> <span style="color:#e6db74">3</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>message(<span style="color:#f92672">${</span>NUMBERS<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>set (<span style="color:#e6db74">NUMBERS</span> <span style="color:#e6db74">0</span> <span style="color:#f92672">${</span>NUMBERS<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>message(<span style="color:#f92672">${</span>NUMBERS<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span></code></pre></div><p>And re-run <code>cmake .</code> to see this in action:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>&gt; cmake . </span></span><span style="display:flex;"><span><span style="color:#ae81ff">1</span> </span></span><span style="display:flex;"><span><span style="color:#ae81ff">12</span> </span></span><span style="display:flex;"><span><span style="color:#ae81ff">123</span> </span></span><span style="display:flex;"><span><span style="color:#ae81ff">0123</span> </span></span></code></pre></div><h3 id="functions">Functions</h3> <p>In CMake, everything is a function!</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cmake" data-lang="cmake"><span style="display:flex;"><span>set (<span style="color:#e6db74">FOO</span> <span style="color:#e6db74">bar</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span></code></pre></div><p><code>set</code> is a function that takes two arguments.</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cmake" data-lang="cmake"><span style="display:flex;"><span>if(<span style="color:#e6db74">CMAKE_SYSTEM_NAME</span> <span style="color:#e6db74">STREQUAL</span> <span style="color:#e6db74">Linux</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> <span style="color:#75715e"># ... </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>else()<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> <span style="color:#75715e"># ... </span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>endif()<span style="color:#960050;background-color:#1e0010"> </span></span></span></code></pre></div><p><code>if</code>, <code>else</code>, and <code>endif</code> are functions.</p> <p><code>add_custom_target</code> and <code>add_custom_command</code> are also functions.</p> <p>Let&rsquo;s create our own function and hide all the intricacies of our CMake script:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cmake" data-lang="cmake"><span style="display:flex;"><span>set (<span style="color:#e6db74">C_COMPILER</span> <span style="color:#e6db74">gcc</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>function(<span style="color:#e6db74">create_executable</span> <span style="color:#e6db74">name</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> add_custom_command(<span style="color:#e6db74">OUTPUT</span> <span style="color:#e6db74">hello.o</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">COMMAND</span> <span style="color:#f92672">${</span>C_COMPILER<span style="color:#f92672">}</span> <span style="color:#e6db74">-c</span> <span style="color:#e6db74">hello.c</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">DEPENDS</span> <span style="color:#e6db74">hello.c</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> add_custom_command(<span style="color:#e6db74">OUTPUT</span> <span style="color:#e6db74">main.o</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">COMMAND</span> <span style="color:#f92672">${</span>C_COMPILER<span style="color:#f92672">}</span> <span style="color:#e6db74">-c</span> <span style="color:#e6db74">main.c</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">DEPENDS</span> <span style="color:#e6db74">main.c</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> add_custom_target(<span style="color:#f92672">${</span>name<span style="color:#f92672">}</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">COMMAND</span> <span style="color:#f92672">${</span>C_COMPILER<span style="color:#f92672">}</span> <span style="color:#e6db74">main.o</span> <span style="color:#e6db74">hello.o</span> <span style="color:#e6db74">-o</span> <span style="color:#e6db74">hello</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">DEPENDS</span> <span style="color:#e6db74">main.o</span> <span style="color:#e6db74">hello.o</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>endfunction()<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>create_executable(<span style="color:#e6db74">hello-target</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span></code></pre></div><p><em>Fun fact: <code>function</code> and <code>endfunction</code> are also functions.</em></p> <p>The function is now reusable, but quite useless since the source files are hardcoded. Let&rsquo;s go a bit deeper and fix this issue.</p> <h3 id="macros">Macros</h3> <p>In CMake, everything is a function! Except for macros.</p> <p>Macros are like functions, with one exception: they are inlined whenever they called. We can extract compilation into the macro:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cmake" data-lang="cmake"><span style="display:flex;"><span>macro(<span style="color:#e6db74">compile</span> <span style="color:#e6db74">source_file</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> get_filename_component(<span style="color:#e6db74">output_file</span> <span style="color:#f92672">${</span>source_file<span style="color:#f92672">}</span> <span style="color:#e6db74">NAME_WE</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> set (<span style="color:#e6db74">output_file</span> <span style="color:#f92672">${</span>output_file<span style="color:#f92672">}</span><span style="color:#e6db74">.o</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> add_custom_command(<span style="color:#e6db74">OUTPUT</span> <span style="color:#f92672">${</span>output_file<span style="color:#f92672">}</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">COMMAND</span> <span style="color:#f92672">${</span>C_COMPILER<span style="color:#f92672">}</span> <span style="color:#e6db74">-c</span> <span style="color:#f92672">${</span>source_file<span style="color:#f92672">}</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">DEPENDS</span> <span style="color:#f92672">${</span>source_file<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>endmacro()<span style="color:#960050;background-color:#1e0010"> </span></span></span></code></pre></div><p>The macro uses <code>get_filename_component</code> to cut the extension from the input source file and constructs the output file name: <code>main.c -&gt; main.o</code>.</p> <p>Now we can use this macro:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cmake" data-lang="cmake"><span style="display:flex;"><span>function(<span style="color:#e6db74">create_executable</span> <span style="color:#e6db74">name</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> compile(<span style="color:#e6db74">hello.c</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> set (<span style="color:#e6db74">output_files</span> <span style="color:#f92672">${</span>output_file<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> compile(<span style="color:#e6db74">main.c</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> set (<span style="color:#e6db74">output_files</span> <span style="color:#f92672">${</span>output_files<span style="color:#f92672">}</span> <span style="color:#f92672">${</span>output_file<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> add_custom_target(<span style="color:#f92672">${</span>name<span style="color:#f92672">}</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">COMMAND</span> <span style="color:#f92672">${</span>C_COMPILER<span style="color:#f92672">}</span> <span style="color:#f92672">${</span>output_files<span style="color:#f92672">}</span> <span style="color:#e6db74">-o</span> <span style="color:#e6db74">hello</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">DEPENDS</span> <span style="color:#f92672">${</span>output_files<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>endfunction()<span style="color:#960050;background-color:#1e0010"> </span></span></span></code></pre></div><p>The code looks a bit cleaner now, but there is at least one part that may look confusing: <code>set (output_files ${output_file})</code>. Since the body of a macro is inlined, we can rewrite this function like this (just for illustration):</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cmake" data-lang="cmake"><span style="display:flex;"><span>function(<span style="color:#e6db74">create_executable</span> <span style="color:#e6db74">name</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> get_filename_component(<span style="color:#e6db74">output_file</span> <span style="color:#e6db74">hello.c</span> <span style="color:#e6db74">NAME_WE</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> set (<span style="color:#e6db74">output_file</span> <span style="color:#f92672">${</span>output_file<span style="color:#f92672">}</span><span style="color:#e6db74">.o</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> add_custom_command(<span style="color:#e6db74">OUTPUT</span> <span style="color:#f92672">${</span>output_file<span style="color:#f92672">}</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">COMMAND</span> <span style="color:#f92672">${</span>C_COMPILER<span style="color:#f92672">}</span> <span style="color:#e6db74">-c</span> <span style="color:#e6db74">hello.c</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">DEPENDS</span> <span style="color:#e6db74">hello.c</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> set (<span style="color:#e6db74">output_files</span> <span style="color:#f92672">${</span>output_file<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> get_filename_component(<span style="color:#e6db74">output_file</span> <span style="color:#e6db74">main.c</span> <span style="color:#e6db74">NAME_WE</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> set (<span style="color:#e6db74">output_file</span> <span style="color:#f92672">${</span>output_file<span style="color:#f92672">}</span><span style="color:#e6db74">.o</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> add_custom_command(<span style="color:#e6db74">OUTPUT</span> <span style="color:#f92672">${</span>output_file<span style="color:#f92672">}</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">COMMAND</span> <span style="color:#f92672">${</span>C_COMPILER<span style="color:#f92672">}</span> <span style="color:#e6db74">-c</span> <span style="color:#e6db74">main.c</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">DEPENDS</span> <span style="color:#e6db74">main.c</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> set (<span style="color:#e6db74">output_files</span> <span style="color:#f92672">${</span>output_files<span style="color:#f92672">}</span> <span style="color:#f92672">${</span>output_file<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> add_custom_target(<span style="color:#f92672">${</span>name<span style="color:#f92672">}</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">COMMAND</span> <span style="color:#f92672">${</span>C_COMPILER<span style="color:#f92672">}</span> <span style="color:#f92672">${</span>output_files<span style="color:#f92672">}</span> <span style="color:#e6db74">-o</span> <span style="color:#e6db74">hello</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">DEPENDS</span> <span style="color:#f92672">${</span>output_files<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>endfunction()<span style="color:#960050;background-color:#1e0010"> </span></span></span></code></pre></div><p>So basically, we reuse the variable <code>output_file</code>. We can use it to construct the list of object files for the custom target. I hope it is clearer now.</p> <h3 id="loops">Loops</h3> <p>It obviously follows (c) that we can use a loop to handle a variable amount of source files passed to this function:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cmake" data-lang="cmake"><span style="display:flex;"><span>function(<span style="color:#e6db74">create_executable</span> <span style="color:#e6db74">name</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> foreach(<span style="color:#e6db74">file</span> <span style="color:#f92672">${</span>ARGN<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> compile(<span style="color:#f92672">${</span>file<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> set (<span style="color:#e6db74">output_files</span> <span style="color:#f92672">${</span>output_files<span style="color:#f92672">}</span> <span style="color:#f92672">${</span>output_file<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> endforeach()<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> add_custom_target(<span style="color:#f92672">${</span>name<span style="color:#f92672">}</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">COMMAND</span> <span style="color:#f92672">${</span>C_COMPILER<span style="color:#f92672">}</span> <span style="color:#f92672">${</span>output_files<span style="color:#f92672">}</span> <span style="color:#e6db74">-o</span> <span style="color:#e6db74">hello</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">DEPENDS</span> <span style="color:#f92672">${</span>output_files<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>endfunction()<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>create_executable(<span style="color:#e6db74">hello-target</span> <span style="color:#e6db74">main.c</span> <span style="color:#e6db74">hello.c</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span></code></pre></div><p>Here we iterate over passed source files (<code>main.c</code>, <code>hello.c</code>) stored in the <code>ARGN</code> variable, and accumulate all the intermediate files in <code>output_files</code>.</p> <h3 id="final-touches">Final touches</h3> <p>I added three more things to the final version:</p> <ul> <li>I added another variable <code>C_FLAGS</code> that stores some additional compile flags one may need</li> <li>the name of the executable passed as a separate argument</li> <li>extracted the linking phase into a separate command</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cmake" data-lang="cmake"><span style="display:flex;"><span>set (<span style="color:#e6db74">C_COMPILER</span> <span style="color:#e6db74">gcc</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>set (<span style="color:#e6db74">C_FLAGS</span> <span style="color:#e6db74">-g</span> <span style="color:#e6db74">-O0</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>macro(<span style="color:#e6db74">compile</span> <span style="color:#e6db74">source_file</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> get_filename_component(<span style="color:#e6db74">output_file</span> <span style="color:#f92672">${</span>source_file<span style="color:#f92672">}</span> <span style="color:#e6db74">NAME_WE</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> set (<span style="color:#e6db74">output_file</span> <span style="color:#f92672">${</span>output_file<span style="color:#f92672">}</span><span style="color:#e6db74">.o</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> add_custom_command(<span style="color:#e6db74">OUTPUT</span> <span style="color:#f92672">${</span>output_file<span style="color:#f92672">}</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">COMMAND</span> <span style="color:#f92672">${</span>C_COMPILER<span style="color:#f92672">}</span> <span style="color:#f92672">${</span>C_FLAGS<span style="color:#f92672">}</span> <span style="color:#e6db74">-c</span> <span style="color:#f92672">${</span>source_file<span style="color:#f92672">}</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">DEPENDS</span> <span style="color:#f92672">${</span>source_file<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>endmacro()<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>function(<span style="color:#e6db74">create_executable</span> <span style="color:#e6db74">name</span> <span style="color:#e6db74">exe</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> foreach(<span style="color:#e6db74">file</span> <span style="color:#f92672">${</span>ARGN<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> compile(<span style="color:#f92672">${</span>file<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> set (<span style="color:#e6db74">output_files</span> <span style="color:#f92672">${</span>output_files<span style="color:#f92672">}</span> <span style="color:#f92672">${</span>output_file<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> endforeach()<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> add_custom_command(<span style="color:#e6db74">OUTPUT</span> <span style="color:#f92672">${</span>exe<span style="color:#f92672">}</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">COMMAND</span> <span style="color:#f92672">${</span>C_COMPILER<span style="color:#f92672">}</span> <span style="color:#f92672">${</span>output_files<span style="color:#f92672">}</span> <span style="color:#e6db74">-o</span> <span style="color:#f92672">${</span>exe<span style="color:#f92672">}</span> </span></span><span style="display:flex;"><span> <span style="color:#e6db74">DEPENDS</span> <span style="color:#f92672">${</span>output_files<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> add_custom_target(<span style="color:#f92672">${</span>name<span style="color:#f92672">}</span> <span style="color:#e6db74">DEPENDS</span> <span style="color:#f92672">${</span>exe<span style="color:#f92672">}</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>endfunction()<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>create_executable(<span style="color:#e6db74">hello-target</span> <span style="color:#e6db74">hello</span> <span style="color:#e6db74">main.c</span> <span style="color:#e6db74">hello.c</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span></code></pre></div><p>Give it another try:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>&gt; cmake . </span></span><span style="display:flex;"><span>&gt; make hello-target </span></span><span style="display:flex;"><span>&gt; ./hello </span></span><span style="display:flex;"><span>Hello, CMake world! </span></span></code></pre></div><h3 id="conclusion">Conclusion</h3> <p>We&rsquo;ve just replicated (limited) version of CMake&rsquo;s <a href="https://cmake.org/cmake/help/latest/command/add_executable.html"><code>add_executable</code></a> functionality.</p> <p>Here is the version you would use if you didn&rsquo;t know how to build the thing on your own:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cmake" data-lang="cmake"><span style="display:flex;"><span>set (<span style="color:#e6db74">CMAKE_C_COMPILER</span> <span style="color:#e6db74">gcc</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>set (<span style="color:#e6db74">CMAKE_C_FLAGS</span> <span style="color:#e6db74">-g</span> <span style="color:#e6db74">-O0</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>add_executable(<span style="color:#e6db74">hello</span> <span style="color:#e6db74">main.c</span> <span style="color:#e6db74">hello.c</span>)<span style="color:#960050;background-color:#1e0010"> </span></span></span></code></pre></div><h3 id="whats-next">What&rsquo;s next?</h3> <p>Go and learn about other CMake <a href="https://cmake.org/cmake/help/latest/manual/cmake-commands.7.html">functions</a> (that are confusingly called <em>commands</em>), you are ready now!</p> <p>I would highly recommend learning about the following concepts:</p> <ul> <li><a href="https://cmake.org/cmake/help/latest/command/add_subdirectory.html">add_subdirectory</a></li> <li><a href="https://cmake.org/cmake/help/latest/command/include.html">include</a></li> <li><a href="https://cmake.org/cmake/help/latest/manual/cmake-properties.7.html">properties</a> <ul> <li><a href="https://cmake.org/cmake/help/latest/command/get_property.html">get_property</a></li> <li><a href="https://cmake.org/cmake/help/latest/command/set_property.html">set_property</a></li> </ul> </li> <li><a href="https://cmake.org/cmake/help/latest/index.html">All the rest</a></li> </ul>