<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.1">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2026-01-13T14:51:04+00:00</updated><id>/feed.xml</id><title type="html">rfdavid</title><subtitle>AI | Machine Learning | Graph Neural Networks </subtitle><entry><title type="html">Exploring Kùzu Graph Database Management System code</title><link href="/exploring-kuzu-graph-database/" rel="alternate" type="text/html" title="Exploring Kùzu Graph Database Management System code" /><published>2023-02-22T00:00:00+00:00</published><updated>2023-02-22T00:00:00+00:00</updated><id>/exploring-kuzu-graph-database</id><content type="html" xml:base="/exploring-kuzu-graph-database/">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://https://kuzudb.com&quot;&gt;Kùzu&lt;/a&gt; is a Graph Database Manaagement System born 
after extensive research conducted over several years at University of Waterloo. 
Kùzu is highly optimized to handle complex join-heavy
analytical workloads on very large databases. It is similar to what
&lt;a href=&quot;https://duckdb.org/&quot;&gt;DuckDB&lt;/a&gt; is doing for SQL. It is extremely useful when you
need to model your data as a graph from different sources and store it in one
place for fast extraction in analytics. Kùzu has integration with Pytorch
Geometric, making it easy to extract graph data and feed it into your PyG models
to perform a GNN task.
This article contains my annotations from when I started exploring how Kùzu database
works. I took a ‘depth limited search’ approach exploring the code by first
going to the CLI and running a simple query. I used LLDB to debug and learn
more about the overall design of the database.&lt;/p&gt;

&lt;h2 id=&quot;starting-from-the-embedded-shell&quot;&gt;Starting from the embedded shell&lt;/h2&gt;

&lt;p&gt;Starting from the CLI tool, the purpose is to track what is happening
internally from the initialization to a match query.&lt;/p&gt;

&lt;p&gt;Kùzu uses &lt;a href=&quot;https://github.com/Taywee/args&quot;&gt;args library&lt;/a&gt; to parse the arguments.
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;#include &quot;args.hxx&quot;&lt;/code&gt;. For instance,  database path (-i parameter) can be
retrieved by:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;k&quot;&gt;auto&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;databasePath&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inputDirFlag&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;uint64_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bpSizeInMB&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bpSizeInMBFlag&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Initialize default bufferPoolSize as -1u bit mask:
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;uint64_t bpSizeInBytes = -1u;&lt;/code&gt;.&lt;/p&gt;

&lt;h3 id=&quot;systemconfig&quot;&gt;SystemConfig&lt;/h3&gt;

&lt;p&gt;shell_runner.cpp: SystemConfig systemConfig(bpSizeInBytes);&lt;/p&gt;

&lt;p&gt;SystemConfig will initialize 4 variables:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;systemMemSize&lt;/strong&gt;: total memory in the system. This is accomplished by mutiplying
                the number of pages of physical memory by the size of a page in bytes. Both
                values are retrieved using sysconf from unistd.h library.&lt;/li&gt;
&lt;/ul&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;n&quot;&gt;database&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cpp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;24&lt;/span&gt;           &lt;span class=&quot;k&quot;&gt;auto&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;systemMemSize&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;25&lt;/span&gt;               &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint64_t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sysconf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_SC_PHYS_PAGES&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint64_t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sysconf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_SC_PAGESIZE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lldb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;systemMemSize&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;unsigned&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;34359738368&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;_SC_PHYS_PAGES : the number of pages of physical memory
_SC_PAGESIZE : size of a page in bytes
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;bufferPoolSize&lt;/strong&gt;: defined by the system memory or UINTPTR_MAX x default
pages buffer ratio. UINTPTR_MAX is the larges value uintptr_t can hold. StorageConfig
is located at include/common/configs.h and contains the struct with many default values used by the application.&lt;/li&gt;
&lt;/ul&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;26&lt;/span&gt;           &lt;span class=&quot;n&quot;&gt;bufferPoolSize&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint64_t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StorageConfig&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DEFAULT_BUFFER_POOL_RATIO&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;27&lt;/span&gt;                                       &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;double_t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;min&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;systemMemSize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint64_t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;UINTPTR_MAX&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;defaultPageBufferPoolSize and largePageBufferPoolSize&lt;/strong&gt;: the bufferPoolSize
multiplied by the ratio defined for default pages and large pages.&lt;/li&gt;
&lt;/ul&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;   &lt;span class=&quot;mi&quot;&gt;29&lt;/span&gt;       &lt;span class=&quot;n&quot;&gt;defaultPageBufferPoolSize&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt;           &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint64_t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;double_t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bufferPoolSize&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StorageConfig&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DEFAULT_PAGES_BUFFER_RATIO&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;31&lt;/span&gt;       &lt;span class=&quot;n&quot;&gt;largePageBufferPoolSize&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;           &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint64_t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;double_t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bufferPoolSize&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StorageConfig&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;LARGE_PAGES_BUFFER_RATIO&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;n&quot;&gt;include&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;common&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;configs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;h&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;StorageConfig&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// The default ratio of system memory allocated to buffer pools (including default and large).&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;constexpr&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DEFAULT_BUFFER_POOL_RATIO&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// The default ratio of buffer allocated to default and large pages.&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;constexpr&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DEFAULT_PAGES_BUFFER_RATIO&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.75&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;constexpr&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LARGE_PAGES_BUFFER_RATIO&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DEFAULT_PAGES_BUFFER_RATIO&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;omitted&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;

&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lldb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;largePageBufferPoolSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1024&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1024&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1024&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;unsigned&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;28&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lldb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;defaultPageBufferPoolSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1024&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1024&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1024&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;unsigned&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;29&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;19&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;maxNumThreads&lt;/strong&gt;: the number of concurrent threads supported by the available
hardware. This number is only a hint and might not be accurate.&lt;/li&gt;
&lt;/ul&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lldb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;maxNumThreads&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint64_t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;12&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h3 id=&quot;embedded-shell&quot;&gt;Embedded Shell&lt;/h3&gt;

&lt;p&gt;Initialize an instance of EmbddedShell (tools/shell/embedded_shell.cpp):&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;n&quot;&gt;tools&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shell&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shell_runner&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cpp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;33&lt;/span&gt;           &lt;span class=&quot;k&quot;&gt;auto&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;shell&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EmbeddedShell&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;databasePath&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;systemConfig&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;tools/shell/embedded_shell.cpp:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;   &lt;span class=&quot;mi&quot;&gt;201&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;EmbeddedShell&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;EmbeddedShell&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;databasePath&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SystemConfig&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;systemConfig&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;202&lt;/span&gt;      &lt;span class=&quot;n&quot;&gt;linenoiseHistoryLoad&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;HISTORY_PATH&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;203&lt;/span&gt;      &lt;span class=&quot;n&quot;&gt;linenoiseSetCompletionCallback&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;completion&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;204&lt;/span&gt;      &lt;span class=&quot;n&quot;&gt;linenoiseSetHighlightCallback&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;highlight&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;205&lt;/span&gt;      &lt;span class=&quot;n&quot;&gt;database&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;make_unique&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Database&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;databasePath&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;systemConfig&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;206&lt;/span&gt;      &lt;span class=&quot;n&quot;&gt;conn&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;make_unique&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Connection&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;database&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;());&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;207&lt;/span&gt;      &lt;span class=&quot;n&quot;&gt;updateTableNames&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;208&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Initialize the embedded shell using the databasePath from the parameter and
also the systemConfig previously defined:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lldb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;systemConfig&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kuzu&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SystemConfig&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;31&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;defaultPageBufferPoolSize&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;20615843020&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;largePageBufferPoolSize&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;6871947673&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;maxNumThreads&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;12&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/antirez/linenoise&quot;&gt;linenoise&lt;/a&gt; is a lightweight library for
editing line, providing useful functionalities such as single and multi line
editing mode, history handling, completion, hints as you type, among others.
It is used in Redis, MongoDB and Android. The library is embedded in the
codebase (tools/shell/linenoise.cpp). I won’t get into the details of
linenoise configuration.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;205&lt;/span&gt;      &lt;span class=&quot;n&quot;&gt;database&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;make_unique&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Database&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;databasePath&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;systemConfig&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;206&lt;/span&gt;      &lt;span class=&quot;n&quot;&gt;conn&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;make_unique&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Connection&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;database&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;());&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;database and conn are both defined in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;embedded_shell.h&lt;/code&gt;:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;nl&quot;&gt;private:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unique_ptr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Database&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;database&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unique_ptr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Connection&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;conn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Line 205 and 206 define the database and get the current connection,
respectively. Before getting into connection in the next section, I’ll take a
look at the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;updateTableNames()&lt;/code&gt;, since now we are dealing with catalogue to read
the database schema.&lt;/p&gt;

&lt;h3 id=&quot;updatetablenames&quot;&gt;updateTableNames()&lt;/h3&gt;

&lt;p&gt;There are two type of tables: node and relations. updateTableNames will store
the table names for both by fetching from database-&amp;gt;catalog. In my database, I
have “person” and “animal” node tables and “hasOwner” and “knows” relations
tables:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;n&quot;&gt;tools&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shell&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;embedded_shell&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cpp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;

   &lt;span class=&quot;mi&quot;&gt;67&lt;/span&gt;   &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EmbeddedShell&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;updateTableNames&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;68&lt;/span&gt;       &lt;span class=&quot;n&quot;&gt;nodeTableNames&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;clear&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;69&lt;/span&gt;       &lt;span class=&quot;n&quot;&gt;relTableNames&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;clear&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;70&lt;/span&gt;       &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;auto&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tableSchema&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;database&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;catalog&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getReadOnlyVersion&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getNodeTableSchemas&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;71&lt;/span&gt;           &lt;span class=&quot;n&quot;&gt;nodeTableNames&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;push_back&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tableSchema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;second&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tableName&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;72&lt;/span&gt;       &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;73&lt;/span&gt;       &lt;span class=&quot;nf&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;auto&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tableSchema&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;database&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;catalog&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getReadOnlyVersion&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getRelTableSchemas&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;74&lt;/span&gt;           &lt;span class=&quot;n&quot;&gt;relTableNames&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;push_back&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tableSchema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;second&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tableName&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;75&lt;/span&gt;       &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;76&lt;/span&gt;   &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;lldb output:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lldb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nodeTableNames&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;basic_string&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;char_traits&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;allocator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;allocator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;basic_string&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;char_traits&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;allocator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;41&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;person&quot;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;animal&quot;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lldb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;relTableNames&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;basic_string&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;char_traits&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;allocator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;allocator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;basic_string&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;char_traits&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;allocator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;42&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;hasOwner&quot;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;knows&quot;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h3 id=&quot;connection-srcmainconnectioncpp&quot;&gt;Connection (src/main/connection.cpp)&lt;/h3&gt;

&lt;p&gt;Connection is used to interact with a Database instance, and each Connection is thread-safe.
Multiple connections can connect to the same Database instance in a multi-threaded environment.
The description of the API below was extracted from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;src/include/main/connection.h&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;Creates a connection to the database.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;n&quot;&gt;KUZU_API&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;explicit&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Connection&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Database&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;database&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Destructor&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;n&quot;&gt;KUZU_API&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;~&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Connection&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Manually starts a new read-only transaction in the current connection.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;n&quot;&gt;KUZU_API&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;beginReadOnlyTransaction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Manually starts a new write transaction in the current connection.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;n&quot;&gt;KUZU_API&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;beginWriteTransaction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Manually commits the current transaction.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;n&quot;&gt;KUZU_API&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;commit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Manually rollbacks the current transaction.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;n&quot;&gt;KUZU_API&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;rollback&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Sets the maximum number of threads to use for execution in the current connection.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;n&quot;&gt;KUZU_API&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;setMaxNumThreadForExec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint64_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;numThreads&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Returns the maximum number of threads to use for execution in the current connection.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;n&quot;&gt;KUZU_API&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;uint64_t&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;getMaxNumThreadForExec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Executes the given query and returns the result.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;n&quot;&gt;KUZU_API&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unique_ptr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;QueryResult&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Prepares the given query and returns the prepared statement.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;n&quot;&gt;KUZU_API&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unique_ptr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PreparedStatement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prepare&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Executes the given prepared statement with args and returns the result.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;n&quot;&gt;KUZU_API&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;template&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;typename&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Args&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;&amp;gt;&lt;/span&gt;
&lt;span class=&quot;kr&quot;&gt;inline&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unique_ptr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;QueryResult&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;PreparedStatement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;preparedStatement&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pair&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Args&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unordered_map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shared_ptr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;common&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inputParameters&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;executeWithParams&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;preparedStatement&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inputParameters&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;...);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Executes the given prepared statement with inputParams and returns the result.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;n&quot;&gt;KUZU_API&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unique_ptr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;QueryResult&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;executeWithParams&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PreparedStatement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;preparedStatement&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unordered_map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shared_ptr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;common&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inputParams&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Return all node table names in string format.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;n&quot;&gt;KUZU_API&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;string&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;getNodeTableNames&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Return all rel table names in string format.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;n&quot;&gt;KUZU_API&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;string&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;getRelTableNames&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Return the node property names.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;n&quot;&gt;KUZU_API&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;string&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;getNodePropertyNames&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tableName&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Return the relation property names.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;n&quot;&gt;KUZU_API&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;string&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;getRelPropertyNames&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;relTableName&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;If you wondering what is behind KUZU_API, the datatype is defined in src/include/common/types/types.h:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;n&quot;&gt;KUZU_API&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;enum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypeID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;uint8_t&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ANY&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;NODE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;REL&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;11&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;// physical types&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;// fixed size types&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;BOOL&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;22&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;INT64&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;23&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;DOUBLE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;24&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;DATE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;25&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;TIMESTAMP&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;26&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;27&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;INTERNAL_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;40&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;// variable size types&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;LIST&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;52&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h2 id=&quot;starting-from-c-api&quot;&gt;Starting from C++ API&lt;/h2&gt;

&lt;p&gt;I will now explore &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;COPY&lt;/code&gt; command from the C++ API by using the existing
example from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;examples/cpp/main.cpp&lt;/code&gt;. To compile, you just have to add
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;add_subdirectory(examples/cpp)&lt;/code&gt; inside &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CMakeLists.txt&lt;/code&gt; and run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;make test&lt;/code&gt;
or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;make debug&lt;/code&gt;. The example will be compiled and available at
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;build/debug/examples/cpp&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;build/release/examples/cpp&lt;/code&gt; depending on the
make parameter used to compile.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;n&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cpp&lt;/span&gt;

&lt;span class=&quot;cp&quot;&gt;#include&lt;/span&gt; &lt;span class=&quot;cpf&quot;&gt;&amp;lt;iostream&amp;gt;&lt;/span&gt;&lt;span class=&quot;cp&quot;&gt;
&lt;/span&gt;
&lt;span class=&quot;cp&quot;&gt;#include&lt;/span&gt; &lt;span class=&quot;cpf&quot;&gt;&quot;main/kuzu.h&quot;&lt;/span&gt;&lt;span class=&quot;cp&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;namespace&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kuzu&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;auto&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;database&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;make_unique&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Database&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/tmp/db&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;auto&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;connection&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;make_unique&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Connection&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;database&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;());&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;connection&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;CREATE NODE TABLE tableOfTypes (id INT64, int64Column INT64, doubleColumn DOUBLE, booleanColumn BOOLEAN, dateColumn DATE, timestampColumn TIMESTAMP, stringColumn STRING, PRIMARY KEY (id));&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;connection&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;COPY tableOfTypes FROM &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;/Users/rfdavid/Devel/waterloo/kuzu/dataset/copy-test/node/csv/types_50k.csv&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt; (HEADER=true);&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;This example created a node table named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tableOfTypes&lt;/code&gt; (from copy-test schema)
and use the command COPY to import 50k rows from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;types_50k.csv&lt;/code&gt; file.&lt;/p&gt;

&lt;p&gt;I will start debugging by adding a breakpoint before the COPY command:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lldb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cpp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;11&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Breakpoint&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;where&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;example&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cpp&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;main&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;136&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;at&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cpp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;11&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;address&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mh&quot;&gt;0x0000000100003c80&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lldb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Process&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;59055&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;launched&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Users&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rfdavid&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Devel&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;waterloo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;kuzu&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;debug&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;examples&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cpp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;example&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cpp&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;&apos;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arm64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Process&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;59055&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stopped&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;kr&quot;&gt;thread&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;#&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;queue&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;com&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;apple&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;kr&quot;&gt;thread&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stop&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;reason&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;breakpoint&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.1&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;frame&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;#&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mh&quot;&gt;0x0000000100003c80&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;example&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cpp&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;main&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;at&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cpp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;11&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;   	&lt;span class=&quot;err&quot;&gt;#&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;include&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;iostream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;   	&lt;span class=&quot;err&quot;&gt;#&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;include&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;main/kuzu.h&quot;&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;   	&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;namespace&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kuzu&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;   	&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;   	    &lt;span class=&quot;k&quot;&gt;auto&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;database&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;make_unique&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Database&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/tmp/db&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;   	    &lt;span class=&quot;k&quot;&gt;auto&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;connection&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;make_unique&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Connection&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;database&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;());&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;  	    &lt;span class=&quot;n&quot;&gt;connection&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;CREATE NODE TABLE tableOfTypes (id INT64, int64Column INT64, doubleColumn DOUBLE, booleanColumn BOOLEAN, dateColumn DATE, timestampColumn TIMESTAMP, stringColumn STRING, PRIMARY KEY (id));&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;11&lt;/span&gt;  	    &lt;span class=&quot;n&quot;&gt;connection&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;COPY tableOfTypes FROM &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;/Users/rfdavid/Devel/waterloo/kuzu/dataset/copy-test/node/csv/types_50k.csv&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt; (HEADER=true);&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;12&lt;/span&gt;  	&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Target&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;example&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cpp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stopped&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Inside &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Connection::query&lt;/code&gt;, a mutex lock is set, a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;preparedStatement&lt;/code&gt; will be
created and executed through &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;executeAndAutoCommitIfNecessaryNoLock&lt;/code&gt;.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c--&quot; data-lang=&quot;c++&quot;&gt;   &lt;span class=&quot;mi&quot;&gt;76&lt;/span&gt;  	&lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unique_ptr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;QueryResult&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Connection&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;77&lt;/span&gt;  	    &lt;span class=&quot;n&quot;&gt;lock_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lck&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mtx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;78&lt;/span&gt;  	    &lt;span class=&quot;k&quot;&gt;auto&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;preparedStatement&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prepareNoLock&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;79&lt;/span&gt;  	    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;executeAndAutoCommitIfNecessaryNoLock&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;preparedStatement&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;());&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;80&lt;/span&gt;  	&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;A prepared statement is a parameterized query used to avoid repeated execution
of the same query. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;prepareNoLock&lt;/code&gt; will go through the following steps:
parsing, binding, planning and optmizing and then return a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PreparedStatement&lt;/code&gt;
object to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Connection::query&lt;/code&gt;.&lt;/p&gt;</content><author><name>Rui F. David</name></author><category term="software" /><category term="engineering" /><summary type="html">Introduction</summary></entry><entry><title type="html">Influence Functions in Machine Learning</title><link href="/influence-functions/" rel="alternate" type="text/html" title="Influence Functions in Machine Learning" /><published>2022-08-31T13:00:00+00:00</published><updated>2022-08-31T13:00:00+00:00</updated><id>/influence-functions</id><content type="html" xml:base="/influence-functions/">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;With the increasing complexity of machine learning models, the generated
predictions are not easily interpretable by humans and are usually treated as black-box
models. To address this issue, a rising field of explainability try to understand 
why those models make certain predictions. In recent years, the work by &lt;a class=&quot;citation&quot; href=&quot;#pmlr-v70-koh17a&quot;&gt;[1]&lt;/a&gt; has attracted a lot of attention in many fields,
using the idea of influence functions &lt;a class=&quot;citation&quot; href=&quot;#10.2307/2285666&quot;&gt;[2]&lt;/a&gt; to identify
the most responsible training points for a given prediction.&lt;/p&gt;

&lt;h2 id=&quot;robust-statistics&quot;&gt;Robust Statistics&lt;/h2&gt;

&lt;p&gt;Statistical methods rely explicitly or implicitly on assumptions based on
the data analysis and the problem stated. The assumption usually concerns the
probability distribution of the dataset. The most widely used framework makes
the assumption that the observed data have a normal (Gaussian) distribution, 
and this &lt;em&gt;classical&lt;/em&gt; statistical method has been used for regression, analysis of
variance and multivariate analysis.  However, real-life data is noisy and contain 
atypical observations, called outliers. Those observations deviate from the
general pattern of data, and classical estimates such as sample mean and sample
variance can be highly adversely influenced. This can result in a bad fit of data.
Robust statistics provide measures of robustness to provide a good fit for data 
containing outliers &lt;a class=&quot;citation&quot; href=&quot;#maronna2006robust&quot;&gt;[3]&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;influence-functions&quot;&gt;Influence Functions&lt;/h3&gt;

&lt;p&gt;The Influence Functions (IF) was first introduced in “The Influence Curve and Its Role in
Robust Estimation” &lt;a class=&quot;citation&quot; href=&quot;#10.2307/2285666&quot;&gt;[2]&lt;/a&gt;, and measures the impact of an infinitesimal perturbation on
an estimator. The very interesting work by &lt;a class=&quot;citation&quot; href=&quot;#pmlr-v70-koh17a&quot;&gt;[1]&lt;/a&gt; brought
this methodology into machine learning.&lt;/p&gt;

&lt;h3 id=&quot;influence-functions-in-machine-learning&quot;&gt;Influence Functions in Machine Learning&lt;/h3&gt;

&lt;p&gt;Consider an image classification task where the goal is to predict the label for
a given image. We want to measure the impact of a particular training image on
a testing image. A naive approach is to remove the image and retrain the model.
However, this approach is prohibitively expensive. To overcome this problem, influence
function upweight that particular point by an infinitesimal amount and measure
the impact in the loss function without having to train the model.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/upweight-a-training-point.jpg&quot; alt=&quot;medium&quot; title=&quot;Upweighting a training point&quot; /&gt;
&lt;em&gt;Figure 1: The fish image is upweighted by an infinitesimal amount so the model
try harder to fit that particular sample. Image by the author.&lt;/em&gt;&lt;/p&gt;

&lt;h3 id=&quot;change-in-parameters&quot;&gt;Change in Parameters&lt;/h3&gt;

&lt;p&gt;The empirical risk minimizer to solve an optimization problem can be defined as
the following:&lt;/p&gt;

\[\begin{equation}
  \hat\theta = arg \; \underset{\theta}{min} \frac{1}{n} \sum_{i=1}^{n} \mathcal{L}(z_i, \theta)
\end{equation}\]

&lt;p&gt;Where \(z_i\) is each training point from a training sample.  First, we need to understand how 
the parameters \(\hat\theta\) change after perturbing a particular training point \(z\) by an infinitesimal 
amount \(\epsilon\), defined by \(\theta - \hat\theta\) where \(\theta\) is the original parameters
for the full training data and \(\hat\theta\) is the new set of parameters after upweighting:&lt;/p&gt;

\[\begin{equation}
  \hat\theta_{\epsilon,z} = arg \; \underset{\theta}{min} \frac{1}{n}\sum_{i=1}^{n}\mathcal{L}(z_i,\theta) + \epsilon \mathcal{L}(z,\theta)
\end{equation}\]

&lt;p&gt;As we want to measure the rate of change of the parameters after perturbing the
point, the derivation made by &lt;a class=&quot;citation&quot; href=&quot;#cook1982influence&quot;&gt;[4]&lt;/a&gt; yields the following:&lt;/p&gt;

\[\begin{equation}
  I(z) = \frac{d\hat\theta_{\epsilon,z}}{d\epsilon} \bigg|_{\epsilon=0} = -H_{\hat\theta}^{-1}\nabla_{\theta} \mathcal{L}(z,\hat\theta)
\end{equation}\]

&lt;p&gt;Where \(H_{\hat\theta}\) is the Hessian matrix and assumed to be positive
definite (symmetric with all positive eigenvalues), which can be calculated by
\(\frac{1}{n}\sum_{i=1}^n \nabla_{\theta}^2 \mathcal{L}(z_i,\hat\theta)\).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The equation \(3\) gives the influence of a single training
point z on the parameters \(\theta\).&lt;/strong&gt; When multiplying \(-\frac{1}{n} I(z)\) 
the result is similar as removing \(z\) and re-training the model.&lt;/p&gt;

&lt;h3 id=&quot;change-in-the-loss-function&quot;&gt;Change in the Loss Function&lt;/h3&gt;

&lt;p&gt;As we want to measure the change in the loss function for a particular testing
point, applying chain rule gives the following equation:&lt;/p&gt;

\[\begin{equation}
  I(z, z_{test}) =  \frac{d L(z_{test},\hat\theta_{\epsilon, z})}{d\epsilon} \bigg|_{\epsilon=0} = -\nabla_\theta \mathcal{L}(z_{test},\hat\theta)^T H_{\hat\theta}^{-1} \nabla_\theta \mathcal{L}(z,\hat\theta)
\end{equation}\]

&lt;p&gt;\(\frac{1}{n} I(z, z_{test})\) approximately measures &lt;strong&gt;the impact of \(z\) on \(z_{test}\)&lt;/strong&gt;.
This is based on the assumption that the underlying loss function is strictly &lt;label class=&quot;tooltip&quot;&gt;convex&lt;input type=&quot;checkbox&quot; /&gt;&lt;span&gt;a continuous function whose value at the midpoint of every interval in its domain does not exceed the arithmetic mean of its values at the ends of the interval. Usually, a loss function is considered to be convex.&lt;/span&gt;&lt;/label&gt; in 
the parameters \(\theta\). Some loss functions are not differentiable 
(&lt;label class=&quot;tooltip&quot;&gt;hinge loss&lt;input type=&quot;checkbox&quot; /&gt;&lt;span&gt;&lt;/span&gt;&lt;/label&gt;), so in this case, one of the contributions of 
Koh’s work is to approximate to a differentiable region right at the margin.&lt;/p&gt;

&lt;h2 id=&quot;influence-functions-on-groups&quot;&gt;Influence Functions on Groups&lt;/h2&gt;

&lt;p&gt;As previously seen, the influence functions measure the impact of a training point 
in a single testing point.  They are based on first-order 
&lt;label class=&quot;tooltip&quot;&gt;Taylor approximation&lt;input type=&quot;checkbox&quot; /&gt;&lt;span&gt;a function becomes “better” as n increases in the Taylor series.&lt;/span&gt;&lt;/label&gt;, which is fairly accurate
for small changes. In order to study the effect of a large group of training
points, &lt;a class=&quot;citation&quot; href=&quot;#NEURIPS2019_a78482ce&quot;&gt;[5]&lt;/a&gt; analyze this phenomenon where
influence functions can be used for some particular cases. It can be written as
the sum of the influences of individual points in a group:&lt;/p&gt;

\[\sum_{i=1}^n I(z_i, z_{test}) = -\nabla_\theta \mathcal{L}(z_{test},\hat\theta)^T H_{\hat\theta}^{-1} \sum_{i=1}^n \nabla_\theta \mathcal{L}(z,\hat\theta)\]

&lt;p&gt;Given a group \(\mathcal{U}\) and \(I(\mathcal{U})^{(1)}\) the first-order group
influence, &lt;a class=&quot;citation&quot; href=&quot;#pmlr-v119-basu20b&quot;&gt;[6]&lt;/a&gt; proposes second-order group influence
function to capture informative cross-dependencies among samples:&lt;/p&gt;

\[I(\mathcal{U})^{2} =  I(\mathcal{U})^{(1)} + I(\mathcal{U})^{&apos;}\]

&lt;p&gt;Hence, first-order group influence function \(I(\mathcal{U})^{(1)}\) can be
defined as:&lt;/p&gt;

\[I(\mathcal{U})^{(1)} = \frac{\partial \theta_{\mathcal{U}}^{\epsilon}}{\partial \epsilon} \bigg|_{\epsilon=0}\]

&lt;p&gt;And the second-order group influence \(I(\mathcal{U})^{&apos;}\) as:&lt;/p&gt;

\[I(\mathcal{U})^{(1)} = \frac{\partial^2 \theta_{\mathcal{U}}^{\epsilon}}{\partial \epsilon^2} \bigg|_{\epsilon=0}\]

&lt;p&gt;This technique was empirically proven that can be used to improve the selection
of the most influential group for a test sample across different group sizes
and types. The idea is to capture more information when the changes to the
underlying model are relatively large.&lt;/p&gt;

&lt;h2 id=&quot;the-calculation-bottleneck&quot;&gt;The Calculation Bottleneck&lt;/h2&gt;

&lt;p&gt;Computing the inverse hessian is quite expensive and infeasible for a network with 
lots of parameters. In numpy, it can be calculated using  &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;numpy.linalg.inv&lt;/code&gt;.
As a side note, numpy is mostly written in c and the high-level functions are
python bindings. Nevertheless, it is still an expensive function. In 
PyTorch framework, you can compute the Hessians using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;torch.autograd.functional.hessian&lt;/code&gt; 
and then inversing it with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;torch.linalg.inv&lt;/code&gt;. I’m going to expand a little bit
here using examples because this is a bit tricky. The module &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nn.torch&lt;/code&gt;
contains different classes that provides useful methods for models that inherit
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nn.Module&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;funcional&lt;/em&gt; modules
takes NN modules and turn them in purely functional stateless so you can explicitely pass
parameters to a function.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;torch.autograd.functional&lt;/code&gt; requires to pass the paramenter to a
function (see the long discussion &lt;a href=&quot;https://github.com/pytorch/pytorch/issues/49171&quot;&gt;here&lt;/a&gt;).&lt;/p&gt;

&lt;h3 id=&quot;conjugate-gradients&quot;&gt;Conjugate Gradients&lt;/h3&gt;

&lt;p&gt;Conjugate gradient &lt;a class=&quot;citation&quot; href=&quot;#Shewchuk94&quot;&gt;[7]&lt;/a&gt; is an iterative method for solving large systems of linear
equations, and it is effective to solve systems in the form of \(Ax = b\).
In &lt;a class=&quot;citation&quot; href=&quot;#10.5555/3104322.3104416&quot;&gt;[8]&lt;/a&gt;, the hessian is calculated by
approximation using second-order optimization technique. This method does not
invert the hessian directly but calculate the inverse hessian product:&lt;/p&gt;

\[H^{-1} v = arg min_{t}(t^T Ht - v^Tt)\]

&lt;h3 id=&quot;linear-time-stochastic-second-order-algorithm-lissa&quot;&gt;Linear Time Stochastic Second-Order Algorithm (LiSSA)&lt;/h3&gt;

&lt;p&gt;The main idea of LiSSA &lt;a class=&quot;citation&quot; href=&quot;#JMLR:v18:16-491&quot;&gt;[9]&lt;/a&gt; is to use Taylor expansion (&lt;a href=&quot;https://en.wikipedia.org/wiki/Neumann_series&quot;&gt;Neumann series&lt;/a&gt;) to 
construct a natural estimator of the inverse Hessian:&lt;/p&gt;

\[H^{-1} = \sum^{\infty}_{i=0} (I - H)^i\]

&lt;p&gt;Rewriting this equation recursively, as \(\lim_{j \to \infty} H_{j}^{-1} = H^{-1}\), we have the following:&lt;/p&gt;

\[H_{j}^{-1} = \sum^{j}_{i=0} (I - H)^i = I + (I - H) H^{-1}_{j-1}\]

&lt;h3 id=&quot;fastif&quot;&gt;FastIF&lt;/h3&gt;

&lt;p&gt;In order to improve the scalability and computational cost, FastIF &lt;a class=&quot;citation&quot; href=&quot;#guo-etal-2021-fastif&quot;&gt;[10]&lt;/a&gt; present a set of modifications to improve the runtime. 
The work uses k-neareast neighbours to narrow the search space down, 
which can be inexpensive for this context since i k-nn is a &lt;label class=&quot;tooltip&quot;&gt;lazy learner&lt;input type=&quot;checkbox&quot; /&gt;&lt;span&gt;it doesn’t learn a discriminative function from the training data, but only store the dataset.&lt;/span&gt;&lt;/label&gt;) algorithm.&lt;/p&gt;

&lt;h2 id=&quot;the-problem-with-influence-functions&quot;&gt;The Problem with Influence Functions&lt;/h2&gt;

&lt;p&gt;Influence functions are an approximation and do not always produce correct
values. In some particular settings, influence functions can have a significant loss in
information quality. It is known to work with convex loss functions, but for
non-convex setups, the estimations can not work as expected. The work
‘Influence Functions in Deep Learning are Fragile’ &lt;a class=&quot;citation&quot; href=&quot;#basu2021influence&quot;&gt;[11]&lt;/a&gt; examines the conditions where influence estimation can be applied to deep
networks through vast experimentation. In short, there are a few obstacles:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The estimation in deeper architectures is erroneous, possibly due to poor
inverse hessian estimation. Weight-decay regularization can help.&lt;/li&gt;
  &lt;li&gt;Wide networks perform poorly. When increasing the width of a network, the
correlation between the true difference in the loss and the influence
function decreases substantially.&lt;/li&gt;
  &lt;li&gt;Scale influence functions is challenging. ImageNet contains 1.2 million
images in the training set, being difficult to evaluate if influence
functions are effective since it is computationally prohibitive to re-train the 
model multiple times, leaving each training point out of the training.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;libraries&quot;&gt;Libraries&lt;/h2&gt;

&lt;p&gt;There are several implementations available in Python with PyTorch and
TensorFlow. A few others are built on R and Matlab.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/kohpangwei/influence-release&quot;&gt;Influence Functions&lt;/a&gt;&lt;br /&gt;
The official version of &lt;a class=&quot;citation&quot; href=&quot;#pmlr-v70-koh17a&quot;&gt;[1]&lt;/a&gt; built on TensorFlow.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/nimarb/pytorch_influence_functions&quot;&gt;Influence Functions for PyTorch&lt;/a&gt;&lt;br /&gt;
PyTorch implementation. It uses stochastic estimation to calculate the
influence.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/alstonlo/torch-influence&quot;&gt;Torch Influence&lt;/a&gt;&lt;br /&gt;
A recent implementation (Jul/2022) of influence functions on PyTorch, providing
three different ways to calculate the inverse hessian: direct computation and
inversion with torch.autograd, truncated conjugate gradients and LiSSA.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/salesforce/fast-influence-functions&quot;&gt;Fast Influence Functions&lt;/a&gt;&lt;br /&gt;
A modified influence function computation using k-Nearest Neighbors (kNN),
implemented in PyTorch.&lt;/p&gt;

&lt;h3 id=&quot;other-implementations&quot;&gt;Other implementations&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/nayopu/influence_function_with_lissa&quot;&gt;Influence Function with LiSSA&lt;/a&gt;&lt;br /&gt;
A simple implementation with LiSSA on TensorFlow.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/jrepifano/influence-pytorch&quot;&gt;Influence Pytorch&lt;/a&gt;
One-file code with the implementation for a random classification problem.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/zedyang/46927-Project&quot;&gt;IF notebook&lt;/a&gt;&lt;br /&gt;
Python notebook with IF applied to other algorithms (Trees, &lt;label class=&quot;tooltip&quot;&gt;Ridge Regression&lt;input type=&quot;checkbox&quot; /&gt;&lt;span&gt;Method to estimate the coefficients of multiple regression models where the independent variables are highly correlated.&lt;/span&gt;&lt;/label&gt;).&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/Benqalu/influence-functions-pytorch&quot;&gt;Influence Functions Pytorch&lt;/a&gt;&lt;br /&gt;
Another implementation of influence functions.&lt;/p&gt;

&lt;h2 id=&quot;applications&quot;&gt;Applications&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Explainability:&lt;/strong&gt; This is the most common use we explored so far, measuring
the impact of a training point to explain the impact in a given testing point.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Adversarial Attacks:&lt;/strong&gt; Real-world data is noisy, and it can be problematic for machine learning.
Adversarial machine learning methods are methods used to feed a model with
deceptive input, changing the predictions of a classifier. Influence functions
can help by identifying how to modify a training point to increase the
loss in a target point.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Label mismatch:&lt;/strong&gt; Toy datasets are pretty good for experimentation, but
real data might contain many mislabeled examples. The idea is to calculate
the influence of a particular training point \(I(z_{i}, z_{i})\) if that point was removed. 
Email spam is a good example since it usually uses the user’s input in
classifying whether an email is spam or not.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;The very interesting work from &lt;a class=&quot;citation&quot; href=&quot;#pmlr-v70-koh17a&quot;&gt;[1]&lt;/a&gt; brought influence
functions to the context of machine learning. In principle, this technique was
introduced more than 40 years ago by &lt;a class=&quot;citation&quot; href=&quot;#10.2307/2285666&quot;&gt;[2]&lt;/a&gt;. 
One of the main contributions is how to apply to non-differentiable loss functions (i.e.
hinge loss). In addition to that, the paper uses other existing ideas to
overcome the computation issue, such as conjugate gradients and LiSSA
algorithm. Subsequent work studied influence functions on groups &lt;a class=&quot;citation&quot; href=&quot;#NEURIPS2019_a78482ce&quot;&gt;[5]&lt;/a&gt;,
&lt;a class=&quot;citation&quot; href=&quot;#pmlr-v119-basu20b&quot;&gt;[6]&lt;/a&gt;. The last used second-order influence
functions to capture hidden information when the group size is relatively large.
I believe this is a powerful technique that will continue to derive new ideas in
many different areas. One example is in pruning, where a single-shot pruning
technique was based on sensitivity connections &lt;a class=&quot;citation&quot; href=&quot;#lee2018snip&quot;&gt;[12]&lt;/a&gt;, exploring
the idea of perturbing weights in a network. Another idea is in the area of
graphs, a popular framework JK Networks &lt;a class=&quot;citation&quot; href=&quot;#JKNets&quot;&gt;[13]&lt;/a&gt; uses perturbation
analysis to measure what is the impact of a change in one node embedding in
another node embedding.&lt;/p&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;ol class=&quot;bibliography&quot;&gt;&lt;li&gt;&lt;span id=&quot;pmlr-v70-koh17a&quot;&gt;[1]P. W. Koh and P. Liang, “Understanding Black-box Predictions via Influence Functions,” in &lt;i&gt;Proceedings of the 34th International Conference on Machine Learning&lt;/i&gt;, 2017, vol. 70, pp. 1885–1894.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span id=&quot;10.2307/2285666&quot;&gt;[2]F. R. Hampel, “The Influence Curve and Its Role in Robust Estimation,” &lt;i&gt;Journal of the American Statistical Association&lt;/i&gt;, vol. 69, no. 346, pp. 383–393, 1974, Accessed: Jul. 27, 2022. [Online].&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span id=&quot;maronna2006robust&quot;&gt;[3]R. A. Maronna, D. R. Martin, and V. J. Yohai, &lt;i&gt;Robust Statistics: Theory and Methods&lt;/i&gt;. Wiley, 2006.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span id=&quot;cook1982influence&quot;&gt;[4]R. D. Cook and S. Weisberg, &lt;i&gt;Residuals and Influence in Regression &lt;/i&gt;. New York: Chapman and Hall, 1982.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span id=&quot;NEURIPS2019_a78482ce&quot;&gt;[5]P. W. W. Koh, K.-S. Ang, H. Teo, and P. S. Liang, “On the Accuracy of Influence Functions for Measuring Group Effects,” in &lt;i&gt;Advances in Neural Information Processing Systems&lt;/i&gt;, 2019, vol. 32.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span id=&quot;pmlr-v119-basu20b&quot;&gt;[6]S. Basu, X. You, and S. Feizi, “On Second-Order Group Influence Functions for Black-Box Predictions,” in &lt;i&gt;Proceedings of the 37th International Conference on Machine Learning&lt;/i&gt;, 2020, vol. 119, pp. 715–724.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span id=&quot;Shewchuk94&quot;&gt;[7]J. R. Shewchuk, “An Introduction to the Conjugate Gradient Method Without the Agonizing Pain,” Aug. 1994.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span id=&quot;10.5555/3104322.3104416&quot;&gt;[8]J. Martens, “Deep Learning via Hessian-Free Optimization,” in &lt;i&gt;Proceedings of the 27th International Conference on International Conference on Machine Learning&lt;/i&gt;, Madison, WI, USA, 2010, pp. 735–742.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span id=&quot;JMLR:v18:16-491&quot;&gt;[9]N. Agarwal, B. Bullins, and E. Hazan, “Second-Order Stochastic Optimization for Machine Learning in Linear Time,” &lt;i&gt;Journal of Machine Learning Research&lt;/i&gt;, vol. 18, no. 116, pp. 1–40, 2017.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span id=&quot;guo-etal-2021-fastif&quot;&gt;[10]H. Guo, N. Rajani, P. Hase, M. Bansal, and C. Xiong, “FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging,” in &lt;i&gt;Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing&lt;/i&gt;, Online and Punta Cana, Dominican Republic, Nov. 2021, pp. 10333–10350.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span id=&quot;basu2021influence&quot;&gt;[11]S. Basu, P. Pope, and S. Feizi, “Influence Functions in Deep Learning Are Fragile,” 2021.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span id=&quot;lee2018snip&quot;&gt;[12]N. Lee, T. Ajanthan, and P. Torr, “SNIP: SINGLE-SHOT NETWORK PRUNING BASED ON CONNECTION SENSITIVITY,” 2019.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span id=&quot;JKNets&quot;&gt;[13]K. Xu, C. Li, Y. Tian, T. Sonobe, K.-ichi Kawarabayashi, and S. Jegelka, “Representation Learning on Graphs with Jumping Knowledge Networks,” in &lt;i&gt;ICML&lt;/i&gt;, 2018, pp. 5449–5458.&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;</content><author><name>Rui F. David</name></author><category term="paper" /><summary type="html">Introduction</summary></entry><entry><title type="html">Paper review - Design Space for Graph Neural Networks</title><link href="/design-space-for-gnn/" rel="alternate" type="text/html" title="Paper review - Design Space for Graph Neural Networks" /><published>2021-12-20T15:27:31+00:00</published><updated>2021-12-20T15:27:31+00:00</updated><id>/design-space-for-gnn</id><content type="html" xml:base="/design-space-for-gnn/">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://arxiv.org/pdf/2011.08843.pdf&quot;&gt;Design Space for Graph Neural Networks&lt;/a&gt; &lt;a class=&quot;citation&quot; href=&quot;#you2020design&quot;&gt;[1]&lt;/a&gt;
was published on NeurIPS 2020. The authors are Jiaxuan You, Zhitao Ying and Jure Leskovec
from Stanford. There is also a very good video from the author &lt;a href=&quot;https://www.youtube.com/watch?v=8OhnwzT9ypg&quot;&gt;available on
YouTube&lt;/a&gt;. 
The code is also available on &lt;a href=&quot;https://github.com/snap-stanford/graphgym&quot;&gt;Github&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Instead of evaluating a specific architecture of GNNs such as GCN, GIN or GAT,
the paper explores the design space in a more general way. For example, is
batch normalization helpful in GNNs? This paper answer this question
empirically by performing multiple experiments.&lt;/p&gt;

&lt;p&gt;The paper takes a systematic approach to study a general design space of GNN for
many different tasks, presenting three key innovations:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;General GNN design space&lt;/li&gt;
  &lt;li&gt;GNN task space with a similarity metric&lt;/li&gt;
  &lt;li&gt;Design space evaluation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;general-gnn-design-space&quot;&gt;General GNN design space&lt;/h3&gt;

&lt;p&gt;The design space is based on three configurations: intra-layer design, inter-layer design,
and learning configuration. All combined possibilities result in 314,928
different designs.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/gnn-design-space.png&quot; alt=&quot;medium&quot; title=&quot;GNN design space&quot; /&gt;
&lt;em&gt;Figure 1: General design space divided into intra-layer, inter-layer and
learning configuration. Image extracted from &lt;a class=&quot;citation&quot; href=&quot;#you2020design&quot;&gt;[1]&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Intra-layer&lt;/strong&gt; design follows the sequence of the modules:&lt;/p&gt;

\[h^{k+1}_{v} = AGG\Big(\Big\{ACT\Big(DROPOUT(BN(W^{(k)}*h_u^{(k)} + b^{(k)}))\Big) \Big\}, u \in \mathcal{N}(v)\Big)\]

&lt;p&gt;It uses the following ranges:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Aggregation                  &lt;/th&gt;
      &lt;th&gt;Activation                           &lt;/th&gt;
      &lt;th&gt;Dropout                 &lt;/th&gt;
      &lt;th&gt;Batch Normalization&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Mean, Max, Sum&lt;/td&gt;
      &lt;td&gt;ReLU, PReLU, Swish&lt;/td&gt;
      &lt;td&gt;False, 0.3, 0.6&lt;/td&gt;
      &lt;td&gt;True, False&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inter-layer&lt;/strong&gt; design is the neural network layers:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Layer connectivity                  &lt;/th&gt;
      &lt;th&gt;Pre-process layers       &lt;/th&gt;
      &lt;th&gt;Message passing layers    &lt;/th&gt;
      &lt;th&gt;Post-precess layers&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Stack, Skip-Sum, Skip-Cat&lt;/td&gt;
      &lt;td&gt;1, 2, 3&lt;/td&gt;
      &lt;td&gt;2, 4, 6, 8&lt;/td&gt;
      &lt;td&gt;1, 2, 3&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Training configuration&lt;/strong&gt; is the configuration:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Batch size                  &lt;/th&gt;
      &lt;th&gt;Learning rate                           &lt;/th&gt;
      &lt;th&gt;Optmizer                &lt;/th&gt;
      &lt;th&gt;Training epochs&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;16, 32, 64&lt;/td&gt;
      &lt;td&gt;0.1, 0.01, 0.001&lt;/td&gt;
      &lt;td&gt;SGD, Adam&lt;/td&gt;
      &lt;td&gt;100, 200, 400&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;I believe some of the properties selected above should not be labelled as
architecture (i.e. learning rate, epochs). The &lt;a href=&quot;https://www.youtube.com/watch?v=5ke9ZEvXJEk&quot;&gt;talk by Ameet
Talkwalkar&lt;/a&gt; well address the
difference between hyper-parameter search and neural architecture search. 
Hyperparameter search starts assuming you have a fixed neural network backbone, 
and then there are certain properties that you want to tune.
Some properties are architectural and others non-architectural:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architectural&lt;/strong&gt;: nodes per layer, number of layers, activation function&lt;br /&gt;
&lt;strong&gt;Non-architectural&lt;/strong&gt;: regularization, learning rate, batch size&lt;/p&gt;

&lt;p&gt;In NAS, you ignore the non-architectural parameters, and you also consider layer
operations and networks connections in the architectural setting. 
Hyperparameter is the entire space to build your network, whereas neural architecture search
is limited by a defined design space.&lt;/p&gt;

&lt;h3 id=&quot;gnn-task-space-with-a-similarity-metric&quot;&gt;GNN task space with a similarity metric&lt;/h3&gt;

&lt;p&gt;The paper developed a technique to measure and quantify the GNN task space in 
conjunction with the design space.
This is the most interesting idea from this paper, in my opinion, and could
spawn other promising ideas. 
They collect 32 synthetic and real-world GNN tasks/datasets and use Kendall
rank correlation &lt;a class=&quot;citation&quot; href=&quot;#abdi2007kendall&quot;&gt;[2]&lt;/a&gt; to compare an evaluated task to a
new task. The finding is very interesting: similar tasks perform well using
similar configurations, and the inverse is true. The implication is the
possibility of transferring the configuration from one known task to a new
task/dataset.&lt;/p&gt;

&lt;p&gt;The example below demonstrates two different tasks, A and B. A controlled random
search is applied to find the best design performance for each task. In this
example, task A performed better using sum aggregation function, whereas task B
performed better using max aggregation function. The question is if it’s
possible to use the same configuration to a new similar task based on
similarity.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/task-transfer.png&quot; alt=&quot;medium&quot; title=&quot;Task similarity example&quot; /&gt;
&lt;em&gt;Table 1: Image extracted from &lt;a class=&quot;citation&quot; href=&quot;#you2020design&quot;&gt;[1]&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Once introducing a new target task (ogbg-molhiv in the example), a task similarity 
is calculated. Task A has a correlation of 0.47, and Task B has a negative
correlation of -0.61. When testing both configurations from A and B to the new
task, the performance was significantly better using Task A design which has a
high correlation with the target task.&lt;/p&gt;

&lt;h3 id=&quot;design-space-evaluation&quot;&gt;Design space evaluation&lt;/h3&gt;

&lt;p&gt;The evaluation of design space alongside all the tasks lead to over 10 million
possible combinations. A controlled random search is proposed to explore this
space. It basically randomly sample 96 setups out of the 10M possibilities,
control the configuration to be tested and evaluated. For example, consider
batch normalization as the target study. A sample of 96 different
configurations is randomly sampled among the design space. Batch
normalization is set to True and evaluated. By preserving the other parameters,
batch normalization is set to False and then evaluated again. The results are
ranked by performance to generate a distribution, and the frequency is used to
analyze whether batch normalization is generally helpful or not.&lt;/p&gt;

&lt;h2 id=&quot;experiments-and-results&quot;&gt;Experiments and Results&lt;/h2&gt;

&lt;p&gt;The paper show a nice visualization using violin plot for the experiments.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/design-space-results.png&quot; alt=&quot;&quot; title=&quot;GNN design space results&quot; /&gt;
&lt;em&gt;Figure 3: Boxplot of the results. Image extracted from &lt;a class=&quot;citation&quot; href=&quot;#you2020design&quot;&gt;[1]&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Each plot represents the distribution of the rank. For example, the first graph
is the distribution of the experiments for batch normalization. By evaluating
different architectures randomly, when setting batch normalization to True, it
ranked better (lower is better), indicating that in most cases, the GNN will
perform better when this property is used.
The most expressive configurations found in this paper are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Dropout node feature is not effective.&lt;/li&gt;
  &lt;li&gt;PReLU stands out as the choice of activation.&lt;/li&gt;
  &lt;li&gt;Sum aggregation is the most expressive.&lt;/li&gt;
  &lt;li&gt;There is no definitive conclusion for the number of message passing layers,
pre-processing layers or pos-processing layers.&lt;/li&gt;
  &lt;li&gt;Skip connections are generally favorable.&lt;/li&gt;
  &lt;li&gt;Batch size of 32 is a safer choice, as learning rate of 0.01.&lt;/li&gt;
  &lt;li&gt;ADAM resulted in better performance than SGD.&lt;/li&gt;
  &lt;li&gt;More epochs of training lead to better performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;ol class=&quot;bibliography&quot;&gt;&lt;li&gt;&lt;span id=&quot;you2020design&quot;&gt;[1]J. You, R. Ying, and J. Leskovec, “Design Space for Graph Neural Networks,” 2020.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span id=&quot;abdi2007kendall&quot;&gt;[2]H. Abdi, &lt;i&gt;The kendall rank correlation coefficient&lt;/i&gt;. Encyclopedia of Measurement and Statistics., 2007.&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;</content><author><name>Rui F. David</name></author><category term="paper" /><summary type="html">Introduction</summary></entry><entry><title type="html">What is this about</title><link href="/first-post/" rel="alternate" type="text/html" title="What is this about" /><published>2021-10-16T17:27:31+00:00</published><updated>2021-10-16T17:27:31+00:00</updated><id>/first-post</id><content type="html" xml:base="/first-post/">&lt;p&gt;I have lately dedicated a good amount of my time to input but little to
generating output. Writing about what you learn is an efficient way to ask
yourself if you really know what you are supposed to know. Furthermore, it 
is very challenging to write clearly and concisely, and I hope I can use this 
blog to improve my brain’s model parameters to write better.  I will mainly use 
the posts as annotations, probably editing and adding more information as I learn. 
The main topic is machine learning focused on graphs, where I have been dedicating 
most of my time.
Feel free to contact me at rui.david ontariotechu.net.&lt;/p&gt;</content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html">I have lately dedicated a good amount of my time to input but little to generating output. Writing about what you learn is an efficient way to ask yourself if you really know what you are supposed to know. Furthermore, it is very challenging to write clearly and concisely, and I hope I can use this blog to improve my brain’s model parameters to write better. I will mainly use the posts as annotations, probably editing and adding more information as I learn. The main topic is machine learning focused on graphs, where I have been dedicating most of my time. Feel free to contact me at rui.david ontariotechu.net.</summary></entry></feed>