<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>James H Wade</title>
<link>https://jameshwade.com/</link>
<atom:link href="https://jameshwade.com/index.xml" rel="self" type="application/rss+xml"/>
<description></description>
<generator>quarto-1.8.27</generator>
<lastBuildDate>Sat, 21 Feb 2026 00:00:00 GMT</lastBuildDate>
<item>
  <title>Python in Your Browser with Pyodide</title>
  <dc:creator>James H Wade</dc:creator>
  <link>https://jameshwade.com/posts/2026-02-21_pyodide.html</link>
  <description><![CDATA[ 

<script src="https://cdn.jsdelivr.net/npm/monaco-editor@0.46.0/min/vs/loader.js"></script>
<script type="module" id="qpyodide-monaco-editor-init">

  // Configure the Monaco Editor's loader
  require.config({
    paths: {
      'vs': 'https://cdn.jsdelivr.net/npm/monaco-editor@0.46.0/min/vs'
    }
  });
</script>




<p>Pyodide compiles CPython to WebAssembly. Like <a href="../2023-08-13_webr.qmd">webR for R</a>, it runs Python entirely in your browser: no server, no virtual environment, nothing to install.</p>
<p>NumPy, pandas, matplotlib, and scipy are pre-installed. Other packages can be installed with <code>micropip</code>. Variables defined in one cell are available in the next.</p>
<p>Load time is a bit slower than webR (5–10 seconds) because CPython is a larger runtime. Once initialized, subsequent cells are fast.</p>
<section id="numpy" class="level2">
<h2 class="anchored" data-anchor-id="numpy">NumPy</h2>
<p>NumPy is available immediately. The vectorization that makes it fast in a normal Python environment carries over to the WASM build:</p>
<div id="qpyodide-insertion-location-1"></div>
<noscript>Please enable JavaScript to experience the dynamic code cell content on this page.</noscript>
<p>Try increasing <code>n</code>. The estimate gets more accurate but takes longer. At a million samples you’re typically accurate to 3–4 decimal places.</p>
</section>
<section id="matplotlib" class="level2">
<h2 class="anchored" data-anchor-id="matplotlib">Matplotlib</h2>
<p>Plots render inline. The same Monte Carlo simulation, visualized: points inside the unit circle versus outside.</p>
<div id="qpyodide-insertion-location-2"></div>
<noscript>Please enable JavaScript to experience the dynamic code cell content on this page.</noscript>
</section>
<section id="pandas" class="level2">
<h2 class="anchored" data-anchor-id="pandas">Pandas</h2>
<p>A random walk built as a pandas DataFrame, with a distance-from-origin column:</p>
<div id="qpyodide-insertion-location-3"></div>
<noscript>Please enable JavaScript to experience the dynamic code cell content on this page.</noscript>
<div id="qpyodide-insertion-location-4"></div>
<noscript>Please enable JavaScript to experience the dynamic code cell content on this page.</noscript>
</section>
<section id="the-central-limit-theorem-in-python" class="level2">
<h2 class="anchored" data-anchor-id="the-central-limit-theorem-in-python">The central limit theorem in Python</h2>
<p>The same CLT demo from the <a href="../2023-08-13_webr.qmd">R post</a>: exponential population, sample means converging to normal as <code>n_obs</code> grows. Change the value and re-run:</p>
<div id="qpyodide-insertion-location-5"></div>
<noscript>Please enable JavaScript to experience the dynamic code cell content on this page.</noscript>
</section>
<section id="installing-packages-with-micropip" class="level2">
<h2 class="anchored" data-anchor-id="installing-packages-with-micropip">Installing packages with micropip</h2>
<p>Packages not bundled with Pyodide can be installed with <code>micropip</code>. Pure-Python packages generally work; packages with C extensions need a WASM build:</p>
<div id="qpyodide-insertion-location-6"></div>
<noscript>Please enable JavaScript to experience the dynamic code cell content on this page.</noscript>
<p><code>micropip.install</code> is asynchronous, so <code>await</code> is required. It downloads from PyPI and installs into the in-browser environment.</p>
</section>
<section id="limitations" class="level2">
<h2 class="anchored" data-anchor-id="limitations">Limitations</h2>
<p>A few things don’t work in a WASM environment:</p>
<ul>
<li><strong>Threading</strong>: <code>threading</code> and <code>multiprocessing</code> are unavailable or limited.</li>
<li><strong>File I/O</strong>: no access to your local filesystem. Use <code>io.StringIO</code> / <code>io.BytesIO</code> for in-memory file handling.</li>
<li><strong>Network requests</strong>: <code>requests</code> won’t work. Use <code>pyodide.http.open_url</code> or JavaScript’s <code>fetch</code> via <code>pyodide.globals</code>.</li>
<li><strong>C-extension packages</strong>: packages that rely on compiled C extensions (like <code>lightgbm</code>, <code>xgboost</code>) only work if a WASM build exists.</li>
</ul>
<p>For most data science and statistics work the pre-installed stack (NumPy, pandas, matplotlib, scipy, scikit-learn) covers the common cases. Scikit-learn is fully available. That’s a separate post.</p>


</section>

 ]]></description>
  <category>Python</category>
  <category>WebAssembly</category>
  <guid>https://jameshwade.com/posts/2026-02-21_pyodide.html</guid>
  <pubDate>Sat, 21 Feb 2026 00:00:00 GMT</pubDate>
  <media:content url="https://pyodide.org/en/stable/_static/pyodide-logo.png" medium="image" type="image/png"/>
</item>
<item>
  <title>Turning Shiny Apps into MCP Apps with shinymcp</title>
  <dc:creator>James H Wade</dc:creator>
  <link>https://jameshwade.com/posts/2026-02-21_shinymcp.html</link>
  <description><![CDATA[ 





<p>Shiny apps live on a server. You visit a URL, you click around, you leave. What if the app could live inside the conversation you’re already having with an AI assistant?</p>
<p>That’s what <a href="https://modelcontextprotocol.io/">MCP Apps</a> enable, and <a href="https://github.com/JamesHWade/shinymcp">shinymcp</a> is how you build them from R.</p>
<section id="whats-an-mcp-app" class="level2">
<h2 class="anchored" data-anchor-id="whats-an-mcp-app">What’s an MCP App?</h2>
<p>The <a href="https://modelcontextprotocol.io/">Model Context Protocol</a> is an open standard for connecting AI assistants to external tools and data. MCP servers expose tools that an AI model can call: search a database, run a computation, fetch a file. MCP Apps extend this idea to include a UI. Instead of the model calling a function and getting text back, the user sees an interactive interface rendered directly in the chat.</p>
<p>In practice, a Shiny-style dashboard can appear inline in Claude Desktop. The user changes a dropdown, the tool fires, the output updates, all inside the conversation. No separate browser tab, no URL to share, no deployment to manage.</p>
</section>
<section id="quick-start" class="level2">
<h2 class="anchored" data-anchor-id="quick-start">Quick start</h2>
<p>Install shinymcp from GitHub:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># install.packages("pak")</span></span>
<span id="cb1-2">pak<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pak</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"JamesHWade/shinymcp"</span>)</span></code></pre></div></div>
</div>
<p>An MCP App has two parts: <strong>UI components</strong> that render in the chat, and <strong>tools</strong> that run R code when inputs change. Here’s a minimal dataset explorer:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(shinymcp)</span>
<span id="cb2-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(bslib)</span>
<span id="cb2-3"></span>
<span id="cb2-4">ui <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">page_sidebar</span>(</span>
<span id="cb2-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">theme =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bs_theme</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">preset =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"shiny"</span>),</span>
<span id="cb2-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Dataset Explorer"</span>,</span>
<span id="cb2-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sidebar =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sidebar</span>(</span>
<span id="cb2-8">    shiny<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">selectInput</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dataset"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Choose dataset"</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mtcars"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"iris"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pressure"</span>))</span>
<span id="cb2-9">  ),</span>
<span id="cb2-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">card</span>(</span>
<span id="cb2-11">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">card_header</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Summary"</span>),</span>
<span id="cb2-12">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mcp_text</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"summary"</span>)</span>
<span id="cb2-13">  )</span>
<span id="cb2-14">)</span>
<span id="cb2-15"></span>
<span id="cb2-16">tools <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb2-17">  ellmer<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tool</span>(</span>
<span id="cb2-18">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fun =</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dataset =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mtcars"</span>) {</span>
<span id="cb2-19">      data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">get</span>(dataset, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">envir =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">asNamespace</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"datasets"</span>))</span>
<span id="cb2-20">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">capture.output</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(data)), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">collapse =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb2-21">    },</span>
<span id="cb2-22">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"get_summary"</span>,</span>
<span id="cb2-23">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Get summary statistics for the selected dataset"</span>,</span>
<span id="cb2-24">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">arguments =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb2-25">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dataset =</span> ellmer<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Dataset name"</span>)</span>
<span id="cb2-26">    )</span>
<span id="cb2-27">  )</span>
<span id="cb2-28">)</span>
<span id="cb2-29"></span>
<span id="cb2-30">app <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mcp_app</span>(ui, tools, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dataset-explorer"</span>)</span>
<span id="cb2-31"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">serve</span>(app)</span></code></pre></div></div>
</div>
<p>Save this as <code>app.R</code>, then register it in your Claude Desktop config:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode json code-with-copy"><code class="sourceCode json"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb3-2">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"mcpServers"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb3-3">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"dataset-explorer"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb3-4">      <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"command"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Rscript"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb3-5">      <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"args"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/path/to/app.R"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span></span>
<span id="cb3-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb3-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb3-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
<p>Restart Claude Desktop and invoke the tool. An interactive UI appears inline in the conversation. Changing the dropdown calls the tool and updates the output without a page reload.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://raw.githubusercontent.com/JamesHWade/shinymcp/main/man/figures/demo.gif" class="img-fluid figure-img"></p>
<figcaption>shinymcp demo showing an interactive dashboard inside Claude Desktop</figcaption>
</figure>
</div>
</section>
<section id="the-core-idea-flatten-your-reactive-graph" class="level2">
<h2 class="anchored" data-anchor-id="the-core-idea-flatten-your-reactive-graph">The core idea: flatten your reactive graph</h2>
<p>If you’ve built Shiny apps, you think in reactive expressions: inputs feed into reactives, which feed into outputs. In an MCP App, you flatten that graph into tool functions.</p>
<p>Each connected group of inputs, reactives, and outputs becomes a single tool. The tool takes input values as arguments and returns a named list of outputs. Here’s what the translation looks like:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># --- Shiny server ---</span></span>
<span id="cb4-2">server <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(input, output, session) {</span>
<span id="cb4-3">  filtered <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">reactive</span>({</span>
<span id="cb4-4">    penguins[penguins<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> input<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>species, ]</span>
<span id="cb4-5">  })</span>
<span id="cb4-6">  output<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>scatter <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">renderPlot</span>({</span>
<span id="cb4-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filtered</span>(), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(bill_length_mm, bill_depth_mm)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>()</span>
<span id="cb4-8">  })</span>
<span id="cb4-9">  output<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>stats <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">renderPrint</span>({</span>
<span id="cb4-10">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filtered</span>())</span>
<span id="cb4-11">  })</span>
<span id="cb4-12">}</span>
<span id="cb4-13"></span>
<span id="cb4-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># --- Equivalent MCP App tool ---</span></span>
<span id="cb4-15">ellmer<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tool</span>(</span>
<span id="cb4-16">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fun =</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">species =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Adelie"</span>) {</span>
<span id="cb4-17">    filtered <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> penguins[penguins<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>species <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> species, ]</span>
<span id="cb4-18"></span>
<span id="cb4-19">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Render plot to base64 PNG</span></span>
<span id="cb4-20">    p <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> ggplot2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(filtered, ggplot2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(bill_length_mm, bill_depth_mm)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-21">      ggplot2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>()</span>
<span id="cb4-22">    tmp <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tempfile</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fileext =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".png"</span>)</span>
<span id="cb4-23">    ggplot2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggsave</span>(tmp, p, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">width =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">height =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dpi =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">144</span>)</span>
<span id="cb4-24">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">on.exit</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlink</span>(tmp))</span>
<span id="cb4-25"></span>
<span id="cb4-26">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb4-27">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scatter =</span> base64enc<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">base64encode</span>(tmp),</span>
<span id="cb4-28">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">stats =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">capture.output</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(filtered)), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">collapse =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb4-29">    )</span>
<span id="cb4-30">  },</span>
<span id="cb4-31">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"explore"</span>,</span>
<span id="cb4-32">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Filter and visualize penguins"</span>,</span>
<span id="cb4-33">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">arguments =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb4-34">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">species =</span> ellmer<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Penguin species"</span>)</span>
<span id="cb4-35">  )</span>
<span id="cb4-36">)</span></code></pre></div></div>
</div>
<p>The return keys (<code>scatter</code>, <code>stats</code>) must match the output IDs in the UI (<code>mcp_plot("scatter")</code>, <code>mcp_text("stats")</code>). The bridge routes each value to the right element.</p>
</section>
<section id="how-the-bridge-works" class="level2">
<h2 class="anchored" data-anchor-id="how-the-bridge-works">How the bridge works</h2>
<p>MCP Apps render inside sandboxed iframes in the AI chat interface. A lightweight JavaScript bridge (no npm dependencies) handles the communication:</p>
<ol type="1">
<li>User changes an input</li>
<li>The bridge detects which form elements are inputs (by matching tool argument names to element <code>id</code> attributes) and collects their values</li>
<li>Bridge sends a <code>tools/call</code> request to the host via <code>postMessage</code></li>
<li>Host proxies the call to the MCP server (your R process)</li>
<li>R tool function runs and returns results</li>
<li>Bridge updates the output elements</li>
</ol>
<p>The input auto-detection is the key convenience. If your <code>selectInput</code> has <code>id = "species"</code> and your tool has an argument called <code>species</code>, the bridge wires them together automatically. For edge cases where ids don’t match argument names, <code>mcp_input()</code> lets you explicitly mark an element.</p>
</section>
<section id="automatic-conversion" class="level2">
<h2 class="anchored" data-anchor-id="automatic-conversion">Automatic conversion</h2>
<p>If you have an existing Shiny app you want to convert, shinymcp includes a parse-analyze-generate pipeline:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">convert_app</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"path/to/my-shiny-app"</span>)</span></code></pre></div></div>
</div>
<p>This parses the UI and server code, maps the reactive dependency graph into tool groups, and writes a working MCP App with tools, components, and a server entrypoint. The generated tool bodies contain placeholders for the computation logic.</p>
<p>For complex apps with dynamic UI, modules, or file uploads, shinymcp also ships a <a href="https://github.com/JamesHWade/deputy">deputy</a> skill that guides an AI agent through the conversion process.</p>
</section>
<section id="output-components" class="level2">
<h2 class="anchored" data-anchor-id="output-components">Output components</h2>
<p>shinymcp provides output components that correspond to standard Shiny outputs:</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Shiny</th>
<th>shinymcp</th>
<th>What the tool returns</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>textOutput()</code></td>
<td><code>mcp_text()</code></td>
<td>Plain text string</td>
</tr>
<tr class="even">
<td><code>plotOutput()</code></td>
<td><code>mcp_plot()</code></td>
<td>Base64-encoded PNG</td>
</tr>
<tr class="odd">
<td><code>tableOutput()</code></td>
<td><code>mcp_table()</code></td>
<td>HTML table string</td>
</tr>
<tr class="even">
<td><code>htmlOutput()</code></td>
<td><code>mcp_html()</code></td>
<td>Raw HTML</td>
</tr>
</tbody>
</table>
<p>For inputs, you use the standard <code>shiny</code> and <code>bslib</code> inputs you already know: <code>selectInput</code>, <code>numericInput</code>, <code>checkboxInput</code>, etc. The bridge auto-detects them.</p>
</section>
<section id="why-this-matters" class="level2">
<h2 class="anchored" data-anchor-id="why-this-matters">Why this matters</h2>
<p>The interesting part isn’t the technology. It’s the interaction pattern. When a Shiny app lives inside a chat, the AI can see and respond to what the user is doing in the app. The model has context about both the conversation and the interactive exploration.</p>
<p>I’m still early in figuring out what this enables. If you build something with it, I’d like to hear about it.</p>
</section>
<section id="resources" class="level2">
<h2 class="anchored" data-anchor-id="resources">Resources</h2>
<ul>
<li><a href="https://github.com/JamesHWade/shinymcp">shinymcp on GitHub</a></li>
<li><a href="https://jameshwade.github.io/shinymcp/">shinymcp documentation</a></li>
<li><a href="https://ellmer.tidyverse.org/">ellmer</a>, the LLM framework shinymcp builds on</li>
<li><a href="https://modelcontextprotocol.io/">MCP specification</a></li>
<li><a href="https://rstudio.github.io/bslib/">bslib</a>, Bootstrap layout and theming for the UI</li>
</ul>


</section>

 ]]></description>
  <category>Shiny</category>
  <category>MCP</category>
  <category>AI</category>
  <category>R</category>
  <category>ellmer</category>
  <guid>https://jameshwade.com/posts/2026-02-21_shinymcp.html</guid>
  <pubDate>Sat, 21 Feb 2026 00:00:00 GMT</pubDate>
  <media:content url="https://raw.githubusercontent.com/JamesHWade/shinymcp/main/man/figures/logo.png" medium="image" type="image/png"/>
</item>
<item>
  <title>Disposable Shiny Apps</title>
  <dc:creator>James H Wade</dc:creator>
  <link>https://jameshwade.com/posts/2025-11-03_disposable-shiny-apps.html</link>
  <description><![CDATA[ 





<p><em>This post is based on my <a href="https://youtube.com/playlist?list=PL9HYL-VRX0oTixlfDPCS5RW_F1pccERRe">posit::conf(2025) talk</a> of the same name. The talk’s companion repo is <a href="https://github.com/JamesHWade/posit-conf-2025">on GitHub</a>. Credit to Barrett Schloerke, whose <a href="https://youtu.be/RcwvG7dtMqU?si=JuptknKHyHuRjR3a">talk on single-file Shiny apps</a> planted the seed for this one.</em></p>
<hr>
<p>I build Shiny apps. A lot of them. I maintain a two-year-old app with a hundred-plus users. It has tests, CI/CD, and a backlog. I know the pain of longevity. But I also build apps for single meetings. They live for an hour and die gracefully.</p>
<p>This post is about the second kind.</p>
<section id="the-moment-everything-stalls" class="level2">
<h2 class="anchored" data-anchor-id="the-moment-everything-stalls">The moment everything stalls</h2>
<p>You know the scene. You’ve crafted your narrative. You’ve polished the slides. You tell the story. And then a stakeholder asks:</p>
<blockquote class="blockquote">
<p>“That’s interesting, but what if we looked at the data <em>this</em> way?”</p>
</blockquote>
<p>Your answer: <em>“I’ll have to get back to you on that.”</em></p>
<p>The conversation stops. You had their attention and you couldn’t keep it, because a static slide deck can’t bend to the question.</p>
</section>
<section id="two-traps" class="level2">
<h2 class="anchored" data-anchor-id="two-traps">Two traps</h2>
<p>We don’t build more apps because we fall into one of two traps.</p>
<p><strong>Trap 1: “Don’t Build.”</strong> It’s too much effort for just one meeting. The development hill looks too steep, so we default to slides.</p>
<p><strong>Trap 2: “Over-Build.”</strong> If I build it, it must be production-grade. We over-engineer a complex solution for a simple, one-off need.</p>
<p>The consequence of both traps is the same: death by PowerPoint.</p>
</section>
<section id="a-third-option" class="level2">
<h2 class="anchored" data-anchor-id="a-third-option">A third option</h2>
<p>What if building an app for a single meeting was as easy as making a slide deck? That’s the disposable app.</p>
<p>A disposable app:</p>
<ul>
<li>Gets its value from <strong>immediate impact</strong>, not longevity</li>
<li>Is a <strong>communication artifact</strong>, not a production system</li>
<li>Is designed to be <strong>thrown away</strong></li>
</ul>
<p>Think of it as a whiteboard sketch, not a monument. Valuable in the moment, erasable afterward.</p>
</section>
<section id="how-to-build-one" class="level2">
<h2 class="anchored" data-anchor-id="how-to-build-one">How to build one</h2>
<p>The development hill isn’t steep anymore. With AI-assisted coding, you can go from a question to a working Shiny app in about fifteen minutes.</p>
<section id="step-1-start-with-one-question" class="level3">
<h3 class="anchored" data-anchor-id="step-1-start-with-one-question">Step 1: Start with one question</h3>
<p>A disposable app should do one thing well. Don’t scope a dashboard — scope a question. For my talk, I used a question from my own house: twin toddlers in potty training, a four-year-old, four cats, and general chaos.</p>
<p>The question: <strong>How can I track and optimize twin potty training success?</strong></p>
<p>The stakeholders (frazzled parents) needed real-time insights and encouragement.</p>
</section>
<section id="step-2-write-a-clear-prompt" class="level3">
<h3 class="anchored" data-anchor-id="step-2-write-a-clear-prompt">Step 2: Write a clear prompt</h3>
<p>Describe what you want in plain English. Be specific about features and tools. Here’s the actual prompt I used in Positron Assistant:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode markdown code-with-copy"><code class="sourceCode markdown"><span id="cb1-1">@shiny your task is to build a fun and engaging shiny app all about potty training.</span>
<span id="cb1-2"></span>
<span id="cb1-3"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>The purpose of the app is to help my wife and me know what to do next</span>
<span id="cb1-4">  when it comes to potty training _twins_ (yikes!).</span>
<span id="cb1-5"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>We are following _Oh Crap! Potty Training_ by Jamie Glowacki.</span>
<span id="cb1-6"></span>
<span id="cb1-7">The app should consist of:</span>
<span id="cb1-8"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  - </span>a page_sidebar where the left sidebar is a shinychat that we can use</span>
<span id="cb1-9">    as our Jamie Glowacki stand-in in times of need. We should use</span>
<span id="cb1-10">    <span class="in" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">`claude-sonnet-4`</span> via ellmer as our guide.</span>
<span id="cb1-11"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  - </span>a main page with trackers and graphs to track pees, poops, and</span>
<span id="cb1-12">    parental sanity. Remember that it matters if we made it to the potty</span>
<span id="cb1-13">    or if it was an accident.</span>
<span id="cb1-14"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  - </span>For bonus points, you can include a prediction for the next "event".</span>
<span id="cb1-15"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  - </span>You should give this a theme suitable for the task at hand.</span>
<span id="cb1-16">    Use <span class="in" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">`_brand.yml`</span> to help you out.</span>
<span id="cb1-17"></span>
<span id="cb1-18">It is important to remember to treat the twins separately. This should</span>
<span id="cb1-19">be easy, they are boy-girl twins named Henry and Penelope.</span></code></pre></div></div>
<p>The key elements: a clear objective, specific features, UI preferences (<code>bslib</code>, sidebar layout), and enough context for the model to make reasonable choices.</p>
</section>
<section id="step-3-let-ai-generate-the-code" class="level3">
<h3 class="anchored" data-anchor-id="step-3-let-ai-generate-the-code">Step 3: Let AI generate the code</h3>
<p>In minutes, the model creates a complete UI structure with <code>bslib</code> components, realistic data simulation, interactive visualizations, and <code>shinychat</code> integration. You need to watch it work. Not to micromanage, but to keep it on the rails.</p>
</section>
<section id="step-4-tweak-and-ship" class="level3">
<h3 class="anchored" data-anchor-id="step-4-tweak-and-ship">Step 4: Tweak and ship</h3>
<p>Fix the inevitable broken pieces. Ask for minor layout adjustments. Swap in your actual data. The common time breakdown:</p>
<ul>
<li>Generation: 2–5 minutes</li>
<li>Review: 2–5 minutes</li>
<li>Tweaks: 5–10 minutes</li>
<li><strong>Total: about 15 minutes</strong></li>
</ul>
<p>Ship it and start getting feedback immediately.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://raw.githubusercontent.com/JamesHWade/posit-conf-2025/main/media/potty_tracker.gif" class="img-fluid figure-img"></p>
<figcaption>The Twin Potty Training Command Center — a disposable app built in fifteen minutes</figcaption>
</figure>
</div>
</section>
</section>
<section id="overcoming-the-objections" class="level2">
<h2 class="anchored" data-anchor-id="overcoming-the-objections">Overcoming the objections</h2>
<section id="who-will-support-it" class="level3">
<h3 class="anchored" data-anchor-id="who-will-support-it">“Who will support it?”</h3>
<p>Nobody. That’s the point. You don’t need a maintenance crew for a whiteboard drawing. You just erase it. This app took fifteen minutes to build. If we need it again, we’ll build it again, probably better, because now we know more.</p>
</section>
<section id="wont-this-create-app-sprawl" class="level3">
<h3 class="anchored" data-anchor-id="wont-this-create-app-sprawl">“Won’t this create app sprawl?”</h3>
<p>Yes, potentially. (Frankly, I might be the wrong person to ask. I’ve been accused of being an app hoarder.) The key distinction: these are communication artifacts, not production systems. You must commit to throwing them away. If an app proves valuable enough to keep, that’s a signal to build a real version with tests, CI/CD, and a maintenance plan. But most won’t need that. Most were valuable for one meeting and one meeting only.</p>
</section>
<section id="isnt-this-more-work-than-slides" class="level3">
<h3 class="anchored" data-anchor-id="isnt-this-more-work-than-slides">“Isn’t this more work than slides?”</h3>
<p>Consider the actual time budget:</p>
<p><strong>A slide deck.</strong> Initial creation: 3 hours. Tweaking chart alignment: 2 hours. Handling update requests: 2 hours. “Can you change the colors?”: 1 hour. Total: 8 hours. Impact: low. Interactivity: zero.</p>
<p><strong>A disposable app.</strong> Prompt and generation: 15 minutes. Integrating data: 30 minutes. Refine and test: 15 minutes. Total: 1 hour. Impact: high. Interactivity: infinite.</p>
<p>The ratio isn’t close. And fun work goes by faster.</p>
</section>
</section>
<section id="the-mindset-shift" class="level2">
<h2 class="anchored" data-anchor-id="the-mindset-shift">The mindset shift</h2>
<p>The tools are here. The only thing left is to change how we think.</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Old mindset</th>
<th>Disposable mindset</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Perfection</td>
<td>Utility</td>
</tr>
<tr class="even">
<td>Production-ready</td>
<td>Good enough is perfect</td>
</tr>
<tr class="odd">
<td>Full test coverage</td>
<td>Happy path only</td>
</tr>
<tr class="even">
<td>Edge case handling</td>
<td>Fast to build</td>
</tr>
</tbody>
</table>
<p>This isn’t about lowering standards. Production apps should be tested, maintained, and properly engineered. But not every app needs to be a production app. A napkin sketch isn’t a failure of architecture. It’s a different tool for a different job.</p>
</section>
<section id="the-challenge" class="level2">
<h2 class="anchored" data-anchor-id="the-challenge">The challenge</h2>
<p>The next time you have a presentation, I challenge you:</p>
<p><strong>Open Positron before you open PowerPoint.</strong></p>
<p>Ask yourself whether your audience would rather look at your data or play with it. If the answer is play, build a disposable app. It’ll take less time than you think, and the conversation it starts will be worth more than any slide deck.</p>
<p>And when the meeting’s over, delete it.</p>


</section>

 ]]></description>
  <category>Shiny</category>
  <category>AI</category>
  <category>posit::conf</category>
  <category>R</category>
  <guid>https://jameshwade.com/posts/2025-11-03_disposable-shiny-apps.html</guid>
  <pubDate>Mon, 03 Nov 2025 00:00:00 GMT</pubDate>
  <media:content url="https://raw.githubusercontent.com/JamesHWade/posit-conf-2025/main/media/potty_tracker.gif" medium="image" type="image/gif"/>
</item>
<item>
  <title>Disposable Shiny Apps: Annotated Talk Notes</title>
  <dc:creator>James H Wade</dc:creator>
  <link>https://jameshwade.com/posts/2025-09-18_disposable-shiny-apps-annotated.html</link>
  <description><![CDATA[ 





<p>I gave this talk at posit::conf(2025) in Atlanta on September 18, 2025. The argument: with AI-assisted coding, the barrier to building a Shiny app for a meeting is now low enough that you should be choosing apps over slides more often — and then deleting them when you’re done.</p>
<p>There’s <a href="../2025-11-03_disposable-shiny-apps.qmd">a prose write-up of the same ideas</a> if you prefer that format. This post is the annotated slides version: what each section was trying to do, notes on the talk itself, and things I’d do differently. Inspired by <a href="https://simonwillison.net/2023/Aug/6/annotated-presentations/">Simon Willison’s annotated presentations format</a>. The slides and demo code are <a href="https://github.com/JamesHWade/posit-conf-2025">on GitHub</a>.</p>
<p>Credit to Barrett Schloerke, whose <a href="https://youtu.be/RcwvG7dtMqU?si=JuptknKHyHuRjR3a">talk on single-file Shiny apps</a> planted the seed for this one.</p>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden; margin-bottom: 1.5rem;">
  <iframe style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;" src="https://www.youtube.com/embed/smnrmTtoiOM" title="Disposable Shiny Apps | posit::conf(2025)" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen="">
  </iframe>
</div>
<ul>
<li>The opening provocation</li>
<li>Who I am and why I can say this</li>
<li>The moment everything stalls</li>
<li>Interactive vs.&nbsp;passive</li>
<li>The two traps</li>
<li>The disposable mindset</li>
<li>How to build one</li>
<li>Overcoming objections</li>
<li>The challenge</li>
</ul>
<section id="opening" class="level2">
<h2 class="anchored" data-anchor-id="opening">The opening provocation</h2>
<p>The talk opens with two lines in large text against a dark background:</p>
<blockquote class="blockquote">
<p>You should build <strong>more</strong> Shiny apps. And then <strong>throw them away.</strong></p>
</blockquote>
<p>The juxtaposition is deliberate. Most conference talks about Shiny apps are about building better, longer-lived, more robust ones. This one inverts that. The phrase “throw them away” is meant to sound almost irresponsible, because getting people interested in an idea often requires making them feel slightly uncomfortable first.</p>
<p>It landed. The room got quiet in a way that felt like engagement, and I’ll take that.</p>
</section>
<section id="who-i-am" class="level2">
<h2 class="anchored" data-anchor-id="who-i-am">Who I am and why I can say this</h2>
<p>I maintain a two-year-old production app with 100-plus users, tests, CI/CD, and a backlog. I know the pain of longevity. But I also build apps for single meetings — apps that live for an hour and die gracefully.</p>
<p>This credibility slide mattered. If I only built disposable apps, you might reasonably dismiss the argument as naivety about engineering. The production side of my work is what earns the right to advocate for the disposable side.</p>
<p>In hindsight, this section ran a little long. The audience doesn’t need much convincing that I’ve built software before. A single sentence would have done it.</p>
</section>
<section id="frustration" class="level2">
<h2 class="anchored" data-anchor-id="frustration">The moment everything stalls</h2>
<p>The scenario: you’ve crafted a narrative, polished the slides, and you’re telling the story. A stakeholder asks:</p>
<blockquote class="blockquote">
<p>“That’s interesting, but what if we looked at the data <em>this</em> way?”</p>
</blockquote>
<p>Your answer: “I’ll have to get back to you on that.”</p>
<p>The conversation stops. You had their attention and couldn’t keep it, because a static slide deck can’t bend to the question.</p>
<p>Most people in a data science audience have been in this room. The scenario resonated — I could see it on faces. This is the problem worth solving.</p>
</section>
<section id="interactive-vs-passive" class="level2">
<h2 class="anchored" data-anchor-id="interactive-vs-passive">Interactive vs.&nbsp;passive</h2>
<div class="quarto-layout-panel" data-layout-ncol="2">
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://raw.githubusercontent.com/JamesHWade/posit-conf-2025/main/media/fine-art-gallery.png" class="img-fluid figure-img"></p>
<figcaption>“Please step back from the data”</figcaption>
</figure>
</div>
</div>
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://raw.githubusercontent.com/JamesHWade/posit-conf-2025/main/media/childrens-museum.png" class="img-fluid figure-img"></p>
<figcaption>“Come play with the data”</figcaption>
</figure>
</div>
</div>
</div>
</div>
<p>This was my favorite slide in the deck. A fine art gallery and a children’s museum side by side, with no explanation. Two completely different philosophies about how people should relate to what’s on display.</p>
<p>A slide deck is a fine art gallery. You can look at the data, but please don’t touch. A Shiny app is a children’s museum. The data is there to be played with.</p>
<p>The question I posed to the audience: “What do you hope to achieve with your presentation?” If the answer involves people <em>understanding</em> the data — not just seeing it — you probably want the children’s museum.</p>
<p>The images did the work here. I barely needed to speak.</p>
</section>
<section id="the-two-traps" class="level2">
<h2 class="anchored" data-anchor-id="the-two-traps">The two traps</h2>
<p>Why don’t more people build apps? Two failure modes:</p>
<p><strong>Trap 1: “Don’t Build.”</strong> The development hill looks too steep for a single meeting. We default to slides because they feel like the path of least resistance.</p>
<p><strong>Trap 2: “Over-Build.”</strong> If we do decide to build, engineering instincts kick in. We add a database, set up CI/CD, write tests. We engineer a production system for a problem that needed a fifteen-minute solution.</p>
<p>Both traps lead to the same outcome: something static when you could have had something interactive.</p>
<p>The punchline: “The consequence of both traps? Death by PowerPoint.” That’s a bit of a crowd-pleasing line, and it worked, but I think the “over-build” trap is the more interesting and less-discussed one. Most talks about app development implicitly encourage it.</p>
</section>
<section id="the-disposable-mindset" class="level2">
<h2 class="anchored" data-anchor-id="the-disposable-mindset">The disposable mindset</h2>
<p>The definition I put on screen:</p>
<ul>
<li>Its value is in its <strong>immediate impact</strong>, not its longevity.</li>
<li>It’s a <strong>communication artifact</strong>, not a production system.</li>
<li>It’s designed to be <strong>thrown away</strong>.</li>
</ul>
<p>The analogy that holds up best: a whiteboard sketch. People gather around it, point at it, annotate it. At the end of the meeting, someone erases it. That’s not a failure — that’s what whiteboards are for. A disposable Shiny app is the same thing, with interactivity.</p>
<p>This section could have been shorter. The concept isn’t complicated. Say it once, cleanly, and move on.</p>
</section>
<section id="how-to-build-one" class="level2">
<h2 class="anchored" data-anchor-id="how-to-build-one">How to build one</h2>
<p>The practical core of the talk. I walked through building an actual app live: the Twin Potty Training Command Center. (Twin toddlers in potty training plus a four-year-old plus four cats equals a legitimately chaotic data problem that needed a Shiny solution.)</p>
<section id="step-1-start-with-one-question" class="level3">
<h3 class="anchored" data-anchor-id="step-1-start-with-one-question">Step 1: Start with one question</h3>
<p>Scope a question, not a dashboard. The question I used: “How can I track and optimize twin potty training success for Henry and Penelope?” Specific enough to build from. Small enough to finish in fifteen minutes.</p>
</section>
<section id="step-2-write-a-clear-prompt" class="level3">
<h3 class="anchored" data-anchor-id="step-2-write-a-clear-prompt">Step 2: Write a clear prompt</h3>
<p>Here’s the actual prompt I used in Positron Assistant:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode markdown code-with-copy"><code class="sourceCode markdown"><span id="cb1-1">@shiny your task is to build a fun and engaging shiny app all about potty training.</span>
<span id="cb1-2"></span>
<span id="cb1-3"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>The purpose of the app is to help my wife and me know what to do next</span>
<span id="cb1-4">  when it comes to potty training _twins_ (yikes!).</span>
<span id="cb1-5"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>We are following _Oh Crap! Potty Training_ by Jamie Glowacki.</span>
<span id="cb1-6"></span>
<span id="cb1-7">The app should consist of:</span>
<span id="cb1-8"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  - </span>a page_sidebar where the left sidebar is a shinychat that we can use</span>
<span id="cb1-9">    as our Jamie Glowacki stand-in in times of need. We should use</span>
<span id="cb1-10">    <span class="in" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">`claude-sonnet-4`</span> via ellmer as our guide.</span>
<span id="cb1-11"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  - </span>a main page with trackers and graphs to track pees, poops, and</span>
<span id="cb1-12">    parental sanity. Remember that it matters if we made it to the potty</span>
<span id="cb1-13">    or if it was an accident.</span>
<span id="cb1-14"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  - </span>For bonus points, you can include a prediction for the next "event".</span>
<span id="cb1-15"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  - </span>You should give this a theme suitable for the task at hand.</span>
<span id="cb1-16">    Use <span class="in" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">`_brand.yml`</span> to help you out.</span>
<span id="cb1-17"></span>
<span id="cb1-18">It is important to remember to treat the twins separately. This should</span>
<span id="cb1-19">be easy, they are boy-girl twins named Henry and Penelope.</span></code></pre></div></div>
<p>The structure that matters: clear objective, specific features, tool preferences (bslib, ellmer, shinychat), and enough context for the model to make reasonable choices without guessing. The <code>_brand.yml</code> call-out is worth noting — it’s a good way to get consistent theming without specifying every color yourself.</p>
</section>
<section id="step-3-let-the-model-generate-the-code" class="level3">
<h3 class="anchored" data-anchor-id="step-3-let-the-model-generate-the-code">Step 3: Let the model generate the code</h3>
<p>Minutes later: complete UI with bslib components, data simulation, visualizations, and shinychat integration. The important thing is to watch it work — not to micromanage, but to redirect when it goes off track.</p>
</section>
<section id="step-4-tweak-and-ship" class="level3">
<h3 class="anchored" data-anchor-id="step-4-tweak-and-ship">Step 4: Tweak and ship</h3>
<p>Fix what’s broken, adjust the layout, swap in real data.</p>
<ul>
<li>Generation: 2–5 minutes</li>
<li>Review: 2–5 minutes</li>
<li>Tweaks: 5–10 minutes</li>
<li>Total: 15 minutes</li>
</ul>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://raw.githubusercontent.com/JamesHWade/posit-conf-2025/main/media/potty_tracker.gif" class="img-fluid figure-img"></p>
<figcaption>The Twin Potty Training Command Center — built live at posit::conf(2025)</figcaption>
</figure>
</div>
<p>The demo worked. There’s always one moment in a live demo where something doesn’t cooperate, and this one was no exception, but the app ran, the chat worked, and the audience got the idea. The GIF above is from the demo.</p>
</section>
</section>
<section id="overcoming-objections" class="level2">
<h2 class="anchored" data-anchor-id="overcoming-objections">Overcoming objections</h2>
<p>Three objections come up reliably when I describe this approach.</p>
<p><strong>“Who will support it?”</strong> Nobody. That’s the point. You don’t maintain a whiteboard drawing — you erase it. If you need the app again, you rebuild it. You probably understand the problem better now anyway, so the rebuild will be faster and better.</p>
<p><strong>“Won’t this create app sprawl?”</strong> Yes, potentially. I’ve been accused of being an app hoarder, so I might be the wrong person to ask. The key distinction is communication artifacts vs.&nbsp;production systems. The commitment is that you delete them. If an app proves valuable enough to outlive the meeting, that’s a signal to build a proper version — with tests and CI/CD and a maintenance plan. Most apps won’t earn that.</p>
<p><strong>“Isn’t this more work than slides?”</strong> The accounting:</p>
<ul>
<li>Slide deck: ~8 hours of work, low impact, zero interactivity.</li>
<li>Disposable app: ~1 hour, high impact, full interactivity.</li>
</ul>
<p>The ratio isn’t close. And data scientists generally find coding more enjoyable than slide alignment, which matters more than it sounds.</p>
</section>
<section id="the-challenge" class="level2">
<h2 class="anchored" data-anchor-id="the-challenge">The challenge</h2>
<p>The closing:</p>
<blockquote class="blockquote">
<p>The next time you have a presentation — open Positron <em>before</em> you open PowerPoint.</p>
</blockquote>
<p>Not every presentation needs an app. But more of them could use one than currently do. Ask the question before defaulting to slides.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://raw.githubusercontent.com/JamesHWade/posit-conf-2025/main/media/app-in-trash.png" class="img-fluid figure-img"></p>
<figcaption>The Twin Potty Training Command Center, making its way to the trash</figcaption>
</figure>
</div>
<p>And when the meeting’s over: delete it.</p>


</section>

 ]]></description>
  <category>Shiny</category>
  <category>AI</category>
  <category>posit::conf</category>
  <category>R</category>
  <guid>https://jameshwade.com/posts/2025-09-18_disposable-shiny-apps-annotated.html</guid>
  <pubDate>Thu, 18 Sep 2025 00:00:00 GMT</pubDate>
  <media:content url="https://raw.githubusercontent.com/JamesHWade/posit-conf-2025/main/media/childrens-museum.png" medium="image" type="image/png"/>
</item>
<item>
  <title>R in Your Browser with WebR</title>
  <dc:creator>James H Wade</dc:creator>
  <link>https://jameshwade.com/posts/2023-08-13_webr.html</link>
  <description><![CDATA[ 

<script src="https://cdn.jsdelivr.net/npm/monaco-editor@0.43.0/min/vs/loader.js"></script>
<script type="module" id="webr-monaco-editor-init">

  // Configure the Monaco Editor's loader
  require.config({
    paths: {
      'vs': 'https://cdn.jsdelivr.net/npm/monaco-editor@0.43.0/min/vs'
    }
  });
</script>




<p>WebR compiles R to WebAssembly. R runs in your browser: no server, no RStudio, no installation. Nothing leaves your machine.</p>
<p>For technical writing where the code is the point, this removes the barrier. Readers can run it, change it, see what happens. The page is the environment.</p>
<p>A few things to know:</p>
<ul>
<li>First load takes a few seconds while webR initializes.</li>
<li><code>ggplot2</code> is pre-loaded for the examples below, which adds a few seconds at startup.</li>
<li>Not every CRAN package has a WASM build, but the most common ones do.</li>
</ul>
<section id="base-r-runs-immediately" class="level2">
<h2 class="anchored" data-anchor-id="base-r-runs-immediately">Base R runs immediately</h2>
<p>The Collatz sequence converges to 1 from any positive integer. Change 27 to any other starting number:</p>
<button class="btn btn-default btn-webr" disabled="" type="button" id="webr-run-button-1">Loading
  webR...</button>
<div id="webr-editor-1"></div>
<div id="webr-code-output-1" aria-live="assertive">
  <pre style="visibility: hidden"></pre>
</div>
<script type="module">
  // Retrieve webR code cell information
  const runButton = document.getElementById("webr-run-button-1");
  const outputDiv = document.getElementById("webr-code-output-1");
  const editorDiv = document.getElementById("webr-editor-1");

  // Add a light grey outline around the code editor
  editorDiv.style.border = "1px solid #eee";

  // Load the Monaco Editor and create an instance
  let editor;
  require(['vs/editor/editor.main'], function () {
    editor = monaco.editor.create(editorDiv, {
      value: `# The Collatz sequence — runs until you reach 1
collatz <- function(n) {
  steps <- c(n)
  while (n != 1) {
    n <- if (n %% 2 == 0) n / 2 else 3 * n + 1
    steps <- c(steps, n)
  }
  steps
}

seq <- collatz(27)
cat("Steps to reach 1:", length(seq) - 1, "\\n")
plot(seq, type = "l", col = "#1b6b4a", lwd = 2,
     main = "Collatz sequence starting at 27",
     xlab = "Step", ylab = "Value", bty = "l")`,
      language: 'r',
      theme: 'vs-light',
      automaticLayout: true,           // TODO: Could be problematic for slide decks
      scrollBeyondLastLine: false,
      minimap: {
        enabled: false
      },
      fontSize: '17.5rem',               // Bootstrap is 1 rem
      renderLineHighlight: "none",     // Disable current line highlighting
      hideCursorInOverviewRuler: true  // Remove cursor indictor in right hand side scroll bar
    });

    // Dynamically modify the height of the editor window if new lines are added.
    let ignoreEvent = false;
    const updateHeight = () => {
      const contentHeight = editor.getContentHeight();
      // We're avoiding a width change
      //editorDiv.style.width = `${width}px`;
      editorDiv.style.height = `${contentHeight}px`;
      try {
        ignoreEvent = true;

        // The key to resizing is this call
        editor.layout();
      } finally {
        ignoreEvent = false;
      }
    };

    // Helper function to check if selected text is empty
    function isEmptyCodeText(selectedCodeText) {
      return (selectedCodeText === null || selectedCodeText === undefined || selectedCodeText === "");
    }

    // Registry of keyboard shortcuts that should be re-added to each editor window
    // when focus changes.
    const addWebRKeyboardShortCutCommands = () => {
      // Add a keydown event listener for Shift+Enter to run all code in cell
      editor.addCommand(monaco.KeyMod.Shift | monaco.KeyCode.Enter, () => {

        // Retrieve all text inside the editor
        executeCode(editor.getValue());
      });

      // Add a keydown event listener for CMD/Ctrl+Enter to run selected code
      editor.addCommand(monaco.KeyMod.CtrlCmd | monaco.KeyCode.Enter, () => {

        // Get the selected text from the editor
        const selectedText = editor.getModel().getValueInRange(editor.getSelection());
        // Check if no code is selected
        if (isEmptyCodeText(selectedText)) {
          // Obtain the current cursor position
          let currentPosition = editor.getPosition();
          // Retrieve the current line content
          let currentLine = editor.getModel().getLineContent(currentPosition.lineNumber);

          // Propose a new position to move the cursor to
          let newPosition = new monaco.Position(currentPosition.lineNumber + 1, 1);

          // Check if the new position is beyond the last line of the editor
          if (newPosition.lineNumber > editor.getModel().getLineCount()) {
            // Add a new line at the end of the editor
            editor.executeEdits("addNewLine", [{
            range: new monaco.Range(newPosition.lineNumber, 1, newPosition.lineNumber, 1),
            text: "\n", 
            forceMoveMarkers: true,
            }]);
          }
          
          // Run the entire line of code.
          executeCode(currentLine);

          // Move cursor to new position
          editor.setPosition(newPosition);
        } else {
          // Code to run when Ctrl+Enter is pressed with selected code
          executeCode(selectedText);
        }
      });
    }

    // Register an on focus event handler for when a code cell is selected to update
    // what keyboard shortcut commands should work.
    // This is a workaround to fix a regression that happened with multiple
    // editor windows since Monaco 0.32.0 
    // https://github.com/microsoft/monaco-editor/issues/2947
    editor.onDidFocusEditorText(addWebRKeyboardShortCutCommands);

    // Register an on change event for when new code is added to the editor window
    editor.onDidContentSizeChange(updateHeight);

    // Manually re-update height to account for the content we inserted into the call
    updateHeight();
  });

  // Function to execute the code (accepts code as an argument)
  async function executeCode(codeToRun) {
    // Disable run button for code cell active
    runButton.disabled = true;

    // Create a canvas variable for graphics
    let canvas = undefined;

    // Initialize webR
    await globalThis.webR.init();

    // Setup a webR canvas by making a namespace call into the {webr} package
    await webR.evalRVoid("webr::canvas(width=504, height=360)");

    // Capture output data from evaluating the code
    const result = await webRCodeShelter.captureR(codeToRun, {
      withAutoprint: true,
      captureStreams: true,
      captureConditions: false//,
      // env: webR.objs.emptyEnv, // maintain a global environment for webR v0.2.0
    });

    // Start attempting to parse the result data
    try {

      // Stop creating images
      await webR.evalRVoid("dev.off()");

      // Merge output streams of STDOUT and STDErr (messages and errors are combined.)
      const out = result.output.filter(
        evt => evt.type == "stdout" || evt.type == "stderr"
      ).map((evt) => evt.data).join("\n");

      // Clean the state
      const msgs = await webR.flush();

      // Output each image stored
      msgs.forEach(msg => {
        // Determine if old canvas can be used or a new canvas is required.
        if (msg.type === 'canvas'){
          // Add image to the current canvas
          if (msg.data.event === 'canvasImage') {
            canvas.getContext('2d').drawImage(msg.data.image, 0, 0);
          } else if (msg.data.event === 'canvasNewPage') {
            // Generate a new canvas element
            canvas = document.createElement("canvas");
            canvas.setAttribute("width", 2 * 504);
            canvas.setAttribute("height", 2 * 360);
            canvas.style.width = "700px";
            canvas.style.display = "block";
            canvas.style.margin = "auto";
          }
        }
      });

      // Nullify the outputDiv of content
      outputDiv.innerHTML = "";

      // Design an output object for messages
      const pre = document.createElement("pre");
      if (/\S/.test(out)) {
        // Display results as text
        const code = document.createElement("code");
        code.innerText = out;
        pre.appendChild(code);
      } else {
        // If nothing is present, hide the element.
        pre.style.visibility = "hidden";
      }
      outputDiv.appendChild(pre);

      // Place the graphics on the canvas
      if (canvas) {
        const p = document.createElement("p");
        p.appendChild(canvas);
        outputDiv.appendChild(p);
      }
    } finally {
      // Clean up the remaining code
      webRCodeShelter.purge();
      runButton.disabled = false;
    }
  }

  // Add a click event listener to the run button
  runButton.onclick = function () {
    executeCode(editor.getValue());
  };
</script>
<p>Old Faithful has a distinctly bimodal eruption pattern. A density estimate shows the two clusters better than a histogram:</p>
<button class="btn btn-default btn-webr" disabled="" type="button" id="webr-run-button-2">Loading
  webR...</button>
<div id="webr-editor-2"></div>
<div id="webr-code-output-2" aria-live="assertive">
  <pre style="visibility: hidden"></pre>
</div>
<script type="module">
  // Retrieve webR code cell information
  const runButton = document.getElementById("webr-run-button-2");
  const outputDiv = document.getElementById("webr-code-output-2");
  const editorDiv = document.getElementById("webr-editor-2");

  // Add a light grey outline around the code editor
  editorDiv.style.border = "1px solid #eee";

  // Load the Monaco Editor and create an instance
  let editor;
  require(['vs/editor/editor.main'], function () {
    editor = monaco.editor.create(editorDiv, {
      value: `plot(density(faithful$eruptions),
     main = "Old Faithful eruption duration",
     xlab = "Duration (minutes)",
     col = "#1b6b4a", lwd = 2, bty = "l")
rug(faithful$eruptions, col = "#c2783980")`,
      language: 'r',
      theme: 'vs-light',
      automaticLayout: true,           // TODO: Could be problematic for slide decks
      scrollBeyondLastLine: false,
      minimap: {
        enabled: false
      },
      fontSize: '17.5rem',               // Bootstrap is 1 rem
      renderLineHighlight: "none",     // Disable current line highlighting
      hideCursorInOverviewRuler: true  // Remove cursor indictor in right hand side scroll bar
    });

    // Dynamically modify the height of the editor window if new lines are added.
    let ignoreEvent = false;
    const updateHeight = () => {
      const contentHeight = editor.getContentHeight();
      // We're avoiding a width change
      //editorDiv.style.width = `${width}px`;
      editorDiv.style.height = `${contentHeight}px`;
      try {
        ignoreEvent = true;

        // The key to resizing is this call
        editor.layout();
      } finally {
        ignoreEvent = false;
      }
    };

    // Helper function to check if selected text is empty
    function isEmptyCodeText(selectedCodeText) {
      return (selectedCodeText === null || selectedCodeText === undefined || selectedCodeText === "");
    }

    // Registry of keyboard shortcuts that should be re-added to each editor window
    // when focus changes.
    const addWebRKeyboardShortCutCommands = () => {
      // Add a keydown event listener for Shift+Enter to run all code in cell
      editor.addCommand(monaco.KeyMod.Shift | monaco.KeyCode.Enter, () => {

        // Retrieve all text inside the editor
        executeCode(editor.getValue());
      });

      // Add a keydown event listener for CMD/Ctrl+Enter to run selected code
      editor.addCommand(monaco.KeyMod.CtrlCmd | monaco.KeyCode.Enter, () => {

        // Get the selected text from the editor
        const selectedText = editor.getModel().getValueInRange(editor.getSelection());
        // Check if no code is selected
        if (isEmptyCodeText(selectedText)) {
          // Obtain the current cursor position
          let currentPosition = editor.getPosition();
          // Retrieve the current line content
          let currentLine = editor.getModel().getLineContent(currentPosition.lineNumber);

          // Propose a new position to move the cursor to
          let newPosition = new monaco.Position(currentPosition.lineNumber + 1, 1);

          // Check if the new position is beyond the last line of the editor
          if (newPosition.lineNumber > editor.getModel().getLineCount()) {
            // Add a new line at the end of the editor
            editor.executeEdits("addNewLine", [{
            range: new monaco.Range(newPosition.lineNumber, 1, newPosition.lineNumber, 1),
            text: "\n", 
            forceMoveMarkers: true,
            }]);
          }
          
          // Run the entire line of code.
          executeCode(currentLine);

          // Move cursor to new position
          editor.setPosition(newPosition);
        } else {
          // Code to run when Ctrl+Enter is pressed with selected code
          executeCode(selectedText);
        }
      });
    }

    // Register an on focus event handler for when a code cell is selected to update
    // what keyboard shortcut commands should work.
    // This is a workaround to fix a regression that happened with multiple
    // editor windows since Monaco 0.32.0 
    // https://github.com/microsoft/monaco-editor/issues/2947
    editor.onDidFocusEditorText(addWebRKeyboardShortCutCommands);

    // Register an on change event for when new code is added to the editor window
    editor.onDidContentSizeChange(updateHeight);

    // Manually re-update height to account for the content we inserted into the call
    updateHeight();
  });

  // Function to execute the code (accepts code as an argument)
  async function executeCode(codeToRun) {
    // Disable run button for code cell active
    runButton.disabled = true;

    // Create a canvas variable for graphics
    let canvas = undefined;

    // Initialize webR
    await globalThis.webR.init();

    // Setup a webR canvas by making a namespace call into the {webr} package
    await webR.evalRVoid("webr::canvas(width=504, height=360)");

    // Capture output data from evaluating the code
    const result = await webRCodeShelter.captureR(codeToRun, {
      withAutoprint: true,
      captureStreams: true,
      captureConditions: false//,
      // env: webR.objs.emptyEnv, // maintain a global environment for webR v0.2.0
    });

    // Start attempting to parse the result data
    try {

      // Stop creating images
      await webR.evalRVoid("dev.off()");

      // Merge output streams of STDOUT and STDErr (messages and errors are combined.)
      const out = result.output.filter(
        evt => evt.type == "stdout" || evt.type == "stderr"
      ).map((evt) => evt.data).join("\n");

      // Clean the state
      const msgs = await webR.flush();

      // Output each image stored
      msgs.forEach(msg => {
        // Determine if old canvas can be used or a new canvas is required.
        if (msg.type === 'canvas'){
          // Add image to the current canvas
          if (msg.data.event === 'canvasImage') {
            canvas.getContext('2d').drawImage(msg.data.image, 0, 0);
          } else if (msg.data.event === 'canvasNewPage') {
            // Generate a new canvas element
            canvas = document.createElement("canvas");
            canvas.setAttribute("width", 2 * 504);
            canvas.setAttribute("height", 2 * 360);
            canvas.style.width = "700px";
            canvas.style.display = "block";
            canvas.style.margin = "auto";
          }
        }
      });

      // Nullify the outputDiv of content
      outputDiv.innerHTML = "";

      // Design an output object for messages
      const pre = document.createElement("pre");
      if (/\S/.test(out)) {
        // Display results as text
        const code = document.createElement("code");
        code.innerText = out;
        pre.appendChild(code);
      } else {
        // If nothing is present, hide the element.
        pre.style.visibility = "hidden";
      }
      outputDiv.appendChild(pre);

      // Place the graphics on the canvas
      if (canvas) {
        const p = document.createElement("p");
        p.appendChild(canvas);
        outputDiv.appendChild(p);
      }
    } finally {
      // Clean up the remaining code
      webRCodeShelter.purge();
      runButton.disabled = false;
    }
  }

  // Add a click event listener to the run button
  runButton.onclick = function () {
    executeCode(editor.getValue());
  };
</script>
</section>
<section id="the-central-limit-theorem-interactively" class="level2">
<h2 class="anchored" data-anchor-id="the-central-limit-theorem-interactively">The central limit theorem, interactively</h2>
<p>Draw <code>n_obs</code> samples from an exponential distribution (right-skewed, decidedly non-normal), compute the sample mean, repeat 2000 times. The orange curve is the normal approximation.</p>
<p>Set <code>n_obs &lt;- 1</code>, then <code>n_obs &lt;- 5</code>, then <code>n_obs &lt;- 30</code>. The distribution of means converges to normal.</p>
<button class="btn btn-default btn-webr" disabled="" type="button" id="webr-run-button-3">Loading
  webR...</button>
<div id="webr-editor-3"></div>
<div id="webr-code-output-3" aria-live="assertive">
  <pre style="visibility: hidden"></pre>
</div>
<script type="module">
  // Retrieve webR code cell information
  const runButton = document.getElementById("webr-run-button-3");
  const outputDiv = document.getElementById("webr-code-output-3");
  const editorDiv = document.getElementById("webr-editor-3");

  // Add a light grey outline around the code editor
  editorDiv.style.border = "1px solid #eee";

  // Load the Monaco Editor and create an instance
  let editor;
  require(['vs/editor/editor.main'], function () {
    editor = monaco.editor.create(editorDiv, {
      value: `n_obs <- 5  # try 1, 5, 10, 30

sample_means <- replicate(2000, mean(rexp(n_obs, rate = 0.5)))

hist(sample_means,
     breaks = 60,
     col = "#1b6b4a40",
     border = "#1b6b4a",
     main = paste("n =", n_obs, "— sample means from exponential population"),
     xlab = "Sample mean",
     prob = TRUE)

# Normal approximation
x <- seq(min(sample_means), max(sample_means), length.out = 300)
lines(x, dnorm(x, mean(sample_means), sd(sample_means)),
      col = "#c27839", lwd = 2)`,
      language: 'r',
      theme: 'vs-light',
      automaticLayout: true,           // TODO: Could be problematic for slide decks
      scrollBeyondLastLine: false,
      minimap: {
        enabled: false
      },
      fontSize: '17.5rem',               // Bootstrap is 1 rem
      renderLineHighlight: "none",     // Disable current line highlighting
      hideCursorInOverviewRuler: true  // Remove cursor indictor in right hand side scroll bar
    });

    // Dynamically modify the height of the editor window if new lines are added.
    let ignoreEvent = false;
    const updateHeight = () => {
      const contentHeight = editor.getContentHeight();
      // We're avoiding a width change
      //editorDiv.style.width = `${width}px`;
      editorDiv.style.height = `${contentHeight}px`;
      try {
        ignoreEvent = true;

        // The key to resizing is this call
        editor.layout();
      } finally {
        ignoreEvent = false;
      }
    };

    // Helper function to check if selected text is empty
    function isEmptyCodeText(selectedCodeText) {
      return (selectedCodeText === null || selectedCodeText === undefined || selectedCodeText === "");
    }

    // Registry of keyboard shortcuts that should be re-added to each editor window
    // when focus changes.
    const addWebRKeyboardShortCutCommands = () => {
      // Add a keydown event listener for Shift+Enter to run all code in cell
      editor.addCommand(monaco.KeyMod.Shift | monaco.KeyCode.Enter, () => {

        // Retrieve all text inside the editor
        executeCode(editor.getValue());
      });

      // Add a keydown event listener for CMD/Ctrl+Enter to run selected code
      editor.addCommand(monaco.KeyMod.CtrlCmd | monaco.KeyCode.Enter, () => {

        // Get the selected text from the editor
        const selectedText = editor.getModel().getValueInRange(editor.getSelection());
        // Check if no code is selected
        if (isEmptyCodeText(selectedText)) {
          // Obtain the current cursor position
          let currentPosition = editor.getPosition();
          // Retrieve the current line content
          let currentLine = editor.getModel().getLineContent(currentPosition.lineNumber);

          // Propose a new position to move the cursor to
          let newPosition = new monaco.Position(currentPosition.lineNumber + 1, 1);

          // Check if the new position is beyond the last line of the editor
          if (newPosition.lineNumber > editor.getModel().getLineCount()) {
            // Add a new line at the end of the editor
            editor.executeEdits("addNewLine", [{
            range: new monaco.Range(newPosition.lineNumber, 1, newPosition.lineNumber, 1),
            text: "\n", 
            forceMoveMarkers: true,
            }]);
          }
          
          // Run the entire line of code.
          executeCode(currentLine);

          // Move cursor to new position
          editor.setPosition(newPosition);
        } else {
          // Code to run when Ctrl+Enter is pressed with selected code
          executeCode(selectedText);
        }
      });
    }

    // Register an on focus event handler for when a code cell is selected to update
    // what keyboard shortcut commands should work.
    // This is a workaround to fix a regression that happened with multiple
    // editor windows since Monaco 0.32.0 
    // https://github.com/microsoft/monaco-editor/issues/2947
    editor.onDidFocusEditorText(addWebRKeyboardShortCutCommands);

    // Register an on change event for when new code is added to the editor window
    editor.onDidContentSizeChange(updateHeight);

    // Manually re-update height to account for the content we inserted into the call
    updateHeight();
  });

  // Function to execute the code (accepts code as an argument)
  async function executeCode(codeToRun) {
    // Disable run button for code cell active
    runButton.disabled = true;

    // Create a canvas variable for graphics
    let canvas = undefined;

    // Initialize webR
    await globalThis.webR.init();

    // Setup a webR canvas by making a namespace call into the {webr} package
    await webR.evalRVoid("webr::canvas(width=504, height=360)");

    // Capture output data from evaluating the code
    const result = await webRCodeShelter.captureR(codeToRun, {
      withAutoprint: true,
      captureStreams: true,
      captureConditions: false//,
      // env: webR.objs.emptyEnv, // maintain a global environment for webR v0.2.0
    });

    // Start attempting to parse the result data
    try {

      // Stop creating images
      await webR.evalRVoid("dev.off()");

      // Merge output streams of STDOUT and STDErr (messages and errors are combined.)
      const out = result.output.filter(
        evt => evt.type == "stdout" || evt.type == "stderr"
      ).map((evt) => evt.data).join("\n");

      // Clean the state
      const msgs = await webR.flush();

      // Output each image stored
      msgs.forEach(msg => {
        // Determine if old canvas can be used or a new canvas is required.
        if (msg.type === 'canvas'){
          // Add image to the current canvas
          if (msg.data.event === 'canvasImage') {
            canvas.getContext('2d').drawImage(msg.data.image, 0, 0);
          } else if (msg.data.event === 'canvasNewPage') {
            // Generate a new canvas element
            canvas = document.createElement("canvas");
            canvas.setAttribute("width", 2 * 504);
            canvas.setAttribute("height", 2 * 360);
            canvas.style.width = "700px";
            canvas.style.display = "block";
            canvas.style.margin = "auto";
          }
        }
      });

      // Nullify the outputDiv of content
      outputDiv.innerHTML = "";

      // Design an output object for messages
      const pre = document.createElement("pre");
      if (/\S/.test(out)) {
        // Display results as text
        const code = document.createElement("code");
        code.innerText = out;
        pre.appendChild(code);
      } else {
        // If nothing is present, hide the element.
        pre.style.visibility = "hidden";
      }
      outputDiv.appendChild(pre);

      // Place the graphics on the canvas
      if (canvas) {
        const p = document.createElement("p");
        p.appendChild(canvas);
        outputDiv.appendChild(p);
      }
    } finally {
      // Clean up the remaining code
      webRCodeShelter.purge();
      runButton.disabled = false;
    }
  }

  // Add a click event listener to the run button
  runButton.onclick = function () {
    executeCode(editor.getValue());
  };
</script>
</section>
<section id="ggplot2" class="level2">
<h2 class="anchored" data-anchor-id="ggplot2">ggplot2</h2>
<p><code>ggplot2</code> loaded in the background while you were reading. Same <code>faithful</code> data, with 2D density contours:</p>
<button class="btn btn-default btn-webr" disabled="" type="button" id="webr-run-button-4">Loading
  webR...</button>
<div id="webr-editor-4"></div>
<div id="webr-code-output-4" aria-live="assertive">
  <pre style="visibility: hidden"></pre>
</div>
<script type="module">
  // Retrieve webR code cell information
  const runButton = document.getElementById("webr-run-button-4");
  const outputDiv = document.getElementById("webr-code-output-4");
  const editorDiv = document.getElementById("webr-editor-4");

  // Add a light grey outline around the code editor
  editorDiv.style.border = "1px solid #eee";

  // Load the Monaco Editor and create an instance
  let editor;
  require(['vs/editor/editor.main'], function () {
    editor = monaco.editor.create(editorDiv, {
      value: `library(ggplot2)

ggplot(faithful, aes(x = eruptions, y = waiting)) +
  geom_point(alpha = 0.4, color = "#1b6b4a", size = 2) +
  geom_density_2d(color = "#c27839", linewidth = 0.6) +
  theme_minimal(base_size = 13) +
  labs(
    title = "Old Faithful: eruption duration vs. waiting time",
    x = "Eruption duration (min)",
    y = "Waiting time (min)"
  )`,
      language: 'r',
      theme: 'vs-light',
      automaticLayout: true,           // TODO: Could be problematic for slide decks
      scrollBeyondLastLine: false,
      minimap: {
        enabled: false
      },
      fontSize: '17.5rem',               // Bootstrap is 1 rem
      renderLineHighlight: "none",     // Disable current line highlighting
      hideCursorInOverviewRuler: true  // Remove cursor indictor in right hand side scroll bar
    });

    // Dynamically modify the height of the editor window if new lines are added.
    let ignoreEvent = false;
    const updateHeight = () => {
      const contentHeight = editor.getContentHeight();
      // We're avoiding a width change
      //editorDiv.style.width = `${width}px`;
      editorDiv.style.height = `${contentHeight}px`;
      try {
        ignoreEvent = true;

        // The key to resizing is this call
        editor.layout();
      } finally {
        ignoreEvent = false;
      }
    };

    // Helper function to check if selected text is empty
    function isEmptyCodeText(selectedCodeText) {
      return (selectedCodeText === null || selectedCodeText === undefined || selectedCodeText === "");
    }

    // Registry of keyboard shortcuts that should be re-added to each editor window
    // when focus changes.
    const addWebRKeyboardShortCutCommands = () => {
      // Add a keydown event listener for Shift+Enter to run all code in cell
      editor.addCommand(monaco.KeyMod.Shift | monaco.KeyCode.Enter, () => {

        // Retrieve all text inside the editor
        executeCode(editor.getValue());
      });

      // Add a keydown event listener for CMD/Ctrl+Enter to run selected code
      editor.addCommand(monaco.KeyMod.CtrlCmd | monaco.KeyCode.Enter, () => {

        // Get the selected text from the editor
        const selectedText = editor.getModel().getValueInRange(editor.getSelection());
        // Check if no code is selected
        if (isEmptyCodeText(selectedText)) {
          // Obtain the current cursor position
          let currentPosition = editor.getPosition();
          // Retrieve the current line content
          let currentLine = editor.getModel().getLineContent(currentPosition.lineNumber);

          // Propose a new position to move the cursor to
          let newPosition = new monaco.Position(currentPosition.lineNumber + 1, 1);

          // Check if the new position is beyond the last line of the editor
          if (newPosition.lineNumber > editor.getModel().getLineCount()) {
            // Add a new line at the end of the editor
            editor.executeEdits("addNewLine", [{
            range: new monaco.Range(newPosition.lineNumber, 1, newPosition.lineNumber, 1),
            text: "\n", 
            forceMoveMarkers: true,
            }]);
          }
          
          // Run the entire line of code.
          executeCode(currentLine);

          // Move cursor to new position
          editor.setPosition(newPosition);
        } else {
          // Code to run when Ctrl+Enter is pressed with selected code
          executeCode(selectedText);
        }
      });
    }

    // Register an on focus event handler for when a code cell is selected to update
    // what keyboard shortcut commands should work.
    // This is a workaround to fix a regression that happened with multiple
    // editor windows since Monaco 0.32.0 
    // https://github.com/microsoft/monaco-editor/issues/2947
    editor.onDidFocusEditorText(addWebRKeyboardShortCutCommands);

    // Register an on change event for when new code is added to the editor window
    editor.onDidContentSizeChange(updateHeight);

    // Manually re-update height to account for the content we inserted into the call
    updateHeight();
  });

  // Function to execute the code (accepts code as an argument)
  async function executeCode(codeToRun) {
    // Disable run button for code cell active
    runButton.disabled = true;

    // Create a canvas variable for graphics
    let canvas = undefined;

    // Initialize webR
    await globalThis.webR.init();

    // Setup a webR canvas by making a namespace call into the {webr} package
    await webR.evalRVoid("webr::canvas(width=504, height=360)");

    // Capture output data from evaluating the code
    const result = await webRCodeShelter.captureR(codeToRun, {
      withAutoprint: true,
      captureStreams: true,
      captureConditions: false//,
      // env: webR.objs.emptyEnv, // maintain a global environment for webR v0.2.0
    });

    // Start attempting to parse the result data
    try {

      // Stop creating images
      await webR.evalRVoid("dev.off()");

      // Merge output streams of STDOUT and STDErr (messages and errors are combined.)
      const out = result.output.filter(
        evt => evt.type == "stdout" || evt.type == "stderr"
      ).map((evt) => evt.data).join("\n");

      // Clean the state
      const msgs = await webR.flush();

      // Output each image stored
      msgs.forEach(msg => {
        // Determine if old canvas can be used or a new canvas is required.
        if (msg.type === 'canvas'){
          // Add image to the current canvas
          if (msg.data.event === 'canvasImage') {
            canvas.getContext('2d').drawImage(msg.data.image, 0, 0);
          } else if (msg.data.event === 'canvasNewPage') {
            // Generate a new canvas element
            canvas = document.createElement("canvas");
            canvas.setAttribute("width", 2 * 504);
            canvas.setAttribute("height", 2 * 360);
            canvas.style.width = "700px";
            canvas.style.display = "block";
            canvas.style.margin = "auto";
          }
        }
      });

      // Nullify the outputDiv of content
      outputDiv.innerHTML = "";

      // Design an output object for messages
      const pre = document.createElement("pre");
      if (/\S/.test(out)) {
        // Display results as text
        const code = document.createElement("code");
        code.innerText = out;
        pre.appendChild(code);
      } else {
        // If nothing is present, hide the element.
        pre.style.visibility = "hidden";
      }
      outputDiv.appendChild(pre);

      // Place the graphics on the canvas
      if (canvas) {
        const p = document.createElement("p");
        p.appendChild(canvas);
        outputDiv.appendChild(p);
      }
    } finally {
      // Clean up the remaining code
      webRCodeShelter.purge();
      runButton.disabled = false;
    }
  }

  // Add a click event listener to the run button
  runButton.onclick = function () {
    executeCode(editor.getValue());
  };
</script>
<p>The two clusters are real. Old Faithful has two modes: short eruptions (~2 min) with short waits (~55 min), and long eruptions (~4.5 min) with long waits (~80 min).</p>
</section>
<section id="installing-packages-from-cran" class="level2">
<h2 class="anchored" data-anchor-id="installing-packages-from-cran">Installing packages from CRAN</h2>
<p>Any package with a WASM binary can be pre-loaded via the <code>packages</code> key in the document frontmatter. Both <code>ggplot2</code> and <code>palmerpenguins</code> are loaded that way here, which is why this cell runs without an <code>install.packages()</code> call:</p>
<button class="btn btn-default btn-webr" disabled="" type="button" id="webr-run-button-5">Loading
  webR...</button>
<div id="webr-editor-5"></div>
<div id="webr-code-output-5" aria-live="assertive">
  <pre style="visibility: hidden"></pre>
</div>
<script type="module">
  // Retrieve webR code cell information
  const runButton = document.getElementById("webr-run-button-5");
  const outputDiv = document.getElementById("webr-code-output-5");
  const editorDiv = document.getElementById("webr-editor-5");

  // Add a light grey outline around the code editor
  editorDiv.style.border = "1px solid #eee";

  // Load the Monaco Editor and create an instance
  let editor;
  require(['vs/editor/editor.main'], function () {
    editor = monaco.editor.create(editorDiv, {
      value: `library(palmerpenguins)

ggplot(penguins, aes(x = bill_length_mm, y = flipper_length_mm,
                     color = species)) +
  geom_point(alpha = 0.7, size = 2) +
  scale_color_manual(values = c("#1b6b4a", "#c27839", "#5856d6")) +
  theme_minimal(base_size = 13) +
  labs(title = "Palmer penguins",
       x = "Bill length (mm)", y = "Flipper length (mm)")`,
      language: 'r',
      theme: 'vs-light',
      automaticLayout: true,           // TODO: Could be problematic for slide decks
      scrollBeyondLastLine: false,
      minimap: {
        enabled: false
      },
      fontSize: '17.5rem',               // Bootstrap is 1 rem
      renderLineHighlight: "none",     // Disable current line highlighting
      hideCursorInOverviewRuler: true  // Remove cursor indictor in right hand side scroll bar
    });

    // Dynamically modify the height of the editor window if new lines are added.
    let ignoreEvent = false;
    const updateHeight = () => {
      const contentHeight = editor.getContentHeight();
      // We're avoiding a width change
      //editorDiv.style.width = `${width}px`;
      editorDiv.style.height = `${contentHeight}px`;
      try {
        ignoreEvent = true;

        // The key to resizing is this call
        editor.layout();
      } finally {
        ignoreEvent = false;
      }
    };

    // Helper function to check if selected text is empty
    function isEmptyCodeText(selectedCodeText) {
      return (selectedCodeText === null || selectedCodeText === undefined || selectedCodeText === "");
    }

    // Registry of keyboard shortcuts that should be re-added to each editor window
    // when focus changes.
    const addWebRKeyboardShortCutCommands = () => {
      // Add a keydown event listener for Shift+Enter to run all code in cell
      editor.addCommand(monaco.KeyMod.Shift | monaco.KeyCode.Enter, () => {

        // Retrieve all text inside the editor
        executeCode(editor.getValue());
      });

      // Add a keydown event listener for CMD/Ctrl+Enter to run selected code
      editor.addCommand(monaco.KeyMod.CtrlCmd | monaco.KeyCode.Enter, () => {

        // Get the selected text from the editor
        const selectedText = editor.getModel().getValueInRange(editor.getSelection());
        // Check if no code is selected
        if (isEmptyCodeText(selectedText)) {
          // Obtain the current cursor position
          let currentPosition = editor.getPosition();
          // Retrieve the current line content
          let currentLine = editor.getModel().getLineContent(currentPosition.lineNumber);

          // Propose a new position to move the cursor to
          let newPosition = new monaco.Position(currentPosition.lineNumber + 1, 1);

          // Check if the new position is beyond the last line of the editor
          if (newPosition.lineNumber > editor.getModel().getLineCount()) {
            // Add a new line at the end of the editor
            editor.executeEdits("addNewLine", [{
            range: new monaco.Range(newPosition.lineNumber, 1, newPosition.lineNumber, 1),
            text: "\n", 
            forceMoveMarkers: true,
            }]);
          }
          
          // Run the entire line of code.
          executeCode(currentLine);

          // Move cursor to new position
          editor.setPosition(newPosition);
        } else {
          // Code to run when Ctrl+Enter is pressed with selected code
          executeCode(selectedText);
        }
      });
    }

    // Register an on focus event handler for when a code cell is selected to update
    // what keyboard shortcut commands should work.
    // This is a workaround to fix a regression that happened with multiple
    // editor windows since Monaco 0.32.0 
    // https://github.com/microsoft/monaco-editor/issues/2947
    editor.onDidFocusEditorText(addWebRKeyboardShortCutCommands);

    // Register an on change event for when new code is added to the editor window
    editor.onDidContentSizeChange(updateHeight);

    // Manually re-update height to account for the content we inserted into the call
    updateHeight();
  });

  // Function to execute the code (accepts code as an argument)
  async function executeCode(codeToRun) {
    // Disable run button for code cell active
    runButton.disabled = true;

    // Create a canvas variable for graphics
    let canvas = undefined;

    // Initialize webR
    await globalThis.webR.init();

    // Setup a webR canvas by making a namespace call into the {webr} package
    await webR.evalRVoid("webr::canvas(width=504, height=360)");

    // Capture output data from evaluating the code
    const result = await webRCodeShelter.captureR(codeToRun, {
      withAutoprint: true,
      captureStreams: true,
      captureConditions: false//,
      // env: webR.objs.emptyEnv, // maintain a global environment for webR v0.2.0
    });

    // Start attempting to parse the result data
    try {

      // Stop creating images
      await webR.evalRVoid("dev.off()");

      // Merge output streams of STDOUT and STDErr (messages and errors are combined.)
      const out = result.output.filter(
        evt => evt.type == "stdout" || evt.type == "stderr"
      ).map((evt) => evt.data).join("\n");

      // Clean the state
      const msgs = await webR.flush();

      // Output each image stored
      msgs.forEach(msg => {
        // Determine if old canvas can be used or a new canvas is required.
        if (msg.type === 'canvas'){
          // Add image to the current canvas
          if (msg.data.event === 'canvasImage') {
            canvas.getContext('2d').drawImage(msg.data.image, 0, 0);
          } else if (msg.data.event === 'canvasNewPage') {
            // Generate a new canvas element
            canvas = document.createElement("canvas");
            canvas.setAttribute("width", 2 * 504);
            canvas.setAttribute("height", 2 * 360);
            canvas.style.width = "700px";
            canvas.style.display = "block";
            canvas.style.margin = "auto";
          }
        }
      });

      // Nullify the outputDiv of content
      outputDiv.innerHTML = "";

      // Design an output object for messages
      const pre = document.createElement("pre");
      if (/\S/.test(out)) {
        // Display results as text
        const code = document.createElement("code");
        code.innerText = out;
        pre.appendChild(code);
      } else {
        // If nothing is present, hide the element.
        pre.style.visibility = "hidden";
      }
      outputDiv.appendChild(pre);

      // Place the graphics on the canvas
      if (canvas) {
        const p = document.createElement("p");
        p.appendChild(canvas);
        outputDiv.appendChild(p);
      }
    } finally {
      // Clean up the remaining code
      webRCodeShelter.purge();
      runButton.disabled = false;
    }
  }

  // Add a click event listener to the run button
  runButton.onclick = function () {
    executeCode(editor.getValue());
  };
</script>
<p>The WASM CRAN mirror covers most of the tidyverse and common packages. Three ways to install:</p>
<ul>
<li><strong>Frontmatter <code>packages</code> key</strong> (used above): packages load before the first cell runs. Best for dependencies your post relies on from the start.</li>
<li><strong><code>webr::install("pkg")</code> in a cell</strong>: installs at runtime inside a code cell. Works for packages you want readers to load on demand.</li>
<li><strong><code>install.packages()</code></strong>: works after calling <code>webr::shim_install()</code>, which patches the function to use the WASM mirror. Useful if you want familiar CRAN-style syntax.</li>
</ul>
<p>webR can’t compile from source, so packages need a WASM binary. Check <a href="https://repo.r-wasm.org/">repo.r-wasm.org</a> to see what’s available.</p>
</section>
<section id="where-webr-fits" class="level2">
<h2 class="anchored" data-anchor-id="where-webr-fits">Where webR fits</h2>
<p>The best case is when the code itself is what you’re teaching. Parameter exploration earns the startup time: the CLT cell above lets readers change <code>n_obs</code> from 1 to 30 and watch the histogram shift. That experience is the lesson in a way that a static image isn’t.</p>
<p>Algorithm exploration is another good fit. Code that behaves differently on different inputs — recursive sequences, graph traversals, sorting algorithms — invites readers to poke at it. The Collatz sequence starting at 27 takes 111 steps. Starting at 837799 takes 524. Readers can find out without leaving the page.</p>
<p>Simulation-based statistical intuition also works well. Bootstrap confidence intervals, permutation tests, birthday problem probabilities: hard to explain with formulas, obvious after running 2000 simulations. WebR is fast enough for this kind of thing that the wait doesn’t become the story.</p>
</section>
<section id="when-to-skip-it" class="level2">
<h2 class="anchored" data-anchor-id="when-to-skip-it">When to skip it</h2>
<p>WebR adds 5–10 seconds of startup and loads the full R runtime into the browser. That cost is worth it only if readers will actually interact with the code.</p>
<p>Skip it when the post is about results. If you’re presenting a plot and explaining what it means, render it statically. The reader gets the information faster, the page works without JavaScript, and there’s nothing to wait for.</p>
<p>Skip it for large data. Everything runs in the browser with no server behind it, which means everything loads into browser memory. A 50 MB CSV will be slow on a laptop and painful on mobile. If you need real data, use a small representative sample or a synthetic dataset.</p>
<p>Skip it for heavy computation. WebR runs in a single browser thread without the optimized BLAS/LAPACK libraries that native R uses. A simulation that takes one second in R might take ten in webR. The CLT demo above works fine. Fitting a random forest on 100,000 rows does not.</p>
<p>Skip it for benchmarking. WebR timing results don’t represent native R performance. If your point involves timing, either note this explicitly or skip webR for that post.</p>


</section>

 ]]></description>
  <category>R</category>
  <category>WebAssembly</category>
  <guid>https://jameshwade.com/posts/2023-08-13_webr.html</guid>
  <pubDate>Sun, 13 Aug 2023 00:00:00 GMT</pubDate>
  <media:content url="https://avatars.githubusercontent.com/u/112946187?s=400&amp;v=4" medium="image"/>
</item>
<item>
  <title>Teaching ChatGPT What It Doesn’t Know</title>
  <dc:creator>James H Wade</dc:creator>
  <link>https://jameshwade.com/posts/2023-03-10_vectorstores.html</link>
  <description><![CDATA[ 





<p>Large language models like GPT-3 and ChatGPT don’t need much of an introduction. They are enormously powerful with benchmarks being surpassed <a href="https://arxiv.org/pdf/2104.14337.pdf">nearly as quickly as they are created</a>. Despite the unprecedented model performance, these models struggle to provide accurate results when the appropriate response requires context more recent than the training data for a model. Vector databases can created from data sources outside the training corpus can address this gap by providing missing context to a model.</p>
<p>Vector databases can be use in semantic search with <strong>embeddings</strong> created form source text. By creating embeddings of text data and storing them in a database, we can quickly search for related documents and even perform advanced operations like similarity searches or clustering. This can be especially helpful when working with text data that is more context-dependent or domain-specific, such as scientific or technical documentation.</p>
<p><code>{gpttools}</code> provides a set of tools for working with GPT-3 and other OpenAI models, including the ability to generate embeddings, perform similarity searches, and build vector databases. This package also has convenience functions to aid in scraping web pages to collect text data, generate embeddings, and store those embeddings in a vector database for future use.</p>
<p>To demonstrate the power of vector databases, we’ll use <code>{gpttools}</code> to build a vector database from <a href="https://r4ds.hadley.nz/"><em>R for Data Science</em></a>. The approach uses semantic search to find the most relevant text from the book and then uses ChatGPT to generate a response based on that text via the recently release ChatGPT API.</p>
<p>Popular python packages such as <a href="https://gpt-index.readthedocs.io/en/latest/index.html"><code>llama-index</code></a> and <a href="https://langchain.readthedocs.io/en/latest/index.html"><code>langchain</code></a> provide easy utility functions to create vector stores for semantic search with a few lines of python code. <code>{gpttools}</code> aims to provide similar functionality with R using data frames as the data structure for the vector store.</p>
<section id="scraping-text-from-r4ds" class="level2">
<h2 class="anchored" data-anchor-id="scraping-text-from-r4ds">Scraping Text from R4DS</h2>
<p>The first step is to scrape the text from the R4DS book. We’ll use the <code>crawl()</code> function to scrape the text from the book and store it in a data frame. The <code>crawl()</code> function uses the <code>rvest</code> package to scrape the text from the online book and <code>{tokenizers}</code> to split the text into chunks for subsequent processing.</p>
<p>The code to scrape the data is relatively simple but is unlikely to work on all sites. From some internal testing, it works quite well on <code>{pkgdown}</code> and similar documentation sites.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(gpttools)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">crawl</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://r4ds.hadley.nz/"</span>)</span></code></pre></div></div>
</div>
<p>Under the hood there are a few things going on. Here is the annotated function and associated functions:</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true" href="">crawl</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false" href="">recursive_hyperlinks</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-3" aria-controls="tabset-1-3" aria-selected="false" href="">get_hyperlinks</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-4-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-4" aria-controls="tabset-1-4" aria-selected="false" href="">scrape_url</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">crawl <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(url,</span>
<span id="cb2-2">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">index_create =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb2-3">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">aggressive =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>,</span>
<span id="cb2-4">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">overwrite =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>,</span>
<span id="cb2-5">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">num_cores =</span> parallel<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">detectCores</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) {</span>
<span id="cb2-6">  local_domain <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> urltools<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">url_parse</span>(url)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>domain</span>
<span id="cb2-7">  withr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">local_options</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb2-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cli.progress_show_after =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,</span>
<span id="cb2-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cli.progress_clear =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span></span>
<span id="cb2-10">  ))</span>
<span id="cb2-11">  future<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plan</span>(future<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>multisession, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">workers =</span> num_cores)</span>
<span id="cb2-12">  scraped_data_dir <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb2-13">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.path</span>(tools<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">R_user_dir</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gpttools"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">which =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data"</span>), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"text"</span>)</span>
<span id="cb2-14">  scraped_text_file <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb2-15">    glue<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glue</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"{scraped_data_dir}/{local_domain}.parquet"</span>)</span>
<span id="cb2-16"></span>
<span id="cb2-17">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.exists</span>(scraped_text_file) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;&amp;</span> rlang<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is_false</span>(overwrite)) {</span>
<span id="cb2-18">    cli<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cli_abort</span>(</span>
<span id="cb2-19">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb2-20">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"!"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Scraped data already exists for this domain."</span>,</span>
<span id="cb2-21">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"i"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Use {.code crawl(&lt;url&gt;, overwrite = TRUE)} to overwrite."</span></span>
<span id="cb2-22">      )</span>
<span id="cb2-23">    )</span>
<span id="cb2-24">  }</span>
<span id="cb2-25"></span>
<span id="cb2-26">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cli_rule</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Crawling {.url {url}}"</span>)</span>
<span id="cb2-27">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cli_inform</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb2-28">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"i"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"This may take a while."</span>,</span>
<span id="cb2-29">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"i"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Gathering links to scrape"</span></span>
<span id="cb2-30">  ))</span>
<span id="cb2-31">  links <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb2-32">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">recursive_hyperlinks</span>(local_domain, url, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">aggressive =</span> aggressive) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-33">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unique</span>()</span>
<span id="cb2-34">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cli_inform</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"i"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Scraping validated links"</span>))</span>
<span id="cb2-35">  scraped_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb2-36">    purrr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(links, \(x) {</span>
<span id="cb2-37">      <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">identical</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">check_url</span>(x), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>L)) {</span>
<span id="cb2-38">        tibble<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb2-39">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">link    =</span> x,</span>
<span id="cb2-40">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">text    =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scrape_url</span>(x), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">collapse =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">" "</span>),</span>
<span id="cb2-41">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n_words =</span> tokenizers<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count_words</span>(text)</span>
<span id="cb2-42">        )</span>
<span id="cb2-43">      } <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> {</span>
<span id="cb2-44">        cli<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cli_inform</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb2-45">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"!"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Skipped {url}"</span>,</span>
<span id="cb2-46">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"i"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Status code: {status}"</span></span>
<span id="cb2-47">        ))</span>
<span id="cb2-48">      }</span>
<span id="cb2-49">    }) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-50">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_rows</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-51">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">distinct</span>()</span>
<span id="cb2-52">  arrow<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">write_parquet</span>(</span>
<span id="cb2-53">    scraped_data,</span>
<span id="cb2-54">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glue</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"text/{local_domain}.parquet"</span>)</span>
<span id="cb2-55">  )</span>
<span id="cb2-56">}</span></code></pre></div></div>
</div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">recursive_hyperlinks <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(local_domain,</span>
<span id="cb3-2">                                 url,</span>
<span id="cb3-3">                                 <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">checked_urls =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>,</span>
<span id="cb3-4">                                 <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">aggressive =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>) {</span>
<span id="cb3-5">  links <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> url[<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>(url <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> checked_urls)]</span>
<span id="cb3-6">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(links) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) {</span>
<span id="cb3-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">return</span>(checked_urls)</span>
<span id="cb3-8">  }</span>
<span id="cb3-9"></span>
<span id="cb3-10">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (aggressive) {</span>
<span id="cb3-11">    domain_pattern <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glue</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"^https?://(?:.*</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">.)?{local_domain}/?"</span>)</span>
<span id="cb3-12">  } <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> {</span>
<span id="cb3-13">    domain_pattern <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glue</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"^https?://{local_domain}/?"</span>)</span>
<span id="cb3-14">  }</span>
<span id="cb3-15"></span>
<span id="cb3-16">  checked_urls <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(checked_urls, links)</span>
<span id="cb3-17">  cli<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cli_inform</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"i"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total urls: {length(checked_urls)}"</span>))</span>
<span id="cb3-18">  links_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> furrr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">future_map</span>(links, get_hyperlinks) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-19">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_rows</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-20">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>stringr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_detect</span>(link, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"^</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">.$|mailto:|^</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">.</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">.|</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">#|^</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">_$"</span>))</span>
<span id="cb3-21"></span>
<span id="cb3-22">  new_links <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb3-23">    purrr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pmap</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.list</span>(links_df), \(parent, link) {</span>
<span id="cb3-24">      clean_link <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span></span>
<span id="cb3-25">      <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (stringr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_detect</span>(link, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"^https?://"</span>, local_domain))) {</span>
<span id="cb3-26">        clean_link <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> link</span>
<span id="cb3-27">      } <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (stringr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_detect</span>(link, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"^/[^/]|^/+$|^</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">./|^[[:alnum:]]"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;&amp;</span></span>
<span id="cb3-28">        <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>stringr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_detect</span>(link, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"^https?://"</span>)) {</span>
<span id="cb3-29">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (stringr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_detect</span>(link, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"^</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">./"</span>)) {</span>
<span id="cb3-30">          clean_link <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> stringr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_replace</span>(link, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"^</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">./"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/"</span>)</span>
<span id="cb3-31">        } <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (stringr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_detect</span>(link, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"^[[:alnum:]]"</span>)) {</span>
<span id="cb3-32">          clean_link <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> glue<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glue</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/"</span>, link)</span>
<span id="cb3-33">        } <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> {</span>
<span id="cb3-34">          clean_link <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> link</span>
<span id="cb3-35">        }</span>
<span id="cb3-36">        clean_link <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> glue<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glue</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"{parent}{clean_link}"</span>)</span>
<span id="cb3-37">      }</span>
<span id="cb3-38">    }) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-39">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlist</span>()</span>
<span id="cb3-40"></span>
<span id="cb3-41">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">recursive_hyperlinks</span>(local_domain, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unique</span>(new_links), checked_urls)</span>
<span id="cb3-42">}</span></code></pre></div></div>
</div>
</div>
<div id="tabset-1-3" class="tab-pane" aria-labelledby="tabset-1-3-tab">
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">get_hyperlinks <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(url) {</span>
<span id="cb4-2">  rlang<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">check_installed</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rvest"</span>)</span>
<span id="cb4-3">  status <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> httr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">GET</span>(url) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> httr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">status_code</span>()</span>
<span id="cb4-4">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">identical</span>(status, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>L)) {</span>
<span id="cb4-5">    tibble<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb4-6">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">parent =</span> url,</span>
<span id="cb4-7">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">link =</span> rvest<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_html</span>(url) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-8">        rvest<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">html_nodes</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"a[href]"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-9">        rvest<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">html_attr</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"href"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-10">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unique</span>()</span>
<span id="cb4-11">    )</span>
<span id="cb4-12">  } <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> {</span>
<span id="cb4-13">    cli<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cli_warn</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb4-14">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"!"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"URL not valid."</span>,</span>
<span id="cb4-15">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"i"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Tried to scrape {url}"</span>,</span>
<span id="cb4-16">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"i"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Status code: {status}"</span></span>
<span id="cb4-17">    ))</span>
<span id="cb4-18">  }</span>
<span id="cb4-19">}</span></code></pre></div></div>
</div>
</div>
<div id="tabset-1-4" class="tab-pane" aria-labelledby="tabset-1-4-tab">
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">scrape_url <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(url) {</span>
<span id="cb5-2">  rlang<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">check_installed</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rvest"</span>)</span>
<span id="cb5-3">  exclude_tags <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"style"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"script"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"head"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"meta"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"link"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"button"</span>)</span>
<span id="cb5-4">  text <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> rvest<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_html</span>(url) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-5">    rvest<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">html_nodes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xpath =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"//body//*[not(self::"</span>,</span>
<span id="cb5-6">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(exclude_tags, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">collapse =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">" or self::"</span>),</span>
<span id="cb5-7">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">")]"</span>,</span>
<span id="cb5-8">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sep =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span></span>
<span id="cb5-9">    )) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-10">    rvest<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">html_text2</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-11">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">remove_new_lines</span>()</span>
<span id="cb5-12">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You need to enable JavaScript to run this app."</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> text) {</span>
<span id="cb5-13">    cli<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cli_warn</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Unable to parse page {url}. JavaScript is required."</span>)</span>
<span id="cb5-14">    <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span></span>
<span id="cb5-15">  } <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> {</span>
<span id="cb5-16">    text</span>
<span id="cb5-17">  }</span>
<span id="cb5-18">}</span></code></pre></div></div>
</div>
</div>
</div>
</div>
<p><code>crawl()</code> that takes in a single argument, <code>url</code>, which is a character string of the URL to be scraped. The function scrapes all hyperlinks within the same domain. The scraped text is processed into a tibble format and saved as a parquet file using the <code>{arrow}</code> package into a directory called “text” with a filename that includes the local domain extracted earlier.</p>
<p>The function begins by extracting the local domain of the input URL using the <code>urltools::url_parse()</code> function, which returns a parsed URL object, and then extracting the domain component of the object.</p>
<p>The function then calls another function called <code>recursive_hyperlinks()</code> to recursively extract all hyperlinks within the url and validates the links in the process by only keeping urls that return a status code of <code>200</code> (i.e., the webpage is accessible).</p>
<p>The function then loops through each link and scrapes the text from each webpage and creates a tibble with three columns: link, text, and n_words. The link column contains the URL, the text column contains the scraped text, and the n_words column contains the number of words in the scraped text.</p>
</section>
<section id="generating-embeddings" class="level2">
<h2 class="anchored" data-anchor-id="generating-embeddings">Generating Embeddings</h2>
<p>After scraping the text data from the R4DS book, the next step is to generate embeddings for each chunk of text. This can be done using the <code>create_index()</code> function provided by the <code>{gpttools}</code> package. <code>create_index()</code> takes in a single argument, <code>domain</code>, which should be a character string indicating the domain of the scraped data.</p>
<p>Here’s the code to generate embeddings for the R4DS text data:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Specify the domain of the scraped data</span></span>
<span id="cb6-2">domain <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"r4ds.hadley.nz"</span></span>
<span id="cb6-3"></span>
<span id="cb6-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create embeddings for each chunk of text in the scraped data</span></span>
<span id="cb6-5">index <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">create_index</span>(domain)</span></code></pre></div></div>
</div>
<p><code>create_index()</code> function prepares the scraped data for indexing using the <code>prepare_scraped_files()</code> function, which splits the text into chunks and calculates the number of tokens in each chunk. It then calls <code>add_embeddings()</code> to generate embeddings for each chunk of text using the OpenAI API. A resulting tibble with embeddings is stored as a feather file using, again using the <code>{arrow}</code> package.</p>
<p>Here’s the code for the create_index() function and helper functions:</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-2-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-1" aria-controls="tabset-2-1" aria-selected="true" href="">create_index</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-2" aria-controls="tabset-2-2" aria-selected="false" href="">prepare_scraped_files</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-3" aria-controls="tabset-2-3" aria-selected="false" href="">add_embeddings</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-4-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-4" aria-controls="tabset-2-4" aria-selected="false" href="">create_openai_embedding</a></li></ul>
<div class="tab-content">
<div id="tabset-2-1" class="tab-pane active" aria-labelledby="tabset-2-1-tab">
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">create_index <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(domain, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">overwrite =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>) {</span>
<span id="cb7-2">  index_dir <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb7-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.path</span>(tools<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">R_user_dir</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gpttools"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">which =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data"</span>), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"index"</span>)</span>
<span id="cb7-4">  index_file <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb7-5">    glue<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glue</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"{index_dir}/{domain}.parquet"</span>)</span>
<span id="cb7-6"></span>
<span id="cb7-7">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.exists</span>(index_file) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;&amp;</span> rlang<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is_false</span>(overwrite)) {</span>
<span id="cb7-8">    cli<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cli_abort</span>(</span>
<span id="cb7-9">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb7-10">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"!"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Index already exists for this domain."</span>,</span>
<span id="cb7-11">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"i"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Use {.code overwrite = TRUE} to overwrite index."</span></span>
<span id="cb7-12">      )</span>
<span id="cb7-13">    )</span>
<span id="cb7-14">  }</span>
<span id="cb7-15">  cli<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cli_inform</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb7-16">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"!"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You are about to create embeddings for {domain}."</span>,</span>
<span id="cb7-17">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"i"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"This will use many tokens. Only proceed if you understand the cost."</span>,</span>
<span id="cb7-18">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"i"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Read more about embeddings at {.url</span></span>
<span id="cb7-19"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      https://platform.openai.com/docs/guides/embeddings}."</span></span>
<span id="cb7-20">  ))</span>
<span id="cb7-21">  ask_user <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> usethis<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ui_yeah</span>(</span>
<span id="cb7-22">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Would you like to continue with creating embeddings?"</span></span>
<span id="cb7-23">  )</span>
<span id="cb7-24">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (rlang<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is_true</span>(ask_user)) {</span>
<span id="cb7-25">    index <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb7-26">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prepare_scraped_files</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">domain =</span> domain) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-27">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_embeddings</span>()</span>
<span id="cb7-28">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (rlang<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is_false</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dir.exists</span>(index_dir))) {</span>
<span id="cb7-29">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dir.create</span>(index_dir, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">recursive =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb7-30">    }</span>
<span id="cb7-31">    arrow<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">write_parquet</span>(</span>
<span id="cb7-32">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x    =</span> index,</span>
<span id="cb7-33">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sink =</span> index_file</span>
<span id="cb7-34">    )</span>
<span id="cb7-35">  } <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> {</span>
<span id="cb7-36">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cli_inform</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"No index was created for {domain}"</span>)</span>
<span id="cb7-37">  }</span>
<span id="cb7-38">}</span></code></pre></div></div>
</div>
</div>
<div id="tabset-2-2" class="tab-pane" aria-labelledby="tabset-2-2-tab">
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">prepare_scraped_files <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(domain) {</span>
<span id="cb8-2">  scraped_dir <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> tools<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">R_user_dir</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gpttools"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">which =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data"</span>)</span>
<span id="cb8-3">  arrow<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_parquet</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glue</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"{scraped_dir}/text/{domain}.parquet"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-4">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb8-5">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">chunks =</span> purrr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(text, \(x) {</span>
<span id="cb8-6">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chunk_with_overlap</span>(x,</span>
<span id="cb8-7">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">chunk_size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">500</span>,</span>
<span id="cb8-8">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">overlap_size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>,</span>
<span id="cb8-9">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">doc_id =</span> domain,</span>
<span id="cb8-10">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lowercase =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>,</span>
<span id="cb8-11">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">strip_punct =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>,</span>
<span id="cb8-12">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">strip_numeric =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>,</span>
<span id="cb8-13">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">stopwords =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span></span>
<span id="cb8-14">        )</span>
<span id="cb8-15">      })</span>
<span id="cb8-16">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-17">    tidyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(chunks) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-18">    tidyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(chunks) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-19">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">original_text =</span> text) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-20">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n_tokens =</span> tokenizers<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count_characters</span>(chunks) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%/%</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)</span>
<span id="cb8-21">}</span></code></pre></div></div>
</div>
</div>
<div id="tabset-2-3" class="tab-pane" aria-labelledby="tabset-2-3-tab">
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">add_embeddings <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(index) {</span>
<span id="cb9-2">  index <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-3">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb9-4">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">embeddings =</span> purrr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(</span>
<span id="cb9-5">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.x =</span> chunks,</span>
<span id="cb9-6">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.f =</span> create_openai_embedding,</span>
<span id="cb9-7">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.progress =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Create Embeddings"</span></span>
<span id="cb9-8">      )</span>
<span id="cb9-9">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-10">    tidyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(embeddings)</span>
<span id="cb9-11">}</span></code></pre></div></div>
</div>
</div>
<div id="tabset-2-4" class="tab-pane" aria-labelledby="tabset-2-4-tab">
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">create_openai_embedding <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb10-2">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(input_text,</span>
<span id="cb10-3">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"text-embedding-ada-002"</span>,</span>
<span id="cb10-4">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">openai_api_key =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.getenv</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"OPENAI_API_KEY"</span>)) {</span>
<span id="cb10-5">    body <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb10-6">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> model,</span>
<span id="cb10-7">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">input =</span> input_text</span>
<span id="cb10-8">    )</span>
<span id="cb10-9">    embedding <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">query_openai_api</span>(body, openai_api_key, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">task =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"embeddings"</span>)</span>
<span id="cb10-10">    embedding<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>usage<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>total_tokens</span>
<span id="cb10-11">    tibble<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb10-12">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">usage =</span> embedding<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>usage<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>total_tokens,</span>
<span id="cb10-13">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">embedding =</span> embedding<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>data<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>embedding</span>
<span id="cb10-14">    )</span>
<span id="cb10-15">  }</span></code></pre></div></div>
</div>
</div>
</div>
</div>
<p>The resulting tibble contains the following columns:</p>
<ul>
<li><code>link</code>: URL of the webpage</li>
<li><code>original_text</code>: scraped text</li>
<li><code>n_words</code>: number of words in the scraped text</li>
<li><code>chunks</code>: text split into chunks</li>
<li><code>usage</code>: number of tokens used for creating the embedding</li>
<li><code>embeddings</code>: embeddings for each chunk of text</li>
</ul>
</section>
<section id="querying-with-embeddings" class="level2">
<h2 class="anchored" data-anchor-id="querying-with-embeddings">Querying with Embeddings</h2>
<p>After generating embeddings for each chunk of text, the next step is to query the embeddings to find similar chunks of text. This can be done using the <code>query_index()</code> function provided by the <code>{gpttools}</code> package. This is a bit of a complicated function, so it’s worth taking a look at the code to see how it works.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-3-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-1" aria-controls="tabset-3-1" aria-selected="true" href="">query_index</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-2" aria-controls="tabset-3-2" aria-selected="false" href="">get_top_matches</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-3" aria-controls="tabset-3-3" aria-selected="false" href="">openai_create_chat_completion (from <code>{gptstudio}</code>)</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-4-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-4" aria-controls="tabset-3-4" aria-selected="false" href="">query_openai_api</a></li></ul>
<div class="tab-content">
<div id="tabset-3-1" class="tab-pane active" aria-labelledby="tabset-3-1-tab">
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">query_index <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(index, query, history, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">task =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Context Only"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">k =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>) {</span>
<span id="cb11-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arg_match</span>(task, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Context Only"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Permissive Chat"</span>))</span>
<span id="cb11-3"></span>
<span id="cb11-4">  query_embedding <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">create_openai_embedding</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">input_text =</span> query) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb11-5">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>(embedding) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb11-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlist</span>()</span>
<span id="cb11-7"></span>
<span id="cb11-8">  full_context <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">get_top_matches</span>(index, query_embedding, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">k =</span> k)</span>
<span id="cb11-9"></span>
<span id="cb11-10">  context <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb11-11">    full_context <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb11-12">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>(chunks) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb11-13">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">collapse =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb11-14"></span>
<span id="cb11-15">  instructions <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb11-16">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">switch</span>(task,</span>
<span id="cb11-17">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Context Only"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span></span>
<span id="cb11-18">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb11-19">          <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb11-20">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"system"</span>,</span>
<span id="cb11-21">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span></span>
<span id="cb11-22">              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glue</span>(</span>
<span id="cb11-23">                <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You are a helpful chat bot that answers questions based on the</span></span>
<span id="cb11-24"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">                context provided by the user. If the user does not provide</span></span>
<span id="cb11-25"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">                context, say </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\"</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">I am not able to answer that question. Maybe</span></span>
<span id="cb11-26"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">                try rephrasing your question in a different way.</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\"\n\n</span></span>
<span id="cb11-27"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">                Context: {context}"</span></span>
<span id="cb11-28">              )</span>
<span id="cb11-29">          ),</span>
<span id="cb11-30">          <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb11-31">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"user"</span>,</span>
<span id="cb11-32">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glue</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"{query}"</span>)</span>
<span id="cb11-33">          )</span>
<span id="cb11-34">        ),</span>
<span id="cb11-35">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Permissive Chat"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span></span>
<span id="cb11-36">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb11-37">          <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb11-38">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"system"</span>,</span>
<span id="cb11-39">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span></span>
<span id="cb11-40">              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glue</span>(</span>
<span id="cb11-41">                <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You are a helpful chat bot that answers questions based on the</span></span>
<span id="cb11-42"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">                context provided by the user. If the user does not provide</span></span>
<span id="cb11-43"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">                context, say </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\"</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">I am not able to answer that question with the</span></span>
<span id="cb11-44"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">                context you gave me, but here is my best answer. Maybe</span></span>
<span id="cb11-45"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">                try rephrasing your question in a different way.</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\"\n\n</span></span>
<span id="cb11-46"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">                Context: {context}"</span></span>
<span id="cb11-47">              )</span>
<span id="cb11-48">          ),</span>
<span id="cb11-49">          <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb11-50">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"user"</span>,</span>
<span id="cb11-51">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glue</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"{query}"</span>)</span>
<span id="cb11-52">          )</span>
<span id="cb11-53">        )</span>
<span id="cb11-54">    )</span>
<span id="cb11-55"></span>
<span id="cb11-56">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cli_inform</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Embedding..."</span>)</span>
<span id="cb11-57"></span>
<span id="cb11-58">  history <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb11-59">    purrr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(history, \(x) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (x<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>role <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"system"</span>) <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> x) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb11-60">    purrr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">compact</span>()</span>
<span id="cb11-61"></span>
<span id="cb11-62">  prompt <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(history, instructions)</span>
<span id="cb11-63">  answer <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> gptstudio<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">openai_create_chat_completion</span>(prompt)</span>
<span id="cb11-64">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(prompt, full_context, answer)</span>
<span id="cb11-65">}</span></code></pre></div></div>
</div>
</div>
<div id="tabset-3-2" class="tab-pane" aria-labelledby="tabset-3-2-tab">
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">get_top_matches <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(index, query_embedding, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">k =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>) {</span>
<span id="cb12-2">  index <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb12-3">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">similarity =</span> purrr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_dbl</span>(embedding, \(x) {</span>
<span id="cb12-4">      lsa<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cosine</span>(query_embedding, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlist</span>(x))</span>
<span id="cb12-5">    })) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb12-6">    dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">desc</span>(similarity)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb12-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>(k)</span>
<span id="cb12-8">}</span></code></pre></div></div>
</div>
</div>
<div id="tabset-3-3" class="tab-pane" aria-labelledby="tabset-3-3-tab">
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">openai_create_chat_completion <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb13-2">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;|endoftext|&gt;"</span>,</span>
<span id="cb13-3">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gpt-3.5-turbo"</span>,</span>
<span id="cb13-4">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">openai_api_key =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.getenv</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"OPENAI_API_KEY"</span>),</span>
<span id="cb13-5">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">task =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"chat/completions"</span>) {</span>
<span id="cb13-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">assert_that</span>(</span>
<span id="cb13-7">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.string</span>(model),</span>
<span id="cb13-8">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.string</span>(openai_api_key)</span>
<span id="cb13-9">    )</span>
<span id="cb13-10"></span>
<span id="cb13-11">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.string</span>(prompt)) {</span>
<span id="cb13-12">      prompt <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb13-13">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb13-14">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role    =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"user"</span>,</span>
<span id="cb13-15">          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> prompt</span>
<span id="cb13-16">        )</span>
<span id="cb13-17">      )</span>
<span id="cb13-18">    }</span>
<span id="cb13-19"></span>
<span id="cb13-20">    body <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb13-21">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> model,</span>
<span id="cb13-22">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">messages =</span> prompt</span>
<span id="cb13-23">    )</span>
<span id="cb13-24"></span>
<span id="cb13-25">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">query_openai_api</span>(body, openai_api_key, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">task =</span> task)</span>
<span id="cb13-26">  }</span></code></pre></div></div>
</div>
</div>
<div id="tabset-3-4" class="tab-pane" aria-labelledby="tabset-3-4-tab">
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">query_openai_api <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(body, openai_api_key, task) {</span>
<span id="cb14-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arg_match</span>(task, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"completions"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"chat/completions"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"edits"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"embeddings"</span>))</span>
<span id="cb14-3"></span>
<span id="cb14-4">  base_url <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glue</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://api.openai.com/v1/{task}"</span>)</span>
<span id="cb14-5"></span>
<span id="cb14-6">  headers <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb14-7">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Authorization"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glue</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bearer {openai_api_key}"</span>),</span>
<span id="cb14-8">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Content-Type"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"application/json"</span></span>
<span id="cb14-9">  )</span>
<span id="cb14-10"></span>
<span id="cb14-11">  response <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb14-12">    httr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">RETRY</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"POST"</span>,</span>
<span id="cb14-13">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">url =</span> base_url,</span>
<span id="cb14-14">      httr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_headers</span>(headers), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">body =</span> body,</span>
<span id="cb14-15">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">encode =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"json"</span>,</span>
<span id="cb14-16">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">quiet =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span></span>
<span id="cb14-17">    )</span>
<span id="cb14-18"></span>
<span id="cb14-19">  parsed <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> response <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb14-20">    httr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">content</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">as =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"text"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">encoding =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"UTF-8"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb14-21">    jsonlite<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fromJSON</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">flatten =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb14-22"></span>
<span id="cb14-23">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (httr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">http_error</span>(response)) {</span>
<span id="cb14-24">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cli_alert_warning</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb14-25">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"x"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glue</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"OpenAI API request failed [{httr::status_code(response)}]."</span>),</span>
<span id="cb14-26">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"i"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glue</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Error message: {parsed$error$message}"</span>)</span>
<span id="cb14-27">    ))</span>
<span id="cb14-28">  }</span>
<span id="cb14-29">  parsed</span>
<span id="cb14-30">}</span></code></pre></div></div>
</div>
</div>
</div>
</div>
<p>The <code>query_index()</code> function takes in four arguments: <code>index</code>, <code>query</code>, <code>task</code>, and <code>k</code>.</p>
<ul>
<li><code>index</code>: The index containing the embeddings to be queried.</li>
<li><code>query</code>: The query string to search for similar chunks of text.</li>
<li><code>task</code>: The type of task to perform based on the context of the query. {gpttools} provides a few pre-defined tasks, such as “conservative q&amp;a” or “extract key libraries and tools”, which can be passed to this argument. These queries were taken from a repo created by OpenAI and Pinecone, which can be found <a href="https://github.com/pinecone-io/examples/tree/master/integrations/openai/beyond_search_webinar">here</a>.</li>
<li><code>k</code>: The number of similar chunks of text to return.</li>
</ul>
<p>The function generates an embedding for the query string with <code>create_openai_embedding()</code>. It then uses the <code>get_top_matches()</code> to find the most similar chunks of text in the index using cosine similarity returning the top <code>k</code> matches.</p>
<p>The next step is to formats the instructions based on the <code>task</code> argument as well as the <code>query</code> and <code>context</code>. For example, if the task is “conservative q&amp;a”, the function will return a string asking the model to answer a question based on the context of the returned chunks of text. If the task is “extract key libraries and tools”, the function will return a string listing the libraries and tools present in the returned chunks of text.</p>
<p>The prompt is passed to OpenAI’s GPT-3.5 <code>gpt-3.5-turbo</code> (i.e., ChatGPT) to generate a response based on the formatted output. The response is returned as a list containing the instructions, the top <code>k</code> matches, and the response from ChatGPT.</p>
</section>
<section id="analysis-of-embeddings-with-tidymodels-and-umap" class="level2">
<h2 class="anchored" data-anchor-id="analysis-of-embeddings-with-tidymodels-and-umap">Analysis of Embeddings with tidymodels and UMAP</h2>
<p>The next step is to perform dimension reduction with UMAP as naive exploration of the embeddings. The <code>embeddings</code> column in the index contains a list of 1536-dimensional vectors. We can use the <code>recipes</code> package to normalize the vectors and then use the <code>step_umap()</code> function to reduce the dimensionality to 2. We can then plot the results.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(recipes)</span>
<span id="cb15-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggplot2)</span>
<span id="cb15-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(purrr)</span>
<span id="cb15-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyr)</span>
<span id="cb15-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(embed)</span>
<span id="cb15-6"></span>
<span id="cb15-7">index_wide <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span>  </span>
<span id="cb15-8">  index <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">embedding =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(embedding, \(x) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.data.frame</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t</span>(x)))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(embedding)</span>
<span id="cb15-11"></span>
<span id="cb15-12"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">123</span>)</span>
<span id="cb15-13"></span>
<span id="cb15-14">umap_spec <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">recipe</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> ., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> index_wide) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step_normalize</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">starts_with</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"V"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step_umap</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">starts_with</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"V"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">num_comp =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb15-17"></span>
<span id="cb15-18">umap_estimates <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prep</span>(umap_spec, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">training =</span> index_wide)</span>
<span id="cb15-19">umap_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bake</span>(umap_estimates, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">new_data =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>)</span>
<span id="cb15-20"></span>
<span id="cb15-21">umap_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb15-22">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb15-23">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> UMAP1, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> UMAP2, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> link),</span>
<span id="cb15-24">             <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">show.legend =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb15-25">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Dimensionality Reduction with UMAP"</span>,</span>
<span id="cb15-26">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"UMAP of 1536-dimensional vectors | Colored by Source Text"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb15-27">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://jameshwade.com/posts/2023-03-10_vectorstores_files/figure-html/unnamed-chunk-16-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="retreiver-plugin-as-a-shiny-app" class="level2">
<h2 class="anchored" data-anchor-id="retreiver-plugin-as-a-shiny-app">Retreiver Plugin as a Shiny App</h2>
<p>To use the index, <code>{gpttools}</code> now as a shiny app that you can run as a plugin in RStudio. To use it, open the command palette (Cmd/Ctrl + Shift + P) and type “gpttools”. You can then select the “gpttools: ChatGPT with Retrieval” option. This will open a shiny app in your viewer pane. If you have multiple indices, you can select which one to use in the dropdown menu. You can also specify if you want answers that only use the context of index (“Context Only”) or if you want answers that use the context of the index and full ChatGPT model (“Permissive Chat”). “Context Only” answers are much less likely to “hallucinate.” “Permissive Chat” answers are more likely to hallucinate, but they are also more likely choice if the index lacks relevant information.</p>
<video src="https://user-images.githubusercontent.com/6314313/227738408-0c4c97e9-3601-4977-b8a8-ac655a185656.mov" data-canonical-src="https://user-images.githubusercontent.com/6314313/227738408-0c4c97e9-3601-4977-b8a8-ac655a185656.mov" controls="controls" muted="muted" class="d-block rounded-bottom-2 width-fit" style="max-height:400px; max-width: 100%; width: 100%;">
</video>
</section>
<section id="photo-credit" class="level2">
<h2 class="anchored" data-anchor-id="photo-credit">Photo Credit</h2>
<p>Thumbnail Photo by <a href="https://unsplash.com/@richardworks?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Richard Burlton</a> on <a href="https://unsplash.com/s/photos/retriever?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></p>


</section>

 ]]></description>
  <category>ChatGPT</category>
  <category>LLM</category>
  <category>NLP</category>
  <category>Web Scraping</category>
  <category>R</category>
  <category>OpenAI</category>
  <guid>https://jameshwade.com/posts/2023-03-10_vectorstores.html</guid>
  <pubDate>Sat, 25 Mar 2023 00:00:00 GMT</pubDate>
  <media:content url="https://jameshwade.com/posts/images/retriever.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Exploring {dm} Alone</title>
  <dc:creator>James H Wade</dc:creator>
  <link>https://jameshwade.com/posts/2023-01-25_dm-alone.html</link>
  <description><![CDATA[ 





<p>Working with relational data tables doesn’t sound like the most exciting topic, but it’s one I always could do better in my data science projects. Kirill Müller drew in quite an audience for his overview of the <code>{dm}</code> package at <a href="https://www.rstudio.com/conference/2022/talks/dm-analyze-build-deploy-relational/">his rstudio::conf talk in 2022</a>. <code>{dm}</code> is designed to bridge the gap between individual data frames and relational databases, making it a powerful tool for anyone working with large or complex datasets.</p>
<p><code>{dm}</code> provides a consistent set of verbs for consuming, creating, and deploying relational data models. It makes working with data a lot easier by capturing a relational data models constructed from local data frames or “lazy tables” connected to an RDBMS (Relational Database Management System). With <code>{dm}</code> you can use <code>{dplyr}</code> data manipulation verbs, along with additional methods for constructing and verifying relational data models, including key selection, key creation, and rigorous constraint checking.</p>
<p>One of the most powerful features of <code>{dm}</code> is its ability to scale from datasets that fit in memory to databases with billions of rows. This means that even if your dataset is too large to fit in memory, you can still use <code>{dm}</code> to work with it efficiently.</p>
<section id="creating-dm-from-dataframes" class="level2">
<h2 class="anchored" data-anchor-id="creating-dm-from-dataframes">Creating <code>dm</code> from Dataframes</h2>
<p>In this tutorial, we will use the <code>{alone}</code> package, part of week for of #TidyTuesday. Our first step is to convert data in the <code>{alone}</code> package into a <code>dm</code> object.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(dm)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>
Attaching package: 'dm'</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code>The following object is masked from 'package:stats':

    filter</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(alone)</span>
<span id="cb4-2"></span>
<span id="cb4-3">alone_no_keys <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dm</span>(episodes, loadouts, seasons, survivalists)</span>
<span id="cb4-4">alone_no_keys</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>── Metadata ────────────────────────────────────────────────────────────────────
Tables: `episodes`, `loadouts`, `seasons`, `survivalists`
Columns: 41
Primary keys: 0
Foreign keys: 0</code></pre>
</div>
</div>
</section>
<section id="primary-keys" class="level2">
<h2 class="anchored" data-anchor-id="primary-keys">Primary Keys</h2>
<p>In practice, we should always inspect our data to ensure we are joining data in a sensible manner, but <code>dm_enum_pk_candidates()</code> can suggest a primary key for us.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dm_enum_pk_candidates</span>(</span>
<span id="cb6-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dm =</span> alone_no_keys,</span>
<span id="cb6-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">table =</span> episodes</span>
<span id="cb6-4">)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 11 × 3
   columns                candidate why                                         
   &lt;keys&gt;                 &lt;lgl&gt;     &lt;chr&gt;                                       
 1 episode_number_overall TRUE      ""                                          
 2 title                  TRUE      ""                                          
 3 version                FALSE     "has duplicate values: US (98)"             
 4 season                 FALSE     "has duplicate values: 2 (13), 1 (11), 6 (1…
 5 episode                FALSE     "has duplicate values: 1 (9), 2 (9), 3 (9),…
 6 air_date               FALSE     "has duplicate values: 2016-07-14 (2)"      
 7 viewers                FALSE     "has 15 missing values, and duplicate value…
 8 quote                  FALSE     "has duplicate values: In nature there are …
 9 author                 FALSE     "has duplicate values: John Muir (4), Ameli…
10 imdb_rating            FALSE     "has duplicate values: 7.7 (16), 7.6 (9), 7…
11 n_ratings              FALSE     "has 5 missing values, and duplicate values…</code></pre>
</div>
</div>
<p><code>episode_number</code> and <code>title</code> are the two candidates for primary keys. We can look at the other tables, as well.</p>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true" href="">loadouts</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false" href="">seasons</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-3" aria-controls="tabset-1-3" aria-selected="false" href="">survivalists</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dm_enum_pk_candidates</span>(</span>
<span id="cb8-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dm =</span> alone_no_keys,</span>
<span id="cb8-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">table =</span> loadouts</span>
<span id="cb8-4">)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 6 × 3
  columns       candidate why                                                   
  &lt;keys&gt;        &lt;lgl&gt;     &lt;chr&gt;                                                 
1 version       FALSE     has duplicate values: US (940)                        
2 season        FALSE     has duplicate values: 4 (140), 1 (100), 2 (100), 3 (1…
3 name          FALSE     has duplicate values: Brad Richardson (20), Britt Aha…
4 item_number   FALSE     has duplicate values: 1 (94), 2 (94), 3 (94), 4 (94),…
5 item_detailed FALSE     has duplicate values: Ferro rod (66), Sleeping bag (6…
6 item          FALSE     has duplicate values: Pot (92), Fishing gear (90), Sl…</code></pre>
</div>
</div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dm_enum_pk_candidates</span>(</span>
<span id="cb10-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dm =</span> alone_no_keys,</span>
<span id="cb10-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">table =</span> seasons</span>
<span id="cb10-4">)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 8 × 3
  columns       candidate why                                                   
  &lt;keys&gt;        &lt;lgl&gt;     &lt;chr&gt;                                                 
1 season        TRUE      ""                                                    
2 version       FALSE     "has duplicate values: US (9)"                        
3 location      FALSE     "has duplicate values: Quatsino (3), Great Slave Lake…
4 country       FALSE     "has duplicate values: Canada (7)"                    
5 n_survivors   FALSE     "has duplicate values: 10 (8)"                        
6 lat           FALSE     "has duplicate values: 50.72444 (3), 61.50028 (2)"    
7 lon           FALSE     "has duplicate values: -127.4981 (3), -114.0011 (2)"  
8 date_drop_off FALSE     "has 6 missing values"                                </code></pre>
</div>
</div>
</div>
<div id="tabset-1-3" class="tab-pane" aria-labelledby="tabset-1-3-tab">
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dm_enum_pk_candidates</span>(</span>
<span id="cb12-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dm =</span> alone_no_keys,</span>
<span id="cb12-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">table =</span> survivalists</span>
<span id="cb12-4">)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 16 × 3
   columns             candidate why                                            
   &lt;keys&gt;              &lt;lgl&gt;     &lt;chr&gt;                                          
 1 season              FALSE     has duplicate values: 4 (14), 1 (10), 2 (10), …
 2 name                FALSE     has duplicate values: Brad Richardson (2), Bri…
 3 age                 FALSE     has duplicate values: 31 (7), 40 (6), 44 (6), …
 4 gender              FALSE     has duplicate values: Male (74), Female (20)   
 5 city                FALSE     has duplicate values: Fox (3), Fox Lake (3), S…
 6 state               FALSE     has duplicate values: Alaska (11), Maine (6), …
 7 country             FALSE     has duplicate values: United States (79), Cana…
 8 result              FALSE     has duplicate values: 1 (10), 2 (10), 3 (10), …
 9 days_lasted         FALSE     has duplicate values: 8 (4), 1 (3), 2 (3), 4 (…
10 medically_evacuated FALSE     has duplicate values: FALSE (69), TRUE (25)    
11 reason_tapped_out   FALSE     has 10 missing values, and duplicate values: S…
12 reason_category     FALSE     has 10 missing values, and duplicate values: M…
13 team                FALSE     has 80 missing values, and duplicate values: B…
14 day_linked_up       FALSE     has 86 missing values, and duplicate values: 9…
15 profession          FALSE     has duplicate values: Carpenter (4), Blacksmit…
16 url                 FALSE     has duplicate values: alex-and-logan-ribar (2)…</code></pre>
</div>
</div>
</div>
</div>
</div>
<p><code>{loadouts}</code> and <code>{survivalists}</code> lack a single column that serves as a primary key, but we can use a column tuple (i.e., multiple columns) to make a primary key.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">alone_only_pks <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb14-2">  alone_no_keys <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb14-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dm_add_pk</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">table =</span> episodes, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columns =</span> episode_number_overall) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb14-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dm_add_pk</span>(loadouts, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(version, season, name, item_number)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb14-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dm_add_pk</span>(seasons, season) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb14-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dm_add_pk</span>(survivalists, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(season, name))</span>
<span id="cb14-7"></span>
<span id="cb14-8">alone_only_pks</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>── Metadata ────────────────────────────────────────────────────────────────────
Tables: `episodes`, `loadouts`, `seasons`, `survivalists`
Columns: 41
Primary keys: 4
Foreign keys: 0</code></pre>
</div>
</div>
</section>
<section id="foreign-keys" class="level2">
<h2 class="anchored" data-anchor-id="foreign-keys">Foreign Keys</h2>
<p>To create the relationships between tables, we need to identify foreign keys. We can use the same approach as we did with primary keys, this time with <code>dm_enum_fk_candidates()</code>.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dm_enum_fk_candidates</span>(</span>
<span id="cb16-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dm =</span> alone_only_pks,</span>
<span id="cb16-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">table =</span> episodes,</span>
<span id="cb16-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ref_table =</span> seasons</span>
<span id="cb16-5">)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 11 × 3
   columns                candidate why                                         
   &lt;keys&gt;                 &lt;lgl&gt;     &lt;chr&gt;                                       
 1 season                 TRUE      ""                                          
 2 version                FALSE     "Can't combine `value1` &lt;character&gt; and `va…
 3 episode_number_overall FALSE     "values of `episodes$episode_number_overall…
 4 episode                FALSE     "values of `episodes$episode` not in `seaso…
 5 title                  FALSE     "Can't combine `value1` &lt;character&gt; and `va…
 6 air_date               FALSE     "Can't combine `value1` &lt;character&gt; and `va…
 7 viewers                FALSE     "values of `episodes$viewers` not in `seaso…
 8 quote                  FALSE     "Can't combine `value1` &lt;character&gt; and `va…
 9 author                 FALSE     "Can't combine `value1` &lt;character&gt; and `va…
10 imdb_rating            FALSE     "values of `episodes$imdb_rating` not in `s…
11 n_ratings              FALSE     "values of `episodes$n_ratings` not in `sea…</code></pre>
</div>
</div>
<p><code>dm_add_fk()</code> is the same as <code>dm_add_pk()</code> except for foreign keys.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">alone_with_keys <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb18-2">  alone_only_pks <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb18-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dm_add_fk</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">table =</span> episodes, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columns =</span> season, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ref_table =</span> seasons) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb18-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dm_add_fk</span>(loadouts, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(name, season), survivalists) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb18-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dm_add_fk</span>(loadouts, season, seasons) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb18-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dm_add_fk</span>(survivalists, season, seasons)</span>
<span id="cb18-7"></span>
<span id="cb18-8">alone_with_keys</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>── Metadata ────────────────────────────────────────────────────────────────────
Tables: `episodes`, `loadouts`, `seasons`, `survivalists`
Columns: 41
Primary keys: 4
Foreign keys: 4</code></pre>
</div>
</div>
</section>
<section id="visualizing-relationships" class="level2">
<h2 class="anchored" data-anchor-id="visualizing-relationships">Visualizing Relationships</h2>
<p>Two powerful features included with <code>{dm}</code> are relational table visualization and integrity checks. <code>dm_draw()</code> gives us a visualization to inspect the relationships between tables and the keys used to do so.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">rlang<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">check_installed</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"DiagrammeR"</span>)</span>
<span id="cb20-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dm_draw</span>(alone_with_keys)</span></code></pre></div></div>
<div class="cell-output-display">
<div class="grViz html-widget html-fill-item-overflow-hidden html-fill-item" id="htmlwidget-b9ffb443254b7bb1a00b" style="width:100%;height:464px;"></div>
<script type="application/json" data-for="htmlwidget-b9ffb443254b7bb1a00b">{"x":{"diagram":"#data_model\ndigraph {\ngraph [rankdir=LR tooltip=\"Data Model\" ]\n\nnode [margin=0 fontcolor = \"#444444\" ]\n\nedge [color = \"#555555\", arrowsize = 1, ]\n\npack=true\npackmode= \"node\"\n\n  \"episodes\" [id = \"episodes\", label = <<TABLE ALIGN=\"LEFT\" BORDER=\"1\" CELLBORDER=\"0\" CELLSPACING=\"0\" COLOR=\"#555555\">\n    <TR>\n      <TD COLSPAN=\"1\" BGCOLOR=\"#EFEBDD\" BORDER=\"0\"><FONT COLOR=\"#000000\">episodes<\/FONT>\n<\/TD>\n    <\/TR>\n    <TR>\n      <TD ALIGN=\"LEFT\" BGCOLOR=\"#FFFFFF\" PORT=\"season\">season<\/TD>\n    <\/TR>\n    <TR>\n      <TD ALIGN=\"LEFT\" BGCOLOR=\"#FFFFFF\" PORT=\"episode_number_overall\"><U>episode_number_overall<\/U><\/TD>\n    <\/TR>\n  <\/TABLE>>, shape = \"plaintext\"] \n\n  \"loadouts\" [id = \"loadouts\", label = <<TABLE ALIGN=\"LEFT\" BORDER=\"1\" CELLBORDER=\"0\" CELLSPACING=\"0\" COLOR=\"#555555\">\n    <TR>\n      <TD COLSPAN=\"1\" BGCOLOR=\"#EFEBDD\" BORDER=\"0\"><FONT COLOR=\"#000000\">loadouts<\/FONT>\n<\/TD>\n    <\/TR>\n    <TR>\n      <TD ALIGN=\"LEFT\" BGCOLOR=\"#FFFFFF\" PORT=\"season\">season<\/TD>\n    <\/TR>\n    <TR>\n      <TD ALIGN=\"LEFT\" BGCOLOR=\"#FFFFFF\" PORT=\"version, season, name, item_number\"><U>version, season, name, item_number<\/U><\/TD>\n    <\/TR>\n    <TR>\n      <TD ALIGN=\"LEFT\" BGCOLOR=\"#FFFFFF\" PORT=\"name, season\">name, season<\/TD>\n    <\/TR>\n  <\/TABLE>>, shape = \"plaintext\"] \n\n  \"seasons\" [id = \"seasons\", label = <<TABLE ALIGN=\"LEFT\" BORDER=\"1\" CELLBORDER=\"0\" CELLSPACING=\"0\" COLOR=\"#555555\">\n    <TR>\n      <TD COLSPAN=\"1\" BGCOLOR=\"#EFEBDD\" BORDER=\"0\"><FONT COLOR=\"#000000\">seasons<\/FONT>\n<\/TD>\n    <\/TR>\n    <TR>\n      <TD ALIGN=\"LEFT\" BGCOLOR=\"#FFFFFF\" PORT=\"season\"><U>season<\/U><\/TD>\n    <\/TR>\n  <\/TABLE>>, shape = \"plaintext\"] \n\n  \"survivalists\" [id = \"survivalists\", label = <<TABLE ALIGN=\"LEFT\" BORDER=\"1\" CELLBORDER=\"0\" CELLSPACING=\"0\" COLOR=\"#555555\">\n    <TR>\n      <TD COLSPAN=\"1\" BGCOLOR=\"#EFEBDD\" BORDER=\"0\"><FONT COLOR=\"#000000\">survivalists<\/FONT>\n<\/TD>\n    <\/TR>\n    <TR>\n      <TD ALIGN=\"LEFT\" BGCOLOR=\"#FFFFFF\" PORT=\"season\">season<\/TD>\n    <\/TR>\n    <TR>\n      <TD ALIGN=\"LEFT\" BGCOLOR=\"#FFFFFF\" PORT=\"season, name\"><U>season, name<\/U><\/TD>\n    <\/TR>\n  <\/TABLE>>, shape = \"plaintext\"] \n\n\"episodes\":\"season\"->\"seasons\":\"season\" [id=\"episodes_1\"]\n\"loadouts\":\"season\"->\"seasons\":\"season\" [id=\"loadouts_1\"]\n\"survivalists\":\"season\"->\"seasons\":\"season\" [id=\"survivalists_1\"]\n\"loadouts\":\"name, season\"->\"survivalists\":\"season, name\" [id=\"loadouts_2\"]\n}","config":{"engine":null,"options":null}},"evals":[],"jsHooks":[]}</script>
</div>
</div>
</section>
<section id="integrity-checks" class="level2">
<h2 class="anchored" data-anchor-id="integrity-checks">Integrity Checks</h2>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dm_examine_constraints</span>(alone_no_keys)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>ℹ No constraints defined.</code></pre>
</div>
</div>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dm_examine_constraints</span>(alone_only_pks)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>ℹ All constraints satisfied.</code></pre>
</div>
</div>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dm_examine_constraints</span>(alone_with_keys)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>! Unsatisfied constraints:</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>• Table `loadouts`: foreign key `name`, `season` into table `survivalists`: Can't combine `value1` &lt;character&gt; and `value1` &lt;double&gt;.</code></pre>
</div>
</div>
<p>We can see that there is an issue with constraints for joining <code>loadouts</code> and <code>survivalists</code>.</p>
</section>
<section id="table-flattening" class="level2">
<h2 class="anchored" data-anchor-id="table-flattening">Table Flattening</h2>
<p>In my own projects, the power of a well organized tidy data structure is most evident when I join tidy tables to answer a particular question about the project. The joined table can sometimes be the most valuable step in the product. From these joins, I can usually build visuals and summaries that becomes the most visible artifact of the product.</p>
<p><code>dm_flatten_to_tbl()</code> uses a table of our choosing as starting point and produces a wide table that brings in information from our other tables. Importantly, columns with the same name but no relationship (i.e., they are not primary &lt;-&gt; foreign keys) are disambiguated. This seems like it could explode for more complicated data structures, but four tables seems manageable.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1">flat_survivors <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dm_flatten_to_tbl</span>(alone_with_keys, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.start =</span> survivalists)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>Renaming ambiguous columns: %&gt;%
  dm_rename(survivalists, country.survivalists = country) %&gt;%
  dm_rename(seasons, country.seasons = country)</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb30-1">flat_survivors</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 94 × 23
   season name     age gender city  state count…¹ result days_…² medic…³ reaso…⁴
    &lt;dbl&gt; &lt;chr&gt;  &lt;dbl&gt; &lt;chr&gt;  &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;    &lt;dbl&gt;   &lt;dbl&gt; &lt;lgl&gt;   &lt;chr&gt;  
 1      1 Alan …    40 Male   Blai… Geor… United…      1      56 FALSE   &lt;NA&gt;   
 2      1 Sam L…    22 Male   Linc… Nebr… United…      2      55 FALSE   Lost t…
 3      1 Mitch…    34 Male   Bell… Mass… United…      3      43 FALSE   Realiz…
 4      1 Lucas…    32 Male   Quas… Iowa  United…      4      39 FALSE   Felt c…
 5      1 Dusti…    37 Male   Pitt… Penn… United…      5       8 FALSE   Fear o…
 6      1 Brant…    44 Male   Albe… Nort… United…      6       6 FALSE   Consum…
 7      1 Wayne…    46 Male   Sain… New … Canada       7       4 FALSE   Fear o…
 8      1 Joe R…    24 Male   Wind… Onta… Canada       8       4 FALSE   Loss o…
 9      1 Chris…    41 Male   Umat… Flor… United…      9       1 FALSE   Fear o…
10      1 Josh …    31 Male   Jack… Ohio  United…     10       0 FALSE   Fear o…
# … with 84 more rows, 12 more variables: reason_category &lt;chr&gt;, team &lt;chr&gt;,
#   day_linked_up &lt;dbl&gt;, profession &lt;chr&gt;, url &lt;chr&gt;, version &lt;chr&gt;,
#   location &lt;chr&gt;, country.seasons &lt;chr&gt;, n_survivors &lt;dbl&gt;, lat &lt;dbl&gt;,
#   lon &lt;dbl&gt;, date_drop_off &lt;chr&gt;, and abbreviated variable names
#   ¹​country.survivalists, ²​days_lasted, ³​medically_evacuated,
#   ⁴​reason_tapped_out</code></pre>
</div>
</div>
<p>The renaming of ambiguous columns is important in this case since <code>seasons$country</code> refers the location of the show and <code>survivalists$country</code> refers to the nationality of the survivalist.</p>
</section>
<section id="a-simple-plot" class="level2">
<h2 class="anchored" data-anchor-id="a-simple-plot">A simple plot</h2>
<p>Usign flattened data, we can make a simple plot of days lasted versus age colored by country.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb32-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggplot2)</span>
<span id="cb32-2">flat_survivors <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb32-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> age, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> days_lasted, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> country.survivalists)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb32-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb32-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb32-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb32-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Survivalist Age"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Days Alone"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Nationality"</span>,</span>
<span id="cb32-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Days on Alone vs Survivalist Age"</span>,</span>
<span id="cb32-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Color Indicates Nationality"</span></span>
<span id="cb32-10">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb32-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">plot.title.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"plot"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://jameshwade.com/posts/2023-01-25_dm-alone_files/figure-html/unnamed-chunk-14-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>In a future post, I’d like to explore going from files to <code>dm</code> to a database.</p>


</section>

 ]]></description>
  <category>database</category>
  <category>RDBMS</category>
  <category>data management</category>
  <category>R</category>
  <category>dm</category>
  <category>TidyTuesday</category>
  <guid>https://jameshwade.com/posts/2023-01-25_dm-alone.html</guid>
  <pubDate>Wed, 25 Jan 2023 00:00:00 GMT</pubDate>
  <media:content url="https://jameshwade.com/posts/images/alonehex.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>MLOps: Moving from Posit Connect to Azure</title>
  <dc:creator>James H Wade</dc:creator>
  <link>https://jameshwade.com/posts/2022-01-22_mlops-azure.html</link>
  <description><![CDATA[ 





<p>If you’re like me, the decisions about deployment locations and “the cloud” are out of your control at work. Whether you use AWS, GCP, Azure, or another, you are stuck with the cloud you’ve been given. The purpose of this article is to demonstrate a model deployment using Posit’s open source tools for MLOps and using Azure as the deployment infrastructure. This is the second article in a series on MLOps. See the first one that <a href="../posts/2022-12-27_mlops-the-whole-game.html">uses Posit Connect for deployment</a>.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="https://vetiver.rstudio.com/"><img src="https://jameshwade.com/posts/images/vetiver-mlops.png" class="img-fluid figure-img" alt="During the MLOps cycle, we collect data, understand and clean the data, train and evaluate a model, deploy the model, and monitor the deployed model. Monitoring can then lead back to collecting more data. There are many great tools available to understand clean data (like pandas and the tidyverse) and to build models (like tidymodels and scikit-learn). Use the vetiver framework to deploy and monitor your models."></a></p>
<figcaption>Source: MLOps Team at Posit | An overview of MLOps with Vetiver and friends</figcaption>
</figure>
</div>
<section id="model-building" class="level2">
<h2 class="anchored" data-anchor-id="model-building">Model Building</h2>
<p>We covered model builidng in <a href="../posts/2022-12-27_mlops-the-whole-game.html">part one</a>, but here is the code from there to save time searching around for it.</p>
<div class="cell">
<details class="code-fold">
<summary>Show code from part one</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(gt)</span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidymodels)</span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(pins)</span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(vetiver)</span>
<span id="cb1-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(palmerpenguins)</span>
<span id="cb1-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(plumber)</span>
<span id="cb1-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(conflicted)</span>
<span id="cb1-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tidymodels_prefer</span>()</span>
<span id="cb1-10"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">conflict_prefer</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"penguins"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"palmerpenguins"</span>)</span>
<span id="cb1-11">penguins_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb1-12">  penguins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop_na</span>(sex) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>year, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>island)</span>
<span id="cb1-15"></span>
<span id="cb1-16"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1234</span>)</span>
<span id="cb1-17">penguin_split <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">initial_split</span>(penguins_df, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">strata =</span> sex)</span>
<span id="cb1-18">penguin_train <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">training</span>(penguin_split)</span>
<span id="cb1-19">penguin_test <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">testing</span>(penguin_split)</span>
<span id="cb1-20"></span>
<span id="cb1-21">penguin_rec <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb1-22">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">recipe</span>(sex <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> ., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> penguin_train) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-23">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step_YeoJohnson</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">all_numeric_predictors</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-24">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step_normalize</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">all_numeric_predictors</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-25">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step_dummy</span>(species)</span>
<span id="cb1-26"></span>
<span id="cb1-27">glm_spec <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb1-28">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">logistic_reg</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-29">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_engine</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"glm"</span>)</span>
<span id="cb1-30"></span>
<span id="cb1-31">tree_spec <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb1-32">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rand_forest</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">min_n =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tune</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-33">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_engine</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ranger"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-34">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_mode</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"classification"</span>)</span>
<span id="cb1-35"></span>
<span id="cb1-36">mlp_brulee_spec <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb1-37">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mlp</span>(</span>
<span id="cb1-38">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hidden_units =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tune</span>(), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">epochs =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tune</span>(),</span>
<span id="cb1-39">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">penalty =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tune</span>(), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">learn_rate =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tune</span>()</span>
<span id="cb1-40">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb1-41">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_engine</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"brulee"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb1-42">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_mode</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"classification"</span>)</span>
<span id="cb1-43"></span>
<span id="cb1-44"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1234</span>)</span>
<span id="cb1-45">penguin_folds <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vfold_cv</span>(penguin_train)</span>
<span id="cb1-46"></span>
<span id="cb1-47">bayes_control <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb1-48">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">control_bayes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">no_improve =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>L, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">time_limit =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">save_pred =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb1-49"></span>
<span id="cb1-50"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1234</span>)</span>
<span id="cb1-51">workflow_set <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb1-52">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">workflow_set</span>(</span>
<span id="cb1-53">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">preproc =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(penguin_rec),</span>
<span id="cb1-54">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">models =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb1-55">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">glm =</span> glm_spec,</span>
<span id="cb1-56">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tree =</span> tree_spec,</span>
<span id="cb1-57">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">torch =</span> mlp_brulee_spec</span>
<span id="cb1-58">    )</span>
<span id="cb1-59">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-60">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">workflow_map</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"tune_bayes"</span>,</span>
<span id="cb1-61">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">iter =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>L,</span>
<span id="cb1-62">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">resamples =</span> penguin_folds,</span>
<span id="cb1-63">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">control =</span> bayes_control</span>
<span id="cb1-64">  )</span></code></pre></div></div>
</details>
</div>
</section>
<section id="model-deployment-on-azure" class="level2">
<h2 class="anchored" data-anchor-id="model-deployment-on-azure">Model Deployment on Azure</h2>
<p>The <a href="https://rstudio.github.io/vetiver-r/"><code>{vetiver}</code></a> package is provides a set of tools for building, deploying, and managing machine learning models in production. It allows users to easily create, version, and deploy machine learning models to various hosting platforms, such as Posit Connect or a cloud hosting service like Azure. Part one showed a Connect deployment, and this one will use an Azure storage container as the board.</p>
<p>The <code>vetiver_model()</code> function is used to create an object that stores a machine learning model and its associated metadata, such as the model’s name, type, and parameters. <code>vetiver_pin_write()</code> and <code>vetiver_pin_read()</code> functions are used to write and read <code>vetiver_model</code> objects to and from a server.</p>
<section id="create-an-pins-board-in-an-azure-storage-container" class="level3">
<h3 class="anchored" data-anchor-id="create-an-pins-board-in-an-azure-storage-container">Create an Pins Board in an Azure Storage Container</h3>
<p>To access an Azure storage container, we can use the <a href="https://github.com/Azure/AzureStor"><code>{AzureStor}</code></a> packages. If you are using Azure, you are most likely using it in a corporate environment. That often comes with company-specific policies. If these are new to you, your best bet is to find someone who is already familiar with the cloud environment at your organization. This example uses SAS (Shared Access Signature) key authentication, which is a way to grant limited access to Azure storage resources, such as containers, to users or applications. SAS keys are generated by Azure Storage and provide a secure way to access storage resources without sharing the account key or the access keys associated with the storage account.</p>
<p>To use SAS keys for accessing Azure storage containers, you will need to create a SAS key and use it to authenticate your requests to the storage API. You can learn more about SAS keys and how to generate them from <a href="https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/create-sas-tokens?">Microsoft Learn</a>.</p>
<p>Below is an example for how to access an Azure storage container, create or connect to a board, and list pins stored in side it if any exist. The code assumes that the user has already set the AZURE_CONTAINER_ENDPOINT and AZURE_SAS_KEY environment variables and has installed the AzureStor and pins packages in their R environment.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(AzureStor)</span>
<span id="cb2-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(pins)</span>
<span id="cb2-3"></span>
<span id="cb2-4">container <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb2-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">storage_container</span>(</span>
<span id="cb2-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">endpoint =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.getenv</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"AZURE_CONTAINER_ENDPOINT"</span>),</span>
<span id="cb2-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sas =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.getenv</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"AZURE_SAS_KEY"</span>)</span>
<span id="cb2-8">  )</span>
<span id="cb2-9"></span>
<span id="cb2-10">model_board <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> pins<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">board_azure</span>(container)</span></code></pre></div></div>
</div>
<p>The <code>storage_container()</code> function from the AzureStor package is used to create a storage container object, which represents a container in an Azure storage account. The endpoint parameter specifies the endpoint URL for the storage container, and the <code>sas</code> variable specifies a SAS key that is used to authenticate requests to the container.</p>
<p>The <code>Sys.getenv()</code> function is used to retrieve the values of the <code>AZURE_CONTAINER_ENDPOINT</code> and <code>AZURE_SAS_KEY</code> environment variables. This assumes you already set <code>AZURE_CONTAINER_ENDPOINT</code> and <code>AZURE_SAS_KEY</code> in something like a <code>.Renviron</code> file. These variables should contain the endpoint URL and SAS key for the Azure storage container, respectively.</p>
<p>The <code>board_azure()</code> function from the <code>{pins}</code> package creates a pins board object that in the Azure storage container.</p>
</section>
<section id="create-vetiver-model" class="level3">
<h3 class="anchored" data-anchor-id="create-vetiver-model">Create Vetiver Model</h3>
<p>To deploy our model with <code>{vetiver}</code>, we starting with our <code>final_fit_to_deploy</code> from above, we first need to extract the trained workflow.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidymodels)</span>
<span id="cb3-2"></span>
<span id="cb3-3">best_model_id <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"recipe_glm"</span></span>
<span id="cb3-4"></span>
<span id="cb3-5">best_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb3-6">  workflow_set <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_workflow_set_result</span>(best_model_id) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select_best</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">metric =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"accuracy"</span>)</span>
<span id="cb3-9"></span>
<span id="cb3-10">final_fit_to_deploy <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb3-11">  workflow_set <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_workflow</span>(best_model_id) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">finalize_workflow</span>(best_fit) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">last_fit</span>(penguin_split) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_workflow</span>()</span></code></pre></div></div>
</div>
<p>We can do that with <code>tune::extract_workflow()</code>. The trained workflow is what we will deploy as a <code>vetiver_model</code>. That means we need to convert it from a workflow to a vetiver model with <code>vetiver_model()</code>.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(vetiver)</span>
<span id="cb4-2">v <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_model</span>(final_fit_to_deploy, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model_name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"penguins_model"</span>)</span>
<span id="cb4-3"></span>
<span id="cb4-4">v</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
── penguins_model ─ &lt;bundled_workflow&gt; model for deployment 
A glm classification modeling workflow using 5 features</code></pre>
</div>
</div>
</section>
<section id="pin-model-to-board" class="level3">
<h3 class="anchored" data-anchor-id="pin-model-to-board">Pin Model to Board</h3>
<p>Once the model_board connection is made it’s as easy as <code>vetiver_pin_write()</code> to “pin” our model to the model board and <code>vetiver_pin_read()</code> to access it.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">model_board <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_pin_write</span>(v)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>Creating new version '20230122T144640Z-a875f'
Writing to pin 'penguins_model'

Create a Model Card for your published model
• Model Cards provide a framework for transparent, responsible reporting
• Use the vetiver `.Rmd` template as a place to start</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">model_board <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_pin_read</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"penguins_model"</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
── penguins_model ─ &lt;bundled_workflow&gt; model for deployment 
A glm classification modeling workflow using 5 features</code></pre>
</div>
</div>
</section>
<section id="create-model-api" class="level3">
<h3 class="anchored" data-anchor-id="create-model-api">Create Model API</h3>
<p>Our next step is to use <code>{vetiver}</code> and <a href="https://www.rplumber.io/"><code>{plumber}</code></a> packages to create an API for our vetiver model, which can then be accessed and used to make predictions or perform other tasks via an HTTP request. <code>pr()</code> creates a new plumber router, and <code>vetiver_api(v)</code> adds a <code>POST</code> endpoint to make endpoints from a trained vetiver model. <code>vetiver_write_plumber()</code> creates a <code>plumber.R</code> file that specifies the model version of the model we pinned to our model dashboard with <code>vetiver_pin_write()</code>.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(plumber)</span>
<span id="cb10-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pr</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_api</span>(v)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># Plumber router with 2 endpoints, 4 filters, and 1 sub-router.
# Use `pr_run()` on this object to start the API.
├──[queryString]
├──[body]
├──[cookieParser]
├──[sharedSecret]
├──/logo
│  │ # Plumber static router serving from directory: /Library/Frameworks/R.framework/Versions/4.2/Resources/library/vetiver
├──/ping (GET)
└──/predict (POST)</code></pre>
</div>
</div>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_write_plumber</span>(</span>
<span id="cb12-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">board =</span> model_board,</span>
<span id="cb12-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"penguins_model"</span>,</span>
<span id="cb12-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">file =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"azure_plumber.R"</span></span>
<span id="cb12-5">)</span></code></pre></div></div>
</div>
<p>Here is an example of the <code>azure_plumber.R</code> file generated by <code>vetiver_write_pumber()</code>.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Generated by the vetiver package; edit with care</span></span>
<span id="cb13-2"></span>
<span id="cb13-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(pins)</span>
<span id="cb13-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(plumber)</span>
<span id="cb13-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(rapidoc)</span>
<span id="cb13-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(vetiver)</span>
<span id="cb13-7"></span>
<span id="cb13-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Packages needed to generate model predictions</span></span>
<span id="cb13-9"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>) {</span>
<span id="cb13-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(parsnip)</span>
<span id="cb13-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(recipes)</span>
<span id="cb13-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(stats)</span>
<span id="cb13-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(workflows)</span>
<span id="cb13-14">}</span>
<span id="cb13-15">b <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">board_azure</span>(AzureStor<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">storage_container</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://penguinstore.blob.core.windows.net/penguincontainer"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">path =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>)</span>
<span id="cb13-16">v <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_pin_read</span>(b, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"penguins_model"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">version =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"20221222T172651Z-50d8c"</span>)</span>
<span id="cb13-17"></span>
<span id="cb13-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#* @plumber</span></span>
<span id="cb13-19"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(pr) {</span>
<span id="cb13-20">  pr <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_api</span>(v)</span>
<span id="cb13-21">}</span></code></pre></div></div>
</div>
</section>
<section id="deploying-elsewhere-with-docker" class="level3">
<h3 class="anchored" data-anchor-id="deploying-elsewhere-with-docker">Deploying Elsewhere with Docker</h3>
<p>If Posit Connect is not the right place for our model, <code>vetiver_write_docker</code> creates a <code>dockerfile</code> and <code>renv.lock</code>. Deployment is much more complicated when not using Posit Connect. If this is your first time creating a deployment, I recommend you connect with <a href="mailto:jhwade@dow.com?subject=Request%20for%20Help%20with%20Vetiver%20Deployment&amp;body=Hi,%20I%20was%20reading%20your%20post%20on%20deploying%20outside%20of%20Posit%20Connect.%20I'd%20like%20some%20help.%20For%20my%20project...">me</a> or someone else with experience in Azure deployments.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_write_docker</span>(</span>
<span id="cb14-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">vetiver_model =</span> v,</span>
<span id="cb14-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">path =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"azure"</span>,</span>
<span id="cb14-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lockfile =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"azure/vetiver_renv.lock"</span></span>
<span id="cb14-5">)</span></code></pre></div></div>
</div>
<p>Here is an example of the dockerfile that is generated.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode dockerfile code-with-copy"><code class="sourceCode dockerfile"><span id="cb15-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Generated by the vetiver package; edit with care</span></span>
<span id="cb15-2"></span>
<span id="cb15-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">FROM</span> rocker/r-ver:4.2.2</span>
<span id="cb15-4"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">ENV</span> RENV_CONFIG_REPOS_OVERRIDE packagemanager.rstudio.com/cran/latest</span>
<span id="cb15-5"></span>
<span id="cb15-6"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">RUN</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">apt-get</span> update <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-qq</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">&amp;&amp;</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">apt-get</span> install <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-y</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--no-install-recommends</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb15-7">  libcurl4-openssl-dev <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb15-8">  libicu-dev <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb15-9">  libsodium-dev <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb15-10">  libssl-dev <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb15-11">  make <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb15-12">  zlib1g-dev <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb15-13">  <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">&amp;&amp;</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">apt-get</span> clean</span>
<span id="cb15-14"></span>
<span id="cb15-15"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">COPY</span> azure/vetiver_renv.lock renv.lock</span>
<span id="cb15-16"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">RUN</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">Rscript</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-e</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"install.packages('renv')"</span></span>
<span id="cb15-17"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">RUN</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">Rscript</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-e</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"renv::restore()"</span></span>
<span id="cb15-18"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">COPY</span> plumber.R /opt/ml/plumber.R</span>
<span id="cb15-19"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">EXPOSE</span> 8000</span>
<span id="cb15-20"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">ENTRYPOINT</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"R"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"-e"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pr &lt;- plumber::plumb('/opt/ml/plumber.R'); pr$run(host = '0.0.0.0', port = 8000)"</span>]</span></code></pre></div></div>
<p>To deploy our API in Azure using that Dockerfile, we need to:</p>
<ol type="1">
<li>Build a Docker image of your API using the Dockerfile. We need to have [docker installed](https://docs.docker.com/get-docker/) on the system we use to build the container. You can build the docker image from the Dockerfile by running the following command in the directory where your Dockerfile is located:</li>
</ol>
<div class="cell">
<div class="code-with-filename">
<div class="code-with-filename-file">
<pre><strong>Terminal</strong></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb16-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">docker</span> build <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-t</span> penguin-image .</span></code></pre></div></div>
</div>
</div>
<ol start="2" type="1">
<li>Push the Docker image to a container registry. A container registry is a service that stores Docker images and makes them available for deployment. Azure’s registry is called Azure Container Registry (ACR). Before we can push the image to ACR, we need to log in to the ACR using the <code>az acr login</code> command from the Azure CLI. We also need to create an ACR instance in Azure if we don’t already have one. To push the Docker image to a container registry, you will need to use the <a href="https://learn.microsoft.com/en-us/cli/azure/install-azure-cli">Azure CLI</a> <code>docker push</code> command and specify the image name and the registry URL. For example, to push the image to ACR, you can use the following command:</li>
</ol>
<div class="cell">
<div class="code-with-filename">
<div class="code-with-filename-file">
<pre><strong>Terminal</strong></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb17-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">az</span> acr login <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--name</span> vetiverdeploy</span>
<span id="cb17-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">docker</span> tag penguin-image:latest vetiverdeploy.azurecr.io/penguin-image</span>
<span id="cb17-3"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">docker</span> push vetiverdeploy.azurecr.io/penguin-image</span></code></pre></div></div>
</div>
</div>
<p>Here, <code>vetiverdeploy</code> is the name of our ACR and <code>penguin-image</code> is the name of our Docker image. The <code>latest</code> tag indicates that this is the latest version of the image. For more information on how to push a Docker image to ACR, you can refer to the official Microsoft documentation: <a href="https://docs.microsoft.com/en-us/azure/container-registry/container-registry-tasks-quick-task"><strong>Push and pull Docker images with Azure Container Registry Tasks</strong></a>. To break down these commands a bit further:</p>
<ul>
<li><p><code>az acr login --name vetiverdeploy</code> logs in to the Azure Container Registry with the specified name (in this case, <code>vetiverdeploy</code>). This is necessary in order to push images to the registry.</p></li>
<li><p><code>docker tag penguin-image:latest vetiverdeploy.azurecr.io/penguin-image</code> tags the Docker image with the specified image name and registry URL. The image name is <code>penguin-image</code>, and the registry URL is <code>vetiverdeploy.azurecr.io/penguin-image</code>. The latest tag indicates that this is the latest version of the image.</p></li>
<li><p><code>docker push vetiverdeploy.azurecr.io/penguin-image</code> pushes the Docker image to the specified registry URL. In this case, the image will be pushed to the <code>vetiverdeploy</code> ACR.</p></li>
</ul>
<ol start="3" type="1">
<li>We now need to create an Azure Container Instance (ACI) that uses our docker image we created and registered above. This can be done either using the <a href="https://learn.microsoft.com/en-us/azure/container-instances/container-instances-quickstart">Azure CLI</a> or in the <a href="https://learn.microsoft.com/en-us/azure/container-instances/container-instances-quickstart">Azure Portal</a>.</li>
</ol>
<p>With the ACI build complete, we have successfully deployed our API!</p>
<div class="callout callout-style-default callout-warning callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Warning</span>Azure can be frustrating at first
</div>
</div>
<div class="callout-body-container callout-body">
<p>These instructions are unlikely to be good enough to deploy a model without some familiarity with Azure. Please comment on this post or find someone with Azure experience for help.</p>
</div>
</div>
</section>
<section id="using-the-api-to-make-predictions" class="level3">
<h3 class="anchored" data-anchor-id="using-the-api-to-make-predictions">Using the API to Make Predictions</h3>
<p>The API deployment site url is <code>http://penguin.eastus.azurecontainer.io</code>, and the prediction endpoint is <code>http://penguin.eastus.azurecontainer.io:8000/predict</code>.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">endpoint <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb18-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_endpoint</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"http://penguin.eastus.azurecontainer.io:8000/predict"</span>)</span>
<span id="cb18-3">endpoint</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
── A model API endpoint for prediction: 
http://penguin.eastus.azurecontainer.io:8000/predict</code></pre>
</div>
</div>
<p>We can make endpoints with the endpoint using <code>predict</code>.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">new_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb20-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">species =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Adelie"</span>,</span>
<span id="cb20-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bill_length_mm =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">40.5</span>,</span>
<span id="cb20-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bill_depth_mm =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">18.9</span>,</span>
<span id="cb20-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">flipper_length_mm =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">180</span>,</span>
<span id="cb20-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">body_mass_g =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3950</span></span>
<span id="cb20-7">)</span>
<span id="cb20-8"></span>
<span id="cb20-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(endpoint, new_data)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 1 × 1
  .pred_class
  &lt;chr&gt;      
1 male       </code></pre>
</div>
</div>
<p>You can also use <code>{httr}</code> to call the API. In most cases, it is easier for R users to use <code>predict</code> rather than <code>httr::POST</code>. However, were this model written in another language, making predictions using <code>{httr}</code> would likely bet the best approach.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(httr)</span>
<span id="cb22-2">url <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"http://penguin.eastus.azurecontainer.io:8000/predict"</span></span>
<span id="cb22-3">json_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> jsonlite<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">toJSON</span>(new_data)</span>
<span id="cb22-4">response <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">POST</span>(url, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">body =</span> json_data)</span>
<span id="cb22-5">response</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>Response [http://penguin.eastus.azurecontainer.io:8000/predict]
  Date: 2023-01-22 14:46
  Status: 200
  Content-Type: application/json
  Size: 24 B</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">content</span>(response)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[[1]]
[[1]]$.pred_class
[1] "male"</code></pre>
</div>
</div>
<p>Avoiding a language-specific approach altogether, you can use <code>curl</code> in a terminal to make API calls.</p>
<div class="cell">
<div class="code-with-filename">
<div class="code-with-filename-file">
<pre><strong>Terminal</strong></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb26-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">curl</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-X</span> POST <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"http://penguin.eastus.azurecontainer.io:8000/predict"</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb26-2">-H <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Accept: application/json"</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb26-3">-H <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Content-Type: application/json"</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb26-4">-d <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'[{"species":"Adelie","bill_length_mm":0.5,"bill_depth_mm":0.5,"flipper_length_mm":0,"body_mass_g":0}]'</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span></code></pre></div></div>
</div>
</div>
</section>
</section>
<section id="model-monitoring" class="level2">
<h2 class="anchored" data-anchor-id="model-monitoring">Model Monitoring</h2>
<p>After deployment, we need to monitor model performance. The <a href="https://vetiver.rstudio.com/get-started/monitor.html">MLOps with vetiver monitoring page</a> describes this well:</p>
<blockquote class="blockquote">
<p>Machine learning can break quietly; a model can continue returning predictions without error, even if it is performing poorly. Often these quiet performance problems are discussed as types of model drift; data drift can occur when the statistical distribution of an input feature changes, or concept drift occurs when there is change in the relationship between the input features and the outcome.</p>
<p>Without monitoring for degradation, this silent failure can continue undiagnosed. The vetiver framework offers functions to fluently compute, store, and plot model metrics. These functions are particularly suited to monitoring your model using multiple performance metrics over time. Effective model monitoring is not “one size fits all”, but instead depends on choosing appropriate metrics and time aggregation for a given application.</p>
</blockquote>
<p>As a baseline for model performance, we can start by using our training set to create original metrics for the model. We also simulate a <code>date_obs</code> column. In a real example, you should use the date the data was collected.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb27-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1234</span>)</span>
<span id="cb27-2">penguin_train_by_date <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb27-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">training</span>(penguin_split) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb27-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb27-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">date_obs =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.Date</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb27-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb27-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(date_obs)</span>
<span id="cb27-8"></span>
<span id="cb27-9">original_metrics <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb27-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">augment</span>(v, penguin_train_by_date) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb27-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_compute_metrics</span>(</span>
<span id="cb27-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">date_var =</span> date_obs,</span>
<span id="cb27-13">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">period =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"day"</span>,</span>
<span id="cb27-14">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">truth =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sex"</span>,</span>
<span id="cb27-15">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">estimate =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".pred_class"</span></span>
<span id="cb27-16">  )</span>
<span id="cb27-17"></span>
<span id="cb27-18"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_plot_metrics</span>(original_metrics)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://jameshwade.com/posts/2022-01-22_mlops-azure_files/figure-html/unnamed-chunk-16-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>We can pin the model performance metrics, just as we did with the model.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1">model_board <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb28-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pin_write</span>(original_metrics, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"penguin_metrics"</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>Guessing `type = 'rds'`
Creating new version '20230122T144645Z-dd69c'
Writing to pin 'penguin_metrics'</code></pre>
</div>
</div>
<section id="performance-over-time" class="level3">
<h3 class="anchored" data-anchor-id="performance-over-time">Performance over Time</h3>
<p>To simulate the model going “live”, let’s use the test set to add more predictions.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb30-1">penguin_test_by_date <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb30-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">testing</span>(penguin_split) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb30-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb30-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">date_obs =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.Date</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb30-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb30-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(date_obs)</span>
<span id="cb30-7"></span>
<span id="cb30-8">v <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb30-9">  model_board <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb30-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_pin_read</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"penguins_model"</span>)</span>
<span id="cb30-11"></span>
<span id="cb30-12">new_metrics <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb30-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">augment</span>(v, penguin_test_by_date) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb30-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_compute_metrics</span>(</span>
<span id="cb30-15">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">date_var =</span> date_obs,</span>
<span id="cb30-16">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">period =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"day"</span>,</span>
<span id="cb30-17">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">truth =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sex"</span>,</span>
<span id="cb30-18">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">estimate =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".pred_class"</span></span>
<span id="cb30-19">  )</span>
<span id="cb30-20"></span>
<span id="cb30-21">model_board <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb30-22">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_pin_metrics</span>(new_metrics, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"penguin_metrics"</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>Creating new version '20230122T144648Z-4f9b0'
Writing to pin 'penguin_metrics'</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 20 × 5
   .index        .n .metric  .estimator .estimate
   &lt;date&gt;     &lt;int&gt; &lt;chr&gt;    &lt;chr&gt;          &lt;dbl&gt;
 1 2023-01-12    32 accuracy binary         0.844
 2 2023-01-12    32 kap      binary         0.688
 3 2023-01-13    45 accuracy binary         0.911
 4 2023-01-13    45 kap      binary         0.82 
 5 2023-01-14    29 accuracy binary         0.966
 6 2023-01-14    29 kap      binary         0.931
 7 2023-01-15    34 accuracy binary         0.912
 8 2023-01-15    34 kap      binary         0.820
 9 2023-01-16    44 accuracy binary         0.886
10 2023-01-16    44 kap      binary         0.759
11 2023-01-17    31 accuracy binary         0.903
12 2023-01-17    31 kap      binary         0.807
13 2023-01-18    34 accuracy binary         0.941
14 2023-01-18    34 kap      binary         0.881
15 2023-01-19    25 accuracy binary         0.92 
16 2023-01-19    25 kap      binary         0.840
17 2023-01-20    31 accuracy binary         0.839
18 2023-01-20    31 kap      binary         0.686
19 2023-01-21    28 accuracy binary         0.964
20 2023-01-21    28 kap      binary         0.924</code></pre>
</div>
</div>
<p>Now that we’ve updated the model metrics, we can plot model performance over time , again using the <code>vetiver_plot_metrics()</code> function.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb33-1">monitoring_metrics <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb33-2">  model_board <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pin_read</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"penguin_metrics"</span>)</span>
<span id="cb33-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_plot_metrics</span>(monitoring_metrics)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://jameshwade.com/posts/2022-01-22_mlops-azure_files/figure-html/unnamed-chunk-19-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>


</section>
</section>

 ]]></description>
  <category>mlops</category>
  <category>vetiver</category>
  <category>pins</category>
  <category>deployment</category>
  <category>R</category>
  <category>cloud</category>
  <category>Azure</category>
  <guid>https://jameshwade.com/posts/2022-01-22_mlops-azure.html</guid>
  <pubDate>Sun, 22 Jan 2023 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Bayesian Optimizaiton with Tidymodels</title>
  <dc:creator>James H Wade</dc:creator>
  <link>https://jameshwade.com/posts/2023-01-01_tidymodels-bo-for-torch.html</link>
  <description><![CDATA[ 





<p>Hyperparameter optimization is a key part of the machine learning workflow. Knowing what hyperparameters to choose or even which ones to change can be a bit overwhelming, especially when you have a lot of them. Iterative hyperparameter optimization is a common approach to this problem, but it can be time consuming and expensive. Bayesian optimization is a method that can help with this problem. For a deeper dive into Bayesian optimization and iterative optimization overall, Julia Silge and Max Kuhn’s <a href="https://www.tmwr.org"><em>Tidy Modeling with R</em></a> has a <a href="https://www.tmwr.org/iterative-search.html">great chapter on this topic</a>.</p>
<p>In <a href="../posts/2022-12-27_mlops-the-whole-game.html">the whole game</a>, I used Bayesian optimization for hyperparameterization but did not provide much explanation or justification. In this post, we’ll use Bayesian optimization to tune the hyperparameters of a neural net with <code>{torch}</code> and <code>{brulee}</code> using the tidymodels framework.</p>
<p>Silge and Kuhn’s share some advice with how to approach hyperparameter optimization and model screening in general:</p>
<blockquote class="blockquote">
<p>A good strategy is to spend some initial effort trying a variety of modeling approaches, determine what works best, then invest additional time tweaking/optimizing a small set of models. <cite> Julia Silge and Max Kuhn, <a href="https://www.tmwr.org"><em>Tidy Modeling with R</em></a> </cite></p>
</blockquote>
<div class="cell">
<div class="code-with-filename">
<div class="code-with-filename-file">
<pre><strong>Load Packages &amp; Set Preferences</strong></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidymodels)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(brulee)</span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(modeldata)</span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(skimr)</span>
<span id="cb1-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tidymodels_prefer</span>()</span>
<span id="cb1-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_set</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>())</span>
<span id="cb1-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1234</span>)</span></code></pre></div></div>
</div>
</div>
<section id="the-data" class="level2">
<h2 class="anchored" data-anchor-id="the-data">The Data</h2>
<p>The <code>{modeldata}</code> package has a number of datasets that are useful for modeling. We’ll use the <code>ad_data</code> dataset, which is a clinical study of a few hundred patients with cognitive impairment. The goal of the study was to predict if early stages of cognitive impairment could be distinguished from normal cognition. We can look at the data documentation with <code>?modeldata::ad_data</code>.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-1-contents" aria-controls="callout-1" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Expand To Learn About <code>ad-data</code>
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-1" class="callout-1-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<div class="cell">
<details class="code-fold">
<summary>Show Code to Print Help Document</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#' Capture help documents contents</span></span>
<span id="cb2-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#'</span></span>
<span id="cb2-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#' Allows you to capture the contents of a help file to print to the console or</span></span>
<span id="cb2-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#' include in a Quarto / RMarkdown document.</span></span>
<span id="cb2-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#'</span></span>
<span id="cb2-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#' based on code by Noam Ross</span></span>
<span id="cb2-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#'  http://www.noamross.net/archives/2013-06-18-helpconsoleexample/</span></span>
<span id="cb2-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#' Stéphane Laurent</span></span>
<span id="cb2-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#'  https://stackoverflow.com/questions/60468080/</span></span>
<span id="cb2-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#'   print-an-r-help-file-vignette-as-output-into-an-r-html-notebook</span></span>
<span id="cb2-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#' Michael Sumner (mdsumner)</span></span>
<span id="cb2-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#'  https://stackoverflow.com/questions/7495685/</span></span>
<span id="cb2-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#'   how-to-access-the-help-documentation-rd-source-files-in-r</span></span>
<span id="cb2-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#' and David Fong</span></span>
<span id="cb2-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#'  https://stackoverflow.com/questions/60468080/print-an-r-help-file-vignette-</span></span>
<span id="cb2-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#'  as-output-into-an-r-html-notebook/62241456#62241456</span></span>
<span id="cb2-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#'</span></span>
<span id="cb2-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#' @param topic - the command for which help is required</span></span>
<span id="cb2-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#' @param package - the package name with the required topic</span></span>
<span id="cb2-20"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#' @param format - output format</span></span>
<span id="cb2-21"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#' @param before - place code before the output e.g. "&lt;blockquote&gt;"</span></span>
<span id="cb2-22"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#' @param after - place code after the output e.g. "&lt;/blockquote&gt;"</span></span>
<span id="cb2-23">help_console <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(topic, package,</span>
<span id="cb2-24">                         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"text"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"html"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"latex"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Rd"</span>),</span>
<span id="cb2-25">                         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">before =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">after =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>) {</span>
<span id="cb2-26">  format <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">match.arg</span>(format)</span>
<span id="cb2-27">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.character</span>(topic)) topic <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">deparse</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">substitute</span>(topic))</span>
<span id="cb2-28">  db <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> tools<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Rd_db</span>(package)</span>
<span id="cb2-29">  helpfile <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> db[<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(topic, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".Rd"</span>)][[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span>
<span id="cb2-30">  hs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">capture.output</span>(</span>
<span id="cb2-31">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">switch</span>(format,</span>
<span id="cb2-32">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">text =</span> tools<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Rd2txt</span>(helpfile,</span>
<span id="cb2-33">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">stages =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"render"</span>,</span>
<span id="cb2-34">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">outputEncoding =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"UTF-8"</span></span>
<span id="cb2-35">      ),</span>
<span id="cb2-36">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">html =</span> tools<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Rd2HTML</span>(helpfile, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">package =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">stages =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"render"</span>),</span>
<span id="cb2-37">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">latex =</span> tools<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Rd2latex</span>(helpfile),</span>
<span id="cb2-38">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Rd =</span> tools<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prepare_Rd</span>(helpfile)</span>
<span id="cb2-39">    )</span>
<span id="cb2-40">  )</span>
<span id="cb2-41">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (format <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"html"</span>) {</span>
<span id="cb2-42">    i <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">grep</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;body&gt;"</span>, hs)</span>
<span id="cb2-43">    j <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">grep</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;/body&gt;"</span>, hs)</span>
<span id="cb2-44">    hs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> hs[(i <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>(j <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)]</span>
<span id="cb2-45">  }</span>
<span id="cb2-46">  hs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(before, hs, after)</span>
<span id="cb2-47">  hs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cat</span>(hs, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sep =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb2-48">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">invisible</span>(hs)</span>
<span id="cb2-49">}</span>
<span id="cb2-50"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">help_console</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ad_data"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"text"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">package =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"modeldata"</span>)</span></code></pre></div></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>_A_l_z_h_e_i_m_e_r'_s _d_i_s_e_a_s_e _d_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     Alzheimer's disease data

_D_e_t_a_i_l_s:

     Craig-Schapiro et al. (2011) describe a clinical study of 333
     patients, including some with mild (but well-characterized)
     cognitive impairment as well as healthy individuals. CSF samples
     were taken from all subjects. The goal of the study was to
     determine if subjects in the early states of impairment could be
     differentiated from cognitively healthy individuals. Data
     collected on each subject included:

        • Demographic characteristics such as age and gender

        • Apolipoprotein E genotype

        • Protein measurements of Abeta, Tau, and a phosphorylated
          version of Tau (called pTau)

        • Proteinmeasurements of 124 exploratory biomarkers, and

        • Clinical dementia scores

     For these analyses, we have converted the scores to two classes:
     impaired and healthy. The goal of this analysis is to create
     classification models using the demographic and assay data to
     predict which patients have early stages of disease.

_V_a_l_u_e:

 ad_data: a tibble

_S_o_u_r_c_e:

     Kuhn, M., Johnson, K. (2013) _Applied Predictive Modeling_,
     Springer.

     Craig-Schapiro R, Kuhn M, Xiong C, Pickering EH, Liu J, Misko TP,
     et al. (2011) Multiplexed Immunoassay Panel Identifies Novel CSF
     Biomarkers for Alzheimer's Disease Diagnosis and Prognosis. PLoS
     ONE 6(4): e18850.

_E_x_a_m_p_l_e_s:

     data(ad_data)
     str(ad_data)
     </code></pre>
</div>
</div>
<p>Summarizing data with <code>{skimr}</code> can give a quick feel for the data overall. Remove <code>|&gt; summary()</code> from the code chunk below for an even more descriptive output. I did not include it here because there are so many variables.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">skim</span>(ad_data) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<table class="caption-top table table-sm table-striped small">
<caption>Data summary</caption>
<tbody>
<tr class="odd">
<td style="text-align: left;">Name</td>
<td style="text-align: left;">ad_data</td>
</tr>
<tr class="even">
<td style="text-align: left;">Number of rows</td>
<td style="text-align: left;">333</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Number of columns</td>
<td style="text-align: left;">131</td>
</tr>
<tr class="even">
<td style="text-align: left;">_______________________</td>
<td style="text-align: left;"></td>
</tr>
<tr class="odd">
<td style="text-align: left;">Column type frequency:</td>
<td style="text-align: left;"></td>
</tr>
<tr class="even">
<td style="text-align: left;">factor</td>
<td style="text-align: left;">2</td>
</tr>
<tr class="odd">
<td style="text-align: left;">numeric</td>
<td style="text-align: left;">129</td>
</tr>
<tr class="even">
<td style="text-align: left;">________________________</td>
<td style="text-align: left;"></td>
</tr>
<tr class="odd">
<td style="text-align: left;">Group variables</td>
<td style="text-align: left;">None</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
<section id="data-splitting" class="level3">
<h3 class="anchored" data-anchor-id="data-splitting">Data Splitting</h3>
<p>We’ll use a deep neural net with to predict the <code>Class</code> variable. But first, we want to split the data into a training and testing set. We’ll use the <code>initial_split()</code> function from the <code>{rsample}</code> package to do this. The default training and testing split is 75% training and 25% testing, which is a good place to start. For the sampling, we’ll use the <code>Class</code> variable as the strata. This will ensure that the training and testing sets have the same proportion of each class.</p>
<div class="cell">
<div class="code-with-filename">
<div class="code-with-filename-file">
<pre><strong>Initial Split</strong></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">ad_split <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">initial_split</span>(ad_data, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">strata =</span> Class)</span>
<span id="cb5-2">ad_train <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">training</span>(ad_split)</span>
<span id="cb5-3">ad_test <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">testing</span>(ad_split)</span></code></pre></div></div>
</div>
</div>
</section>
<section id="cross-validation" class="level3">
<h3 class="anchored" data-anchor-id="cross-validation">Cross Validation</h3>
<p>We’ll use v-fold cross validation to evaluate the model. We’ll use the <code>vfold_cv()</code> function from the <code>{rsample}</code> package to do this. We’ll use the <code>Class</code> variable as the strata again to ensure that each fold has the same proportion of each class.</p>
<div class="cell">
<div class="code-with-filename">
<div class="code-with-filename-file">
<pre><strong>Cross Validation</strong></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">ad_folds <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vfold_cv</span>(ad_train, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">v =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">strata =</span> Class)</span></code></pre></div></div>
</div>
</div>
</section>
</section>
<section id="build-the-model" class="level2">
<h2 class="anchored" data-anchor-id="build-the-model">Build the Model</h2>
<section id="data-preprocessing-with-recipes" class="level3">
<h3 class="anchored" data-anchor-id="data-preprocessing-with-recipes">Data Preprocessing with <code>{recipes}</code></h3>
<p>We’ll use the <code>{recipes}</code> package to preprocess the data include a few standard pre-processing steps following the advice from <a href="https://recipes.tidymodels.org/%20articles/Ordering.html#recommended-preprocessing-outline"><code>recipes</code> documentation for order of steps</a>]: - <code>step_YeoJohnson()</code> to transform the numeric variables - <code>step_dummy()</code> to create dummy variables for the categorical variables - <code>step_normalize()</code> to normalize the numeric variables - <code>step_nzv()</code> to remove near-zero variance variables</p>
<div class="cell">
<div class="code-with-filename">
<div class="code-with-filename-file">
<pre><strong>Specify Recipe</strong></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">ad_rec <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb7-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">recipe</span>(Class <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> ., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> ad_train) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step_YeoJohnson</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">all_numeric_predictors</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step_normalize</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">all_numeric_predictors</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step_dummy</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">all_nominal_predictors</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step_nzv</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">all_numeric_predictors</span>())</span></code></pre></div></div>
</div>
</div>
</section>
<section id="model-specification" class="level3">
<h3 class="anchored" data-anchor-id="model-specification">Model Specification</h3>
<p>We will use two models to demonstrate hyperparameter tuning: logistic regression and multilayer perception. Model specification is beyond the scope of this post, but you can read more about it in the <a href="https://www.tidymodels.org/learn/">tidymodels documentation</a> or in <a href="https://www.tmwr.org/">Tidy Models with R</a>. For now, we’ll just specify the models.</p>
<div class="cell">
<div class="code-with-filename">
<div class="code-with-filename-file">
<pre><strong>Specify Models</strong></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">logistic_reg_brulee_spec <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb8-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">logistic_reg</span>(</span>
<span id="cb8-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">penalty =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tune</span>()</span>
<span id="cb8-4">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_engine</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"brulee"</span>)</span>
<span id="cb8-6"></span>
<span id="cb8-7">mlp_brulee_spec <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb8-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mlp</span>(</span>
<span id="cb8-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hidden_units =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>,</span>
<span id="cb8-10">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dropout      =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tune</span>(),</span>
<span id="cb8-11">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">epochs       =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tune</span>(),</span>
<span id="cb8-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">learn_rate   =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tune</span>(),</span>
<span id="cb8-13">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">activation   =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"elu"</span></span>
<span id="cb8-14">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_engine</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"brulee"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_mode</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"classification"</span>)</span></code></pre></div></div>
</div>
</div>
</section>
<section id="model-tuning" class="level3">
<h3 class="anchored" data-anchor-id="model-tuning">Model Tuning</h3>
<p>Model tuning is where Bayesian optimization comes into play. The <code>{tune}</code> package is quite handy for this. In particular, <code>tune::tune_bayes()</code> and <code>tune::control_bayes()</code> are the functions we’ll use. The <code>tune_bayes()</code> function takes a model specification, the recipe, the data, and the cross validation folds. The <code>control_bayes()</code> function takes a few parameters that control the Bayesian optimization:</p>
<ul>
<li><code>no_improve</code> controls how many iterations of Bayesian optimization to run without improvement</li>
<li><code>time_limit</code> controls how long to run Bayesian optimization in minutes.</li>
<li><code>save_pred</code> controls whether to save the predictions from each iteration of Bayesian optimization. This is useful for ensembling.</li>
<li><code>save_workflow</code> controls whether to save the workflow should be appended to the results.</li>
<li><code>verbose</code> and <code>verbose_iter</code> controls whether to print the results of each iteration of Bayesian optimization.</li>
<li><code>allow_par</code> and <code>parallel_over</code> controls whether to run tuning in parallel. This only works for some engines, and I don’t think it works for <code>brulee</code> or <code>keras</code> yet.</li>
</ul>
<div class="cell">
<div class="code-with-filename">
<div class="code-with-filename-file">
<pre><strong>Tuning Control Settings</strong></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">bayes_control <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb9-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">control_bayes</span>(</span>
<span id="cb9-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">no_improve    =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30</span>L,</span>
<span id="cb9-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">time_limit    =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>,</span>
<span id="cb9-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">verbose_iter  =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb9-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">save_pred     =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb9-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">save_workflow =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span></span>
<span id="cb9-8">  )</span>
<span id="cb9-9"></span>
<span id="cb9-10">grid_control <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb9-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">control_grid</span>(</span>
<span id="cb9-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">allow_par     =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb9-13">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">save_pred     =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb9-14">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">save_workflow =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>,</span>
<span id="cb9-15">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">parallel_over =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span></span>
<span id="cb9-16">  )</span></code></pre></div></div>
</div>
</div>
<p>The basic intuition behind Bayesian optimization is that it uses a surrogate model to approximate the true model. The surrogate model is a probabilistic model that is updated with each iteration of Bayesian optimization and is used to find the next set of hyperparameters to evaluate. This process is repeated until the surrogate model is no longer improving or the time limit is reached. For <code>tune_bayes()</code>, the surrogate model is a Gaussian process model.</p>
<p>It’s a good idea to adjust the range of hyperparameter values before we start to fit our model, and the {dials} package can help.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">dials<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">penalty</span>()</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>Amount of Regularization (quantitative)
Transformer: log-10 [1e-100, Inf]
Range (transformed scale): [-10, 0]</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">dials<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">activation</span>()</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>Activation Function  (qualitative)
5 possible value include:
'linear', 'softmax', 'relu', 'elu' and 'tanh' </code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">dials<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">epochs</span>()</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># Epochs (quantitative)
Range: [10, 1000]</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">dials<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dropout</span>()</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>Dropout Rate (quantitative)
Range: [0, 1)</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">dials<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hidden_units</span>()</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># Hidden Units (quantitative)
Range: [1, 10]</code></pre>
</div>
</div>
<p>The default range for <code>epochs</code> is a bit large, but we can update it. Let’s also narrow the range for dropout from <code>c(0, 1)</code> to <code>(0.1, 0.9)</code>.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">mlp_brulee_params <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb20-2">  mlp_brulee_spec <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb20-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_parameter_set_dials</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb20-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">update</span>(</span>
<span id="cb20-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">epochs  =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">epochs</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>)),</span>
<span id="cb20-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dropout =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dropout</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span>))</span>
<span id="cb20-7">  )</span></code></pre></div></div>
</div>
<p>We can also use the <code>grid_regular()</code> function to create a grid of hyperparameter values to evaluate. We’ll use this to create a grid of hyperparameter values to serve as a starting point for Bayesian optimization.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1">mlp_brulee_start <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb21-2">  mlp_brulee_params <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb21-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">grid_regular</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">levels =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span></code></pre></div></div>
</div>
</section>
</section>
<section id="model-workflow" class="level2">
<h2 class="anchored" data-anchor-id="model-workflow">Model Workflow</h2>
<p>The <code>{workflows}</code> package creates a workflow for each model. The workflow will include the recipe, the model specification, and the cross validation folds. We’ll use the <code>workflow()</code> function to create the workflow. The <code>tune_bayes()</code> function will then be used to tune the model with with splits and control parameters we created above.</p>
<section id="logistic-regression" class="level3">
<h3 class="anchored" data-anchor-id="logistic-regression">Logistic Regression</h3>
<p>We start with a logistic regression model. We only have one hyperparameter to tune, so we’ll use the <code>tune_grid()</code> function instead of <code>tune_bayes()</code>.</p>
<div class="cell">
<div class="code-with-filename">
<div class="code-with-filename-file">
<pre><strong>Create Logistic Regression Workflow</strong></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1">logistic_reg_brulee_wf <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb22-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">workflow</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb22-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_recipe</span>(ad_rec) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb22-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_model</span>(logistic_reg_brulee_spec) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb22-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tune_grid</span>(</span>
<span id="cb22-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">resamples =</span> ad_folds,</span>
<span id="cb22-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">control   =</span> grid_control</span>
<span id="cb22-8">  )</span></code></pre></div></div>
</div>
</div>
<p>We can use <code>autoplot()</code> to visualize the results of tuning.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">autoplot</span>(logistic_reg_brulee_wf)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://jameshwade.com/posts/2023-01-01_tidymodels-bo-for-torch_files/figure-html/unnamed-chunk-13-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>We can also use <code>collect_metrics()</code> to collect the results of tuning.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_metrics</span>(logistic_reg_brulee_wf, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">summarize =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 20 × 7
    penalty .metric  .estimator  mean     n std_err .config              
      &lt;dbl&gt; &lt;chr&gt;    &lt;chr&gt;      &lt;dbl&gt; &lt;int&gt;   &lt;dbl&gt; &lt;chr&gt;                
 1 1.11e- 2 accuracy binary     0.831     3 0.0257  Preprocessor1_Model01
 2 1.11e- 2 roc_auc  binary     0.880     3 0.00871 Preprocessor1_Model01
 3 4.58e- 8 accuracy binary     0.835     3 0.00419 Preprocessor1_Model02
 4 4.58e- 8 roc_auc  binary     0.878     3 0.00766 Preprocessor1_Model02
 5 7.60e- 9 accuracy binary     0.843     3 0.0121  Preprocessor1_Model03
 6 7.60e- 9 roc_auc  binary     0.874     3 0.00235 Preprocessor1_Model03
 7 3.85e- 3 accuracy binary     0.855     3 0.0183  Preprocessor1_Model04
 8 3.85e- 3 roc_auc  binary     0.895     3 0.00220 Preprocessor1_Model04
 9 1.76e- 5 accuracy binary     0.839     3 0.0115  Preprocessor1_Model05
10 1.76e- 5 roc_auc  binary     0.891     3 0.00872 Preprocessor1_Model05
11 6.88e- 7 accuracy binary     0.827     3 0.0187  Preprocessor1_Model06
12 6.88e- 7 roc_auc  binary     0.877     3 0.0192  Preprocessor1_Model06
13 3.57e- 6 accuracy binary     0.868     3 0.0182  Preprocessor1_Model07
14 3.57e- 6 roc_auc  binary     0.877     3 0.00511 Preprocessor1_Model07
15 2.11e-10 accuracy binary     0.831     3 0.0130  Preprocessor1_Model08
16 2.11e-10 roc_auc  binary     0.880     3 0.00217 Preprocessor1_Model08
17 8.03e- 1 accuracy binary     0.827     3 0.00706 Preprocessor1_Model09
18 8.03e- 1 roc_auc  binary     0.877     3 0.0115  Preprocessor1_Model09
19 9.42e- 4 accuracy binary     0.823     3 0.0250  Preprocessor1_Model10
20 9.42e- 4 roc_auc  binary     0.882     3 0.00516 Preprocessor1_Model10</code></pre>
</div>
</div>
<p>From the fitted workflow we can select the best model with the <code>tune::select_best()</code> function and the <code>roc_auc</code> metric. This metric is used to measure the ability of the model to distinguish between two classes, and is calculated by plotting the true positive rate against the false positive rate.</p>
<p>Once we’ve identified the best model, we can extract it from the workflow using the <code>extract_workflow</code> function. This function allows us to isolate the model and use it for further analysis. We then use the <code>finalize_workflow</code> function to finalize the model, and the <code>last_fit</code> function to fit the model to the ad_split data.</p>
<p>We use the <code>collect_metrics</code> function to gather metrics for the best model. This is an important step, as it allows us to evaluate the performance of the model and determine whether it is accurate and reliable.</p>
<p>Finally, we use the <code>collect_predictions</code> function to generate predictions on the test set, and use these predictions to create an ROC curve.</p>
<div class="cell">
<div class="code-with-filename">
<div class="code-with-filename-file">
<pre><strong>Select Best Logistic Regression Model and Evaluate</strong></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb26-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># select the best model from the workflow</span></span>
<span id="cb26-2">best_logistic_reg_id <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb26-3">  logistic_reg_brulee_wf <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb26-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select_best</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">metric =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"roc_auc"</span>)</span>
<span id="cb26-5"></span>
<span id="cb26-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># extract the best model from the workflow</span></span>
<span id="cb26-7">best_logistic_reg <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb26-8">  logistic_reg_brulee_wf <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb26-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_workflow</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb26-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">finalize_workflow</span>(best_logistic_reg_id) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb26-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">last_fit</span>(ad_split)</span>
<span id="cb26-12"></span>
<span id="cb26-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># collect the metrics for the best model</span></span>
<span id="cb26-14">best_logistic_reg <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb26-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_metrics</span>()</span></code></pre></div></div>
</div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 4
  .metric  .estimator .estimate .config             
  &lt;chr&gt;    &lt;chr&gt;          &lt;dbl&gt; &lt;chr&gt;               
1 accuracy binary         0.833 Preprocessor1_Model1
2 roc_auc  binary         0.865 Preprocessor1_Model1</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># plot results of test set fit</span></span>
<span id="cb28-2">best_logistic_reg <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb28-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_predictions</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb28-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">roc_curve</span>(Class, .pred_Impaired) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb28-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">autoplot</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://jameshwade.com/posts/2023-01-01_tidymodels-bo-for-torch_files/figure-html/unnamed-chunk-15-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="multilayer-perceptron" class="level3">
<h3 class="anchored" data-anchor-id="multilayer-perceptron">Multilayer Perceptron</h3>
<p>We start by fitting the workflow with the grid of hyperparameter values we created above. This will give us a starting point for Bayesian optimization.</p>
<div class="cell">
<div class="code-with-filename">
<div class="code-with-filename-file">
<pre><strong>Create MLP Workflow and Perform Grid Tuning</strong></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1">mlp_brulee_wf <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb29-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">workflow</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb29-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_recipe</span>(ad_rec) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb29-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_model</span>(mlp_brulee_spec)</span>
<span id="cb29-5"></span>
<span id="cb29-6">mlp_brulee_tune_grid <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb29-7">  mlp_brulee_wf <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb29-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tune_grid</span>(</span>
<span id="cb29-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">resamples =</span> ad_folds,</span>
<span id="cb29-10">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">grid      =</span> mlp_brulee_start,</span>
<span id="cb29-11">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">control   =</span> grid_control</span>
<span id="cb29-12">  )</span></code></pre></div></div>
</div>
</div>
<p>As above, <code>autoplot()</code> is a quick way to visualize results form our initial grid tuning.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb30-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">autoplot</span>(mlp_brulee_tune_grid)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://jameshwade.com/posts/2023-01-01_tidymodels-bo-for-torch_files/figure-html/unnamed-chunk-17-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>We can also repeat the best model selection and evaluation, as we did for logistic regression. Our expectation should be that Bayesian optimization results in better predictions that a simple <code>tune_grid()</code> approach.</p>
<div class="cell">
<div class="code-with-filename">
<div class="code-with-filename-file">
<pre><strong>Visualize and Evaluate Initial Tuning</strong></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1">best_mlp_id_no_bayes <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb31-2">  mlp_brulee_tune_grid <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb31-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select_best</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">metric =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"roc_auc"</span>)</span>
<span id="cb31-4"></span>
<span id="cb31-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># extract the best model from the workflow</span></span>
<span id="cb31-6">best_mlp_no_bayes <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb31-7">  mlp_brulee_tune_grid <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb31-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_workflow</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb31-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">finalize_workflow</span>(best_mlp_id_no_bayes) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb31-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">last_fit</span>(ad_split)</span>
<span id="cb31-11"></span>
<span id="cb31-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># collect the metrics for the best model</span></span>
<span id="cb31-13">best_mlp_no_bayes <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb31-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_metrics</span>()</span></code></pre></div></div>
</div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 4
  .metric  .estimator .estimate .config             
  &lt;chr&gt;    &lt;chr&gt;          &lt;dbl&gt; &lt;chr&gt;               
1 accuracy binary         0.833 Preprocessor1_Model1
2 roc_auc  binary         0.852 Preprocessor1_Model1</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb33-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># plot results of test set fit</span></span>
<span id="cb33-2">best_mlp_no_bayes <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb33-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_predictions</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb33-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">roc_curve</span>(Class, .pred_Impaired) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb33-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">autoplot</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://jameshwade.com/posts/2023-01-01_tidymodels-bo-for-torch_files/figure-html/unnamed-chunk-18-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>We now take the initial MLP tune results and pass them to <code>tune_bayes()</code>, which includes the initial grid of hyperparameter values, the folds we created above, and the <code>bayes_control</code> variables we created earlier.</p>
<div class="cell">
<div class="code-with-filename">
<div class="code-with-filename-file">
<pre><strong>Bayesian Optimization of MLP</strong></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb34-1">mlp_brulee_bo <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb34-2">  mlp_brulee_wf <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb34-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tune_bayes</span>(</span>
<span id="cb34-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">resamples =</span> ad_folds,</span>
<span id="cb34-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">iter      =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>L,</span>
<span id="cb34-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">control   =</span> bayes_control,</span>
<span id="cb34-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">initial   =</span> mlp_brulee_tune_grid</span>
<span id="cb34-8">  )</span></code></pre></div></div>
</div>
<div class="cell-output cell-output-stderr">
<pre><code>Optimizing roc_auc using the expected improvement</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code>! No improvement for 30 iterations; returning current results.</code></pre>
</div>
</div>
<p>Yet again, we can use <code>autoplot()</code> to visualize the results of tuning. From this, it appears that learning rate has a big impact on model performance, but number of epochs and droput rate are less important. Digging into why is beyond the scope of this post, but it’s important to recognize that not all hyperparameter are created equal.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb37-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">autoplot</span>(mlp_brulee_bo)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://jameshwade.com/posts/2023-01-01_tidymodels-bo-for-torch_files/figure-html/unnamed-chunk-20-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>And yet again, we can use <code>collect_metrics()</code> to collect the results of tuning.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb38" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb38-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_metrics</span>(mlp_brulee_bo, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">summarize =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 142 × 10
   dropout epochs learn_rate .metric  .estim…¹  mean     n std_err .config .iter
     &lt;dbl&gt;  &lt;int&gt;      &lt;dbl&gt; &lt;chr&gt;    &lt;chr&gt;    &lt;dbl&gt; &lt;int&gt;   &lt;dbl&gt; &lt;chr&gt;   &lt;int&gt;
 1     0.1     10      0.001 accuracy binary   0.426     3  0.0897 Prepro…     0
 2     0.1     10      0.001 roc_auc  binary   0.491     3  0.0212 Prepro…     0
 3     0.5     10      0.001 accuracy binary   0.563     3  0.111  Prepro…     0
 4     0.5     10      0.001 roc_auc  binary   0.542     3  0.0360 Prepro…     0
 5     0.9     10      0.001 accuracy binary   0.466     3  0.0362 Prepro…     0
 6     0.9     10      0.001 roc_auc  binary   0.554     3  0.0576 Prepro…     0
 7     0.1    105      0.001 accuracy binary   0.357     3  0.0274 Prepro…     0
 8     0.1    105      0.001 roc_auc  binary   0.459     3  0.0476 Prepro…     0
 9     0.5    105      0.001 accuracy binary   0.610     3  0.0619 Prepro…     0
10     0.5    105      0.001 roc_auc  binary   0.550     3  0.0707 Prepro…     0
# … with 132 more rows, and abbreviated variable name ¹​.estimator</code></pre>
</div>
</div>
<p>For our moment of truth, we can select the best model and evaluate it on the test set.</p>
<div class="cell">
<div class="code-with-filename">
<div class="code-with-filename-file">
<pre><strong>Finalize Model and Evaluate</strong></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb40" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb40-1">best_mlp_id <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb40-2">  mlp_brulee_bo <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb40-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select_best</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">metric =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"roc_auc"</span>)</span>
<span id="cb40-4"></span>
<span id="cb40-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># extract the best model from the workflow</span></span>
<span id="cb40-6">best_mlp <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb40-7">  mlp_brulee_bo <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb40-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_workflow</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb40-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">finalize_workflow</span>(best_mlp_id) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb40-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">last_fit</span>(ad_split)</span>
<span id="cb40-11"></span>
<span id="cb40-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># collect the metrics for the best model</span></span>
<span id="cb40-13">best_mlp <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb40-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_metrics</span>()</span></code></pre></div></div>
</div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 4
  .metric  .estimator .estimate .config             
  &lt;chr&gt;    &lt;chr&gt;          &lt;dbl&gt; &lt;chr&gt;               
1 accuracy binary         0.893 Preprocessor1_Model1
2 roc_auc  binary         0.890 Preprocessor1_Model1</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb42" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb42-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># plot results of test set fit</span></span>
<span id="cb42-2">best_mlp <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb42-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_predictions</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb42-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">roc_curve</span>(Class, .pred_Impaired) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb42-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">autoplot</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://jameshwade.com/posts/2023-01-01_tidymodels-bo-for-torch_files/figure-html/unnamed-chunk-22-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Based on these results, there was not much value in the Bayesian optimization in this case. Nonetheless, it is a useful tool to have in your toolbox, and I hope you find this example useful.</p>


</section>
</section>

 ]]></description>
  <category>machine learning</category>
  <category>modeling</category>
  <category>tune</category>
  <category>deep learning</category>
  <category>torch</category>
  <category>R</category>
  <guid>https://jameshwade.com/posts/2023-01-01_tidymodels-bo-for-torch.html</guid>
  <pubDate>Sun, 01 Jan 2023 00:00:00 GMT</pubDate>
  <media:content url="https://jameshwade.com/posts/images/brulee.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>MLOps: The Whole Game</title>
  <dc:creator>James H Wade</dc:creator>
  <link>https://jameshwade.com/posts/2022-12-27_mlops-the-whole-game.html</link>
  <description><![CDATA[ 





<blockquote class="blockquote">
<p>If we actually know what we’re doing we call it engineering. So, this basket of pain, we call something ops.</p>
<p><cite>Peter Wang, <a href="https://www.anaconda.com/podcast/human-in-the-loop">Numerically Speaking Podcast</a> with Vicky Boykis on ML &amp; MLOps</cite></p>
</blockquote>
<p>MLOps can be overwhelming, but that is not the way it has to be. Posit’s MLOps team has made from fantastic advancements over the past year or so, and I hope to show how to demonstrate how easy model deployment can be using Posit’s open source tools for MLOps. This includes <code>{pins}</code>, <code>{vetiver}</code>, and the <code>{tidymodels}</code> bundle of packages along with the <code>{tidyverse}</code>. The motivation for this post came in part from a <a href="https://www.anaconda.com/podcast/human-in-the-loop">Numerically Speaking podcast</a> I quote above, and much of the model building is taken from Julia Silge’s blog post written to help R users <a href="https://juliasilge.com/blog/palmer-penguins/">get started with tidymodels</a>. I also found inspiration from the <a href="https://vetiver.rstudio.com/"><code>{vetiver}</code> documentation page</a> and the recently revamped <a href="https://solutions.posit.co/gallery/bike_predict/">Solutions Engineering Page from Posit</a>.</p>
<p>The post covers most of steps in MLOps process, but it’s more of a sampler than exhaustive coverage. Think of this as the <a href="https://r-pkgs.org/whole-game.html">“whole game”</a> of MLOps with R.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="https://vetiver.rstudio.com/"><img src="https://jameshwade.com/posts/images/vetiver-mlops.png" class="img-fluid quarto-figure quarto-figure-center figure-img" alt="During the MLOps cycle, we collect data, understand and clean the data, train and evaluate a model, deploy the model, and monitor the deployed model. Monitoring can then lead back to collecting more data. There are many great tools available to understand clean data (like pandas and the tidyverse) and to build models (like tidymodels and scikit-learn). Use the vetiver framework to deploy and monitor your models." width="700"></a></p>
</figure>
</div>
<figcaption>Source: MLOps Team at Posit | An overview of MLOps with Vetiver and friends</figcaption>
</figure>
</div>
<section id="model-building" class="level2">
<h2 class="anchored" data-anchor-id="model-building">Model Building</h2>
<section id="load-packages-and-set-options" class="level3">
<h3 class="anchored" data-anchor-id="load-packages-and-set-options">Load Packages and Set Options</h3>
<p>Let’s start with the packages that we’ll use throughout. <code>{tidyverse}</code> and <code>{tidymodels}</code> are there, of course. <code>{pins}</code>, <code>{plumbr}</code>, and <code>{vetiver}</code> completes the rest of the Posit set for MLOps, and I use <code>{gt}</code> for tables.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidymodels)</span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(pins)</span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(vetiver)</span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(plumber)</span>
<span id="cb1-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(palmerpenguins)</span>
<span id="cb1-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(gt)</span>
<span id="cb1-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(conflicted)</span>
<span id="cb1-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tidymodels_prefer</span>()</span>
<span id="cb1-10"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">conflict_prefer</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"penguins"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"palmerpenguins"</span>)</span>
<span id="cb1-11"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_set</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>())</span>
<span id="cb1-12"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">options</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tidymodels.dark =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span></code></pre></div></div>
</div>
</section>
<section id="data-exploration" class="level3">
<h3 class="anchored" data-anchor-id="data-exploration">Data Exploration</h3>
<p>For this example, we’ll use the <code>palmerpenguins</code> dataset to demonstrate the overall approach. There is a <code>{palmerpenguins}</code> package that contains this data set, and it is also included in the <code>{modeldata}</code> package, a part of <code>{tidymodels}</code>. We’ll use the data from <code>{palmerpenguins}</code>.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">penguins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glimpse</span>()</span></code></pre></div></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>Rows: 4
Columns: 8
$ species           &lt;fct&gt; Adelie, Adelie, Adelie, Adelie
$ island            &lt;fct&gt; Torgersen, Torgersen, Torgersen, Torgersen
$ bill_length_mm    &lt;dbl&gt; 39.1, 39.5, 40.3, NA
$ bill_depth_mm     &lt;dbl&gt; 18.7, 17.4, 18.0, NA
$ flipper_length_mm &lt;int&gt; 181, 186, 195, NA
$ body_mass_g       &lt;int&gt; 3750, 3800, 3250, NA
$ sex               &lt;fct&gt; male, female, female, NA
$ year              &lt;int&gt; 2007, 2007, 2007, 2007</code></pre>
</div>
</div>
<p>As Julia’s <a href="https://juliasilge.com/blog/palmer-penguins/">post points out</a>, differentiating the species with a classification model is quite easy. A trickier model is one that predicts the penguin sex. Let’s look at a plot of <code>flipper_length_mm</code> versus <code>bill_length_mm</code> for each of the species. The color indicates <code>sex</code> and the point size indicates <code>body_mass_g</code>.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">penguins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb4-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(sex)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb4-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb4-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> flipper_length_mm,</span>
<span id="cb4-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> bill_length_mm,</span>
<span id="cb4-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> sex,</span>
<span id="cb4-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> body_mass_g</span>
<span id="cb4-8">  )) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>species)</span></code></pre></div></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://jameshwade.com/posts/2022-12-27_mlops-the-whole-game_files/figure-html/unnamed-chunk-2-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="model-splitting-bootstrapping" class="level3">
<h3 class="anchored" data-anchor-id="model-splitting-bootstrapping">Model Splitting &amp; Bootstrapping</h3>
<p>Let’s do a little data cleaning before we move onto modeling. This will include removing any missing <code>sex</code> assignments and removing <code>year</code> and <code>island</code> columns.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">penguins_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb5-2">  penguins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop_na</span>(sex) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>year, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>island)</span></code></pre></div></div>
</div>
<p>The <code>{tidymodels}</code> ecosystem has convenience functions for data splitting that help us do the “right” thing during model building. The default split between training and testing set is 75:25.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1234</span>)</span>
<span id="cb6-2">penguin_split <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">initial_split</span>(penguins_df, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">strata =</span> sex)</span>
<span id="cb6-3">penguin_train <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">training</span>(penguin_split)</span>
<span id="cb6-4">penguin_test <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">testing</span>(penguin_split)</span></code></pre></div></div>
</div>
</section>
<section id="preprocess-with-recipes" class="level3">
<h3 class="anchored" data-anchor-id="preprocess-with-recipes">Preprocess with <code>{recipes}</code></h3>
<p>For preprocessing of the data, let’s use <code>{recipes}</code>.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">penguin_rec <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb7-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">recipe</span>(sex <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> ., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> penguin_train) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step_YeoJohnson</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">all_numeric_predictors</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step_normalize</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">all_numeric_predictors</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step_dummy</span>(species)</span></code></pre></div></div>
</div>
<p>The <code>penguin_rec</code> recipe is a process for preparing data for modeling. It consists of four steps:</p>
<ol type="1">
<li><p>The <code>recipe()</code> function creates a recipe object, which is a sequence of steps for preprocessing data. The first argument to the function specifies the outcome variable (<code>sex</code>) and the predictor variables (<code>.</code>, which stands for all variables in the data). The <code>data</code> argument specifies the data frame to use for the recipe.</p></li>
<li><p>The <code>step_YeoJohnson()</code> function applies a Yeo-Johnson transformation to all numeric predictors in the data. This transformation is a type of power transformation that can help normalize data by making it more symmetric and reducing the influence of outliers.</p></li>
<li><p>The <code>step_normalize()</code> function normalizes all numeric predictors in the data. Normalization scales the data so that it has a mean of 0 and a standard deviation of 1.</p></li>
<li><p>The <code>step_dummy()</code> function creates dummy variables for the <code>species</code> variable. Dummy variables are binary variables that are used to represent categorical variables in a regression model.</p></li>
</ol>
<p>Overall, this recipe applies several preprocessing steps to the data in order to prepare it for modeling. The transformed and normalized data, along with the dummy variables, can then be used to build a predictive model.</p>
</section>
<section id="specify-the-model" class="level3">
<h3 class="anchored" data-anchor-id="specify-the-model">Specify the Model</h3>
<p>We’ll evaluate three modeling approaches. In the code below, <code>glm_spec</code>, <code>tree_spec</code>, and <code>mlp_brulee_spec</code> are specifications for three different machine learning models: a logistic regression model, a random forest model, and a multi-layer perceptron (MLP) model. The intent with model selection was to demonstrate the use of very different models rather than pick an ideal set of models to screen.</p>
<p>The <code>logistic_reg()</code> function creates a specification for a logistic regression model, and the <code>set_engine('glm')</code> function sets the engine for the model to be <code>'glm'</code>, which stands for generalized linear model.</p>
<p>The <code>rand_forest()</code> function creates a specification for a random forest model, and the <code>set_engine('ranger')</code> function sets the engine for the model to be <code>'ranger'</code>, which is an implementation of random forests using the <code>{ranger}</code> package. The <code>set_mode('classification')</code> function sets the mode of the model to be classification. <code>set_mode()</code> is not needed for logistic regression as that model is only used for classification. (Yes, the name is a bad one for what it does.)</p>
<p>The <code>mlp()</code> function creates a specification for an MLP model, and the <code>set_engine('brulee')</code> function sets the engine for the model to be <code>'brulee'</code>, which uses {<code>torch}</code> to specify and fit the neural network. The <code>tune()</code> function indicates that the hyperparameters for the model (<code>hidden_units</code>, <code>epochs</code>, <code>penalty</code>, and <code>learn_rate</code>) should be tuned.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">glm_spec <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb8-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">logistic_reg</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_engine</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"glm"</span>)</span>
<span id="cb8-4"></span>
<span id="cb8-5">tree_spec <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb8-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rand_forest</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">min_n =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tune</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_engine</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ranger"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_mode</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"classification"</span>)</span>
<span id="cb8-9"></span>
<span id="cb8-10">mlp_brulee_spec <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb8-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mlp</span>(</span>
<span id="cb8-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hidden_units =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tune</span>(), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">epochs =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tune</span>(),</span>
<span id="cb8-13">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">penalty =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tune</span>(), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">learn_rate =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tune</span>()</span>
<span id="cb8-14">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb8-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_engine</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"brulee"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb8-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_mode</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"classification"</span>)</span></code></pre></div></div>
</div>
</section>
<section id="create-the-workflow-set-and-fit-models" class="level3">
<h3 class="anchored" data-anchor-id="create-the-workflow-set-and-fit-models">Create the Workflow Set and Fit Models</h3>
<p>Before we fit the models specified above, let’s use cross validation for more robust model evaluation and set the parameters for hyperparameter tuning.</p>
<p>The <code>set.seed()</code> function sets the seed for the random number generator, which is helps improve the reproducibility of the code.</p>
<p>The <code>vfold_cv()</code> function creates a v-fold cross-validation object, which is used to evaluate the performance of a model on different subsets of the data. The <code>penguin_folds</code> object stores the folds that will be used for cross-validation.</p>
<p>The <code>control_bayes()</code> creates an object to store the settings for Bayesian optimization. Bayesian optimization is a method for finding the optimal set of hyperparameters for a machine learning model. The <code>no_improve</code> argument specifies the number of consecutive iterations with no improvement in the objective function before the optimization process is terminated. The <code>time_limit</code> argument specifies the maximum amount of time that the optimization process can run in minutes. The <code>save_pred</code> argument specifies whether to save the predictions made during the optimization process.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1234</span>)</span>
<span id="cb9-2">penguin_folds <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vfold_cv</span>(penguin_train)</span>
<span id="cb9-3"></span>
<span id="cb9-4">bayes_control <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb9-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">control_bayes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">no_improve =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>L, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">time_limit =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">save_pred =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span></code></pre></div></div>
</div>
<p>A workflow set combines the recipes and models to fit to our training data. The <code>{workflowsets}</code> package is an extension of the <code>{workflow}</code> package that allows us to evaluate multiple preprocessing and modeling approaches all together. The <code>workflow_set()</code> function creates a workflow set object, which consists of a list of preprocessing recipes in the <code>preproc</code> argument and a list of modeling specifications in the <code>models</code> argument.</p>
<p>The <code>workflow_map()</code> function applies a function to each element of the workflow set. In this case, we use the <code>tune_bayes</code> function, which performs Bayesian optimization using the <code>{tune}</code> package. The <code>iter</code> argument specifies the maximum number of iterations for each model, the <code>resamples</code> argument specifies the cross-validation folds to use, and the <code>control</code> argument specifies the settings for Bayesian optimization that we defined above.</p>
<p>Overall, this code creates a workflow set consisting of three models (a logistic regression model, a random forest model, and an MLP model) with preprocessing steps applied to the data, and then performs Bayesian optimization to tune the hyperparameters of the models using cross-validation.<sup>1</sup></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">workflow_set <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb10-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">workflow_set</span>(</span>
<span id="cb10-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">preproc =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(penguin_rec),</span>
<span id="cb10-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">models =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb10-5">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">glm =</span> glm_spec,</span>
<span id="cb10-6">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tree =</span> tree_spec,</span>
<span id="cb10-7">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">torch =</span> mlp_brulee_spec</span>
<span id="cb10-8">    )</span>
<span id="cb10-9">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">workflow_map</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"tune_bayes"</span>,</span>
<span id="cb10-11">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">iter =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>L,</span>
<span id="cb10-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">resamples =</span> penguin_folds,</span>
<span id="cb10-13">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">control =</span> bayes_control</span>
<span id="cb10-14">  )</span></code></pre></div></div>
</div>
<p>We can now use <code>rank_results()</code> to rank the models in the workflow set based on their performance based on our specified metrics - the area under the receiver operating characteristic curve (ROC AUC). ROC AUC is a measure of a model’s ability to distinguish between positive and negative classes. A higher ROC AUC indicates a better-performing model with a maximum value of 1. Using the rank table, we can select the workflow ID for the best performing model.</p>
<p>Throughout many tidymodels packages, <code>autoplot</code> is a handy method to rapidly visualize steps in a model workflow. These methods are specified by the package authors, and some <code>autoplot</code> methods have some options to customize the output. These are <code>ggplot</code> objects, so customize their appearance is easy.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rank_results</span>(workflow_set,</span>
<span id="cb11-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">rank_metric =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"roc_auc"</span>,</span>
<span id="cb11-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">select_best =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span></span>
<span id="cb11-4">) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb11-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gt</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div id="ofujbkyrlk" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>html {
  font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Helvetica Neue', 'Fira Sans', 'Droid Sans', Arial, sans-serif;
}

#ofujbkyrlk .gt_table {
  display: table;
  border-collapse: collapse;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#ofujbkyrlk .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#ofujbkyrlk .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#ofujbkyrlk .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#ofujbkyrlk .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 0;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#ofujbkyrlk .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#ofujbkyrlk .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#ofujbkyrlk .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#ofujbkyrlk .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#ofujbkyrlk .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#ofujbkyrlk .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#ofujbkyrlk .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#ofujbkyrlk .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#ofujbkyrlk .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#ofujbkyrlk .gt_from_md > :first-child {
  margin-top: 0;
}

#ofujbkyrlk .gt_from_md > :last-child {
  margin-bottom: 0;
}

#ofujbkyrlk .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#ofujbkyrlk .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#ofujbkyrlk .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#ofujbkyrlk .gt_row_group_first td {
  border-top-width: 2px;
}

#ofujbkyrlk .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#ofujbkyrlk .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#ofujbkyrlk .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#ofujbkyrlk .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#ofujbkyrlk .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#ofujbkyrlk .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#ofujbkyrlk .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#ofujbkyrlk .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#ofujbkyrlk .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#ofujbkyrlk .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-left: 4px;
  padding-right: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#ofujbkyrlk .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#ofujbkyrlk .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#ofujbkyrlk .gt_left {
  text-align: left;
}

#ofujbkyrlk .gt_center {
  text-align: center;
}

#ofujbkyrlk .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#ofujbkyrlk .gt_font_normal {
  font-weight: normal;
}

#ofujbkyrlk .gt_font_bold {
  font-weight: bold;
}

#ofujbkyrlk .gt_font_italic {
  font-style: italic;
}

#ofujbkyrlk .gt_super {
  font-size: 65%;
}

#ofujbkyrlk .gt_footnote_marks {
  font-style: italic;
  font-weight: normal;
  font-size: 75%;
  vertical-align: 0.4em;
}

#ofujbkyrlk .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#ofujbkyrlk .gt_indent_1 {
  text-indent: 5px;
}

#ofujbkyrlk .gt_indent_2 {
  text-indent: 10px;
}

#ofujbkyrlk .gt_indent_3 {
  text-indent: 15px;
}

#ofujbkyrlk .gt_indent_4 {
  text-indent: 20px;
}

#ofujbkyrlk .gt_indent_5 {
  text-indent: 25px;
}
</style>

<table class="gt_table caption-top table table-sm table-striped small">
<thead class="gt_col_headings">
<tr class="header">
<th id="wflow_id" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col">wflow_id</th>
<th id=".config" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col">.config</th>
<th id=".metric" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col">.metric</th>
<th id="mean" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">mean</th>
<th id="std_err" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">std_err</th>
<th id="n" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">n</th>
<th id="preprocessor" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col">preprocessor</th>
<th id="model" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col">model</th>
<th id="rank" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">rank</th>
</tr>
</thead>
<tbody class="gt_table_body">
<tr class="odd">
<td class="gt_row gt_left" headers="wflow_id">recipe_torch</td>
<td class="gt_row gt_left" headers=".config">Preprocessor1_Model3</td>
<td class="gt_row gt_left" headers=".metric">accuracy</td>
<td class="gt_row gt_right" headers="mean">0.8953333</td>
<td class="gt_row gt_right" headers="std_err">0.02186491</td>
<td class="gt_row gt_right" headers="n">10</td>
<td class="gt_row gt_left" headers="preprocessor">recipe</td>
<td class="gt_row gt_left" headers="model">mlp</td>
<td class="gt_row gt_right" headers="rank">1</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="wflow_id">recipe_torch</td>
<td class="gt_row gt_left" headers=".config">Preprocessor1_Model3</td>
<td class="gt_row gt_left" headers=".metric">roc_auc</td>
<td class="gt_row gt_right" headers="mean">0.9656795</td>
<td class="gt_row gt_right" headers="std_err">0.01237363</td>
<td class="gt_row gt_right" headers="n">10</td>
<td class="gt_row gt_left" headers="preprocessor">recipe</td>
<td class="gt_row gt_left" headers="model">mlp</td>
<td class="gt_row gt_right" headers="rank">1</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="wflow_id">recipe_tree</td>
<td class="gt_row gt_left" headers=".config">Preprocessor1_Model4</td>
<td class="gt_row gt_left" headers=".metric">accuracy</td>
<td class="gt_row gt_right" headers="mean">0.8916667</td>
<td class="gt_row gt_right" headers="std_err">0.02309000</td>
<td class="gt_row gt_right" headers="n">10</td>
<td class="gt_row gt_left" headers="preprocessor">recipe</td>
<td class="gt_row gt_left" headers="model">rand_forest</td>
<td class="gt_row gt_right" headers="rank">2</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="wflow_id">recipe_tree</td>
<td class="gt_row gt_left" headers=".config">Preprocessor1_Model4</td>
<td class="gt_row gt_left" headers=".metric">roc_auc</td>
<td class="gt_row gt_right" headers="mean">0.9656541</td>
<td class="gt_row gt_right" headers="std_err">0.01552447</td>
<td class="gt_row gt_right" headers="n">10</td>
<td class="gt_row gt_left" headers="preprocessor">recipe</td>
<td class="gt_row gt_left" headers="model">rand_forest</td>
<td class="gt_row gt_right" headers="rank">2</td>
</tr>
<tr class="odd">
<td class="gt_row gt_left" headers="wflow_id">recipe_glm</td>
<td class="gt_row gt_left" headers=".config">Preprocessor1_Model1</td>
<td class="gt_row gt_left" headers=".metric">accuracy</td>
<td class="gt_row gt_right" headers="mean">0.8956667</td>
<td class="gt_row gt_right" headers="std_err">0.02540560</td>
<td class="gt_row gt_right" headers="n">10</td>
<td class="gt_row gt_left" headers="preprocessor">recipe</td>
<td class="gt_row gt_left" headers="model">logistic_reg</td>
<td class="gt_row gt_right" headers="rank">3</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers="wflow_id">recipe_glm</td>
<td class="gt_row gt_left" headers=".config">Preprocessor1_Model1</td>
<td class="gt_row gt_left" headers=".metric">roc_auc</td>
<td class="gt_row gt_right" headers="mean">0.9639505</td>
<td class="gt_row gt_right" headers="std_err">0.01352438</td>
<td class="gt_row gt_right" headers="n">10</td>
<td class="gt_row gt_left" headers="preprocessor">recipe</td>
<td class="gt_row gt_left" headers="model">logistic_reg</td>
<td class="gt_row gt_right" headers="rank">3</td>
</tr>
</tbody>
</table>

</div>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">workflow_set <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">autoplot</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://jameshwade.com/posts/2022-12-27_mlops-the-whole-game_files/figure-html/unnamed-chunk-10-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>In this case <code>autoplot()</code> compare the results from each of our workflows showing both <code>accuracy</code> and <code>roc_auc</code>. Logistic regression appears to be the best model based on these metrics given its comparable performance and lower model complexity.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">best_model_id <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"recipe_glm"</span></span></code></pre></div></div>
</div>
</section>
<section id="finalize-model" class="level3">
<h3 class="anchored" data-anchor-id="finalize-model">Finalize Model</h3>
<p>Now that we have compared our models and identified the top performing one based on <code>roc_auc</code>, we can finalize the workflow and fit the model will the full dataset (i.e., not just training data).</p>
<p>In the code below, the <code>best_fit</code> object is extract the best model from the workflow using the workflow ID we selected above. This is done with <code>workflowsets::extract_workflow_set_result()</code> and <code>tune::select_best()</code> to give us <code>best_fit</code>, a tibble of hyperparameters for the best fit model.</p>
<p>We can then use <code>finalize_workflow()</code> to take the hyperparameters from <code>best_fit</code> and apply it to the <code>final_workflow</code> object. We can then update the fit of the model to use the entire training set instead of folds and evaluate the model on the test set.</p>
<p>The <code>collect_metrics()</code> and <code>collect_performance()</code> functions are convenience functions to to check model performance. We can again use <code>autoplot()</code> to visualize model results, in this case ROC curves.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">best_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb14-2">  workflow_set <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_workflow_set_result</span>(best_model_id) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select_best</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">metric =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"accuracy"</span>)</span>
<span id="cb14-5"></span>
<span id="cb14-6">final_workflow <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb14-7">  workflow_set <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_workflow</span>(best_model_id) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">finalize_workflow</span>(best_fit)</span>
<span id="cb14-10"></span>
<span id="cb14-11">final_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb14-12">  final_workflow <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">last_fit</span>(penguin_split)</span>
<span id="cb14-14"></span>
<span id="cb14-15">final_fit <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_metrics</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gt</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div id="seybfbewmz" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>html {
  font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Helvetica Neue', 'Fira Sans', 'Droid Sans', Arial, sans-serif;
}

#seybfbewmz .gt_table {
  display: table;
  border-collapse: collapse;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#seybfbewmz .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#seybfbewmz .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#seybfbewmz .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#seybfbewmz .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 0;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#seybfbewmz .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#seybfbewmz .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#seybfbewmz .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#seybfbewmz .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#seybfbewmz .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#seybfbewmz .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#seybfbewmz .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#seybfbewmz .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#seybfbewmz .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#seybfbewmz .gt_from_md > :first-child {
  margin-top: 0;
}

#seybfbewmz .gt_from_md > :last-child {
  margin-bottom: 0;
}

#seybfbewmz .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#seybfbewmz .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#seybfbewmz .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#seybfbewmz .gt_row_group_first td {
  border-top-width: 2px;
}

#seybfbewmz .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#seybfbewmz .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#seybfbewmz .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#seybfbewmz .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#seybfbewmz .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#seybfbewmz .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#seybfbewmz .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#seybfbewmz .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#seybfbewmz .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#seybfbewmz .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-left: 4px;
  padding-right: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#seybfbewmz .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#seybfbewmz .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#seybfbewmz .gt_left {
  text-align: left;
}

#seybfbewmz .gt_center {
  text-align: center;
}

#seybfbewmz .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#seybfbewmz .gt_font_normal {
  font-weight: normal;
}

#seybfbewmz .gt_font_bold {
  font-weight: bold;
}

#seybfbewmz .gt_font_italic {
  font-style: italic;
}

#seybfbewmz .gt_super {
  font-size: 65%;
}

#seybfbewmz .gt_footnote_marks {
  font-style: italic;
  font-weight: normal;
  font-size: 75%;
  vertical-align: 0.4em;
}

#seybfbewmz .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#seybfbewmz .gt_indent_1 {
  text-indent: 5px;
}

#seybfbewmz .gt_indent_2 {
  text-indent: 10px;
}

#seybfbewmz .gt_indent_3 {
  text-indent: 15px;
}

#seybfbewmz .gt_indent_4 {
  text-indent: 20px;
}

#seybfbewmz .gt_indent_5 {
  text-indent: 25px;
}
</style>

<table class="gt_table caption-top table table-sm table-striped small">
<thead class="gt_col_headings">
<tr class="header">
<th id=".metric" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col">.metric</th>
<th id=".estimator" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col">.estimator</th>
<th id=".estimate" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" scope="col">.estimate</th>
<th id=".config" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" scope="col">.config</th>
</tr>
</thead>
<tbody class="gt_table_body">
<tr class="odd">
<td class="gt_row gt_left" headers=".metric">accuracy</td>
<td class="gt_row gt_left" headers=".estimator">binary</td>
<td class="gt_row gt_right" headers=".estimate">0.9047619</td>
<td class="gt_row gt_left" headers=".config">Preprocessor1_Model1</td>
</tr>
<tr class="even">
<td class="gt_row gt_left" headers=".metric">roc_auc</td>
<td class="gt_row gt_left" headers=".estimator">binary</td>
<td class="gt_row gt_right" headers=".estimate">0.9705215</td>
<td class="gt_row gt_left" headers=".config">Preprocessor1_Model1</td>
</tr>
</tbody>
</table>

</div>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">final_fit <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_predictions</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">roc_curve</span>(sex, .pred_female) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">autoplot</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://jameshwade.com/posts/2022-12-27_mlops-the-whole-game_files/figure-html/unnamed-chunk-12-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</section>
</section>
<section id="model-deployment" class="level2">
<h2 class="anchored" data-anchor-id="model-deployment">Model Deployment</h2>
<p>The <a href="https://rstudio.github.io/vetiver-r/"><code>{vetiver}</code></a> package provides a set of tools for building, deploying, and managing machine learning models in production. It allows users to easily create, version, and deploy machine learning models to various hosting platforms, such as Posit Connect or a cloud hosting service like Azure.</p>
<p>The <code>vetiver_model()</code> function is used to create an object that stores a machine learning model and its associated metadata, such as the model’s name, type, and parameters. <code>vetiver_pin_write()</code> and <code>vetiver_pin_read()</code> functions are used to write and read <code>vetiver_model</code> objects to and from a server.</p>
<section id="create-vetiver-model" class="level3">
<h3 class="anchored" data-anchor-id="create-vetiver-model">Create Vetiver Model</h3>
<p>To deploy our model with <code>{vetiver}</code>, we start with our <code>final_fit</code> from above, we first need to extract the trained workflow. We can do that with <code>tune::extract_workflow()</code>. The trained workflow is what we will deploy as a <code>vetiver_model</code>. That means we need to convert it from a workflow to a vetiver model with <code>vetiver_model()</code>.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">final_fit_to_deploy <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> final_fit <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_workflow</span>()</span>
<span id="cb16-2"></span>
<span id="cb16-3">v <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_model</span>(final_fit_to_deploy, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model_name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"penguins_model"</span>)</span>
<span id="cb16-4"></span>
<span id="cb16-5">v</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
── penguins_model ─ &lt;bundled_workflow&gt; model for deployment 
A glm classification modeling workflow using 5 features</code></pre>
</div>
</div>
</section>
<section id="pin-model-to-board" class="level3">
<h3 class="anchored" data-anchor-id="pin-model-to-board">Pin Model to Board</h3>
<p>The <a href="https://pins.rstudio.com/"><code>{pins}</code></a> package is used for storing and managing data sets in a local or remote repository. <code>{pins}</code> allows users to “pin” data sets to a “board”, allowing them to be easily accessed and shared with others. Using the pins package, users can create a board, add data sets, and access and retrieve data sets from the board. The <code>board_rsconnect()</code> function is used to create a <code>model_board</code> or connect to an existing board on Posit Connect (formerly RStudio Connect), which is a connection to a server where a <code>vetiver_model</code> can be stored and accessed. We also specify <code>versioned = TRUE</code> so that we can version control our vetiver models.</p>
<p>Once the <code>model_board</code> connection is made it’s as easy as <code>vetiver_pin_write()</code> to “pin” our model to the model board and <code>vetiver_pin_read()</code> to access it. In this case, we must specify the username of the author of the pin, which in this case is <code>james</code>.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">model_board <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">board_local</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">versioned =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb18-2">model_board <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_pin_write</span>(v)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>Creating new version '20221228T000205Z-9c561'
Writing to pin 'penguins_model'

Create a Model Card for your published model
• Model Cards provide a framework for transparent, responsible reporting
• Use the vetiver `.Rmd` template as a place to start</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">model_board <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_pin_read</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"penguins_model"</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
── penguins_model ─ &lt;bundled_workflow&gt; model for deployment 
A glm classification modeling workflow using 5 features</code></pre>
</div>
</div>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1">model_board <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">board_rsconnect</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">versioned =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb22-2">model_board <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_pin_write</span>(v)</span>
<span id="cb22-3">model_board <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_pin_read</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"penguins_model"</span>)</span></code></pre></div></div>
</div>
</section>
<section id="create-model-api" class="level3">
<h3 class="anchored" data-anchor-id="create-model-api">Create Model API</h3>
<p>Our next step is to use <code>{vetiver}</code> and <a href="https://www.rplumber.io/"><code>{plumber}</code></a> packages to create an API for our vetiver model, which can then be accessed and used to make predictions or perform other tasks via an HTTP request. <code>pr()</code> creates a new plumber router, and <code>vetiver_api(v)</code> adds a <code>POST</code> endpoint to make endpoints from a trained vetiver model. <code>vetiver_write_plumber()</code> creates a <code>plumber.R</code> file that specifies the model version of the model we pinned to our model dashboard with <code>vetiver_pin_write()</code>.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pr</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb23-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_api</span>(v)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># Plumber router with 2 endpoints, 4 filters, and 1 sub-router.
# Use `pr_run()` on this object to start the API.
├──[queryString]
├──[body]
├──[cookieParser]
├──[sharedSecret]
├──/logo
│  │ # Plumber static router serving from directory: /Library/Frameworks/R.framework/Versions/4.2/Resources/library/vetiver
├──/ping (GET)
└──/predict (POST)</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_write_plumber</span>(model_board, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"penguins_model"</span>)</span></code></pre></div></div>
</div>
<p>Here is an example of the <code>plumber.R</code> file generated by <code>vetiver_write_pumber()</code>.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb26-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Generated by the vetiver package; edit with care</span></span>
<span id="cb26-2"></span>
<span id="cb26-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(pins)</span>
<span id="cb26-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(plumber)</span>
<span id="cb26-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(rapidoc)</span>
<span id="cb26-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(vetiver)</span>
<span id="cb26-7"></span>
<span id="cb26-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Packages needed to generate model predictions</span></span>
<span id="cb26-9"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>) {</span>
<span id="cb26-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(parsnip)</span>
<span id="cb26-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(recipes)</span>
<span id="cb26-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(stats)</span>
<span id="cb26-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(workflows)</span>
<span id="cb26-14">}</span>
<span id="cb26-15">b <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">board_rsconnect</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"envvar"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">server =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://connect.mycompany.com"</span>)</span>
<span id="cb26-16">v <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_pin_read</span>(b, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"penguins_model"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">version =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"6926"</span>)</span>
<span id="cb26-17"></span>
<span id="cb26-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#* @plumber</span></span>
<span id="cb26-19"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(pr) {</span>
<span id="cb26-20">  pr <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_api</span>(v)</span>
<span id="cb26-21">}</span></code></pre></div></div>
</div>
</section>
<section id="deploy-api-to-posit-connect" class="level3">
<h3 class="anchored" data-anchor-id="deploy-api-to-posit-connect">Deploy API to Posit Connect</h3>
<p>This model can be hosted in a variety of locations. One of the easiest to use is Posit Connect. <code>vetiver_deploy_rsconnect()</code> does that for us. All we need to specify is the name of the pinned vetiver model and the model board.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb27-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_deploy_rsconnect</span>(</span>
<span id="cb27-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">board =</span> model_board,</span>
<span id="cb27-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"penguins_model"</span>,</span>
<span id="cb27-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">account =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"james"</span></span>
<span id="cb27-5">)</span></code></pre></div></div>
</div>
</section>
<section id="deploying-elsewhere" class="level3">
<h3 class="anchored" data-anchor-id="deploying-elsewhere">Deploying Elsewhere</h3>
<p>If Posit Connect is not the right place for our model, <code>vetiver_write_docker</code> creates a <code>dockerfile</code> and <code>renv.lock</code>.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_write_docker</span>(v)</span></code></pre></div></div>
</div>
<p>Here is an example of the dockerfile that is generated.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode dockerfile code-with-copy"><code class="sourceCode dockerfile"><span id="cb29-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Generated by the vetiver package; edit with care</span></span>
<span id="cb29-2"></span>
<span id="cb29-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">FROM</span> rocker/r-ver:4.2.1</span>
<span id="cb29-4"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">ENV</span> RENV_CONFIG_REPOS_OVERRIDE https://packagemanager.rstudio.com/cran/latest</span>
<span id="cb29-5"></span>
<span id="cb29-6"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">RUN</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">apt-get</span> update <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-qq</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">&amp;&amp;</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">apt-get</span> install <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-y</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--no-install-recommends</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb29-7">libcurl4-openssl-dev <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb29-8">libicu-dev <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb29-9">libsodium-dev <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb29-10">libssl-dev <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb29-11">make <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb29-12">zlib1g-dev <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb29-13"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">&amp;&amp;</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">apt-get</span> clean</span>
<span id="cb29-14"></span>
<span id="cb29-15"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">COPY</span> vetiver_renv.lock renv.lock</span>
<span id="cb29-16"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">RUN</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">Rscript</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-e</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"install.packages('renv')"</span></span>
<span id="cb29-17"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">RUN</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">Rscript</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-e</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"renv::restore()"</span></span>
<span id="cb29-18"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">COPY</span> plumber.R /opt/ml/plumber.R</span>
<span id="cb29-19"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">EXPOSE</span> 8000</span>
<span id="cb29-20"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">ENTRYPOINT</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"R"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"-e"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pr &lt;- plumber::plumb('/opt/ml/plumber.R'); pr$run(host = '0.0.0.0', port = 8000)"</span>]</span></code></pre></div></div>
</section>
<section id="using-the-api-to-make-predictions" class="level3">
<h3 class="anchored" data-anchor-id="using-the-api-to-make-predictions">Using the API to Make Predictions</h3>
<p>The api deployment site url <code>https://connect.mycompany.com/penguins</code>, and the prediction endpoint is <code>https://connect.mycompany.com/penguins/predict</code>.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb30-1">endpoint <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb30-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_endpoint</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://connect.mycompany.com/penguins/predict"</span>)</span></code></pre></div></div>
</div>
<p>We can make endpoints with the endpoint using <code>predict</code>.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1">new_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb31-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">species =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Adelie"</span>,</span>
<span id="cb31-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bill_length_mm =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">40.5</span>,</span>
<span id="cb31-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bill_depth_mm =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">18.9</span>,</span>
<span id="cb31-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">flipper_length_mm =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">180</span>,</span>
<span id="cb31-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">body_mass_g =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3950</span></span>
<span id="cb31-7">)</span>
<span id="cb31-8"></span>
<span id="cb31-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(endpoint, new_data)</span></code></pre></div></div>
</div>
<p>We can also use <code>{httr}</code> to call the API. In most cases, it is easier for R users to use <code>predict</code> rather than <code>httr::POST</code>. However, were this model written in another language, making predictions using <code>{httr}</code> would likely bet the best approach.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb32-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(httr)</span>
<span id="cb32-2">url <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://connect.mycompany.com/penguins/predict"</span></span>
<span id="cb32-3">json_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> jsonlite<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">toJSON</span>(new_data)</span>
<span id="cb32-4">response <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">POST</span>(url, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">body =</span> json_data)</span>
<span id="cb32-5">response</span>
<span id="cb32-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">content</span>(response)</span></code></pre></div></div>
</div>
<p>Avoiding a language-specific approach altogether, we can use <code>curl</code> in a terminal to make API calls.</p>
<div class="code-with-filename">
<div class="code-with-filename-file">
<pre><strong>Terminal</strong></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb33" data-filename="Terminal" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb33-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#| file</span></span>
<span id="cb33-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">curl</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-X</span> POST <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://connect.mycompany.com/penguins/predict"</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb33-3"> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-H</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Accept: application/json"</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb33-4"> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-H</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Content-Type: application/json"</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb33-5"> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-d</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'[{"species":"Adelie","bill_length_mm":0.5,"bill_depth_mm":0.5,"flipper_length_mm":0,"body_mass_g":0}]'</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span></code></pre></div></div>
</div>
</section>
</section>
<section id="model-monitoring" class="level2">
<h2 class="anchored" data-anchor-id="model-monitoring">Model Monitoring</h2>
<p>After deployment, we need to monitor model performance. The <a href="https://vetiver.rstudio.com/get-started/monitor.html">MLOps with vetiver monitoring page</a> describes this well:</p>
<blockquote class="blockquote">
<p>Machine learning can break quietly; a model can continue returning predictions without error, even if it is performing poorly. Often these quiet performance problems are discussed as types of model drift; data drift can occur when the statistical distribution of an input feature changes, or concept drift occurs when there is change in the relationship between the input features and the outcome.</p>
<p>Without monitoring for degradation, this silent failure can continue undiagnosed. The vetiver framework offers functions to fluently compute, store, and plot model metrics. These functions are particularly suited to monitoring your model using multiple performance metrics over time. Effective model monitoring is not “one size fits all”, but instead depends on choosing appropriate metrics and time aggregation for a given application.</p>
</blockquote>
<p>As a baseline for model performance, we can start by using our training set to create original metrics for the model. We also simulate a <code>date_obs</code> column. In a real example, we should use the date the data was collected.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb34-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1234</span>)</span>
<span id="cb34-2">penguin_train_by_date <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb34-3">  penguin_train <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb34-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb34-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">date_obs =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.Date</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb34-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb34-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(date_obs)</span>
<span id="cb34-8"></span>
<span id="cb34-9">original_metrics <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb34-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">augment</span>(v, penguin_train_by_date) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb34-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_compute_metrics</span>(</span>
<span id="cb34-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">date_var =</span> date_obs,</span>
<span id="cb34-13">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">period =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"day"</span>,</span>
<span id="cb34-14">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">truth =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sex"</span>,</span>
<span id="cb34-15">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">estimate =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".pred_class"</span></span>
<span id="cb34-16">  )</span>
<span id="cb34-17"></span>
<span id="cb34-18"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_plot_metrics</span>(original_metrics)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://jameshwade.com/posts/2022-12-27_mlops-the-whole-game_files/figure-html/unnamed-chunk-23-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>We can pin the model performance metrics, just as we did with the model.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb35-1">model_board <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb35-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pin_write</span>(original_metrics, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"penguin_metrics"</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>Guessing `type = 'rds'`
Creating new version '20221228T000206Z-3e18d'
Writing to pin 'penguin_metrics'</code></pre>
</div>
</div>
<p>To simulate the model going “live”, let’s use the test set to add more predictions.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb37-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1234</span>)</span>
<span id="cb37-2">penguin_test_by_date <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb37-3">  penguin_test <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb37-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb37-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">date_obs =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.Date</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb37-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb37-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(date_obs)</span>
<span id="cb37-8"></span>
<span id="cb37-9">v <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb37-10">  model_board <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb37-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_pin_read</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"penguins_model"</span>)</span>
<span id="cb37-12"></span>
<span id="cb37-13">new_metrics <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb37-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">augment</span>(v, penguin_test_by_date) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb37-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_compute_metrics</span>(</span>
<span id="cb37-16">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">date_var =</span> date_obs,</span>
<span id="cb37-17">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">period =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"day"</span>,</span>
<span id="cb37-18">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">truth =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sex"</span>,</span>
<span id="cb37-19">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">estimate =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".pred_class"</span></span>
<span id="cb37-20">  )</span>
<span id="cb37-21"></span>
<span id="cb37-22">model_board <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb37-23">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_pin_metrics</span>(new_metrics, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"penguin_metrics"</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>Creating new version '20221228T000206Z-7c132'
Writing to pin 'penguin_metrics'</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 20 × 5
   .index        .n .metric  .estimator .estimate
   &lt;date&gt;     &lt;int&gt; &lt;chr&gt;    &lt;chr&gt;          &lt;dbl&gt;
 1 2022-12-17    32 accuracy binary         0.844
 2 2022-12-17    32 kap      binary         0.688
 3 2022-12-18    45 accuracy binary         0.911
 4 2022-12-18    45 kap      binary         0.82 
 5 2022-12-19    29 accuracy binary         0.966
 6 2022-12-19    29 kap      binary         0.931
 7 2022-12-20    34 accuracy binary         0.912
 8 2022-12-20    34 kap      binary         0.820
 9 2022-12-21    44 accuracy binary         0.886
10 2022-12-21    44 kap      binary         0.759
11 2022-12-22    31 accuracy binary         0.903
12 2022-12-22    31 kap      binary         0.807
13 2022-12-23    34 accuracy binary         0.941
14 2022-12-23    34 kap      binary         0.881
15 2022-12-24    30 accuracy binary         0.867
16 2022-12-24    30 kap      binary         0.724
17 2022-12-25    30 accuracy binary         0.933
18 2022-12-25    30 kap      binary         0.867
19 2022-12-26    24 accuracy binary         0.917
20 2022-12-26    24 kap      binary         0.833</code></pre>
</div>
</div>
<p>Now that we’ve updated the model metrics, we can plot model performance over time , again using the <code>vetiver_plot_metrics()</code> function.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb40" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb40-1">monitoring_metrics <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb40-2">  model_board <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pin_read</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"penguin_metrics"</span>)</span>
<span id="cb40-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vetiver_plot_metrics</span>(monitoring_metrics)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://jameshwade.com/posts/2022-12-27_mlops-the-whole-game_files/figure-html/unnamed-chunk-26-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>For ease of compute, I don’t actually re-calculate the workflow_set in this document. There’s a hidden chunk that reads a pinned version of the workflow_set result. For the curious, that’s <code>model_board &lt;- board_local()</code> and <code>workflow_set &lt;- mode_board |&gt; pin_read("workflow_set")</code> .↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>mlops</category>
  <category>modeling</category>
  <category>vetiver</category>
  <category>pins</category>
  <category>deployment</category>
  <category>R</category>
  <guid>https://jameshwade.com/posts/2022-12-27_mlops-the-whole-game.html</guid>
  <pubDate>Tue, 27 Dec 2022 00:00:00 GMT</pubDate>
  <media:content url="https://jameshwade.com/posts/images/penguins.png" medium="image" type="image/png" height="86" width="144"/>
</item>
<item>
  <title>How to Teach Tech</title>
  <dc:creator>James H Wade</dc:creator>
  <link>https://jameshwade.com/posts/2022-02-01_how-to-teach-tech.html</link>
  <description><![CDATA[ 





<p>Much of my inspiration for this project come from <a href="https://third-bit.com/">Greg Wilson</a>, founder of the <a href="https://carpentries.org/">Carpentries</a>. These notes are based on his talks about how to teach tech. The first is a talk he gave in April 2019 for RStudio,<sup>1</sup> and the second is an update to that talk from July 2021 on his personal YouTube channel.<sup>2</sup></p>
<section id="lesson-1-design-curriculum-for-learning-stages" class="level2">
<h2 class="anchored" data-anchor-id="lesson-1-design-curriculum-for-learning-stages">Lesson 1: Design Curriculum for Learning Stages</h2>
<p>There are three stages of learners: <strong>novices</strong>, <strong>competents</strong>, and <strong>experts</strong>. The curriculum and teaching styles should be distinct for each learning stage.</p>
<section id="novice" class="level3">
<h3 class="anchored" data-anchor-id="novice">Novice</h3>
<p>Novices can follow a set of instructions, but they get stuck if they deviate even slightly from the instructions. They may ask nonsensical questions (e.g., “What color is this database?”), and they cannot identify relevant details for the topic. Novices may not know they are a novice and misidentify as competent or even an expert. They lack a <strong>mental model</strong> of the problem. Your job as the instructor is to <strong>guide</strong> a novice to gain a mental model. You <em>push</em> knowledge to them.</p>
<p>A mental model is an incomplete and likely inaccurate framing of a topic, but it lets a novice grasp central learning concepts. A <strong>concept map</strong> is a great way to build and share mental models. A concept map <strong>connects concepts with labeled connections</strong>.</p>
</section>
<section id="competent" class="level3">
<h3 class="anchored" data-anchor-id="competent">Competent</h3>
<p>Competent practitioners have a connected mental model. To solve a problem, they reason through a series of intermediate steps and usually come to the correct conclusion, albeit slowly. As an instructor, <strong>mentor</strong> competent practitioners. Give them problems to expand their knowledge, but allow them to learn at their own pace. Do not let a competent practitioner be stuck for too long. Momentum is important for learning.</p>
<section id="avoid-mixing-guiding-mentoring" class="level4">
<h4 class="anchored" data-anchor-id="avoid-mixing-guiding-mentoring">Avoid Mixing Guiding &amp; Mentoring</h4>
<p>A common mistake when teaching is to mentor a novice learner and guide a competent practitioner. Pushing knowledge to guide a competent practitioners will frustrate them because you tell them what they already know. Asking novices to tackle a problem on their own will frustrate them because they do not know what to explore.</p>
</section>
</section>
<section id="expert" class="level3">
<h3 class="anchored" data-anchor-id="expert">Expert</h3>
<p>Experts are able to see solutions at a glance. They bring multiple points of view to a a problem, and they excel at debugging. They fluency comes from thinking back and forth between causes and effects. Experts may <strong>struggle to explain their thinking</strong>. To them, problems are obvious, and they have forgotten the experience of a novice. Experts may be bad teacher for novices.</p>
<p>To teach an expert, ask them to <strong>reflect</strong> upon their work and <strong>give feedback</strong> on their thinking. Your aim is for them to learn to reflect on why more effectively.</p>
</section>
</section>
<section id="lesson-2-a-lesson-is-a-user-interface-for-knowledge" class="level2">
<h2 class="anchored" data-anchor-id="lesson-2-a-lesson-is-a-user-interface-for-knowledge">Lesson 2: A Lesson is a User Interface for Knowledge</h2>
<p>Much like designing a new user interface, your first step in to creating learning content is to <strong>create personas</strong> of learners. These are <em>fictional</em> characters to capture key properties of your target audience. A persona consists of:</p>
<ol type="1">
<li>General background: Who are they? What do they do when not learning?</li>
<li>Relevant experience: What have they done before? This is better than a list of prerequisites.</li>
<li>Perceived needs: What do they <em>think</em> they want to learn?</li>
<li>Special considerations: How are they different from you as the instructor? What constraints do they have on learning</li>
</ol>
</section>
<section id="lesson-3-use-formative-assessments-as-unit-tests-for-learning" class="level2">
<h2 class="anchored" data-anchor-id="lesson-3-use-formative-assessments-as-unit-tests-for-learning">Lesson 3: Use Formative Assessments as Unit Tests for Learning</h2>
<p>A formative assessment checks for retention of key learning concepts. If the learner gets the assessment wrong, you gain insight into the learners mental model. Your assessment must have <strong>diagnostic power</strong>. Build formative assessments <em>before</em> you create your curriculum. You need to understand the mental model of your learners. If you do this well, you can dynamically adjust the lessons based on the prior knowledge of the learner.</p>
</section>
<section id="lesson-4-manage-cognitive-load" class="level2">
<h2 class="anchored" data-anchor-id="lesson-4-manage-cognitive-load">Lesson 4: Manage Cognitive Load</h2>
<p>Do not overload your students with too much information at once. This does <em>not</em> mean that you cannot convey complex ideas or use complex figures. You must introduce components of a complex idea or image gradually. A practical example of this is to use slide builds. You want new linguistic and audio information at the same time. This is one reason why <a href="https://andymatuschak.org/books/">books do not work</a>.</p>
<p>Short term memory is the bottleneck of learning. We often overestimate our short term memory capacity. Modern estimates of short term memory are 4 +/- 1 chunks for knowledge. Short term memory capacity determines how you convert concept maps to lessons. Only once a concept map is decomposed into digestible chunks can you create the lesson. Count the concepts. Remember to not spoon feeding learners. Use your personas and learning stages to guide you.</p>
</section>
<section id="lesson-5-active-learning-beats-passive-learning-every-time" class="level2">
<h2 class="anchored" data-anchor-id="lesson-5-active-learning-beats-passive-learning-every-time">Lesson 5: Active Learning Beats Passive Learning Every Time</h2>
<p>Active learning results in better learning outcomes. However, most learners will prefer passive learning as that is how most were taught throughout their childhood education. For strategies on how to incorporate active learning into your lessons, explore <a href="https://www.learningscientists.org/">The Learning Scientists</a>.</p>
</section>
<section id="lesson-6-learner-are-not-robots" class="level2">
<h2 class="anchored" data-anchor-id="lesson-6-learner-are-not-robots">Lesson 6: Learner are Not Robots</h2>
<p>For most learners, the most important factor for success is <strong>intrinsic motivation</strong>. “I’m learning this because I want to.” You can increase intrinsic motivation of your students by increasing self-efficacy, utility, and community. Formative assessments increase self-efficacy by giving some control over the pace learning to learners. Utility requires that learners can apply new concepts soon after learning. A community of learners that want to learn will build upon each other. Demotivators a are unpredictability, unfairness, and indifference. Being an ally to your learners can make a big difference in student motivation. Valerie Aurora covers <a href="https://frameshiftconsulting.com/ally-skills-workshop/">how to be a good ally</a>.</p>


<script type="module" src="https://js.withorbit.com/orbit-web-component.js"></script>


<orbit-reviewarea color="purple"> <orbit-prompt question="What do novices lack when learning new concepts?" answer="mental model"></orbit-prompt> <orbit-prompt cloze="As a teacher, you roles is to {guide} novices and {mentor} competent practitioners."></orbit-prompt> <orbit-prompt question="What makes experts bad teachers?" answer="struggle to explain thinking"></orbit-prompt> <orbit-prompt question="What feeling does guiding a competent practitioner evoke?" answer="frustration"></orbit-prompt> <orbit-prompt question="When instructing an expert in a subject, what do you want them to do?" answer="**reflect** on their thinking"></orbit-prompt> <orbit-prompt question="What feature must a formative assessment have in order to guide you instruction?" answer="diagnostic power"></orbit-prompt> <orbit-prompt question="What is the most important factor in whether you learn something?" answer="intrinsic motivation"></orbit-prompt> <orbit-prompt question="Great teachers can influence the *intrinsic motivation* of their students by increasing what three things?" answer="self-efficacy, utility, community"></orbit-prompt> <orbit-prompt question="How many elements should you select from a concept map to put into a lesson?" answer="No more than 7"></orbit-prompt> <orbit-prompt question="What do you do after building a concept map to create teaching curricula?" answer="isolate digestible elements"></orbit-prompt> <orbit-prompt question="What is the bottleneck to learning?" answer="short term memory"></orbit-prompt> </orbit-reviewarea>


</section>
<section id="miscellaneous-tips-and-tricks" class="level2">
<h2 class="anchored" data-anchor-id="miscellaneous-tips-and-tricks">Miscellaneous Tips and Tricks</h2>
<section id="what-can-you-do-to-help-disparate-learners" class="level3">
<h3 class="anchored" data-anchor-id="what-can-you-do-to-help-disparate-learners">What Can You Do to Help Disparate Learners?</h3>
<p>This is a common problem that is difficult to address. Here’s advice from Wilson on what you can try:</p>
<ol type="1">
<li>Avoid this if you can. Can you have different session based on prior knowledge?</li>
<li>Split the room.</li>
<li>Use advanced learners to teach less advanced learners. <strong>Note</strong>: This can backfire in corporate settings since everyone expects to be taught.</li>
<li>Use pair programming or another type of pairing. Pairs will be more homogeneous than individuals. People will realize that they are not alone in their struggles. Mismatched pairs can have their own student-teacher scenario where the “teacher” will also learn.</li>
<li>Synchronous self-paced work. You can learn at your own pace, but you have instructors and helpers in the room. This is experimental but an area of interest to Wilson.</li>
</ol>
</section>
<section id="pre-assessments" class="level3">
<h3 class="anchored" data-anchor-id="pre-assessments">Pre-Assessments</h3>
<section id="the-perils-of-pre-assessment" class="level4">
<h4 class="anchored" data-anchor-id="the-perils-of-pre-assessment">The Perils of Pre-Assessment</h4>
<p>Pre-assessments can scare novices away. Learners can feel that they are not “ready” for the curriculum. This selects for self-confidence more than prior knowledge and can disadvantage certain groups.</p>
</section>
<section id="the-problem-of-false-beginners" class="level4">
<h4 class="anchored" data-anchor-id="the-problem-of-false-beginners">The Problem of False Beginners</h4>
<p>Pre-assessments will poorly estimate learning pace. For instance, if you offer a course in building visualization with R to a group of experiences python programmers, the assessment results will show very little prior knowledge about R. However, if you mix those programming experts with complete novices, they will learn at a drastically different pace.</p>
</section>
</section>
<section id="avoid-a-deficit-model-for-teaching" class="level3">
<h3 class="anchored" data-anchor-id="avoid-a-deficit-model-for-teaching">Avoid a Deficit Model for Teaching</h3>
<p>Do not use a deficit model for teaching. Do not require the people who already have a hard path do more work to keep up. It is the duty of the privileged to do the extra work to lower the playing field.</p>
</section>
<section id="online-interaction-is-the-future" class="level3">
<h3 class="anchored" data-anchor-id="online-interaction-is-the-future">Online Interaction is the Future</h3>
<p>Wilson believes that online courses are a dead end. They are at best a refresher for those who already know the material. Instead, try to use the web as a medium for real-time interactions. The big divide between learning types is not on-line vs in-person. The divide is interactive versus recorded. Building places for interactive learning is the future.</p>
</section>
<section id="other-advice-from-wilson" class="level3">
<h3 class="anchored" data-anchor-id="other-advice-from-wilson">Other Advice From Wilson</h3>
<ol type="1">
<li>Be kind: all else is details.</li>
<li>Remember that you are not your learners…</li>
<li>…that most people would rather fail than change…</li>
<li>…and that ninety percent of magic consists of knowing one extra thing.</li>
<li>Never teach alone.</li>
<li>Never hesitate to sacrifice truth for clarity.</li>
<li>Make every mistake a lesson.</li>
<li>Remember that no lesson survives first contact with learners…</li>
<li>…that every lesson is too short for the teacher and too long for the learner…</li>
<li>…and that nobody will be more excited about the lesson than you are.</li>
</ol>
</section>
<section id="where-to-go-next" class="level3">
<h3 class="anchored" data-anchor-id="where-to-go-next">Where to Go Next</h3>
<ul>
<li><a href="https://www.amazon.com/Small-Teaching-Everyday-Lessons-Learning/dp/1118944496">Small Teaching by James Lang</a> - What should you do if you know what the right thing to do is, but you don’t have the time or the budge?</li>
<li><a href="https://www.amazon.com/Teaching-What-You-Dont-Know-ebook/dp/B003N18V04/">Teaching What You Don’t Know by Therese Huston</a> - How do you teach if you are only a page ahead of your learners?</li>
<li><a href="https://www.amazon.com/How-Learning-Happens-Educational-Psychology-ebook/dp/B084RNK2Z9/">How Learning Happens by Kirschner &amp; Hendrick</a> - An opinionated introduction to the current state of learning. Wilson doesn’t recommend reading this cover to cover but as a reference in how to get started in a new area.</li>
<li><a href="https://www.amazon.com/Discussion-Book-Great-People-Talking/dp/1119049717">The Discussion Book by Brookfield &amp; Preskill</a> - A catalog of 50 different techniques to get people sharing information and making decisions.</li>
<li><a href="https://teachtogether.tech/en/index.html">Teaching Tech Together by Greg Wilson</a> - Wilson’s collected lessons on teaching.</li>
</ul>


</section>
</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p><a href="https://www.rstudio.com/resources/webinars/what-every-data-scientist-should-know-about-education/">What Every Data Scientist Should Know About Education - RStudio</a>↩︎</p></li>
<li id="fn2"><p><a href="https://youtu.be/ewXvFQByRqY">What Everyone in Tech Should Know About Teaching and Learning</a>↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>learning</category>
  <guid>https://jameshwade.com/posts/2022-02-01_how-to-teach-tech.html</guid>
  <pubDate>Tue, 01 Feb 2022 00:00:00 GMT</pubDate>
  <media:content url="https://jameshwade.com/posts/images/classroom.png" medium="image" type="image/png" height="102" width="144"/>
</item>
<item>
  <title>Notes on learning</title>
  <dc:creator>James H Wade</dc:creator>
  <link>https://jameshwade.com/posts/2022-01-30_begining.html</link>
  <description><![CDATA[ 





<p>I have an idea for lowering the barrier to learn new R packages. I have built a few private packages, but nothing of mine fits the bill for what I want to try. There are uncountable resources for getting started with R, and I do not have much to add there. If you defied the odds and found this before you found <a href="https://stat545.com/">STAT 545</a>, I highly recommend that course. My idea is to integrate three fantastic tools for thought:</p>
<ul>
<li>Spaced repetition with <a href="https://withorbit.com/">Orbit</a> created by <a href="https://andymatuschak.org/">Andy Matuschak</a></li>
<li>Building teaching curriculum for data science <a href="https://www.rstudio.com/resources/webinars/what-every-data-scientist-should-know-about-education/">with personas, concept maps, and thoughtful assessments</a> as taught by Greg Wilson.</li>
<li>Documentation following the <a href="https://diataxis.fr/">diataxis framework</a> popularized by <a href="https://youtu.be/t4vKPhjcMZg">Daniele Procida</a></li>
</ul>
<p>I plan to start with <a href="https://shiny.rstudio.com/">shiny</a> and work my way up to <a href="https://golemverse.org/">golem</a>. Both packages have fantastic documentation already, and I will learn from and be inspired by that content as I go. I have maintained a minimal website for a few years, but I’ve never had something to show off there. I plan to create in the open so you can learn from my progress if not from the learning content I create.</p>



 ]]></description>
  <category>learning</category>
  <guid>https://jameshwade.com/posts/2022-01-30_begining.html</guid>
  <pubDate>Sun, 30 Jan 2022 00:00:00 GMT</pubDate>
  <media:content url="https://jameshwade.com/posts/images/manistee.jpg" medium="image" type="image/jpeg"/>
</item>
</channel>
</rss>
