<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Prismadic]]></title><description><![CDATA[AI research. We're building a future where intelligence creates wealth for everyone.]]></description><link>https://prismadic.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!zGlI!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf52411-e693-4f7a-87df-bea8917d97e4_1024x1024.png</url><title>Prismadic</title><link>https://prismadic.substack.com</link></image><generator>Substack</generator><lastBuildDate>Fri, 03 Apr 2026 18:37:41 GMT</lastBuildDate><atom:link href="https://prismadic.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Dylan]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[prismadic@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[prismadic@substack.com]]></itunes:email><itunes:name><![CDATA[Dylan]]></itunes:name></itunes:owner><itunes:author><![CDATA[Dylan]]></itunes:author><googleplay:owner><![CDATA[prismadic@substack.com]]></googleplay:owner><googleplay:email><![CDATA[prismadic@substack.com]]></googleplay:email><googleplay:author><![CDATA[Dylan]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Build a system for the future of AI research in 1 hour (Part 2, Control Plane & Memory)]]></title><description><![CDATA[In part 1 we built a cluster and enabled virtualized GPUs. In part 2 we will use this to power our systemic memory layer which LLMs can use to power the future of computing.]]></description><link>https://prismadic.substack.com/p/build-a-system-for-the-future-of</link><guid isPermaLink="false">https://prismadic.substack.com/p/build-a-system-for-the-future-of</guid><dc:creator><![CDATA[Dylan]]></dc:creator><pubDate>Thu, 14 Dec 2023 00:44:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!o_TL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2853650-441f-4498-b8c6-bb8d1e24e2d1.heic" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!o_TL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2853650-441f-4498-b8c6-bb8d1e24e2d1.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!o_TL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2853650-441f-4498-b8c6-bb8d1e24e2d1.heic 424w, https://substackcdn.com/image/fetch/$s_!o_TL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2853650-441f-4498-b8c6-bb8d1e24e2d1.heic 848w, https://substackcdn.com/image/fetch/$s_!o_TL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2853650-441f-4498-b8c6-bb8d1e24e2d1.heic 1272w, https://substackcdn.com/image/fetch/$s_!o_TL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2853650-441f-4498-b8c6-bb8d1e24e2d1.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!o_TL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2853650-441f-4498-b8c6-bb8d1e24e2d1.heic" width="462" height="462" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c2853650-441f-4498-b8c6-bb8d1e24e2d1.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:462,&quot;bytes&quot;:267927,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!o_TL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2853650-441f-4498-b8c6-bb8d1e24e2d1.heic 424w, https://substackcdn.com/image/fetch/$s_!o_TL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2853650-441f-4498-b8c6-bb8d1e24e2d1.heic 848w, https://substackcdn.com/image/fetch/$s_!o_TL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2853650-441f-4498-b8c6-bb8d1e24e2d1.heic 1272w, https://substackcdn.com/image/fetch/$s_!o_TL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2853650-441f-4498-b8c6-bb8d1e24e2d1.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">In a future where data is so central to how we build, systems will organize this data as a function of how it is acquired.</figcaption></figure></div><p>Since Part 1, we&#8217;ve established that we want to be modern AI engineers on a linux box which we&#8217;ve turned into a headless cluster able to serve any virtualized needs we can think up.</p><p>There is no way to navigate this potentially complex environment without going back to square one methodologically (i.e. entering commands for every little thing). It will require you accept some things and adapt to others and your mileage may vary.</p><ol><li><p>The future of computing shouldn&#8217;t <em>typically</em> be limited by the interfaces of today.</p></li><li><p>AI research is costly in time before it&#8217;s particularly costly in terms of money. Though one can lead to another, time is what we&#8217;re looking to reinvest.</p></li><li><p>The pace of progress is due to work ethic among many people, all around the world, in a field defined by the bleeding edge. <strong>Operating systems of today are not built for this.</strong></p></li></ol><p>Agree? &#129763; Then you&#8217;re still reading the right series! &#129309;</p><p> Let&#8217;s get back to building the future, with it in mind that these issues apply to every person hoping to build something with AI at a pace which dignifies their creativity and likely technical aptitude as well.</p><div><hr></div><h2>&#128002; Rancher Control Plane UI</h2><p>First, we need to install <code>helm</code>. It doesn&#8217;t get its own section because it&#8217;s not a concept on it&#8217;s own which needs explaining, the Kubernetes manifest packages <code>helm</code> calls <em>charts</em> is what you will fall asleep at your workstation over.</p><p>Let&#8217;s grab it, then install Rancher&#8217;s few dependencies.</p><pre><code><code>$ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
$ chmod 700 get_helm.sh
$ ./get_helm.sh</code></code></pre><pre><code>$ helm repo add rancher-stable https://releases.rancher.com/server-charts/stable</code></pre><p>Create a namespace in your local cluster separate from system manifests:</p><pre><code>$ k3s kubectl create ns cattle-system</code></pre><p>Install cert-manager, which makes TLS/HTTPS possible, then install.</p><pre><code>$ k3s kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.2/cert-manager.crds.yaml
$ helm repo add jetstack https://charts.jetstack.io
$ helm repo update
$ helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace</code></pre><p>Finally, it&#8217;s time to install Rancher.</p><pre><code>$ helm install rancher rancher-latest/rancher \
  --namespace cattle-system \
  --set hostname=YOUR_SSLIP_HOSTNAME \
  --set bootstrapPassword=admin</code></pre><p><code>YOUR_SSLIP_HOSTNAME</code> needs to be replace with the following:</p><p><code>&lt;your hosts local IP (not your external IP)&gt;.sslip.io</code></p><p>unless you are hoping to host this externally, in which case you absolutely can by instead entering the domain you hope you remotely connect to your host with via your own networking approach. Just be sure then to change the password to something substantially more secure.</p><p>Now, check Rancher&#8217;s deployment progress (and probably wait a bit).</p><pre><code>$ k3s kubectl -n cattle-system rollout status deploy/rancher</code></pre><p>When the table which prints out suggests the <code>rancher-</code> workload in particular is &#8220;Running&#8221;, you should see your new control plane interface at the hostname you gave it from a different machine on the local network.</p><p>Once you&#8217;ve logged in, you&#8217;ll be greeted with a less customized version of this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XNMW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa661fb9b-e544-45c1-b120-eaf6dedc0019.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XNMW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa661fb9b-e544-45c1-b120-eaf6dedc0019.heic 424w, https://substackcdn.com/image/fetch/$s_!XNMW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa661fb9b-e544-45c1-b120-eaf6dedc0019.heic 848w, https://substackcdn.com/image/fetch/$s_!XNMW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa661fb9b-e544-45c1-b120-eaf6dedc0019.heic 1272w, https://substackcdn.com/image/fetch/$s_!XNMW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa661fb9b-e544-45c1-b120-eaf6dedc0019.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XNMW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa661fb9b-e544-45c1-b120-eaf6dedc0019.heic" width="1456" height="772" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a661fb9b-e544-45c1-b120-eaf6dedc0019.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:772,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:81291,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XNMW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa661fb9b-e544-45c1-b120-eaf6dedc0019.heic 424w, https://substackcdn.com/image/fetch/$s_!XNMW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa661fb9b-e544-45c1-b120-eaf6dedc0019.heic 848w, https://substackcdn.com/image/fetch/$s_!XNMW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa661fb9b-e544-45c1-b120-eaf6dedc0019.heic 1272w, https://substackcdn.com/image/fetch/$s_!XNMW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa661fb9b-e544-45c1-b120-eaf6dedc0019.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">This is your new operating system&#8217;s UI. It is only an engineering dashboard for Kubernetes if you do not know what you want already. We do.</figcaption></figure></div><p>Congrats! This is an enviable development cluster UX already, but it needs workload context.</p><div><hr></div><h2>&#128170; GPU Workloads in Kubernetes</h2><p><strong>Kubernetes</strong> uses manifests made of YAML (Yaml Ain&#8217;t Markup Language)<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>.</p><p>For this reason, we must first install the Kubernetes <code>ResourceDefinitions</code> and <code>ResourceClasses</code> for CUDA-enabled workloads which require driver-level GPU access<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>.</p><p>As I said before, we no longer need the CLI.</p><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/002e4ddd-4ea8-4958-b31b-573d2dd34159_848x834.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/253488eb-857c-4993-a800-79f30c2cdbcf_588x412.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d44d95f4-6295-4307-b394-49e1487be71f_1504x1136.png&quot;}],&quot;caption&quot;:&quot;Add the two Nvidia helm chart repositories we will be using by following nav steps in this gallery.&quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/817df139-6bdb-40ee-b902-7577b1676a51_1456x474.png&quot;}},&quot;isEditorNode&quot;:true}"></div><p>1&#65039;&#8419; For the first repository, enter these details:</p><p><strong>name</strong>: <code>nvdp</code></p><p><strong>url</strong>: <code>https://nvidia.github.io/k8s-device-plugin</code></p><p>2&#65039;&#8419; And for the second, enter these:</p><p><strong>name</strong>: <code>nvgfd</code></p><p><strong>url</strong>: <code>https://nvidia.github.io/gpu-feature-discovery</code></p><p>Once you&#8217;re done adding the repositories, you will be able to find them in the Chart search.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Aciq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6a0de78-37ad-4f2e-968f-0272ffee5ba3.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Aciq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6a0de78-37ad-4f2e-968f-0272ffee5ba3.heic 424w, https://substackcdn.com/image/fetch/$s_!Aciq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6a0de78-37ad-4f2e-968f-0272ffee5ba3.heic 848w, https://substackcdn.com/image/fetch/$s_!Aciq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6a0de78-37ad-4f2e-968f-0272ffee5ba3.heic 1272w, https://substackcdn.com/image/fetch/$s_!Aciq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6a0de78-37ad-4f2e-968f-0272ffee5ba3.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Aciq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6a0de78-37ad-4f2e-968f-0272ffee5ba3.heic" width="1456" height="395" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f6a0de78-37ad-4f2e-968f-0272ffee5ba3.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:395,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:44300,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Aciq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6a0de78-37ad-4f2e-968f-0272ffee5ba3.heic 424w, https://substackcdn.com/image/fetch/$s_!Aciq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6a0de78-37ad-4f2e-968f-0272ffee5ba3.heic 848w, https://substackcdn.com/image/fetch/$s_!Aciq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6a0de78-37ad-4f2e-968f-0272ffee5ba3.heic 1272w, https://substackcdn.com/image/fetch/$s_!Aciq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6a0de78-37ad-4f2e-968f-0272ffee5ba3.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!15_w!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f9a4b6e-c129-4765-a3dc-10394de32970.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!15_w!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f9a4b6e-c129-4765-a3dc-10394de32970.heic 424w, https://substackcdn.com/image/fetch/$s_!15_w!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f9a4b6e-c129-4765-a3dc-10394de32970.heic 848w, https://substackcdn.com/image/fetch/$s_!15_w!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f9a4b6e-c129-4765-a3dc-10394de32970.heic 1272w, https://substackcdn.com/image/fetch/$s_!15_w!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f9a4b6e-c129-4765-a3dc-10394de32970.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!15_w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f9a4b6e-c129-4765-a3dc-10394de32970.heic" width="1456" height="379" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0f9a4b6e-c129-4765-a3dc-10394de32970.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:379,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:41302,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!15_w!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f9a4b6e-c129-4765-a3dc-10394de32970.heic 424w, https://substackcdn.com/image/fetch/$s_!15_w!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f9a4b6e-c129-4765-a3dc-10394de32970.heic 848w, https://substackcdn.com/image/fetch/$s_!15_w!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f9a4b6e-c129-4765-a3dc-10394de32970.heic 1272w, https://substackcdn.com/image/fetch/$s_!15_w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f9a4b6e-c129-4765-a3dc-10394de32970.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Go ahead and install both of these with the registered defaults. </p><p>Hard to tell what they&#8217;ve done when done. Here are the two namespaces the <strong>Nvidia</strong> <code>helm</code> charts made.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QqZv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf15134b-4e7d-4789-996d-066cfa76397a.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QqZv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf15134b-4e7d-4789-996d-066cfa76397a.heic 424w, https://substackcdn.com/image/fetch/$s_!QqZv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf15134b-4e7d-4789-996d-066cfa76397a.heic 848w, https://substackcdn.com/image/fetch/$s_!QqZv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf15134b-4e7d-4789-996d-066cfa76397a.heic 1272w, https://substackcdn.com/image/fetch/$s_!QqZv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf15134b-4e7d-4789-996d-066cfa76397a.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QqZv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf15134b-4e7d-4789-996d-066cfa76397a.heic" width="580" height="366" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/df15134b-4e7d-4789-996d-066cfa76397a.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:366,&quot;width&quot;:580,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19314,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QqZv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf15134b-4e7d-4789-996d-066cfa76397a.heic 424w, https://substackcdn.com/image/fetch/$s_!QqZv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf15134b-4e7d-4789-996d-066cfa76397a.heic 848w, https://substackcdn.com/image/fetch/$s_!QqZv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf15134b-4e7d-4789-996d-066cfa76397a.heic 1272w, https://substackcdn.com/image/fetch/$s_!QqZv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf15134b-4e7d-4789-996d-066cfa76397a.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">and in those namespaces are resources that thankfully, we don&#8217;t need to pay a lot of attention to for our purposes</figcaption></figure></div><p><br>Now that it&#8217;s possible to talk to our GPUs through the <strong>Kubernetes</strong> cluster control plane: it&#8217;s time to use them, and in a way we are familiar with.</p><p>Back to <strong>Jupyter</strong> Notebooks (nothing is lost in our goal to engineer a better system, it&#8217;s just better)!</p><div><hr></div><h2>&#128221; One to Jupyterhub</h2><p>We&#8217;re going to play on these words: <a href="https://z2jh.jupyter.org/en/stable/">Zero to Jupyterhub</a> because well, we&#8217;ve skipped ahead a bit.</p><p>Similar to how you added the <strong>Nvidia</strong> repositories, let&#8217;s configure the <code>jupyterhub</code> one. </p><p><strong>name</strong>: <code>jupyterhub</code></p><p><strong>url</strong>: <code>https://hub.jupyter.org/helm-chart/</code></p><p>And just like before, go ahead and install &#8212; <strong>WAIT</strong>. Don&#8217;t install just yet. </p><p>Make sure you have a chance to do two things suited to your requirements:</p><ol><li><p>create a namespace. </p></li><li><p>edit your <strong>JupyterHub</strong> <code>values.yaml</code>.</p></li></ol><p>Creating a namespace is easy. I&#8217;ll let you do that. </p><p>Your <strong>JupyterHub</strong> configuration is altered significantly in two different YAML keys.</p><p>First thing is to use the config block under hub to <a href="https://z2jh.jupyter.org/en/stable/jupyterhub/customizing/user-management.html#admin-users">set up authentication</a>. Easy enough, just have faith Kubernetes will know what to do with regard to the application&#8217;s image.</p><pre><code>hub:
  ...
  config:
    JupyterHub:
      admin_access: true
      authenticator_class: dummy
    Authenticator:
      admin_users:
        - &lt;your account name&gt;
    DummyAuthenticator:
      password: &lt;desired password&gt;</code></pre><p>Next is the image which is pulled then used as your <strong>JupyterLab</strong> environment. As you may know, we require <strong>CUDA</strong> be installed anywhere <code>torch</code> and the like will be loaded. We&#8217;re going to use a pre-built image that contains that work already, and extends it upon the Jupyter-single-user image that comes in the helm chart by default. </p><p>So in essence, just copying the official Jupyter image with CUDA pre-installed.</p><pre><code>singleuser:
  ...
  image:
    name: cogstacksystems/jupyter-singleuser-gpu
    pullPolicy: IfNotPresent
    pullSecrets: []
    tag: latest</code></pre><p>There! When it installs, it will use this image to spin up new <strong>JupyterLab</strong> environments.</p><p>One last thing, we&#8217;re going to make sure the <code>proxy-public</code> service the operator creates does not clash with the Rancher UI. This should be done through the UI.</p><p>Navigate to <code>Service Discovery</code> &#8594; <code>Services</code> &#8594; scroll down &#8594; click <code>proxy-public</code> &#8594; and swing over to the dropdown, click <code>Edit Config</code> where you will change the listening port to&#8230; something else. We&#8217;re going to use <code>8888</code>.</p><p>You can now use the hostname/IP of the cluster to reach JupyterLab on this port. </p><p>In a production environment, this would be handled by an ingress with a custom domain configuration such that you could deploy infinite services with associated subdomains. </p><div><hr></div><h2>Milvus</h2><p>We&#8217;re going to use a Kubernetes operator to construct our Milvus memory &#8220;subsystem&#8221;<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> which contains our environment&#8217;s virtual context.</p><p>I will need to beg an apology from the reader, you must return to the command line once more. But do so through Rancher nonetheless and enter in this command.</p><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9d7efb07-8e46-4ace-bb0d-fc154b33492e_460x454.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/be6c77da-e348-40a7-959b-40467ed9040b_674x356.png&quot;}],&quot;caption&quot;:&quot;&quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/116cf12d-5189-4738-82f4-6a73d0fa088a_1456x720.png&quot;}},&quot;isEditorNode&quot;:true}"></div><pre><code><code>$ helm install milvus-operator \
  -n milvus-operator --create-namespace \
  --wait --wait-for-jobs \
  https://github.com/zilliztech/milvus-operator/releases/download/v0.8.4/milvus-operator-0.8.4.tgz</code></code></pre><p>Check that the operator is installing accordingly:</p><pre><code>$ kubectl get pods -n milvus-operator</code></pre><p>While the operator is creating all of its necessary resources, one of them will be a CRD (custom resource definition) for Milvus clusters. This should take just a moment. Now, creating a new cluster to store information is as easy as saving the following as *<code>.yml</code>:</p><pre><code><code>apiVersion: milvus.io/v1beta1
kind: Milvus
metadata:
  name: llm-mem # feel free to edit this
  labels:
    app: milvus
spec:
  mode: cluster
  dependencies: {}
  components: {}
  config: {}</code></code></pre><p>with <code>kubectl</code> the same way we do the others:</p><pre><code>$ kubectl apply -f llm_mem.yml</code></pre><p>in order to validate that, we use the <code>CRD</code> in our <code>kubectl</code> command.</p><pre><code><code>$ kubectl get milvus &amp;&amp; kubectl get pods</code></code></pre><p><strong>NAME      MODE      STATUS    UPDATED   AGE</strong></p><p>llm-mem   cluster      Healthy      True              6m13s</p><p><code>&#8230;</code></p><p>Excellent! From there you should be able to see your <code>milvus</code> cluster booting up. This may also take a moment.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ozBe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe58cc4e-5185-42e1-9582-35e58c86b698_2204x1086.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ozBe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe58cc4e-5185-42e1-9582-35e58c86b698_2204x1086.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ozBe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe58cc4e-5185-42e1-9582-35e58c86b698_2204x1086.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ozBe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe58cc4e-5185-42e1-9582-35e58c86b698_2204x1086.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ozBe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe58cc4e-5185-42e1-9582-35e58c86b698_2204x1086.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ozBe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe58cc4e-5185-42e1-9582-35e58c86b698_2204x1086.jpeg" width="1456" height="717" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/be58cc4e-5185-42e1-9582-35e58c86b698_2204x1086.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:717,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:&quot;Image&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!ozBe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe58cc4e-5185-42e1-9582-35e58c86b698_2204x1086.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ozBe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe58cc4e-5185-42e1-9582-35e58c86b698_2204x1086.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ozBe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe58cc4e-5185-42e1-9582-35e58c86b698_2204x1086.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ozBe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe58cc4e-5185-42e1-9582-35e58c86b698_2204x1086.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">You are here, in the center.</figcaption></figure></div><p>This diagram<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a> is particularly oversimplified. Those &#8220;parse&#8221; arrows are doing a lot of heavy lifting here. Thankfully, we have a short answer in the form of <a href="https://github.com/prismadic/hygiene">hygiene</a>, our tool to convert API responses and payloads into compressed natural language structures (bullet pointed lists and the like, while saving ~15% tokens on average).</p><p>We also don&#8217;t have a traditional computing interface <em>to</em> parse any of this information in a contained, secure way that is transparent and easy to debug. </p><p>Instead, we have a clustered computing environment capable of that.</p><div><hr></div><h2>In Part 3, we will build a custom inference API for designing your own way around problems with data with LLMs&#9889;&#65039;</h2><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>I parenthesize the acronym because it&#8217;s declarative. You cannot &#8220;run&#8221; yaml files. They merely describe your desired state in the context of virtual workloads and related hassle like networking, storage, etc. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>Do these steps in order, they are synchronous.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>It&#8217;s not a subsystem at all. It&#8217;s just another operator, but this fits into the notion that LLMs can power a diverse range of interfaces and we won&#8217;t know until we treat each interaction as a thread with meta-characteristics important to future needs (not a big stretch seeing as most chatbots store history for a reason).</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>https://memgpt.ai</p></div></div>]]></content:encoded></item><item><title><![CDATA[Build a system for the future of AI research in an hour (Part 1, Dependencies)]]></title><description><![CDATA[Imagine a blank computing environment powered by LLMs. Let's build a cluster that closely resembles what that might look/feel like in the hopes of improvising from established patterns.]]></description><link>https://prismadic.substack.com/p/engineer-a-system-for-the-future</link><guid isPermaLink="false">https://prismadic.substack.com/p/engineer-a-system-for-the-future</guid><dc:creator><![CDATA[Dylan]]></dc:creator><pubDate>Wed, 13 Dec 2023 16:01:04 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!o281!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd902396d-10a1-48de-837b-b01863193cab.heic" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!o281!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd902396d-10a1-48de-837b-b01863193cab.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!o281!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd902396d-10a1-48de-837b-b01863193cab.heic 424w, https://substackcdn.com/image/fetch/$s_!o281!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd902396d-10a1-48de-837b-b01863193cab.heic 848w, https://substackcdn.com/image/fetch/$s_!o281!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd902396d-10a1-48de-837b-b01863193cab.heic 1272w, https://substackcdn.com/image/fetch/$s_!o281!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd902396d-10a1-48de-837b-b01863193cab.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!o281!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd902396d-10a1-48de-837b-b01863193cab.heic" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d902396d-10a1-48de-837b-b01863193cab.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:188757,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!o281!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd902396d-10a1-48de-837b-b01863193cab.heic 424w, https://substackcdn.com/image/fetch/$s_!o281!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd902396d-10a1-48de-837b-b01863193cab.heic 848w, https://substackcdn.com/image/fetch/$s_!o281!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd902396d-10a1-48de-837b-b01863193cab.heic 1272w, https://substackcdn.com/image/fetch/$s_!o281!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd902396d-10a1-48de-837b-b01863193cab.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">the goal is to create a new computing environment from an existing host which houses many LLMs regardless of their role(s) in research</figcaption></figure></div><p>The LLM space moves fast. As of writing this, most of the good, viral research is about one of two things: </p><ol><li><p>Emulating aspects of <strong>OpenAI&#8217;s</strong> <strong>ChatGPT</strong> architecture/performance</p></li><li><p>More efficient model paradigms</p></li></ol><p>Increasingly, peers I&#8217;ve spoken to place an emphasis on the sheer amount of testing and toying around with different solutions that is required. Not just A/B&#8217;ing models but the ability to control the underlying resources, fine-tuning from several versions of datasets at once, library version control, quantization comparison, the list goes on but it includes modularizing all of this. This doesn&#8217;t even address the time required to read the aforementioned research.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://prismadic.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Prismadic is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>As a data scientist-turned-engineer, my role is characterized by a focus on availability, monitoring, and observability in the most efficient and performant way(s). People in more sales-heavy parts of the tech industry refer to this as &#8220;AIOps&#8221; (as &#8220;ML&#8221; becomes less used every day). If we dig a little deeper into the patterns we&#8217;re able to see culturally, it&#8217;s very likely <a href="https://twitter.com/karpathy/status/1707437820045062561">Karpathy is onto something when he suggests LLMs may be a new computing paradigm on the kernel-level</a> (as he usually is). Let&#8217;s adopt this philosophy and think of what we are building as the bare minimum for quality research. This architecture will define a fluid LLM-based computing platform which you can use to create, host, understand, interpret any down-stream use case given what we understand about LLMs today (including the momentum of change in their research).</p><p>In this article, we&#8217;re going to look over what about implementing large language models is so time-consuming in early stages, and how to tick the important boxes we&#8217;d usually reserve for production deployments in a way that meets the pace of the experimentation phase. </p><p>In short, it&#8217;s time to move away from doing everything in a Jupyter Notebook &#128556; Sorry, but it&#8217;s slowing you down! </p><div><hr></div><h2>&#9879;&#65039;Toolkit</h2><p>To do that, we&#8217;re building a production-grade environment and cutting some corners to make the silhouette of our deployment overlap just our initial needs.</p><p>We&#8217;re getting help from several important architectural &amp; design choices.</p><ul><li><p><a href="https://github.com/substratusai/vllm-docker/blob/main/k8s-deployment.yaml">kubernetes</a></p><ul><li><p>clustering configuration engine for infrastructure</p></li><li><p>and we&#8217;ll use <code>k3s</code>, a lightweight version of Kubernetes that preserves production concepts</p></li></ul></li><li><p><a href="https://github.com/NVIDIA/nvidia-container-toolkit">nvidia-container-toolkit</a></p><ul><li><p>GPU-passthrough for docker images</p></li></ul></li><li><p><a href="https://github.com/vllm-project/vllm">vLLM</a></p><ul><li><p><a href="https://github.com/substratusai/vllm-docker/pull/5">the recent ability to use quantization flags in prebuilt images with CUDA 12.x installed on an Ubuntu host</a> makes this very straight-forward</p></li></ul></li><li><p><a href="https://github.com/milvus-io/milvus">milvus</a></p><ul><li><p>OSS, performant, scalable vector DB with great team(s) behind the product</p></li></ul></li><li><p><a href="https://github.com/Prismadic/magnet">magnet</a> &amp; <a href="https://github.com/Prismadic/hygiene">hygiene</a></p><ul><li><p>tools we made at <a href="https://prismadic.ai/developers">Prismadic</a> to help with LLM-specific tasks &amp; routines</p></li></ul></li><li><p><a href="https://haystack.deepset.ai/overview/quick-start">haystack</a></p><ul><li><p>similar to <em>langchain</em>, but includes well thought-out design patterns for scalability.</p></li></ul></li><li><p>CUDA 8-compatible GPUs</p><ul><li><p>The T4 found on AWS g4dn instances, as an example, works only up to CUDA 7.5. g5.xlarge is the most affordable type of instance that has GPUs capable of performing modern LLM research tasks available on AWS in regular quantities.</p></li></ul></li></ul><p>These pieces put together create an extremely configurable and adaptable environment to test <em>n</em> models for most downstream tasks quickly. </p><div><hr></div><h3>&#9881;&#65039; Configuring your host for virtualizing GPU workloads </h3><p>Assuming you have <strong>CUDA</strong> enabled on an Ubuntu 22 machine, let&#8217;s install some necessary components you might not have already as an ML researcher so that we can begin turning this computer into a zombie that exists just to serve LLMs.</p><pre><code>$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
$ echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list &amp;gt; /dev/null
$ sudo apt-get install apt-transport-https ca-certificates curl gnupg lsb-release -y
$ sudo apt-get update
$ sudo apt-get install docker-ce docker-ce-cli containerd.io -y
$ sudo usermod -aG docker $USER</code></pre><p>there, we&#8217;re just getting container and cluster virtualization tools then adding ourselves to the docker group, so moving on!</p><p>Next is to install <code>nvidia-container-toolkit</code></p><pre><code>$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt-get update &amp;&amp; sudo apt-get install -y nvidia-container-toolkit
$ sudo systemctl restart docker</code></pre><p>(it&#8217;s always handy to do a quick <code>nvidia-smi</code> at times like this to make sure your drivers and <strong>CUDA</strong> libs are working as expected)</p><p>when this is finished installing, you&#8217;ll be using a slightly modified <code>containerd</code> runtime in docker which allows you to take advantage of host GPU inside the <strong>docker</strong> image context. You can think of it as what hardware passthrough is to bare-metal VM environments.</p><h4>&#129514; Test your GPU with <code>vLLM</code></h4><p>Minor changes to existing tools go a long way to pave a path of least resistance.</p><p><strong>vLLM</strong> has good documentation for creating your own image but this requires that your <strong>CUDA</strong> version matches that of the <strong>PyTorch</strong> used in the image to build the vllm package from <code>pip</code> &#8212; which can present deployment hurdles on hosts where an existing <strong>CUDA</strong> installation exists which other development might already depend on. <code>conda</code> addresses some of this, but it&#8217;s not great for clusters like Kubernetes which puts us back to square one.</p><p>Instead of using their image, we&#8217;re going to <a href="https://github.com/substratusai/vllm-docker">grab a pre-built one</a> we&#8217;ve already contributed to prior to this article as to prepare for the quantization requirement of our deployment.</p><p>Since this is already addressed, it&#8217;s as simple as:</p><pre><code>$ sudo docker run -d -p 8080:8080 --gpus=all -e MODEL=TheBloke/Mistral-7B-Instruct-v0.1-AWQ -e QUANTIZATION=awq -e DTYPE=half ghcr.io/substratusai/vllm</code></pre><p><code>docker</code> will then:</p><ul><li><p>open <code>8080</code> to the application</p></li><li><p>use all of your GPUs which are discoverable by <strong>CUDA</strong></p><ul><li><p>distributed inference!</p></li></ul></li><li><p>assign to the server a quantized model from <strong>TheBloke</strong> on <strong>huggingface</strong></p></li><li><p>use the new flags for the image to appropriate a non-standard model</p></li></ul><p>(you may find some of this useful if you&#8217;d like to build a <code>vllm</code> image yourself instead, but there&#8217;s really no advantage to this as you&#8217;ll find out later when we deploy the python API)</p><p>You can delete this container now with <code>sudo docker container stop &lt;id&gt;</code> now as we&#8217;re going to deploy this as a pod with a <strong>Kubernetes</strong> manifest, which will enable reliable availability with little effort. The image is useful to k8s too though, so don&#8217;t remove it once built. </p><div><hr></div><h3>&#9784;&#65039; Kubernetes</h3><p><strong>Kubernetes</strong> eludes the research space and it&#8217;s no wonder why: it&#8217;s a lot of difficult abstraction. This article doesn&#8217;t beg the attention between building and philosophy though. Besides, for some reason it garners pretty intense reactions (and I&#8217;m not totally sure why&#8230;)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7EqY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f6a3161-31ae-4265-842d-8ad88e7a0bf6.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7EqY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f6a3161-31ae-4265-842d-8ad88e7a0bf6.heic 424w, https://substackcdn.com/image/fetch/$s_!7EqY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f6a3161-31ae-4265-842d-8ad88e7a0bf6.heic 848w, https://substackcdn.com/image/fetch/$s_!7EqY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f6a3161-31ae-4265-842d-8ad88e7a0bf6.heic 1272w, https://substackcdn.com/image/fetch/$s_!7EqY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f6a3161-31ae-4265-842d-8ad88e7a0bf6.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7EqY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f6a3161-31ae-4265-842d-8ad88e7a0bf6.heic" width="432" height="453.744966442953" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f6a3161-31ae-4265-842d-8ad88e7a0bf6.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1252,&quot;width&quot;:1192,&quot;resizeWidth&quot;:432,&quot;bytes&quot;:117253,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7EqY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f6a3161-31ae-4265-842d-8ad88e7a0bf6.heic 424w, https://substackcdn.com/image/fetch/$s_!7EqY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f6a3161-31ae-4265-842d-8ad88e7a0bf6.heic 848w, https://substackcdn.com/image/fetch/$s_!7EqY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f6a3161-31ae-4265-842d-8ad88e7a0bf6.heic 1272w, https://substackcdn.com/image/fetch/$s_!7EqY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f6a3161-31ae-4265-842d-8ad88e7a0bf6.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1Qf6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71679de7-35ab-44b7-8b0b-d57b2fd79102.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1Qf6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71679de7-35ab-44b7-8b0b-d57b2fd79102.heic 424w, https://substackcdn.com/image/fetch/$s_!1Qf6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71679de7-35ab-44b7-8b0b-d57b2fd79102.heic 848w, https://substackcdn.com/image/fetch/$s_!1Qf6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71679de7-35ab-44b7-8b0b-d57b2fd79102.heic 1272w, https://substackcdn.com/image/fetch/$s_!1Qf6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71679de7-35ab-44b7-8b0b-d57b2fd79102.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1Qf6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71679de7-35ab-44b7-8b0b-d57b2fd79102.heic" width="434" height="144.1762711864407" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/71679de7-35ab-44b7-8b0b-d57b2fd79102.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:392,&quot;width&quot;:1180,&quot;resizeWidth&quot;:434,&quot;bytes&quot;:28137,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1Qf6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71679de7-35ab-44b7-8b0b-d57b2fd79102.heic 424w, https://substackcdn.com/image/fetch/$s_!1Qf6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71679de7-35ab-44b7-8b0b-d57b2fd79102.heic 848w, https://substackcdn.com/image/fetch/$s_!1Qf6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71679de7-35ab-44b7-8b0b-d57b2fd79102.heic 1272w, https://substackcdn.com/image/fetch/$s_!1Qf6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71679de7-35ab-44b7-8b0b-d57b2fd79102.heic 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>In any case, let&#8217;s install <code>k3s</code> &amp; <code>kubectl</code> so that we aren&#8217;t left behind.</p><pre><code>$ curl -sfL https://get.k3s.io | sh -
$ curl -LO https://storage.googleapis.com/kubernetes-release/release/`curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt`/bin/linux/amd64/kubectl
$ chmod +x kubectl
$ sudo mv kubectl /usr/local/bin/</code></pre><p>You should see a healthy <strong>Kubernetes</strong> cluster info dump. </p><p>Next, to quickly deploy a quantized version of the <code>mistral-7b-instruct</code> model in your new <strong>Kubernetes</strong> cluster, create a file called <code>vllm.yaml</code> and in that file, paste the following:</p><pre><code>apiVersion: apps/v1
kind: Deployment
metadata:
  name: vllm
  labels:
    app: vllm
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vllm
  template:
    metadata:
      labels:
        app: vllm
    spec:
      containers:
      - name: vllm
        image: ghcr.io/substratusai/vllm:latest
        ports:
        - containerPort: 8080
        env:
        - name: MODEL
          value: "TheBloke/Mistral-7B-Instruct-v0.1-AWQ"
        - name: QUANTIZATION
          value: "awq"
        - name: DTYPE
          value: "half"
        volumeMounts:
        - mountPath: /dev/shm
          name: dshm
        readinessProbe:
          httpGet:
            path: /docs
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
        resources:
          limits:
            nvidia.com/gpu: "1" 
      volumes:
      - name: dshm
        emptyDir:
          medium: Memory
          sizeLimit: 1Gi</code></pre><p>note: we put &#8220;1&#8221; as the number of GPU in <code>resources.limits.nvidia.com/gpu</code> because each model will be designated to the VRAM we have allotted (and 14GB is required for the model without inference or embeddings from a vector DB in the memory!)</p><p>now, it&#8217;s as simple as:</p><pre><code>$ k3s kubectl apply -f vllm.yaml</code></pre><p>&#127881; You should now have a highly scalable, virtualized, containerized deployment for deploying not just quantized models, but almost any model on dedicated GPU compute (<strong>Nvidia</strong> has great documentation on slicing up GPU based on time allotments to different workloads, but this is outside the scope of this series &amp; indeed probably the needs of many individuals) </p><p>feel free to validate this with <code>kubectl get deployments</code></p><p><code>vllm</code> takes care of queuing, the <strong>OpenAI</strong> API cloning, pretty much everything you need to get started and worry less about the compute-heavy parts of the work. So in that spirit, we aren&#8217;t going to worry about it so much (but their documentation is great and the next tool we&#8217;re about to get prepared is coupled nicely with <code>vllm</code> specifically!)</p><p>Okay, now remove it with:</p><pre><code>$ k3s kubectl delete -f vllm.yaml</code></pre><div><hr></div><h3>Because in part &#9996;&#65039; of this series, we&#8217;re going to create our &#8220;Control Panel&#8221; of our new LLM-chauvinist OS concept. </h3><p>(it&#8217;s literally a control plane for the cluster with some tweaks)</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://prismadic.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Prismadic is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>