DEV Community: Clelia (Astra) Bertelli The latest articles on DEV Community by Clelia (Astra) Bertelli (@astrabert). https://dev.to/astrabert https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1339433%2F33cd77c9-958e-45a5-b258-8800a93a6475.png DEV Community: Clelia (Astra) Bertelli https://dev.to/astrabert en Build Code-RAGent, an agent for your codebase Clelia (Astra) Bertelli Tue, 29 Apr 2025 20:20:28 +0000 https://dev.to/astrabert/build-code-ragent-an-agent-for-your-codebase-5dh8 https://dev.to/astrabert/build-code-ragent-an-agent-for-your-codebase-5dh8 <h2> Introduction </h2> <p>Recently, I've been hooked up with automating data ingestion into vector databases, and I came up with <a href="proxy.php?url=https://github.com/AstraBert/ingest-anything" rel="noopener noreferrer"><code>ingest-anything</code></a>, which I talked about in my <a href="proxy.php?url=https://dev.to/astrabert/ingest-almost-any-non-pdf-document-in-a-vector-database-effortlessly-547c">last post</a>.<br> After <a href="proxy.php?url=https://chonkie.ai" rel="noopener noreferrer">Chonkie</a> released <a href="proxy.php?url=https://chonkie.mintlify.app/chunkers/code-chunker" rel="noopener noreferrer"><code>CodeChunker</code></a>, I decided to include code ingestion within <code>ingest-anything</code>, and you can read about it in this LinkedIn post, where I announced the new release:</p> <p><a href="proxy.php?url=https://www.linkedin.com/feed/update/urn:li:activity:7322236934569775104/" rel="noopener noreferrer"><img src="proxy.php?url=https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn5raouf12c67d3jb7cfr.png" alt="image" width="800" height="991"></a></p> <p>The only thing left to do then was to build something that could showcase the power of code ingestion within a vector database, and it immediately clicked in my mind: "Why don't I ingest my entire codebase of solved Go exercises from <a href="proxy.php?url=https://exercism.org" rel="noopener noreferrer">Exercism</a>?"<br> That's how I created <a href="proxy.php?url=https://github.com/AstraBert/code-ragent" rel="noopener noreferrer">Code-RAGent</a>, your friendly coding assistant based on your personal codebases and grounded in web search. It is built on top of GPT-4.1, powered by <a href="proxy.php?url=https://openai.com" rel="noopener noreferrer">OpenAI</a>, <a href="proxy.php?url=https://linkup.so" rel="noopener noreferrer">LinkUp</a>, <a href="proxy.php?url=https://www.llamaindex.ai" rel="noopener noreferrer">LlamaIndex</a>, <a href="proxy.php?url=https://qdrant.tech" rel="noopener noreferrer">Qdrant</a>, <a href="proxy.php?url=https://fastapi.tiangolo.com" rel="noopener noreferrer">FastAPI</a> and <a href="proxy.php?url=https://streamlit.io" rel="noopener noreferrer">Streamlit</a>. <br> The building of this project was aimed at providing a reproducible and adaptable agent, that people can therefore customize based on their needs, and it was composed of three phases:</p> <ul> <li>Environment setup</li> <li>Data preparation and ingestion</li> <li>Agent workflow design</li> </ul> <h2> Environment Setup </h2> <p>I personally like setting up my environment using <a href="proxy.php?url=https://docs.conda.io/projects/conda/en/latest/index.html" rel="noopener noreferrer">conda</a>, also because it's easily dockerizable, so we'll follow this path:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>conda create <span class="nt">-y</span> <span class="nt">-n</span> code-ragent <span class="nv">python</span><span class="o">=</span>3.11 <span class="c"># you don't necessarily need to specify 3.11, it's for reproducibility purposes</span> conda activate code-ragent </code></pre> </div> <p>Now let's install all the needed packages within our environment:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>python3 <span class="nt">-m</span> pip <span class="nb">install </span>ingest-anything streamlit </code></pre> </div> <p><code>ingest-anything</code> already wraps all the packages that we need to get our Code-RAGent up and running, we just need to add <code>streamlit</code>, which we'll use to create the frontend.</p> <p>Let's also get a Qdrant instance, as a vector database, locally using Docker:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker run <span class="nt">-p</span> 6333:6333 <span class="nt">-p</span> 6334:6334 qdrant/qdrant:latest </code></pre> </div> <h2> Data ingestion </h2> <p>The starting data, as I said earlier, will be my <a href="proxy.php?url=https://github.com/AstraBert/learning-go" rel="noopener noreferrer">learning-go</a> repository, that contains solved Go exercises coming from Exercism. We can get the repository by cloning it:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>git clone https://github.com/AstraBert/learning-go </code></pre> </div> <p>And now we can get all the Go files contained in it, in our python scripts, as follows:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight python"><code><span class="kn">import</span> <span class="n">os</span> <span class="n">files</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">for</span> <span class="n">root</span><span class="p">,</span> <span class="n">_</span><span class="p">,</span> <span class="n">fls</span> <span class="ow">in</span> <span class="n">os</span><span class="p">.</span><span class="nf">walk</span><span class="p">(</span><span class="sh">"</span><span class="s">./learning-go</span><span class="sh">"</span><span class="p">):</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">fls</span><span class="p">:</span> <span class="k">if</span> <span class="n">f</span><span class="p">.</span><span class="nf">endswith</span><span class="p">(</span><span class="sh">"</span><span class="s">.go</span><span class="sh">"</span><span class="p">):</span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="n">files</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="nf">join</span><span class="p">(</span><span class="n">root</span><span class="p">,</span> <span class="n">f</span><span class="p">))</span> </code></pre> </div> <p>Now let's ingest all the files with <code>ingest-anything</code>:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight python"><code><span class="kn">from</span> <span class="n">ingest_anything.ingestion</span> <span class="kn">import</span> <span class="n">IngestCode</span> <span class="kn">from</span> <span class="n">qdrant_client</span> <span class="kn">import</span> <span class="n">QdrantClient</span><span class="p">,</span> <span class="n">AsyncQdrantClient</span> <span class="n">client</span> <span class="o">=</span> <span class="nc">QdrantClient</span><span class="p">(</span><span class="sh">"</span><span class="s">http://localhost:6333</span><span class="sh">"</span><span class="p">)</span> <span class="n">aclient</span> <span class="o">=</span> <span class="nc">AsyncQdrantClient</span><span class="p">(</span><span class="sh">"</span><span class="s">http://localhost:6333</span><span class="sh">"</span><span class="p">)</span> <span class="n">ingestor</span> <span class="o">=</span> <span class="nc">IngestCode</span><span class="p">(</span><span class="n">qdrant_client</span><span class="o">=</span><span class="n">client</span><span class="p">,</span> <span class="n">async_qdrant_client</span><span class="o">=</span><span class="n">aclient</span><span class="p">,</span> <span class="n">collection_name</span><span class="o">=</span><span class="sh">"</span><span class="s">go-code</span><span class="sh">"</span><span class="p">,</span><span class="n">hybrid_search</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> <span class="n">vector_index</span> <span class="o">=</span> <span class="n">ingestor</span><span class="p">.</span><span class="nf">ingest</span><span class="p">(</span><span class="n">files</span><span class="o">=</span><span class="n">files</span><span class="p">,</span> <span class="n">embedding_model</span><span class="o">=</span><span class="sh">"</span><span class="s">Shuu12121/CodeSearch-ModernBERT-Owl</span><span class="sh">"</span><span class="p">,</span> <span class="n">language</span><span class="o">=</span><span class="sh">"</span><span class="s">go</span><span class="sh">"</span><span class="p">)</span> </code></pre> </div> <p>And this is it: the collection <code>go-code</code> is now set up and available for search within Qdrant, so we can get our hands actually on agent workflow design.</p> <h2> Agent workflow design </h2> <p>This is a visualization of Code-RAGent workflow:</p> <p><a href="proxy.php?url=https://github.com/AstraBert/code-ragent" rel="noopener noreferrer"><img src="proxy.php?url=https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5npo17mpuz5lh112fint.png" alt="workflow" width="800" height="450"></a><br> We won't see the details of the code here, just high-level concepts, but you can find everything in <a href="proxy.php?url=https://github.com/AstraBert/code-ragent/blob/main/agent.py" rel="noopener noreferrer">the GitHub repo</a>.</p> <h3> 1. Tools </h3> <p>We need three main tools:</p> <ul> <li> <strong><code>vector_search_tool</code></strong> that searches the vector database, using a <a href="proxy.php?url=https://docs.llamaindex.ai/en/stable/module_guides/deploying/query_engine/" rel="noopener noreferrer">LlamaIndex Query Engine</a>, that first produces a hypothetical document embedding (<a href="proxy.php?url=https://medium.aiplanet.com/advanced-rag-improving-retrieval-using-hypothetical-document-embeddings-hyde-1421a8ec075a" rel="noopener noreferrer">HyDE</a>) and then matches it with the database using hybrid retrieval, producing a final summary response.</li> <li> <code>web_search_tool</code> that can ground solutions in web search: we exploit <a href="proxy.php?url=https://linkup.so" rel="noopener noreferrer">Linkup</a>, and we format the search results in such a way that the tool always produces a code explanation and, when necessary, a code snippet.</li> <li> <code>evaluate_response</code> that can give a correctness, faithfulness and relevancy score to the agent's final response based on the original user query and on the retrieved context (either from the web or from vector search). For this purpose, we use <a href="proxy.php?url=https://docs.llamaindex.ai/en/stable/optimizing/evaluation/evaluation/" rel="noopener noreferrer">LlamaIndex evaluators</a> </li> </ul> <h3> 2. Designing and serving the agent </h3> <p>We use a simple and straightforward <a href="proxy.php?url=https://docs.llamaindex.ai/en/stable/examples/agent/agent_workflow_basic/" rel="noopener noreferrer">Function Calling Agent</a> within the Agent Workflow module in LlamaIndex, and we give the agent access to all the tools designed at point (1).</p> <p>Now, it's just a matter of deploying the agent on an API endpoint, making it available to the frontend portion of our application: we do it via <a href="proxy.php?url=https://fastapi.tiangolo.com" rel="noopener noreferrer">FastAPI</a>, serving the agent under the <code>/chat</code> POST endpoint. </p> <h3> 3. User Interface </h3> <p>The UI, written with Streamlit, can be set up like this:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight python"><code><span class="kn">import</span> <span class="n">streamlit</span> <span class="k">as</span> <span class="n">st</span> <span class="kn">import</span> <span class="n">requests</span> <span class="k">as</span> <span class="n">rq</span> <span class="kn">from</span> <span class="n">pydantic</span> <span class="kn">import</span> <span class="n">BaseModel</span> <span class="k">class</span> <span class="nc">ApiInput</span><span class="p">(</span><span class="n">BaseModel</span><span class="p">):</span> <span class="err"> </span> <span class="err"> </span> <span class="n">prompt</span><span class="p">:</span> <span class="nb">str</span> <span class="k">def</span> <span class="nf">get_chat</span><span class="p">(</span><span class="n">prompt</span><span class="p">:</span> <span class="nb">str</span><span class="p">):</span> <span class="err"> </span> <span class="err"> </span> <span class="n">response</span> <span class="o">=</span> <span class="n">rq</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="sh">"</span><span class="s">http://backend:8000/chat/</span><span class="sh">"</span><span class="p">,</span> <span class="n">json</span><span class="o">=</span><span class="nc">ApiInput</span><span class="p">(</span><span class="n">prompt</span><span class="o">=</span><span class="n">prompt</span><span class="p">).</span><span class="nf">model_dump</span><span class="p">())</span> <span class="err"> </span> <span class="err"> </span> <span class="n">actual_res</span> <span class="o">=</span> <span class="n">response</span><span class="p">.</span><span class="nf">json</span><span class="p">()[</span><span class="sh">"</span><span class="s">response</span><span class="sh">"</span><span class="p">]</span> <span class="err"> </span> <span class="err"> </span> <span class="n">actual_proc</span> <span class="o">=</span> <span class="n">response</span><span class="p">.</span><span class="nf">json</span><span class="p">()[</span><span class="sh">"</span><span class="s">proces</span><span class="sh">"</span><span class="p">]</span> <span class="err"> </span> <span class="err"> </span> <span class="k">return</span> <span class="n">actual_res</span><span class="p">,</span> <span class="n">actual_proc</span> <span class="n">st</span><span class="p">.</span><span class="nf">title</span><span class="p">(</span><span class="sh">"</span><span class="s">Code RAGent💻</span><span class="sh">"</span><span class="p">)</span> <span class="k">if</span> <span class="sh">"</span><span class="s">messages</span><span class="sh">"</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">st</span><span class="p">.</span><span class="n">session_state</span><span class="p">:</span> <span class="err"> </span> <span class="err"> </span> <span class="n">st</span><span class="p">.</span><span class="n">session_state</span><span class="p">.</span><span class="n">messages</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">for</span> <span class="n">message</span> <span class="ow">in</span> <span class="n">st</span><span class="p">.</span><span class="n">session_state</span><span class="p">.</span><span class="n">messages</span><span class="p">:</span> <span class="err"> </span> <span class="err"> </span> <span class="k">with</span> <span class="n">st</span><span class="p">.</span><span class="nf">chat_message</span><span class="p">(</span><span class="n">message</span><span class="p">[</span><span class="sh">"</span><span class="s">role</span><span class="sh">"</span><span class="p">]):</span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="n">st</span><span class="p">.</span><span class="nf">markdown</span><span class="p">(</span><span class="n">message</span><span class="p">[</span><span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">])</span> <span class="k">if</span> <span class="n">prompt</span> <span class="p">:</span><span class="o">=</span> <span class="n">st</span><span class="p">.</span><span class="nf">chat_input</span><span class="p">(</span><span class="sh">"</span><span class="s">What is up?</span><span class="sh">"</span><span class="p">):</span> <span class="err"> </span> <span class="err"> </span> <span class="n">st</span><span class="p">.</span><span class="n">session_state</span><span class="p">.</span><span class="n">messages</span><span class="p">.</span><span class="nf">append</span><span class="p">({</span><span class="sh">"</span><span class="s">role</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">user</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">:</span> <span class="n">prompt</span><span class="p">})</span> <span class="err"> </span> <span class="err"> </span> <span class="k">with</span> <span class="n">st</span><span class="p">.</span><span class="nf">chat_message</span><span class="p">(</span><span class="sh">"</span><span class="s">user</span><span class="sh">"</span><span class="p">):</span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="n">st</span><span class="p">.</span><span class="nf">markdown</span><span class="p">(</span><span class="n">prompt</span><span class="p">)</span> <span class="err"> </span> <span class="err"> </span> <span class="k">with</span> <span class="n">st</span><span class="p">.</span><span class="nf">chat_message</span><span class="p">(</span><span class="sh">"</span><span class="s">assistant</span><span class="sh">"</span><span class="p">):</span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="n">stream</span><span class="p">,</span> <span class="n">proc</span> <span class="o">=</span> <span class="nf">get_chat</span><span class="p">(</span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="n">prompt</span><span class="o">=</span><span class="n">st</span><span class="p">.</span><span class="n">session_state</span><span class="p">.</span><span class="n">messages</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">][</span><span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">],</span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="p">)</span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="n">response</span> <span class="o">=</span> <span class="n">st</span><span class="p">.</span><span class="nf">write</span><span class="p">(</span><span class="n">stream</span><span class="p">)</span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="n">st</span><span class="p">.</span><span class="n">session_state</span><span class="p">.</span><span class="n">messages</span><span class="p">.</span><span class="nf">append</span><span class="p">({</span><span class="sh">"</span><span class="s">role</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">assistant</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">:</span> <span class="n">stream</span><span class="p">})</span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="k">with</span> <span class="n">st</span><span class="p">.</span><span class="nf">expander</span><span class="p">(</span><span class="sh">"</span><span class="s">See Agentic Process</span><span class="sh">"</span><span class="p">):</span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="err"> </span> <span class="n">st</span><span class="p">.</span><span class="nf">write</span><span class="p">(</span><span class="n">proc</span><span class="p">)</span> </code></pre> </div> <p>And will result in something like this:</p> <p><a href="proxy.php?url=https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fug1tnjfwi5h2ezglr7fk.png" class="article-body-image-wrapper"><img src="proxy.php?url=https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fug1tnjfwi5h2ezglr7fk.png" alt="UI" width="800" height="923"></a></p> <p>Clean and simple!</p> <h2> Conclusion </h2> <p>To wrap this article up, let me just highlight three main points that have the potential to make Code-RAGent a very good codebase assistant:</p> <ul> <li>The codebase is ingested with a dedicated pipeline, using a special chunking for code, as well as a dense embedding model finetuned for code retrieval</li> <li>The agent can fall back on the web search whenever the information you ask for is outside of the scope of your ingested codebase</li> <li>It evaluates the responses it produces</li> </ul> <p>That being said, this is just a tutorial-ready agentic system, far from being perfect, so if you have any feedback or suggestion just let me know! ✨</p> ai python vectordatabase rag Ingest (almost) any non-PDF document in a vector database, effortlessly Clelia (Astra) Bertelli Fri, 25 Apr 2025 17:32:51 +0000 https://dev.to/astrabert/ingest-almost-any-non-pdf-document-in-a-vector-database-effortlessly-547c https://dev.to/astrabert/ingest-almost-any-non-pdf-document-in-a-vector-database-effortlessly-547c <p>One of my areas of focus, recently, has been the development of a universal and zero-effort way of converting text-based documents (and even images) into PDF files, so that they could fit into my RAG pipelines, that are optimized for that format. In the end, after almost 30 "This is the last <code>git commit</code>", I came up with <a href="proxy.php?url=https://github.com/AstraBert/PdfItDown" rel="noopener noreferrer">PdfItDown</a>, a python package capable of transforming the most commonly used file formats into PDF, and it can do so with single or multiple files (and even entire folders!). <br> After that, tho, I wasn't satisfied: converting files to PDF is ok, but they're unplugged from the main <em>ingest-into-DB</em> pipeline, which might be still a lot of effort to design and optimize. Then it came the idea: why don't I create a standardized, simple and yet powerful, fully-automated procedure to go from a non-PDF file to vector data loaded into a database?<br> The tools were already there:</p> <ul> <li>PdfItDown can handle file transformation</li> <li> <a href="proxy.php?url=//https//www%20llamaindex.ai">LlamaIndex</a> has the readers to turn PDFs into text files</li> <li> <a href="proxy.php?url=https://docs.chonkie.ai" rel="noopener noreferrer">Chonkie</a> offers a versatile and mighty chunking toolbox</li> <li> <a href="proxy.php?url=https://sbert.net" rel="noopener noreferrer">Sentence Transformers</a> are a widely use embeddings library that could provide text encoders</li> <li> <a href="proxy.php?url=https://qdrant.tech" rel="noopener noreferrer">Qdrant</a> is an easy-to-set-up, highly performing and scalable vector database, that offers numerous functionalities (among which hybrid search and metadata filtering).</li> </ul> <blockquote> <p><em>What's even better? All these tools are open source!🎉</em></p> </blockquote> <p>So it was just a matter of combining them - and that's how <a href="proxy.php?url=https://github.com/AstraBert/ingest-anything" rel="noopener noreferrer">ingest-anything</a> came to life:</p> <p><a href="proxy.php?url=https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj9r2un44yzcrnykpszc5.png" class="article-body-image-wrapper"><img src="proxy.php?url=https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj9r2un44yzcrnykpszc5.png" alt="ingest-anything workflow" width="800" height="436"></a></p> <p>Simple, elegant and all-in-one!</p> <p>Let's see how we can use it to ingest files:</p> <ol> <li>We install it: </li> </ol> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>pip <span class="nb">install </span>ingest-anything <span class="c"># or, if you prefer a faster installation</span> uv pip <span class="nb">install </span>ingest-anything </code></pre> </div> <ol> <li>We set up a local Qdrant instance with Docker: </li> </ol> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker run <span class="nt">-p</span> 6333:6333 <span class="nt">-p</span> 6334:6334 qdrant/qdrant:latest </code></pre> </div> <ol> <li>We initialize the ingestor: </li> </ol> <div class="highlight js-code-highlight"> <pre class="highlight python"><code><span class="kn">from</span> <span class="n">ingest_anything.ingestionn</span> <span class="kn">import</span> <span class="n">IngestAnything</span><span class="p">,</span> <span class="n">QdrantClient</span><span class="p">,</span> <span class="n">AsyncQdrantClient</span> <span class="n">ingestor</span> <span class="o">=</span> <span class="nc">IngestAnything</span><span class="p">(</span><span class="n">qdrant_client</span> <span class="o">=</span> <span class="nc">QdrantClient</span><span class="p">(</span><span class="sh">"</span><span class="s">http://localhost:6333</span><span class="sh">"</span><span class="p">),</span> <span class="n">async_qdrant_client</span> <span class="o">=</span> <span class="nc">AsyncQdrantClient</span><span class="p">(</span><span class="sh">"</span><span class="s">http://localhost:6333</span><span class="sh">"</span><span class="p">),</span> <span class="n">collection_name</span> <span class="o">=</span> <span class="sh">"</span><span class="s">flowers</span><span class="sh">"</span><span class="p">,</span> <span class="n">hybrid_search</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> </code></pre> </div> <ol> <li>We ingest our files...: </li> </ol> <div class="highlight js-code-highlight"> <pre class="highlight python"><code><span class="n">ingestor</span><span class="p">.</span><span class="nf">ingest</span><span class="p">(</span><span class="n">chunker</span><span class="o">=</span><span class="sh">"</span><span class="s">late</span><span class="sh">"</span><span class="p">,</span> <span class="n">files_or_dir</span><span class="o">=</span><span class="p">[</span><span class="sh">'</span><span class="s">tests/data/test.docx</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">tests/data/test0.png</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">tests/data/test1.csv</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">tests/data/test2.json</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">tests/data/test3.md</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">tests/data/test4.xml</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">tests/data/test5.zip</span><span class="sh">'</span><span class="p">],</span> <span class="n">embedding_model</span><span class="o">=</span><span class="sh">"</span><span class="s">sentence-transformers/all-MiniLM-L6-v2</span><span class="sh">"</span><span class="p">)</span> </code></pre> </div> <ol> <li>Or an entire directory! </li> </ol> <div class="highlight js-code-highlight"> <pre class="highlight python"><code><span class="c1"># with a directory </span><span class="n">ingestor</span><span class="p">.</span><span class="nf">ingest</span><span class="p">(</span><span class="n">chunker</span><span class="o">=</span><span class="sh">"</span><span class="s">token</span><span class="sh">"</span><span class="p">,</span> <span class="n">files_or_dir</span><span class="o">=</span><span class="sh">"</span><span class="s">tests/data</span><span class="sh">"</span><span class="p">,</span> <span class="n">tokenizer</span><span class="o">=</span><span class="sh">"</span><span class="s">gpt2</span><span class="sh">"</span><span class="p">,</span> <span class="n">embedding_model</span><span class="o">=</span><span class="sh">"</span><span class="s">sentence-transformers/all-MiniLM-L6-v2</span><span class="sh">"</span><span class="p">)</span> </code></pre> </div> <p>And we're done! In three lines of code we've ingested all non-PDF files (in a list or in a directory) into a Qdrant collection, which we can now query for RAG purposes!</p> <p>As you can see, you can act on several levels, customizing the embedding model, the chunking method (check out <a href="proxy.php?url=https://docs.chonkie.ai/chunkers/overview" rel="noopener noreferrer">Chonkie docs</a> for this), the tokenizer (when necessary), a lot of chunking parameters (that you can optionally set or leave as default), and you can also turn on and off the hybrid search, (optionally) choosing the sparse model to use among the ones available through FastEmbed.</p> <p>So what are you waiting? Grab your PC and try this out: I can guarantee that speedrunning effortlessly through documents ingestion in a vector DB is highly satisfying (and addictive)!</p> python ai opensource vectordatabase 1minDocker #14 - Deploy an AI app with Docker on cloud Clelia (Astra) Bertelli Thu, 06 Mar 2025 22:59:09 +0000 https://dev.to/astrabert/1mindocker-14-deploy-an-ai-app-with-docker-on-cloud-h27 https://dev.to/astrabert/1mindocker-14-deploy-an-ai-app-with-docker-on-cloud-h27 <p>In the <a href="proxy.php?url=https://dev.to/astrabert/1mindocker-13-push-build-and-dockerize-with-github-actions-52gb">last article</a> we dove into the world of continuous integration with GitHub Actions. Now it's time to take a step forward and to talk about deploying a Docker application and make it available to everyone. To do this, we could exploit a local server, but local servers are usually costly to set up, initialize and maintain (on the long run): cloud solutions are, on the other hand, simpler and faster to boot and set up, especially the one we are going to use for this tutorial, <a href="proxy.php?url=https://www.linode.com/" rel="noopener noreferrer"><strong>Linode</strong></a></p> <h2> Step 1: your Linode instance </h2> <p>Setting up a Linode instance couldn't be easier: you just need to sign up or log in to <a href="proxy.php?url=https://www.linode.com/" rel="noopener noreferrer">Linode</a>.</p> <p>Once you land into your dashboard, you just need to click on <code>Create</code> (green button on the top left corner) -&gt; <code>Linode</code> . Then you will be prompted to select the settings of your instance (operating system, region, name, root password and, eventually, an SSH key). The set up is extremely intuitive and, for our application, I'd suggest:</p> <ul> <li>Choose Ubuntu 22.04 as OS</li> <li>Choose a 2GB RAM - 1 vCPU hardware</li> <li>Choose the region that is closer to you</li> <li>Choose a strong password for your root user</li> </ul> <p>Once you are set up and your instance is booted and running, you can connect to it from your terminal. Regardless that you are on Windows, macOS or Linux (although I prefer the last one), you can simply use the SSH protocol and authenticate with the root password. To do so, you need to get the public IP address of your Linode (which will be useful also later) - you can comfortably find it in you dashboard.<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>ssh root@&lt;PUBLIC-IP-ADDRESS&gt; </code></pre> </div> <p>You'll be prompted to input the password and, after that, you'll be finally inside your Linode's terminal!</p> <h2> Step 2: preparing your Linode for the application </h2> <p>Since we want to deploy our application with Docker, we need to install it within our Linode virtual machine.</p> <p>If you followed my advice and you created an Ubuntu 22.04 machine, you can simply try these commands, that you can also find on the official <a href="proxy.php?url=https://docs.docker.com/engine/install/ubuntu/" rel="noopener noreferrer">Docker installation page</a>:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code><span class="c"># Add Docker's official GPG key:</span> <span class="nb">sudo </span>apt-get update <span class="nb">sudo </span>apt-get <span class="nb">install </span>ca-certificates curl <span class="nb">sudo install</span> <span class="nt">-m</span> 0755 <span class="nt">-d</span> /etc/apt/keyrings <span class="nb">sudo </span>curl <span class="nt">-fsSL</span> https://download.docker.com/linux/ubuntu/gpg <span class="nt">-o</span> /etc/apt/keyrings/docker.asc <span class="nb">sudo chmod </span>a+r /etc/apt/keyrings/docker.asc <span class="c"># Add the repository to Apt sources:</span> <span class="nb">echo</span> <span class="se">\</span> <span class="s2">"deb [arch=</span><span class="si">$(</span>dpkg <span class="nt">--print-architecture</span><span class="si">)</span><span class="s2"> signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu </span><span class="se">\</span><span class="s2"> </span><span class="si">$(</span><span class="nb">.</span> /etc/os-release <span class="o">&amp;&amp;</span> <span class="nb">echo</span> <span class="s2">"</span><span class="k">${</span><span class="nv">UBUNTU_CODENAME</span><span class="k">:-</span><span class="nv">$VERSION_CODENAME</span><span class="k">}</span><span class="s2">"</span><span class="si">)</span><span class="s2"> stable"</span> | <span class="se">\</span> <span class="nb">sudo tee</span> /etc/apt/sources.list.d/docker.list <span class="o">&gt;</span> /dev/null <span class="nb">sudo </span>apt-get update </code></pre> </div> <p>After this, run:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code><span class="nb">sudo </span>apt-get <span class="nb">install </span>docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin </code></pre> </div> <p>Test the successful installation with:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code><span class="nb">sudo </span>docker run hello-world </code></pre> </div> <p>And BAM! You installed Docker🐋</p> <h2> Step 3: Get the application </h2> <p>For this tutorial, I already prepared an application for you: it's called <strong>SciNewsBot</strong> and it's a BlueSky bot which publishes daily science news from trusted publishers. </p> <p>SciNewsBot exploits <a href="proxy.php?url=https://mistral.ai/" rel="noopener noreferrer">Mistral AI</a> to summarizes into an effective and catchy headlines the titles and content of news from Google News publishers that are labelled as trustworthy by Media Bias/Fact Check. These news spans four domains (Science, Environment, Energy and Technology), and are scraped and published 4 times a day, with a pause of 3 hours in between and with a pause of 12 hours from the last news report of one day to the first news report of the following day. You can see the bot working in <a href="proxy.php?url=https://bsky.app/profile/sci-news-bot.bsky.social" rel="noopener noreferrer">this page</a></p> <p>So, from within your Linode instance (which you connected to via SSH in previous steps) clone the application from GitHub:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>git clone https://github.com/AstraBert/SciNewsBot.git <span class="nb">cd </span>SciNewsBot/ </code></pre> </div> <p>Now the only things you need to do are:</p> <ol> <li>Get a <a href="proxy.php?url=https://console.mistral.ai/api-keys" rel="noopener noreferrer">Mistral AI API key</a> (you can create one for free)</li> <li>Create a BlueSky user for your bot, and you can do it <a href="proxy.php?url=https://bsky.app/" rel="noopener noreferrer">here</a> </li> <li>Modify your <code>.env.example</code> file with reporting the Mistral API key, the BlueSky username and password </li> <li>Rename the <code>.env.example</code> file to <code>.env</code> with the following command: </li> </ol> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code><span class="nb">mv</span> .env.example .env </code></pre> </div> <h2> Step 4: Deploy! </h2> <p>Now we're just one step away from deployment, and that step is launching our application through Docker. Let's take a look to the <code>compose.yaml</code> file that we have in the SciNewsBot folder<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">name</span><span class="pi">:</span> <span class="s">news-sci-bot</span> <span class="na">services</span><span class="pi">:</span> <span class="na">bot</span><span class="pi">:</span> <span class="na">build</span><span class="pi">:</span> <span class="na">context</span><span class="pi">:</span> <span class="s">./docker/</span> <span class="na">dockerfile</span><span class="pi">:</span> <span class="s">Dockerfile</span> <span class="na">secrets</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">mistral_key</span> <span class="pi">-</span> <span class="s">bsky_usr</span> <span class="pi">-</span> <span class="s">bsky_psw</span> <span class="na">networks</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">mynet</span> <span class="na">secrets</span><span class="pi">:</span> <span class="na">mistral_key</span><span class="pi">:</span> <span class="na">environment</span><span class="pi">:</span> <span class="s">mistral_api_key</span> <span class="na">bsky_usr</span><span class="pi">:</span> <span class="na">environment</span><span class="pi">:</span> <span class="s">bsky_username</span> <span class="na">bsky_psw</span><span class="pi">:</span> <span class="na">environment</span><span class="pi">:</span> <span class="s">bsky_password</span> <span class="na">networks</span><span class="pi">:</span> <span class="na">mynet</span><span class="pi">:</span> <span class="na">driver</span><span class="pi">:</span> <span class="s">bridge</span> </code></pre> </div> <p>This file creates a container from the <code>docker</code> subfolder we have, mounting within it three environment-derived secrets, i.e the Mistral AI API key, the BlueSky username and the password. It then attaches the container to a network named <code>mynet</code>.</p> <p>Each of this secrets is accessible through the path <code>/run/secrets/&lt;secret_name&gt;</code>, and that's how we do that in our python scripts (we read these files).</p> <p>Now, to deploy we just need to run:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker compose up <span class="nt">-d</span> </code></pre> </div> <blockquote> <p><em>Don't forget to put the <code>-d</code> option, otherwise you will kill the container execution one you exit from the Linode terminal: the <code>-d</code> option detaches the container execution from the main terminal, allowing you to close it without stopping the container.</em></p> </blockquote> <p>Congrats! You just deployed your first Docker application on cloud!🎉</p> <p>We will stop here for this article, but in the next (and last article) we will see more complex deployment use cases and will wrap up the 1minDocker journey: stay tuned and have fun!🥰</p> devops cloud ai tutorial 1minDocker #13 - Push, build and dockerize with GitHub Actions Clelia (Astra) Bertelli Thu, 23 Jan 2025 23:24:11 +0000 https://dev.to/astrabert/1mindocker-13-push-build-and-dockerize-with-github-actions-52gb https://dev.to/astrabert/1mindocker-13-push-build-and-dockerize-with-github-actions-52gb <p>In the <a href="proxy.php?url=https://dev.to/astrabert/1mindocker-12-what-is-cicd-2ap6">last article</a> we talked about CI/CD: but how do we put in practice a CI/CD pipeline in a hassle-free and very simple framework?</p> <p>Well, say no more! <strong>GitHub Actions</strong> have all that you need to set up a perfect environment to commit -&gt; build -&gt; dockerize your app. </p> <p>Let's dive in!</p> <blockquote> <p><em>Find the tutorial repo for this blog post <a href="proxy.php?url=https://github.com/AstraBert/hello-world-github-docker" rel="noopener noreferrer">here</a></em></p> </blockquote> <h2> Setting up - GitHub </h2> <p>We do not take anything for granted, so let's assume you don't have a GitHub account and you just wanna start from scratch:</p> <ul> <li>Head over to <a href="proxy.php?url=https://github.com/signup" rel="noopener noreferrer">GitHub Signup</a> and register there with your email and password. You will be asked also to create an username</li> <li>Activating 2 factors authentication (<a href="proxy.php?url=https://docs.github.com/en/authentication/securing-your-account-with-two-factor-authentication-2fa/configuring-two-factor-authentication" rel="noopener noreferrer">2FA</a>) is optional, but recommended</li> <li>Once you are signed in and set up, it's time to create your first repository!</li> </ul> <p>To create a repository, you generally have to click on a <code>New</code> or a <code>Create new repository</code> green button: you will be prompted to choose the visibility (if the repo is public or private) and the name of the repository. I suggest that, for this tutorial, you create a <strong>public repository called <code>hello-world-github-docker</code></strong>. </p> <h2> Setting up - Application </h2> <p>Now let's build an application: we'll be using Python, a versatile programming language that you can use for lots of things, from building apps to data analysis, from creating websites to machine learning. </p> <p>Let's first of all clone the GitHub repository (i.e. make a local copy of it) with <code>git</code> (see how to install it <a href="proxy.php?url=https://git-scm.com/book/en/v2/Getting-Started-Installing-Git" rel="noopener noreferrer">here</a> if you don't have it)<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>git clone https://github.com/username/hello-world-docker-github <span class="c"># Remember to change 'username' with your actual username!</span> <span class="nb">cd </span>hello-world-docker-github/ </code></pre> </div> <p>And now, let's create and start editing our <code>app.py</code> file (<code>.py</code> is the extension that python scripts have):<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code><span class="nb">touch </span>app.py code app.py </code></pre> </div> <p>The <code>touch</code> command creates the file, whereas the <code>code</code> command opens the file within Visual Studio Code (a IDE, <em>Integrated Development Environment</em>) to modify it. You can obviously use different IDEs: no IDE is better than another one, as long as it works for you.</p> <p>Generally, a "hello world" application prints "Hello world!" on the terminal, like this:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight python"><code><span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">Hello world!</span><span class="sh">"</span><span class="p">)</span> </code></pre> </div> <p>But we want to do something more: we don't like vanilla white text on our terminal, we want some 🌈color🌈.</p> <p>To do this, we simply need to install the <a href="proxy.php?url=https://pypi.org/project/termcolor/" rel="noopener noreferrer"><code>termcolor</code></a> package:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>pip <span class="nb">install </span>termcolor </code></pre> </div> <p>Now we just import and use it into our script:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight python"><code><span class="kn">from</span> <span class="n">termcolor</span> <span class="kn">import</span> <span class="n">cprint</span> <span class="c1"># cprint stands for "colorer print", and allows us to print colored text </span> <span class="nf">cprint</span><span class="p">(</span><span class="sh">"</span><span class="s">Hello world!</span><span class="sh">"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="sh">"</span><span class="s">red</span><span class="sh">"</span><span class="p">)</span> </code></pre> </div> <p>Now the printed text will be red, but let's add even some more spice, and let the program choose a random color to use in printing the string to terminal:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight python"><code><span class="kn">from</span> <span class="n">termcolor</span> <span class="kn">import</span> <span class="n">cprint</span> <span class="kn">import</span> <span class="n">random</span> <span class="c1"># random is the library that allows us to extract random items from a list </span><span class="n">colors</span> <span class="o">=</span> <span class="p">[</span><span class="sh">"</span><span class="s">red</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">green</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">blue</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">magenta</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">yellow</span><span class="sh">"</span><span class="p">]</span> <span class="n">color</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="nf">choice</span><span class="p">(</span><span class="n">colors</span><span class="p">)</span> <span class="nf">cprint</span><span class="p">(</span><span class="sh">"</span><span class="s">Hello world!</span><span class="sh">"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="n">color</span><span class="p">)</span> </code></pre> </div> <p>Now our script will randomly choose one of the colors and print the "Hello world!" in that color :)</p> <blockquote> <p><em><strong>NOTE</strong>: <code>random</code> is a built-in library in python, which means it comes within the language: that's why we did not need to install it.</em></p> </blockquote> <h2> Setting up - Docker </h2> <p>Ok, now we have our application: how can we dockerize it?</p> <p>Our first option could be to write down a Dockerfile, build an image from that and push it to Docker Hub. That's a viable solution, but we would need to re-build the image on our local machine and push it to Docker Hub every time we make a change to the app. While our "Hello world" app is relatively small and that would not be a burden for our computer, we for sure would want to avoid this with bigger applications. </p> <p>Another solution could be, as we said, writing the Dockerfile, and then uploading our application to GitHub and let the platform take care of building and pushing the image to <code>ghcr</code>, i.e. <em>GitHub Container Registry</em>, a registry where the Docker images built on GitHub (also known as <em>packages</em>) are stored. </p> <p>Since our goal is to exploit GitHub Actions, let's do that!</p> <p>The first thing we have to do is to create a <code>requirements.txt</code> file that will install all the necessary dependencies inside our Docker image. In our case, we only need <code>termcolor</code>, so we can just do it like this:</p> <ul> <li>Create and open the file for editing: </li> </ul> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code><span class="nb">touch </span>requirements.txt code requirements.txt </code></pre> </div> <ul> <li>Write the <code>termcolor</code> package in the file: </li> </ul> <div class="highlight js-code-highlight"> <pre class="highlight plaintext"><code>termcolor </code></pre> </div> <p>Now let's build our <code>Dockerfile</code>. We want our image to contain <code>python</code>, and we want it also to install the needed dependencies, so let's define it like this:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight docker"><code><span class="c"># Define your python version</span> <span class="k">ARG</span><span class="s"> PY_VERSION="3.11.9-slim-bookworm"</span> <span class="c"># Base image</span> <span class="k">FROM</span><span class="s"> python:${PY_VERSION}</span> <span class="c"># Define your working directory</span> <span class="k">WORKDIR</span><span class="s"> /app/</span> <span class="c"># Copy your local file system into the working directory</span> <span class="k">COPY</span><span class="s"> ./ /app/</span> <span class="c"># Install the necessary dependencies</span> <span class="k">RUN </span>pip cache purge <span class="k">RUN </span>pip <span class="nb">install</span> <span class="nt">--no-cache-dir</span> <span class="nt">-r</span> requirements.txt <span class="c"># Run the application as an entrypoint</span> <span class="k">ENTRYPOINT</span><span class="s"> python3 app.py</span> </code></pre> </div> <p>Now our local folder structure will look like this:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight plaintext"><code>. |__ app.py |__ requirements.txt |__ Dockerfile </code></pre> </div> <p>The only thing that we need is to configure a workflow file that will trigger GitHub Actions and tell them to build and push the image. For this, we can use the pre-built template offered by GitHub.</p> <p>We need to first of all create the file:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code><span class="c"># The file is placed in a special directory, .github/workflows</span> <span class="nb">mkdir</span> <span class="nt">-p</span> .github/workflows <span class="nb">touch</span> .github/workflows/docker-publish.yml code .github/workflows/docker-publish.yml </code></pre> </div> <p>As you can see, the workflow file is in YAML format, a powerful markup language that allows you to specify all the steps you want in the build. Copy and paste the below text in the file you are now editing:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">name</span><span class="pi">:</span> <span class="s">Docker</span> <span class="c1"># This workflow uses actions that are not certified by GitHub.</span> <span class="c1"># They are provided by a third-party and are governed by</span> <span class="c1"># separate terms of service, privacy policy, and support</span> <span class="c1"># documentation.</span> <span class="na">on</span><span class="pi">:</span> <span class="na">schedule</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">cron</span><span class="pi">:</span> <span class="s1">'</span><span class="s">40</span><span class="nv"> </span><span class="s">8</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">*'</span> <span class="na">push</span><span class="pi">:</span> <span class="na">branches</span><span class="pi">:</span> <span class="pi">[</span> <span class="s2">"</span><span class="s">main"</span> <span class="pi">]</span> <span class="c1"># Publish semver tags as releases.</span> <span class="na">tags</span><span class="pi">:</span> <span class="pi">[</span> <span class="s1">'</span><span class="s">v*.*.*'</span> <span class="pi">]</span> <span class="na">pull_request</span><span class="pi">:</span> <span class="na">branches</span><span class="pi">:</span> <span class="pi">[</span> <span class="s2">"</span><span class="s">main"</span> <span class="pi">]</span> <span class="na">env</span><span class="pi">:</span> <span class="c1"># Use docker.io for Docker Hub if empty</span> <span class="na">REGISTRY</span><span class="pi">:</span> <span class="s">ghcr.io</span> <span class="c1"># github.repository as &lt;account&gt;/&lt;repo&gt;</span> <span class="na">IMAGE_NAME</span><span class="pi">:</span> <span class="s">${{ github.repository }}</span> <span class="na">jobs</span><span class="pi">:</span> <span class="na">build</span><span class="pi">:</span> <span class="na">runs-on</span><span class="pi">:</span> <span class="s">ubuntu-latest</span> <span class="na">permissions</span><span class="pi">:</span> <span class="na">contents</span><span class="pi">:</span> <span class="s">read</span> <span class="na">packages</span><span class="pi">:</span> <span class="s">write</span> <span class="c1"># This is used to complete the identity challenge</span> <span class="c1"># with sigstore/fulcio when running outside of PRs.</span> <span class="na">id-token</span><span class="pi">:</span> <span class="s">write</span> <span class="na">steps</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Checkout repository</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">actions/checkout@v4</span> <span class="c1"># Install the cosign tool except on PR</span> <span class="c1"># https://github.com/sigstore/cosign-installer</span> <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Install cosign</span> <span class="na">if</span><span class="pi">:</span> <span class="s">github.event_name != 'pull_request'</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">sigstore/cosign-installer@59acb6260d9c0ba8f4a2f9d9b48431a222b68e20</span> <span class="c1">#v3.5.0</span> <span class="na">with</span><span class="pi">:</span> <span class="na">cosign-release</span><span class="pi">:</span> <span class="s1">'</span><span class="s">v2.2.4'</span> <span class="c1"># Set up BuildKit Docker container builder to be able to build</span> <span class="c1"># multi-platform images and export cache</span> <span class="c1"># https://github.com/docker/setup-buildx-action</span> <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Set up Docker Buildx</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">docker/setup-buildx-action@f95db51fddba0c2d1ec667646a06c2ce06100226</span> <span class="c1"># v3.0.0</span> <span class="c1"># Login against a Docker registry except on PR</span> <span class="c1"># https://github.com/docker/login-action</span> <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Log into registry ${{ env.REGISTRY }}</span> <span class="na">if</span><span class="pi">:</span> <span class="s">github.event_name != 'pull_request'</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">docker/login-action@343f7c4344506bcbf9b4de18042ae17996df046d</span> <span class="c1"># v3.0.0</span> <span class="na">with</span><span class="pi">:</span> <span class="na">registry</span><span class="pi">:</span> <span class="s">${{ env.REGISTRY }}</span> <span class="na">username</span><span class="pi">:</span> <span class="s">${{ github.actor }}</span> <span class="na">password</span><span class="pi">:</span> <span class="s">${{ secrets.GITHUB_TOKEN }}</span> <span class="c1"># Extract metadata (tags, labels) for Docker</span> <span class="c1"># https://github.com/docker/metadata-action</span> <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Extract Docker metadata</span> <span class="na">id</span><span class="pi">:</span> <span class="s">meta</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">docker/metadata-action@96383f45573cb7f253c731d3b3ab81c87ef81934</span> <span class="c1"># v5.0.0</span> <span class="na">with</span><span class="pi">:</span> <span class="na">images</span><span class="pi">:</span> <span class="s">${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}</span> <span class="c1"># Build and push Docker image with Buildx (don't push on PR)</span> <span class="c1"># https://github.com/docker/build-push-action</span> <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Build and push Docker image</span> <span class="na">id</span><span class="pi">:</span> <span class="s">build-and-push</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">docker/build-push-action@0565240e2d4ab88bba5387d719585280857ece09</span> <span class="c1"># v5.0.0</span> <span class="na">with</span><span class="pi">:</span> <span class="na">context</span><span class="pi">:</span> <span class="s">.</span> <span class="na">push</span><span class="pi">:</span> <span class="s">${{ github.event_name != 'pull_request' }}</span> <span class="na">tags</span><span class="pi">:</span> <span class="s">${{ steps.meta.outputs.tags }}</span> <span class="na">labels</span><span class="pi">:</span> <span class="s">${{ steps.meta.outputs.labels }}</span> <span class="na">cache-from</span><span class="pi">:</span> <span class="s">type=gha</span> <span class="na">cache-to</span><span class="pi">:</span> <span class="s">type=gha,mode=max</span> <span class="c1"># Sign the resulting Docker image digest except on PRs.</span> <span class="c1"># This will only write to the public Rekor transparency log when the Docker</span> <span class="c1"># repository is public to avoid leaking data. If you would like to publish</span> <span class="c1"># transparency data even for private images, pass --force to cosign below.</span> <span class="c1"># https://github.com/sigstore/cosign</span> <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Sign the published Docker image</span> <span class="na">if</span><span class="pi">:</span> <span class="s">${{ github.event_name != 'pull_request' }}</span> <span class="na">env</span><span class="pi">:</span> <span class="c1"># https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#using-an-intermediate-environment-variable</span> <span class="na">TAGS</span><span class="pi">:</span> <span class="s">${{ steps.meta.outputs.tags }}</span> <span class="na">DIGEST</span><span class="pi">:</span> <span class="s">${{ steps.build-and-push.outputs.digest }}</span> <span class="c1"># This step uses the identity token to provision an ephemeral certificate</span> <span class="c1"># against the sigstore community Fulcio instance.</span> <span class="na">run</span><span class="pi">:</span> <span class="s">echo "${TAGS}" | xargs -I {} cosign sign --yes {}@${DIGEST}</span> </code></pre> </div> <p>It's now time to test this workflow!</p> <h2> Testing - First push </h2> <p>Let's first of all add, commit and push all the local changes to the online repository:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>git add <span class="nb">.</span> git commit <span class="nt">-m</span> <span class="s2">"first commit"</span> git branch <span class="nt">-M</span> main git push <span class="nt">-u</span> origin main </code></pre> </div> <p>You will be prompted to insert your GitHub username and password. As password, you should use a GitHub access token, that you can create <a href="proxy.php?url=https://github.com/settings/tokens" rel="noopener noreferrer">here</a>. </p> <p>Once the change is pushed, you will see, on your online GitHub repo, a brown dot: it means that the workflow we triggered with our push is now working and will build and push the image. If the workflow run is successful, you will see a green tick at the end of it, whereas if it fails, you will see a red cross. </p> <p>Make sure that, once the package is created, it is <strong>public</strong>: if not, change its visibility to public, otherwise you won't be able to download it. </p> <h2> Testing - Trying our app </h2> <p>As we said, the Docker image will be loaded as a package under <code>ghcr.io</code>; we can then pull and run the image like this:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker pull ghcr.io/username/hello-world-github-docker:main docker run <span class="nt">-t</span> ghcr.io/username/hello-world-github-docker:main </code></pre> </div> <p>If Docker says that it can't pull the image because you are not logged in to the GitHub Container Registry, you can simply run:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight plaintext"><code>docker login ghcr.io -u username -p GITHUB-ACCESS-TOKEN </code></pre> </div> <p>And this should do the magic!</p> <p>When we run our application, we should see a colored "Hello world!" printed on the terminal.</p> <h2> Modifying our application </h2> <p>Now, let's say that we want to let the user choose the color they want "Hello world" to be printed with. We can modify our <code>app.py</code> like this:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight python"><code><span class="kn">from</span> <span class="n">termcolor</span> <span class="kn">import</span> <span class="n">cprint</span> <span class="n">colors</span> <span class="o">=</span> <span class="p">[</span><span class="sh">"</span><span class="s">red</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">green</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">blue</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">magenta</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">yellow</span><span class="sh">"</span><span class="p">]</span> <span class="c1"># Tell the user the instructions </span><span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Hello user! What color would you like </span><span class="sh">'</span><span class="s">Hello world</span><span class="sh">'</span><span class="s"> to be printed with?</span><span class="se">\n</span><span class="s">Choose among: </span><span class="si">{</span><span class="sh">'</span><span class="s">, </span><span class="sh">'</span><span class="p">.</span><span class="nf">join</span><span class="p">(</span><span class="n">colors</span><span class="p">)</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span> <span class="c1"># Take the input from the user </span><span class="n">color</span> <span class="o">=</span> <span class="nf">input</span><span class="p">(</span><span class="sh">"</span><span class="s">--&gt;</span><span class="sh">"</span><span class="p">)</span> <span class="c1"># Check if the input is in the available colors: if not, tell the user that it is not available </span><span class="k">if</span> <span class="n">color</span><span class="p">.</span><span class="nf">lower</span><span class="p">()</span> <span class="ow">in</span> <span class="n">colors</span><span class="p">:</span> <span class="nf">cprint</span><span class="p">(</span><span class="sh">"</span><span class="s">Hello world!</span><span class="sh">"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="n">color</span><span class="p">)</span> <span class="k">else</span><span class="p">:</span> <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">ERROR! The color you chose is not among the available colors :(</span><span class="sh">"</span><span class="p">)</span> </code></pre> </div> <p>To modify the Docker image, it is now sufficient to add, commit and push the local changes to the online repo:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>git add <span class="nb">.</span> git commit <span class="nt">-m</span> <span class="s2">"user defined color"</span> git push origin main </code></pre> </div> <p>If the workflow run is successful again, we should be able to pull and run the new image with updated features:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker pull ghcr.io/username/hello-world-github-docker:main docker run <span class="nt">-it</span> ghcr.io/username/hello-world-github-docker:main </code></pre> </div> <p>We will stop here for this article, but in the next one we will see, hands-on, how to use Docker for on-cloud deployments: stay tuned and have fun!🥰</p> docker devops beginners tutorial 1minDocker #12 - What is CI/CD? Clelia (Astra) Bertelli Wed, 22 Jan 2025 20:52:33 +0000 https://dev.to/astrabert/1mindocker-12-what-is-cicd-2ap6 https://dev.to/astrabert/1mindocker-12-what-is-cicd-2ap6 <p>Hello world and Happy 2025!🐳✨</p> <p>It's time we resume our 1minDocker series, starting our last learning block, consisting of 4 articles, that will introduce us to <strong>CI/CD practices with Docker</strong>.</p> <p>In order to get started with this last learning block, tho, we need to understand what CI/CD is: let's dive in!</p> <h2> What does CI/CD mean? </h2> <p>CI/CD is short for <strong>Continuous Integration/Continuous Delivery</strong> or, sometimes, <strong>Continuous Deployment</strong>. Let's break it down:</p> <ul> <li>Continuous <em>Integration</em>: the CI piece of CI/CD means that everything, even small modification, gets integrated in the main code, constantly. This is a key feature when you need to fix small bugs and/or technicalities that would otherwise require you to package a new, standalone, patch release, that will have to be manually integrated user-side through updates. With continuous integrations, small and big changes/fixes are immediately available. </li> <li>Continuous <em>Delivery</em>: the CD part is the consequence of continuous integration. Every time we integrate a new modification of our source code, we prompt a certain number of steps that will then, in the end, bring to a delivery/deployment of our code into production. This happens constantly, and allows companies with wise source control and orchestration to ship new features into production within <strong>minutes</strong> from their request. Obviously, this might be an edge case applicable to those huge tech companies with big developer teams working 24/7 on their products, but still the continuous delivery ensures high speed also to other smaller companies, that can deploy new features within hours instead of days or weeks.</li> </ul> <h2> What are the main steps of CI/CD? </h2> <p>As this image shows:</p> <p><a href="proxy.php?url=https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.blackduck.com%2Fglossary%2Fwhat-is-cicd%2F_jcr_content%2Froot%2Fsynopsyscontainer%2Fcolumn_1946395452_co%2FcolRight%2Fimage_copy.coreimg.svg%2F1727199377195%2Fcicd.svg" class="article-body-image-wrapper"><img src="proxy.php?url=https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.blackduck.com%2Fglossary%2Fwhat-is-cicd%2F_jcr_content%2Froot%2Fsynopsyscontainer%2Fcolumn_1946395452_co%2FcolRight%2Fimage_copy.coreimg.svg%2F1727199377195%2Fcicd.svg" alt="CI/CD" width="1000" height="500"></a></p> <p><em>Image from <a href="proxy.php?url=https://www.blackduck.com/glossary/what-is-cicd.html" rel="noopener noreferrer"><strong>Black duck</strong>: What is CI/CD</a></em></p> <p>The CI/CD pipeline contains several really important steps, that are inserted in a "infinite" loop (that's the main idea behind the <em>continuous</em> thing):</p> <ul> <li>We start with the <strong>code</strong>: developers all around the world write their code in their comfortable IDE on their computers, and then, once they are done, push the changes they made into a version control system. The most widespread is <a href="proxy.php?url=https://git-scm.com/" rel="noopener noreferrer">Git</a>, and the most used services on this end are <a href="proxy.php?url=//htts://github.com">GitHub</a> and <a href="proxy.php?url=https://about.gitlab.com/" rel="noopener noreferrer">GitLab</a> </li> <li>The code, once in code management system, <strong>needs to be built</strong>: builds are generally automatized and test if there is some error in this phase, that would result in breakings and/or other errors further down the road.</li> <li>After the build is complete and the code is deemed clean on this end, we can proceed with <strong>testing its actual capabilities</strong>. There are several possible tests, most of them depend on the use case that your code is tackling. A widespread strategy is <strong>integration testing</strong>, which looks into the possibility that the code is compatible with the requirements and the standard of its use case.</li> <li>If all the tests are passed, we can finally <strong>release</strong> our code out in the wild</li> <li>Following the release there's often the <strong>deployment</strong>: the code is pushed into production and is finally ready to be used</li> <li> <strong>Operating</strong> and <strong>Monitoring</strong> are then the last two phases before starting to modify the code again: we see how users interact with it, how well our products perform and we collect feedback. With this information, we can start fixing bugs, creating new features and making our products shine :)</li> </ul> <h2> Why CI/CD and Docker? </h2> <p>As you can see, CI/CD requires several environments: a build environment, a test environment and a deployment environment. Docker is a perfect solution, as we can manage everything through different containers, just using simple commands like <code>docker build</code> and <code>docker run</code>. Docker is also perfect for deployment, as it does not require complex environment set-up from scratch: you can build a portable image with all the dependencies and simply deploy it building from <code>Dockerfile</code> and running it. </p> <p>Docker handles secrets, takes care of data transfer from the local context to the container and manages networks.</p> <p>With Docker, you can even manage all of this with <strong>one command</strong>: you simply need to create a <code>compose.yaml</code> file and define your services, their sources (if pre-existing images or on-the-fly builds) and their specs (secrets, volumes, networks). After that, you simply need <code>docker compose up</code>. It's really simple, isn't it?🐳</p> <h2> Sources </h2> <ul> <li><a href="proxy.php?url=https://youtu.be/M4CXOocovZ4?feature=shared" rel="noopener noreferrer"><strong>Akamai Developer</strong>: CI/CD Explained | How DevOps Use Pipelines for Automation</a></li> <li><a href="proxy.php?url=https://youtu.be/OPwU3UWCxhw?feature=shared" rel="noopener noreferrer"><strong>Be a Better Dev</strong>: The IDEAL and Practical CI/CD Pipeline - Concepts Overview</a></li> </ul> docker devops beginners tutorial 1minDocker #11 - Advanced compose example Clelia (Astra) Bertelli Sun, 29 Dec 2024 00:07:11 +0000 https://dev.to/astrabert/1mindocker-11-advanced-compose-example-a4b https://dev.to/astrabert/1mindocker-11-advanced-compose-example-a4b <p>In the <a href="proxy.php?url=https://dev.to/astrabert/1mindocker-10-the-compose-file-4ihf">last article</a>, we introduced the <code>compose</code> file reference, and we went through numerous top-level elements and their attributes. In this article, we will present an advanced example using <code>docker compose</code> to deploy a multi-container application on the cloud. </p> <p>We will refer, for this tutorial, to a slightly modified version of the <code>compose.yaml</code> file proposed for ElasticSearch-Logstash-Kibana by the <a href="proxy.php?url=https://github.com/docker/awesome-compose" rel="noopener noreferrer">awesome-compose</a> repository by Docker on GitHub. </p> <h2> Background </h2> <p>Why would we build a multi-container environment with ElasticSearch, Logstash and Kibana? Well, the idea is that this stack (also known as the <strong>ELK stack</strong>), can help us monitor logs and data sent by third party services (Logstash), store and index them for fast search/retrieval (ElasticSearch) and visualize statistics in real time (Kibana). </p> <p>This is optimal when we have real-time data flows (such as with IoT devices, social media or web traffic), we want to track logs from big servers and analyze them and/or we want to search data-heavy applications, such as e-commerce ones.</p> <p>If you want to know more, please refer to:</p> <ul> <li><a href="proxy.php?url=https://www.elastic.co/docs" rel="noopener noreferrer">ElasticSearch docs</a></li> <li><a href="proxy.php?url=https://www.elastic.co/guide/en/logstash/current/introduction.html" rel="noopener noreferrer">Logstash reference</a></li> <li><a href="proxy.php?url=https://www.elastic.co/guide/en/kibana/current/index.html" rel="noopener noreferrer">Kibana guide</a></li> </ul> <p>As you can see, these three services are all provided by <a href="proxy.php?url=https://www.elastic.co/" rel="noopener noreferrer">Elastic</a>.</p> <h2> Set-up </h2> <h3> Getting the needed files </h3> <p>First of all, we clone the <code>awesome-compose</code> repository:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>git clone https://github.com/docker/awesome-compose.git </code></pre> </div> <p>And we head over to our folder of interest:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code><span class="nb">cd </span>awesome-compose/elasticsearch-logstash-kibana/ </code></pre> </div> <blockquote> <p><em><strong>TIP</strong>💡: you can use all the folders in this repository to experiment with different <code>compose</code> settings</em></p> </blockquote> <p>And we can take a look at the structure of the repository:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code><span class="c"># You might need to install tree before using it</span> tree <span class="nb">.</span> <span class="nt">-L</span> 2 </code></pre> </div> <p>We will get out this structure:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight plaintext"><code>. |__ logstash/ | |__nginx.log | |__pipeline/ |__compose.yaml </code></pre> </div> <p>The <code>logstash</code> folder contains a <code>pipeline</code> subfolder and a <code>nginx.log</code> file: their purpose is not important for our scopes, but we have to keep in mind that they are there. </p> <h3> Adding some modifications </h3> <p>To showcase more of the <code>compose</code> file elements, we introduce a <code>.env</code> file, which we can create in this way:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code><span class="nb">touch</span> .env </code></pre> </div> <p>And then modify with our favorite text editor (for me it's VSCode):<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>code .env </code></pre> </div> <p>In the <code>.env</code> file, let's create the following keys and values:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code><span class="nv">JAVA_OPTS</span><span class="o">=</span><span class="s2">"-Xms512m -Xmx512m"</span> <span class="nv">DISCOVERY_SEED_HOSTS</span><span class="o">=</span><span class="s2">"logstash"</span> <span class="nv">API_TOKEN</span><span class="o">=</span><span class="s2">"super-secret-token"</span> </code></pre> </div> <p>We will use them in the <code>compose</code> file (see below).</p> <h2> The <code>compose</code> file </h2> <p>Let's take a look to the compose file, which is slightly different from the one proposed by <code>awesome-compose</code>:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">services</span><span class="pi">:</span> <span class="na">elasticsearch</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">elasticsearch:7.16.1</span> <span class="na">container_name</span><span class="pi">:</span> <span class="s">es</span> <span class="na">environment</span><span class="pi">:</span> <span class="na">discovery.type</span><span class="pi">:</span> <span class="s">single-node</span> <span class="na">ES_JAVA_OPTS</span><span class="pi">:</span> <span class="s">$JAVA_OPTS</span> <span class="c1"># as we mentioned in the last article, compose can access the env variables we set in our .env file</span> <span class="na">ports</span><span class="pi">:</span> <span class="pi">-</span> <span class="s2">"</span><span class="s">9200:9200"</span> <span class="pi">-</span> <span class="s2">"</span><span class="s">9300:9300"</span> <span class="na">healthcheck</span><span class="pi">:</span> <span class="na">test</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">CMD-SHELL"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">curl</span><span class="nv"> </span><span class="s">--silent</span><span class="nv"> </span><span class="s">--fail</span><span class="nv"> </span><span class="s">localhost:9200/_cluster/health</span><span class="nv"> </span><span class="s">||</span><span class="nv"> </span><span class="s">exit</span><span class="nv"> </span><span class="s">1"</span><span class="pi">]</span> <span class="na">interval</span><span class="pi">:</span> <span class="s">10s</span> <span class="na">timeout</span><span class="pi">:</span> <span class="s">10s</span> <span class="na">retries</span><span class="pi">:</span> <span class="m">3</span> <span class="na">networks</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">elastic</span> <span class="na">logstash</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">logstash:7.16.1</span> <span class="na">container_name</span><span class="pi">:</span> <span class="s">log</span> <span class="na">environment</span><span class="pi">:</span> <span class="na">discovery.seed_hosts</span><span class="pi">:</span> <span class="s">$DISCOVERY_SEED_HOSTS</span> <span class="na">LS_JAVA_OPTS</span><span class="pi">:</span> <span class="s">$JAVA_OPTS</span> <span class="na">secrets</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">api_token</span> <span class="na">volumes</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">./logstash/pipeline/logstash-nginx.config:/usr/share/logstash/pipeline/logstash-nginx.config</span> <span class="pi">-</span> <span class="s">./logstash/nginx.log:/home/nginx.log</span> <span class="na">ports</span><span class="pi">:</span> <span class="pi">-</span> <span class="s2">"</span><span class="s">5000:5000/tcp"</span> <span class="pi">-</span> <span class="s2">"</span><span class="s">5000:5000/udp"</span> <span class="pi">-</span> <span class="s2">"</span><span class="s">5044:5044"</span> <span class="pi">-</span> <span class="s2">"</span><span class="s">9600:9600"</span> <span class="na">depends_on</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">elasticsearch</span> <span class="na">networks</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">elastic</span> <span class="na">command</span><span class="pi">:</span> <span class="s">logstash -f /usr/share/logstash/pipeline/logstash-nginx.config</span> <span class="na">kibana</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">kibana:7.16.1</span> <span class="na">container_name</span><span class="pi">:</span> <span class="s">kib</span> <span class="na">volumes</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">kibana-cache:/etc/logs/cache</span> <span class="na">ports</span><span class="pi">:</span> <span class="pi">-</span> <span class="s2">"</span><span class="s">5601:5601"</span> <span class="na">depends_on</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">elasticsearch</span> <span class="na">networks</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">elastic</span> <span class="na">networks</span><span class="pi">:</span> <span class="na">elastic</span><span class="pi">:</span> <span class="na">driver</span><span class="pi">:</span> <span class="s">bridge</span> <span class="na">secrets</span><span class="pi">:</span> <span class="na">api_token</span><span class="pi">:</span> <span class="na">environment</span><span class="pi">:</span> <span class="s2">"</span><span class="s">API_TOKEN"</span> <span class="na">volumes</span><span class="pi">:</span> <span class="na">kibana-cache</span><span class="pi">:</span> <span class="na">external</span><span class="pi">:</span> <span class="kc">true</span> <span class="na">name</span><span class="pi">:</span> <span class="s">kibana_cache_volume</span> </code></pre> </div> <p>We will now break this <code>compose</code> file down service by service. </p> <h3> ElasticSearch </h3> <p>ElasticSearch is based on the Docker image <strong>elasticsearch:7.16.1</strong>, and the container that is launched within the service is named <strong>es</strong>. </p> <p>It accesses two environment variables: <strong>ES_JAVA_OPTS</strong>, that is read from the <code>.env</code> file, and <strong>discovery.type</strong>, which is set in-line. </p> <p>It is connected to two ports: <strong>9200 and 9300</strong>, accessible to the local device host on the same port addresses.</p> <p>There is an <strong>healthcheck</strong>, that is performed for a maximum of three times with 10s intervals and a timeout (maximum time of execution of the test command) of 10s. </p> <p>The service is bound to the <strong>elastic</strong> network, whose driver is the most common: <strong>bridge</strong> (simply connects all the containers together, <em>bridging</em> among them). </p> <p>This container does not depend on anything, so it is <strong>the first to be started</strong>. </p> <h3> Logstash </h3> <p>ElasticSearch is based on the Docker image <strong>logstash:7.16.1</strong>, and the container that is launched within the service is named <strong>log</strong>. </p> <p>It accesses two environment variables: <strong>LS_JAVA_OPTS</strong>, that is read from the <code>.env</code> file, and <strong>discovery.seed_hosts</strong>, which is also read from the .env` file.</p> <p>It also has access to a secret, <strong>api_token</strong>, which is read from the environment file also as the <strong>API_TOKEN</strong> variable. </p> <p>Inside the container, two volumes are mounted from the local file system (the <code>pipeline</code> subfolder and the <code>nginx.log</code> file) into the container's system. </p> <p>It is connected to three ports: <strong>5600, 5044 and 9300</strong>, accessible to the local device host on the same port addresses.</p> <p>There is <strong>no healthcheck</strong>, but the service depends on ElasticSearch so it is the second one to be started: as soon as the container is started, the command <strong>logstash -f /usr/share/logstash/pipeline/logstash-nginx.config</strong> is executed, overriding the eventual <code>CMD</code> entries in the original Dockerfile from the logstash image. </p> <p>The service is bound to the <strong>elastic</strong> network.</p> <h3> Kibana </h3> <p>Kibana is based on the Docker image <strong>kibana:7.16.1</strong>, and the container that is launched within the service is named <strong>kib</strong>. </p> <p>It does not access environment variables or secrets.</p> <p>Inside the container, there is one volume mounted, <strong>kibana-cache</strong>, the depends on a volume whose life cycle is externally managed, and which is named <strong>kibana_cache_volume</strong>.</p> <p>It is connected to one ports: <strong>5061</strong>, accessible to the local device host on the same port address.</p> <p>There is <strong>no healthcheck</strong>, but the service depends on ElasticSearch so it is the second one to be started, along with Logstash.</p> <p>The service is bound to the <strong>elastic</strong> network.</p> <h2> Launch and stop the service </h2> <p>Now, we can just launch our multi-container application with:</p> <p><code></code><code><br> docker compose up<br> </code><code></code></p> <p>Or we can use:</p> <p><code></code><code><br> docker compose up --detach<br> </code><code></code></p> <p>If we do not want the services logs to be displayed (and, potentially swamp) on our terminal.</p> <p>If we want to see what is currently running, we can use:</p> <p><code></code><code><br> docker compose ps<br> </code><code></code></p> <p>And if we want to stop the services and take down the <code>compose</code> app, we can simply run:</p> <p><code></code><code><br> docker compose stop<br> </code><code></code></p> <p>Or, to stop and remove all the containers:</p> <p><code></code><code><br> docker compose down<br> </code><code></code></p> <p>We will stop here for this article, and in the next one we will explore continuous-integration/continuous-deployment solution for Docker images. See you in 2025!🥰</p> devops docker tutorial beginners 1minDocker #10 - The Compose File Clelia (Astra) Bertelli Wed, 18 Dec 2024 11:37:54 +0000 https://dev.to/astrabert/1mindocker-10-the-compose-file-4ihf https://dev.to/astrabert/1mindocker-10-the-compose-file-4ihf <p>In the <a href="proxy.php?url=https://dev.to/astrabert/1mindocker-9-introduction-to-compose-11di">last article</a> we introduced <code>compose</code>, a popular Docker plugin to build multi-container applications and to manage complex environments in an easy and sharable way.</p> <p>In this post we will focus on the <code>compose</code> file, i.e. the YAML file that contains the instructions that <code>docker compose</code> reads and runs when it is launched.</p> <p>As we saw for the Dockerfile, also the compose file has keywords: these keywords are named <em>elements</em>, and the most important of them are known as <em>top level</em> elements. We will learn about them in the following paragraphs.</p> <h2> <code>name</code> and <code>version</code> </h2> <p>The <code>version</code> top element is obsolete, and it is used only for backward compatibility with older version of Compose, where the program actually validated the YAML file structured according to a precise schema known as <em>Specification</em>. Newer versions of <code>compose</code> no longer parse their input file for validation and, if they encounter a wrongly compiled/unknown field, <code>compose</code> would simply throw an error.</p> <p>The <code>name</code> top level element is set to give a name to the project you are launching with <code>compose</code>, and overrides the default ones. </p> <p>For example:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">name</span><span class="pi">:</span> <span class="s">new_app</span> <span class="na">services</span><span class="pi">:</span> <span class="na">app</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">foo/bar</span> <span class="na">command</span><span class="pi">:</span> <span class="s">echo "I'm running ${COMPOSE_PROJECT_NAME}"</span> </code></pre> </div> <h2> <code>services</code> </h2> <p>The <code>services</code> elements defines the various containers your <code>compose</code> project will run, with several potential configurations and specifications.</p> <p>There are numerous elements linked to the <code>services</code> one, we will go through the most used (excluding the ones referenced in the next sections):</p> <h3> image </h3> <p>Specifies the image that the container is running on: if the image has not already been pulled locally, it is pulled from the hub on the fly when the service is started.<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">services</span><span class="pi">:</span> <span class="na">app</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">node:18-alpine</span> <span class="s">...</span> <span class="na">db</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">Postgres</span> <span class="s">...</span> </code></pre> </div> <h3> build </h3> <p>If you want your container to run on a custom image you configured through a Dockerfile, you can use the <code>build</code> element, which will build the image on the fly based on the context provided (you can also specify the Dockerfile name):<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">services</span><span class="pi">:</span> <span class="na">app</span><span class="pi">:</span> <span class="na">build</span><span class="pi">:</span> <span class="s">.</span> <span class="na">dockerfile</span><span class="pi">:</span> <span class="s2">"</span><span class="s">Dockerfile.node"</span> <span class="s">...</span> <span class="na">db</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">postgres</span> <span class="s">...</span> </code></pre> </div> <h3> env_file and environment </h3> <p><code>compose</code> by default can read the environment variables you set in a <code>.env</code> file is that is placed in the same directory in which the <code>compose.yml</code> file is situated. Nevertheless, you can specify your environment file through the <code>env_file</code> element:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">env_file</span><span class="pi">:</span> <span class="s2">"</span><span class="s">./envs_config/.raw.env"</span> </code></pre> </div> <p>You can also specify the format of the <code>env_file</code> (more on Docker docs <a href="proxy.php?url=https://docs.docker.com/reference/compose-file/services/#format" rel="noopener noreferrer">here</a>) and if it is required or not (more on Docker docs <a href="proxy.php?url=https://docs.docker.com/reference/compose-file/services/#required" rel="noopener noreferrer">here</a>).</p> <p>For example, you could write:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">env_file</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">path</span><span class="pi">:</span> <span class="s">./default.env</span> <span class="na">required</span><span class="pi">:</span> <span class="kc">true</span> <span class="c1"># default</span> <span class="na">format</span><span class="pi">:</span> <span class="s">raw</span> <span class="pi">-</span> <span class="na">path</span><span class="pi">:</span> <span class="s">./override.env</span> <span class="na">required</span><span class="pi">:</span> <span class="kc">false</span> </code></pre> </div> <p>The <code>environment</code> element works as an env file, but from inside the compose file:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">environment</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">PG_USR</span><span class="pi">:</span> <span class="s">user</span> <span class="pi">-</span> <span class="na">PG_PSW</span><span class="pi">:</span> <span class="s">password</span> <span class="pi">-</span> <span class="na">PG_DATABASE</span><span class="pi">:</span> <span class="s">postgres</span> </code></pre> </div> <p>If the <code>environment</code> element is used in the <code>compose</code> file, it has priority over the same variables used in the env file.</p> <h3> depends_on </h3> <p>The <code>depends_on</code> element is useful when it comes to set the order in which several services are started.</p> <p>For example:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">services</span><span class="pi">:</span> <span class="na">app</span><span class="pi">:</span> <span class="na">build</span><span class="pi">:</span> <span class="s">.</span> <span class="na">depends_on</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">db</span> <span class="pi">-</span> <span class="s">redis</span> <span class="na">redis</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">redis</span> <span class="na">db</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">postgres</span> </code></pre> </div> <p>In this case, the <code>app</code> container is built only after the <code>db</code> and <code>redis</code> one are ready. </p> <p>This can be accompanied by <code>conditions</code> on how to actually control the starting of a container:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">services</span><span class="pi">:</span> <span class="na">app</span><span class="pi">:</span> <span class="na">build</span><span class="pi">:</span> <span class="s">.</span> <span class="na">depends_on</span><span class="pi">:</span> <span class="na">db</span><span class="pi">:</span> <span class="na">condition</span><span class="pi">:</span> <span class="s">service_healthy</span> <span class="na">restart</span><span class="pi">:</span> <span class="kc">true</span> <span class="na">redis</span><span class="pi">:</span> <span class="na">condition</span><span class="pi">:</span> <span class="s">service_started</span> <span class="na">redis</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">redis</span> <span class="na">db</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">postgres</span> </code></pre> </div> <p>In this case, the <code>app</code> container is started only if the <code>db</code> container passes its health check (see below) and when the <code>redis</code> container is started (no need for health check).</p> <h3> command and entrypoint </h3> <p><code>command</code> element specifies a command that overrides the execution of a <code>CMD</code>-dependent command from the Docker image.</p> <p>For example:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">command</span><span class="pi">:</span> <span class="s">bundle exec thin -p </span><span class="m">3000</span> </code></pre> </div> <p><code>entrypoint</code>, on the other hand, overrides the <code>ENTRYPOINT</code> set for the service's Docker image:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">entrypoint</span><span class="pi">:</span> <span class="s">bash /app/post_create_command.sh</span> </code></pre> </div> <h3> ports </h3> <p>Ports associated with the service and exposed from the container:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">services</span><span class="pi">:</span> <span class="na">semantic_db</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">qdrant/qdrant:latest</span> <span class="na">volumes</span><span class="pi">:</span> <span class="pi">-</span> <span class="s2">"</span><span class="s">./qdrant_storage:/qdrant/storage"</span> <span class="na">ports</span><span class="pi">:</span> <span class="pi">-</span> <span class="s2">"</span><span class="s">6333:6333"</span> <span class="pi">-</span> <span class="s2">"</span><span class="s">6334:6334"</span> </code></pre> </div> <p>In this case, the <code>semantic_db</code> container will have associated and exposed the ports 6333 and 6334, which will be accessible to the user on their <code>localhost</code> under the same port number.</p> <h3> restart </h3> <p>It can happen that a container fails to start or terminates abruptly/prematurely its execution. <code>restart</code> takes care of this, defining what is the policy when termination happens:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">restart</span><span class="pi">:</span> <span class="s2">"</span><span class="s">no"</span> <span class="c1"># no restarting whatsoever </span> <span class="na">restart</span><span class="pi">:</span> <span class="s">always</span> <span class="c1"># always restart upon termination</span> <span class="na">restart</span><span class="pi">:</span> <span class="s">on-failure</span> <span class="c1"># restart only if the container produced an error</span> <span class="na">restart</span><span class="pi">:</span> <span class="s">on-failure:3</span> <span class="c1"># restart max 3 times on failure</span> <span class="na">restart</span><span class="pi">:</span> <span class="s">unless-stopped</span> <span class="c1"># restart only if the container wasn't stopped or removed externally</span> </code></pre> </div> <h3> healthcheck </h3> <p><code>healthcheck</code> element is used to test the correct functioning of the service to which it is associated. It generally uses this syntax:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">healthcheck</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">test</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">CMD"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">curl"</span><span class="pi">,</span><span class="s2">"</span><span class="s">-f"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">http://localhost"</span><span class="pi">]</span> <span class="pi">-</span> <span class="na">interval</span><span class="pi">:</span> <span class="s">1m10s</span> <span class="pi">-</span> <span class="na">timeout</span><span class="pi">:</span> <span class="s">30s</span> <span class="pi">-</span> <span class="na">retries</span><span class="pi">:</span> <span class="m">5</span> <span class="pi">-</span> <span class="na">start_period</span><span class="pi">:</span> <span class="s">30s</span> <span class="pi">-</span> <span class="na">start_interval</span><span class="pi">:</span> <span class="s">5s</span> </code></pre> </div> <ul> <li> <strong>test</strong> is the command to be executed during the check. CMD or CMD-SHELL specify that the command is executed in the default shell for the container (<code>/bin/sh</code> for Linux, generally).</li> <li> <strong>interval</strong> is the time that occurs between retries.</li> <li> <strong>timeout</strong> is the maximum duration for a health check before it is considered failed</li> <li> <strong>retries</strong> sets the maximum number of failures before the container is considered unhealthy </li> <li> <strong>start_period</strong> is the "protected time" in which health checks occur during the start of a container that needs bootstrap. If these health checks fail, they do not count toward the maximum number of retries, whereas if they pass the container is considered started</li> <li> <strong>start_interval</strong> works as interval but for health checks during the start time </li> </ul> <h3> container_name </h3> <p>Set the name of the container, in order to make it easier to detect it when you run <code>docker ps -a</code>:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">services</span><span class="pi">:</span> <span class="na">app</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">node:alpine-18</span> <span class="na">container_name</span><span class="pi">:</span> <span class="s2">"</span><span class="s">reactjs_app"</span> <span class="s">...</span> </code></pre> </div> <h2> <code>volumes</code> </h2> <p><code>volumes</code> is a top-level element that ensures that data from the local file system are injected into the container. <code>volumes</code> is both a top level element and an attribute for <code>services</code> elements, and to ensure that a service has access to the volume you need to explicitly specify it inside the service specification itself.<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">services</span><span class="pi">:</span> <span class="na">app</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">python:3.11.9-slim-bookworm</span> <span class="na">volumes</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">app-data:/app/data/</span> <span class="na">volumes</span><span class="pi">:</span> <span class="na">app-data</span><span class="pi">:</span> </code></pre> </div> <p>If you don't need to mount anything inside the <code>/app/data</code> path, you can simply leave the <code>app-data</code> field under <code>volumes</code> blank. Otherwise, you simply have to specify the path to your data in the local file system.</p> <p>A volume, like a network (see below), can have a <code>driver</code> (whose options are specified through <code>driver_opts</code>) and can be managed outside the container (<code>external: true</code> is specified).</p> <p>Let's see a complex example:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">services</span><span class="pi">:</span> <span class="na">app</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">python:3.11.9-slim-bookworm</span> <span class="na">volumes</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">app-data:/app/data/</span> <span class="pi">-</span> <span class="s">app-cache:/etc/logs/cache</span> <span class="na">volumes</span><span class="pi">:</span> <span class="na">app-data</span><span class="pi">:</span> <span class="na">driver</span><span class="pi">:</span> <span class="s">local</span> <span class="na">driver_opts</span><span class="pi">:</span> <span class="na">type</span><span class="pi">:</span> <span class="s">none</span> <span class="na">device</span><span class="pi">:</span> <span class="s">/data/db_data</span> <span class="na">o</span><span class="pi">:</span> <span class="s">bind</span> <span class="na">app-cache</span><span class="pi">:</span> <span class="na">external</span><span class="pi">:</span> <span class="kc">true</span> <span class="na">name</span><span class="pi">:</span> <span class="s">appcache_vol</span> </code></pre> </div> <h2> <code>networks</code> </h2> <p><code>networks</code> are a very important element for a <code>compose</code> file, because they allow the different services to communicate with each other, instead of isolating them. <code>compose</code> by default sets a single network for your app, but this is not always optimal: we may want some networks to be accessible only to specific services, and that's why we should specify the networks attached to each of our services.</p> <p>For example:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">services</span><span class="pi">:</span> <span class="na">frontend</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">user/webapp</span> <span class="na">networks</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">foo</span> <span class="pi">-</span> <span class="s">bar</span> <span class="na">networks</span><span class="pi">:</span> <span class="na">foo</span><span class="pi">:</span> <span class="na">bar</span><span class="pi">:</span> </code></pre> </div> <p>Networks can obviously be configured, so let's see two important elements in their configuration:</p> <ul> <li> <strong>driver</strong>: this attribute provides information about the driver used to build the network and provide its core functionalities: <code>bridge</code> is the default one (ensure communication among containers in your app), but you can also use <code>host</code> (exploit directly the host networking, removing network isolation between the container and the Docker host), <code>overlay</code> (allow connectivity across different Docker daemons, networking across nodes for Swarm services), <code>ipvlan</code> (gives user control over IPv4 or IPv6 addressing and may be used for underlay network integration), <code>macvlan</code> (assigns your MAC address to a container, making it a visible device in your network) or <code>none</code> (completely isolate the container's network). Drivers can be configured through <strong>driver_opts</strong> </li> <li> <strong>internal</strong>/<strong>external</strong>: specified if the network is managed outside or inside the application. By default, every network is internal and, if set external, <code>compose</code> throws an error if it is not able to connect to it.</li> </ul> <p>Let's see a complete example:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">services</span><span class="pi">:</span> <span class="na">proxy</span><span class="pi">:</span> <span class="na">build</span><span class="pi">:</span> <span class="s">./proxy</span> <span class="na">networks</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">frontend</span> <span class="pi">-</span> <span class="s">outside</span> <span class="na">app</span><span class="pi">:</span> <span class="na">build</span><span class="pi">:</span> <span class="s">./app</span> <span class="na">networks</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">frontend</span> <span class="pi">-</span> <span class="s">backend</span> <span class="na">db</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">postgres</span> <span class="na">networks</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">backend</span> <span class="na">networks</span><span class="pi">:</span> <span class="na">frontend</span><span class="pi">:</span> <span class="na">driver</span><span class="pi">:</span> <span class="s">bridge</span> <span class="na">driver_opts</span><span class="pi">:</span> <span class="na">com.docker.network.bridge.host_binding_ipv4</span><span class="pi">:</span> <span class="s2">"</span><span class="s">127.0.0.1"</span> <span class="na">backend</span><span class="pi">:</span> <span class="na">driver</span><span class="pi">:</span> <span class="s">bridge</span> <span class="na">outside</span><span class="pi">:</span> <span class="na">external</span><span class="pi">:</span> <span class="kc">true</span> </code></pre> </div> <h2> <code>configs</code> </h2> <p><code>configs</code> are specific configurations that can be accessed by services (if explicitly declared under the <code>configs</code> attribute) and that modify a Docker image without having to build it from scratch.</p> <p>Configs are by default owned by the user who is running the services and generally have world-readable permissions (that can be overridden by the services if they are configured to do so).</p> <p>Configs have the following attributes:</p> <ul> <li> <strong>file</strong>: the configuration file for the container (provided as a path referring to the local file system)</li> <li> <strong>environment</strong>: the configuration is set as an environment variable</li> <li> <strong>content</strong>: configuration is passed in-line inside the <code>compose</code> file</li> <li> <strong>external</strong>: the config was already created and its lifecycle is externally managed</li> <li> <strong>name</strong>: the name of the configuration (by default is <code>&lt;project_name&gt;_config_key</code>)</li> </ul> <p>Let's see an example:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">services</span><span class="pi">:</span> <span class="na">app</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">foo/bar</span> <span class="na">configs</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">app_config</span> <span class="na">http_server</span><span class="pi">:</span> <span class="na">build</span><span class="pi">:</span> <span class="s">./http_server/</span> <span class="na">configs</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">http_config</span> <span class="na">db</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">postgres</span> <span class="na">configs</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">db_config</span> <span class="na">configs</span><span class="pi">:</span> <span class="na">http_config</span><span class="pi">:</span> <span class="na">file</span><span class="pi">:</span> <span class="s">./httpd.conf</span> <span class="na">app_config</span><span class="pi">:</span> <span class="na">content</span><span class="pi">:</span> <span class="pi">|</span> <span class="s">debug=${DEBUG}</span> <span class="s">spring.application.admin.enabled=${DEBUG}</span> <span class="s">spring.application.name=${COMPOSE_PROJECT_NAME}</span> <span class="na">db_config</span><span class="pi">:</span> <span class="na">external</span><span class="pi">:</span> <span class="kc">true</span> </code></pre> </div> <h2> <code>secrets</code> </h2> <p>A secret can be specified as a file or an environment variable, and to be accessed by a service, it has to specified as a <code>service</code> attribute. Here is an example:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">services</span><span class="pi">:</span> <span class="na">frontend</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">example/webapp</span> <span class="na">secrets</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">server-certificate</span> <span class="na">db</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">postgres</span> <span class="na">secrets</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">postgres_psw</span> <span class="pi">-</span> <span class="s">postgres_user</span> <span class="pi">-</span> <span class="s">postgres_db</span> <span class="na">secrets</span><span class="pi">:</span> <span class="na">server-certificate</span><span class="pi">:</span> <span class="na">file</span><span class="pi">:</span> <span class="s">./server.cert</span> <span class="na">postgres_psw</span><span class="pi">:</span> <span class="na">environment</span><span class="pi">:</span> <span class="s2">"</span><span class="s">POSTGRES_PSW"</span> <span class="na">postgres_user</span><span class="pi">:</span> <span class="na">environment</span><span class="pi">:</span> <span class="s2">"</span><span class="s">POSTGRES_USER"</span> <span class="na">postgres_db</span><span class="pi">:</span> <span class="na">environment</span><span class="pi">:</span> <span class="s2">"</span><span class="s">POSTGRES_DB"</span> </code></pre> </div> <p>We will stop here for this article, but in the next one we will explore an advanced <code>compose</code> example, in which we will see the nuances of this powerful plugin🥰</p> <blockquote> <p><em>The content for this article is mainly based on <a href="proxy.php?url=https://docs.docker.com/reference/compose-file/" rel="noopener noreferrer"><code>docker compose</code> file reference documentation</a> : make sure to visit them to get to know more!</em></p> </blockquote> docker devops tutorial beginners 1minDocker #9 - Introduction to Compose Clelia (Astra) Bertelli Thu, 12 Dec 2024 19:23:52 +0000 https://dev.to/astrabert/1mindocker-9-introduction-to-compose-11di https://dev.to/astrabert/1mindocker-9-introduction-to-compose-11di <p>In <a href="proxy.php?url=https://dev.to/astrabert/1mindocker-8-advanced-concepts-for-buildx-2olc">the last article</a> we finished our deep dive on <code>docker buildx</code>, a popular plugin that is aimed at easing and automating the building process.</p> <p>From this article on, we will talk about another plugin, <code>docker compose</code>, which presents numerous fields of application and a high potential for deployment and development automation.</p> <h2> What is <code>compose</code>? </h2> <p><code>compose</code> is a technology that allow users to run one or more container in a easy and reproducible way. </p> <p>Through <code>compose</code> you can control the entire tech stack and environment needed for your application, using simple but elegant YAML code in a input file.</p> <p><code>compose</code> provides also a very simple and intuitive CLI that, with few commands, lets your run, inspect, interact with and stop the containers you defined as services inside your YAML input file.</p> <h2> Why <code>compose</code>? </h2> <p>Choosing <code>compose</code> might come for a variety of reasons:</p> <ul> <li>It's <strong>simple</strong>: it only needs few key words to work correctly, it leverages intuitive CLI commands and does not need the complex configurations that are set when running directly with <code>docker run</code> </li> <li>It's <strong>compact</strong>: everything you need (images, volumes, networks...) is in one file</li> <li>It's <strong>the easiest way to set up a working environment</strong>: imaging managing multiple databases, switching among various stacks for backend and frontend and manage several different API services: this would be very difficult to implement natively but, with <code>compose</code>, you can easily combine several different Docker images and just run them all together as a perfectly harmonic orchestra </li> <li>It's easily <strong>sharable</strong>: you don't have to transfer entire codebases, deal with conflicts and with local machine versioning problems when giving your compose YAML file to other people from your team or from other team. This enhances <strong>reproducibility</strong> and fosters <strong>collaboration</strong> </li> </ul> <h2> Getting started </h2> <p>Getting started with <code>compose</code> is simple and easy. We just need to have it installed (see <a href="proxy.php?url=https://dev.to/astrabert/1mindocker-2-get-docker-kh">the second article of this series</a>) and we can then proceed with creating our first <code>compose</code> YAML file, which we will call <code>compose.yaml</code> (suggested by Docker docs over <code>compose.yml</code> and <code>docker-compose.yaml</code>, which can still be used, though). </p> <p>Let's say we want to build a React.js application and we want it to be interfaced with a Postgres database, whose status we also want to monitor through Adminer. We can exploit the <code>node:18-alpine</code> image to build an environment where we can install and run our local application mounted as a volume, the <code>postgres</code> image to get a PostgreSQL DB instance up and running on port 5432 and the <code>adminer</code> image to start Adminer on port 8080.</p> <p>Let's see how the compose file will look like:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight yaml"><code><span class="na">services</span><span class="pi">:</span> <span class="na">db</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">postgres</span> <span class="na">restart</span><span class="pi">:</span> <span class="s">always</span> <span class="na">ports</span><span class="pi">:</span> <span class="pi">-</span> <span class="s2">"</span><span class="s">5432:5432"</span> <span class="na">environment</span><span class="pi">:</span> <span class="na">POSTGRES_DB</span><span class="pi">:</span> <span class="s">$PG_DB</span> <span class="na">POSTGRES_USER</span><span class="pi">:</span> <span class="s">$PG_USER</span> <span class="na">POSTGRES_PASSWORD</span><span class="pi">:</span> <span class="s">$PG_PASSWORD</span> <span class="na">volumes</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">pgdata:/var/lib/postgresql/data</span> <span class="na">app</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">node:18-alpine</span> <span class="na">restart</span><span class="pi">:</span> <span class="s">always</span> <span class="na">ports</span><span class="pi">:</span> <span class="pi">-</span> <span class="s2">"</span><span class="s">3000:3000"</span> <span class="na">volumes</span><span class="pi">:</span> <span class="pi">-</span> <span class="s2">"</span><span class="s">appsrc:/app/src"</span> <span class="pi">-</span> <span class="s2">"</span><span class="s">apppublic:/app/public"</span> <span class="pi">-</span> <span class="s2">"</span><span class="s">./package.json:/app/"</span> <span class="pi">-</span> <span class="s2">"</span><span class="s">./.env:/app/"</span> <span class="na">entrypoint</span><span class="pi">:</span> <span class="s2">"</span><span class="s">cd</span><span class="nv"> </span><span class="s">/app</span><span class="nv"> </span><span class="s">&amp;&amp;</span><span class="nv"> </span><span class="s">npm</span><span class="nv"> </span><span class="s">install</span><span class="nv"> </span><span class="s">&amp;&amp;</span><span class="nv"> </span><span class="s">npm</span><span class="nv"> </span><span class="s">start"</span> <span class="na">adminer</span><span class="pi">:</span> <span class="na">image</span><span class="pi">:</span> <span class="s">adminer</span> <span class="na">restart</span><span class="pi">:</span> <span class="s">always</span> <span class="na">ports</span><span class="pi">:</span> <span class="pi">-</span> <span class="s2">"</span><span class="s">8080:8080"</span> <span class="na">volumes</span><span class="pi">:</span> <span class="na">pgdata</span><span class="pi">:</span> <span class="na">appsrc</span><span class="pi">:</span> <span class="s2">"</span><span class="s">./src/"</span> <span class="na">apppublic</span><span class="pi">:</span> <span class="s2">"</span><span class="s">./public/"</span> </code></pre> </div> <blockquote> <p><em>Notice that we use PG_DB, PG_USER and PG_PASSWORD as environmental variable: this means that you should have set them in a <code>.env</code> file</em></p> </blockquote> <p>In this case, we have all our three services available at once: the <code>app</code> (exposed on port 3000), that is injected from the local file system into the container and built on the fly every time the service is started, the <code>db</code> (exposed on port 5432), that is accessible through user, password and database name on <code>adminer</code> (exposed on port 8080).</p> <p>To start everything, we just need to go to the directory in which our <code>compose</code> file is stored and run:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker compose up </code></pre> </div> <p>And, if we want to stop them, we can simply run:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker compose down </code></pre> </div> <p>We will stop here for this article, but in the next one we will dive into the <code>compose</code> files and how to build the best out of them!🥰</p> devops docker beginners tutorial 1MinDocker #8 - Advanced concepts for buildx Clelia (Astra) Bertelli Sun, 01 Dec 2024 19:56:00 +0000 https://dev.to/astrabert/1mindocker-8-advanced-concepts-for-buildx-2olc https://dev.to/astrabert/1mindocker-8-advanced-concepts-for-buildx-2olc <p>In the <a href="proxy.php?url=https://dev.to/astrabert/1mindocker-7-superpower-your-builds-with-buildx-123m">last article</a>, we started using <code>buildx</code> to add more building capacity to our Docker core.</p> <p>In this article, we will dive deep into <code>buildx</code>'s subcommands.</p> <h2> <code>docker buildx bake</code> </h2> <p><code>bake</code> is a high-level command for <code>buildx</code>. <br> It is able to automate the build for multiple images at once, taking as reference a JSON, compose or HCL (HashiCorp configuration Language) file.</p> <p>On a smaller scale, <code>bake</code> does not make any difference from build. If we consider having only one image to build, there is no performance gap, and:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight plaintext"><code>docker build . -t user/name:tag </code></pre> </div> <p>Is the same as building the following HCL file:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight hcl"><code><span class="nx">target</span> <span class="s2">"image"</span> <span class="p">{</span> <span class="nx">dockerfile</span> <span class="p">=</span> <span class="s2">"Dockerfile"</span> <span class="nx">tag</span> <span class="p">=</span> <span class="p">[</span><span class="s2">"user/image:tag"</span><span class="p">]</span> <span class="p">}</span> </code></pre> </div> <p>And then run:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight plaintext"><code>docker buildx bake image </code></pre> </div> <p>Things change when we have multiple images to build together.</p> <h4> The bake file </h4> <p>Let's nevertheless take a step back and ask ourselves: how do we build a bake file? We will explore the HCL format, because it is the easiest and the most intuitive to use.<br> The file structure resembles the one of a JSON, and has the following three main keywords:</p> <ul> <li> <code>target</code>: objects that are specified under this key are images that should be built. Target objects generally contain information on the context on which we are building the Docker image and on the tags to assign.</li> <li> <code>group</code>: a list of targets are put under this keyword, so that everytime we want to build all the images together we can just bake the group name instead of calling the targets one by one</li> <li> <code>variable</code>: works as an ARG or an ENV in a Dockerfile. It sets a variable that can be used downstream in the HCL file</li> </ul> <p>Let's look at an example:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight hcl"><code><span class="nx">group</span> <span class="s2">"all"</span> <span class="p">{</span> <span class="nx">targets</span> <span class="p">=</span> <span class="p">[</span><span class="s2">"app"</span><span class="p">,</span> <span class="s2">"db"</span><span class="p">]</span> <span class="p">}</span> <span class="nx">variable</span> <span class="nx">PYTHON_TAG</span> <span class="p">{</span> <span class="nx">default</span> <span class="p">=</span> <span class="s2">"3.11.9-slim-bookworm"</span> <span class="p">}</span> <span class="nx">target</span> <span class="s2">"backend"</span> <span class="p">{</span> <span class="nx">dockerfile</span> <span class="p">=</span> <span class="s2">"Dockerfile.backend"</span> <span class="nx">tag</span> <span class="p">=</span> <span class="p">[</span><span class="s2">"user/python-backend:prod"</span><span class="p">,</span> <span class="s2">"user/python-backend:latest"</span><span class="p">]</span> <span class="nx">args</span> <span class="p">=</span> <span class="p">{</span> <span class="nx">PYTHON_VERSION</span> <span class="p">=</span> <span class="nx">$</span><span class="p">{</span><span class="nx">PYTHON_TAG</span><span class="p">}</span> <span class="p">}</span> <span class="p">}</span> <span class="nx">target</span> <span class="s2">"db"</span> <span class="p">{</span> <span class="nx">dockerfile</span> <span class="p">=</span> <span class="s2">"Dockerfile.postgres"</span> <span class="nx">tag</span> <span class="p">=</span> <span class="p">[</span><span class="s2">"user/postgres-db:prod"</span><span class="p">,</span> <span class="s2">"user/postgres-db:latest"</span><span class="p">]</span> <span class="nx">no-cache</span> <span class="p">=</span> <span class="kc">true</span> <span class="nx">platforms</span> <span class="p">=</span> <span class="p">[</span><span class="s2">"linux/amd64"</span><span class="p">,</span> <span class="s2">"linux/arm64"</span><span class="p">]</span> <span class="p">}</span> </code></pre> </div> <p>Now, we could run:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker build <span class="nb">.</span> <span class="se">\</span> <span class="nt">-f</span> Dockerfile.backend <span class="se">\</span> <span class="nt">-t</span> user/python-backend:prod <span class="se">\</span> <span class="nt">-t</span> user/python-backend:latest docker build <span class="nb">.</span> <span class="se">\</span> <span class="nt">-f</span> Dockerfile.postgres <span class="se">\</span> <span class="nt">-t</span> user/postgres-db:prod <span class="se">\</span> <span class="nt">-t</span> user/postgres-db:latest </code></pre> </div> <p>Or we could also run:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker buildx bake backend db </code></pre> </div> <p>But the easiest way to do this is to leverage the group of targets that we specified:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker buildx bake all </code></pre> </div> <p>Bake will take care of the two builds at the same time.</p> <h2> <code>docker buildx create</code> </h2> <p>The <code>create</code> subcommand will create a new build environment instance. You can append some context to it as a node:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker buildx node-0 </code></pre> </div> <p>This will produce an environment with a name which will be returned on your terminal (let's say <code>happy_euclid</code>).</p> <p>You can use this name to append a new node to the environment.<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker buildx <span class="nt">--name</span> happy_euclid <span class="nt">--append</span> node-1 </code></pre> </div> <p>You can use the <code>--node</code> flag with the node name to modify or create a node.</p> <p><code>create</code> should be provided with a daemon configuration file through the <code>--buildkitd-config</code> flag (if not, it defaults to the <code>buildkitd.default.toml</code> file contained in the config directory of <code>buildx</code>). You can find an example of a complete configuration file in <a href="proxy.php?url=https://github.com/moby/buildkit/blob/master/docs/buildkitd.toml.md" rel="noopener noreferrer"><code>buildkit</code> official documentation</a> on GitHub.</p> <p>If you nevertheless want to specify some BuildKit configuration flags for your builder instances overwriting the ones of the config file, you can do it by adding the <code>--buildkitd-flags</code> option:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker buildx create node-0 <span class="nt">--buildkitd-config</span> ./buildkitdconfig.local.toml <span class="nt">--buildkitd-flags</span> <span class="s1">'--debug --debugaddr 0.0.0.0:6666'</span> </code></pre> </div> <p>You should also specify the driver (see <a href="proxy.php?url=https://dev.to/astrabert/1mindocker-7-superpower-your-builds-with-buildx-123m">last article</a>) for your builder instances with the <code>--driver</code> option: the default one is <code>docker</code> (your local Docker), but you can also choose <code>docker-container</code> (runs locally but based on a Docker image), <code>kubernetes</code> (a Kubernetes pod) and <code>remote</code> (a remote environment to which you're connected).</p> <p>If you want to specify the platform(s) for which a builder is intended, you can do that passing the <code>--platform</code> option (like <code>--platform linux/amd64</code> or <code>--platform darwin/amd64,linux/arm64</code>).</p> <p>Deleting a node is also very simple: you just add the <code>--leave</code> flag followed by the name of the node you want to eliminate (specifying the name of the builder and the name of the node):<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker buildx create <span class="nt">--name</span> kitty_builds <span class="nt">--node</span> kitty0 <span class="nt">--leave</span> </code></pre> </div> <h2> <code>docker buildx build</code> </h2> <p>The <code>build</code> subcommand, as one might expect, has lots of options. Let's focus on the most important ones:</p> <h4> <code>--build-arg</code> </h4> <p>This option passes arguments for the build as in the following example:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker buildx build <span class="nt">--build-arg</span> <span class="nv">HTTP_PROXY</span><span class="o">=</span>http://10.20.30.2:1234 <span class="nt">--build-arg</span> <span class="nv">FTP_PROXY</span><span class="o">=</span>http://40.50.60.5:4567 <span class="nb">.</span> </code></pre> </div> <p>Arguments here are passed only at build-time (so not exposed while running the image) and can only modify non-persistent arguments in a Dockerfile set with the <code>ARG</code> keyword.</p> <h4> <code>--build-context</code> </h4> <p>This option sets additional building context for our build operation. For example you can specify an additional Docker image or stage that can be accessed through the Dockerfile using the <code>FROM</code> keyword or the <code>--from</code> flag:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker buildx build <span class="nt">--build-context</span> <span class="nv">myimage</span><span class="o">=</span>docker-image://myimage@sha256:0123456789 <span class="nb">.</span> </code></pre> </div> <p>The argument can be also a local or remote directory:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker buildx build <span class="nt">--build-context</span> <span class="nv">project</span><span class="o">=</span>path/to/project/source <span class="nb">.</span> docker buildx build <span class="nt">--build-context</span> <span class="nv">gitproject</span><span class="o">=</span>https://github.com/myuser/project.git <span class="nb">.</span> </code></pre> </div> <p>You can access all this values in the Dockerfile:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight docker"><code><span class="k">FROM</span><span class="s"> myimage</span> <span class="k">COPY</span><span class="s"> --from=project node_modules/* /app/node_modules/</span> <span class="k">COPY</span><span class="s"> --from=gitproject src/* /app/sec/</span> </code></pre> </div> <h4> <code>--cache-from</code> </h4> <p>With <code>--cache-from</code>, you can import a previously existent cache for your build from a local folder, a GitHub repo, a Docker registry cache or a S3 bucket.</p> <p>Here are some examples on how to use this option command:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code><span class="c">## IMAGE REGISTRY - 1</span> docker buildx build <span class="nt">--cache-from</span><span class="o">=</span>user/image:cache <span class="nb">.</span> <span class="c">## IMAGE REGISTRY - 2</span> docker buildx build <span class="nt">--cache-from</span><span class="o">=</span>user/image <span class="nb">.</span> <span class="c">## IMAGE REGISTRY - 3</span> docker buildx build <span class="nt">--cache-from</span><span class="o">=</span><span class="nb">type</span><span class="o">=</span>registry,ref<span class="o">=</span>ghcr.io/user/image <span class="nb">.</span> <span class="c">## LOCAL</span> docker buildx build <span class="nt">--cache-from</span><span class="o">=</span><span class="nb">type</span><span class="o">=</span><span class="nb">local</span>,src<span class="o">=</span>path/to/cache <span class="nb">.</span> <span class="c">## GITHUB REPO</span> docker buildx build <span class="nt">--cache-from</span><span class="o">=</span><span class="nb">type</span><span class="o">=</span>gha <span class="nb">.</span> <span class="c">## S3 BUCKET</span> docker buildx build <span class="nt">--cache-from</span><span class="o">=</span><span class="nb">type</span><span class="o">=</span>s3,region<span class="o">=</span>eu-west-1,bucket<span class="o">=</span>mybucket <span class="nb">.</span> </code></pre> </div> <h4> <code>-f,--file</code> </h4> <p>Specify the Dockerfile for your build.</p> <h4> <code>--load</code> </h4> <p>Load the image resulting from the build into a local Docker image. This flags is the same as setting <code>--output=type=docker</code> </p> <h4> <code>--push</code> </h4> <p>Push directly the image resulting from the build to a registry. This flag is the same as setting: <code>--output=type=registry</code></p> <h4> <code>--platform</code> </h4> <p>Specify the target platform for which you are building the image.</p> <p>The platform specification should follow the <code>os/arch</code> or <code>os/arch/variant</code> syntax and can be also a list of comma-separated platforms, but only if you are not using <code>docker</code> as a driver.</p> <p>You can also configure the platform as <code>local</code>, which makes <code>buildx</code> picking the local platform on which BuildKit is configured for the build.</p> <h4> <code>--secret</code> </h4> <p>You can expose a secret during a build, mounting it inside your Dockerfile and on your command line from a file (<code>type=file</code>) or from an environmental variable (<code>type=env</code>). </p> <p>If you're using a file-based secret, you should specify the file origin:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker buildx build <span class="nt">--secret</span> <span class="nb">type</span><span class="o">=</span>file,id<span class="o">=</span>hf_token,src<span class="o">=</span><span class="nv">$HOME</span>/.gitcredentials/HF_TOKEN <span class="nb">.</span> </code></pre> </div> <p>Abd you can import it inside your Dockerfile like this:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight docker"><code><span class="k">FROM</span><span class="s"> python:3.11.9-slim-bookworm</span> <span class="k">RUN </span>pip <span class="nb">install </span>huggingface-cli <span class="k">RUN </span><span class="nt">--mount</span><span class="o">=</span><span class="nb">type</span><span class="o">=</span>secret,id<span class="o">=</span>hf_token,target<span class="o">=</span>/root/.gitcredentials/HF_TOKEN <span class="se">\ </span> huggingface-cli login <span class="nt">--token</span> <span class="nv">$hf_token</span> </code></pre> </div> <p>Using <code>type=env</code> instead loads the secret from an environmental variable. </p> <p>You can set it like this:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code><span class="nb">export </span><span class="nv">SECRET_TOKEN</span><span class="o">=</span>token docker buildx build <span class="nt">--secret</span> <span class="nb">id</span><span class="o">=</span>SECRET_TOKEN <span class="nb">.</span> </code></pre> </div> <p>As long as the ID matches with the name of the environmental variable, you don't have to specify <code>type=env</code>.</p> <p>You can import it in your Dockerfile with:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight docker"><code><span class="c"># syntax=docker/dockerfile:1</span> <span class="k">FROM</span><span class="s"> node:alpine</span> <span class="k">RUN </span><span class="nt">--mount</span><span class="o">=</span><span class="nb">type</span><span class="o">=</span><span class="nb">bind</span>,target<span class="o">=</span><span class="nb">.</span> <span class="se">\ </span> <span class="nt">--mount</span><span class="o">=</span><span class="nb">type</span><span class="o">=</span>secret,id<span class="o">=</span>SECRET_TOKEN,env<span class="o">=</span>SECRET_TOKEN <span class="se">\ </span> yarn run <span class="nb">test</span> </code></pre> </div> <p>You can also use the <code>src</code>/<code>source</code> flag but you need to specify <code>type=env</code> otherwise <code>buildx</code> will look for a file named as the name you reported for <code>src</code>/<code>source</code>:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code><span class="nb">export </span><span class="nv">API_KEY</span><span class="o">=</span>sk-your-supersecret-key-api docker buildx build <span class="nt">--secret</span> <span class="nb">type</span><span class="o">=</span><span class="nb">env</span>,id<span class="o">=</span>api,src<span class="o">=</span>API_KEY <span class="nb">.</span> </code></pre> </div> <p>This might be useful when you don't want your secret's ID to match with the name of the environmental variable.</p> <h4> <code>-t, --tag</code> </h4> <p>Used to specify the name and tag of an image for the build.</p> <h2> Minor commands </h2> <ul> <li> <code>docker buildx imagetools</code>: it helps with managing registry-based images creating new ones from a list of manifests and/or inspecting already-existent manifests, for instance to check for multi-platform attestation.</li> <li> <code>docker buildx use</code>: Changes the current builder instances to the specified one.</li> <li> <code>docker buildx rm</code>: Removes the specified builder instance(s).</li> <li> <code>docker buildx prune</code>: Removes data from a builder cache, giving you precise control over data elimination.</li> <li> <code>docker buildx stop</code>: Stops the current specified builder instance but allows restarting, and is driver-dependent</li> </ul> <p>We will stop here for this article, but in the next one we will dive into <code>compose</code>, another popular Docker plugin🥰</p> <blockquote> <p><em>The content for this article is mainly based on <a href="proxy.php?url=https://docs.docker.com/reference/cli/docker/buildx/" rel="noopener noreferrer"><code>docker buildx</code> command documentation</a> : make sure to visit them to get to know more!</em></p> </blockquote> docker devops tutorial beginners 1MinDocker #7 - Superpower your builds with buildx Clelia (Astra) Bertelli Fri, 22 Nov 2024 20:15:36 +0000 https://dev.to/astrabert/1mindocker-7-superpower-your-builds-with-buildx-123m https://dev.to/astrabert/1mindocker-7-superpower-your-builds-with-buildx-123m <p>In the <a href="proxy.php?url=https://dev.to/astrabert/1mindocker-6-building-further-39al">last article</a> we talked about the possibility of expanding our build capacity with multi-staged builds and if-else statements: in this article, we'll see how to superpower our builds with <code>buildx</code>, a popular Docker plugin that is intended at replacing the legacy <code>docker build</code> command. </p> <h3> Getting <code>buildx</code> </h3> <p>If you correctly installed Docker Desktop for Windows or macOS (see our <a href="proxy.php?url=https://dev.to/astrabert/1mindocker-2-get-docker-kh">second article</a>), <code>buildx</code> should be already included. </p> <p>If you are on Linux and running <code>docker buildx --version</code> returns an error because the plugin wasn't installed, you should follow the instructions you can find in <a href="proxy.php?url=https://dev.to/astrabert/1mindocker-2-get-docker-kh">1minDocker #2</a> and/or on <a href="proxy.php?url=https://docs.docker.com/engine/install/" rel="noopener noreferrer">Docker official documentation</a></p> <p>Once you have <code>buildx</code>, you can set it as a default builder by running:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker buildx <span class="nb">install</span> </code></pre> </div> <p>This will dismiss the legacy builder (<code>docker build</code>) and default to that of the plugin (<code>docker buildx build</code>).</p> <h3> What <code>buildx</code> can that <code>build</code> can't </h3> <p><code>buildx</code> has multiple features that the legacy builder does not provide</p> <h4> 1. Drivers </h4> <p>You can choose the environment where to run the build: this environment is called <em>driver</em> and by default is set to the same as the normal builder (the <code>docker</code> driver), but it can also exploit <a href="proxy.php?url=https://docs.docker.com/build/builders/drivers/docker-container/" rel="noopener noreferrer"><code>docker-container</code></a> (a containerized environment for the build), <a href="proxy.php?url=https://docs.docker.com/build/builders/drivers/kubernetes/" rel="noopener noreferrer"><code>kubernetes</code></a> (that connects local environments to Kubernetes clusters) or <a href="proxy.php?url=https://docs.docker.com/build/builders/drivers/remote/" rel="noopener noreferrer"><code>remote</code></a> (allows access to an externally managed building environment)</p> <h4> 2. Isolated builder instances </h4> <p>You can create multiple isolated builder instances assigning them to different nodes through <code>buildx create</code> (and there are a handful of commands to manage those instances). There is also the possibility to give your builder instances a default template with the <code>buildx context</code> command.</p> <h4> 3. Multi-platform builds </h4> <p>You can specify the platform for which you're building through the <code>--platform</code> flag (available: <code>linux/amd64</code>, <code>linux/arm64</code>, <code>darwin/amd64</code>). When you're backed by <code>docker-container</code> or <code>kubernetes</code>, you can actually do a multi-platform build at once, using different strategies:</p> <ul> <li>Specifying stages in the Dockerfile that can cross-compile through different platforms</li> <li>Using different builder instances that compile for different architectures</li> <li>Using kernel emulation through QEMU (easiest solution)</li> </ul> <p>For what concerns kernel emulation, if this solution is enabled in your node, it just automatically recognizes secondary available architectures and builds also for them. QEMU can be installed with Docker as simple as this command:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker run <span class="nt">--privileged</span> <span class="nt">--rm</span> tonistiigi/binfmt <span class="nt">--install</span> all </code></pre> </div> <p>And the builder instances will be able to use it. </p> <p>You can also encounter more complicated cases where QEMU is not sufficient. In those cases you can either build on multiple nodes, like in this example:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker buildx create <span class="nt">--use</span> <span class="nt">--name</span> mybuild node-amd64 docker buildx create <span class="nt">--append</span> <span class="nt">--name</span> mybuild node-arm64 docker buildx build <span class="nt">--platform</span> linux/amd64,linux/arm64 <span class="nb">.</span> </code></pre> </div> <p>Or you can use multi-platform builds in your Dockerfile:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight docker"><code><span class="c"># syntax=docker/dockerfile:1</span> <span class="k">FROM</span><span class="w"> </span><span class="s">--platform=$BUILDPLATFORM golang:alpine</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="s">build</span> <span class="k">ARG</span><span class="s"> TARGETPLATFORM</span> <span class="k">ARG</span><span class="s"> BUILDPLATFORM</span> <span class="k">RUN </span><span class="nb">echo</span> <span class="s2">"I am running on </span><span class="nv">$BUILDPLATFORM</span><span class="s2">, building for </span><span class="nv">$TARGETPLATFORM</span><span class="s2">"</span> <span class="o">&gt;</span> /log <span class="k">FROM</span><span class="s"> alpine</span> <span class="k">COPY</span><span class="s"> --from=build /log /log</span> </code></pre> </div> <p>We will stop here for this article, but in the next one we will go through common <code>buildx</code> commands and how they work🥰.</p> <blockquote> <p><em>The content for this article is mainly based on <a href="proxy.php?url=https://github.com/docker/buildx" rel="noopener noreferrer"><code>docker/buildx</code></a> GitHub repo: make sure to visit them and give them a star!⭐</em></p> </blockquote> docker devops beginners tutorial 1MinDocker #6 - Building further Clelia (Astra) Bertelli Tue, 12 Nov 2024 02:35:10 +0000 https://dev.to/astrabert/1mindocker-6-building-further-39al https://dev.to/astrabert/1mindocker-6-building-further-39al <p>In <a href="proxy.php?url=https://dev.to/astrabert/1mindocker-5-build-and-push-a-docker-image-1kpm">the last article</a> we saw how to build an image from scratch and we introduced several keywords to work with Dockerfiles. </p> <p>We will now try to understand how to take our building capacity to the next level, adding more complexity and more layers to our images.</p> <h3> Case study </h3> <p>Imagine that we want to build an image to run our data analysis pipelines written in python and R.</p> <p>To manage python and R dependencies separately we can wrap them inside <a href="proxy.php?url=https://docs.conda.io/projects/conda/en/latest/index.html" rel="noopener noreferrer">conda</a> environments.</p> <p>Conda is a great tool for environment management, but is often outpaced by <a href="proxy.php?url=https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html" rel="noopener noreferrer">mamba</a> in some operations such as environment creation and installation.</p> <p>We will then use conda to organize and run the environments, while mamba will create them and install what's needed.</p> <p>Let's say we need the following packages for python data analysis:</p> <ul> <li><a href="proxy.php?url=https://pandas.pydata.org/" rel="noopener noreferrer">pandas</a></li> <li><a href="proxy.php?url=https://pola.rs/" rel="noopener noreferrer">polars</a></li> <li><a href="proxy.php?url=https://numpy.org/" rel="noopener noreferrer">numpy</a></li> <li><a href="proxy.php?url=https://scikit-learn.org/stable/" rel="noopener noreferrer">scikit-learn</a></li> <li><a href="proxy.php?url=https://scipy.org" rel="noopener noreferrer">scipy</a></li> <li><a href="proxy.php?url=https://matplotlib.org/" rel="noopener noreferrer">matplotlib</a></li> <li><a href="proxy.php?url=https://seaborn.pydata.org/" rel="noopener noreferrer">seaborn</a></li> <li><a href="proxy.php?url=https://plotly.com/python/" rel="noopener noreferrer">plotly</a></li> </ul> <p>And we need the following for our R data analysis:</p> <ul> <li><a href="proxy.php?url=https://dplyr.tidyverse.org/" rel="noopener noreferrer">dplyr</a></li> <li><a href="proxy.php?url=https://ggplot2.tidyverse.org/" rel="noopener noreferrer">ggplot2</a></li> <li><a href="proxy.php?url=https://tidyr.tidyverse.org/" rel="noopener noreferrer">tidyr</a></li> <li><a href="proxy.php?url=https://topepo.github.io/caret/" rel="noopener noreferrer">caret</a></li> <li><a href="proxy.php?url=https://purrr.tidyverse.org/" rel="noopener noreferrer">purrr</a></li> <li><a href="proxy.php?url=https://lubridate.tidyverse.org/" rel="noopener noreferrer">lubridate</a></li> </ul> <p>We store the environment creation and the installation of everything in this file called <code>conda_deps_1.sh</code> (find all the code for this article <a href="proxy.php?url=https://github.com/AstraBert/1minDocker/tree/master/code_snippets/build_an_image_2" rel="noopener noreferrer">here</a>):<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code><span class="nb">eval</span> <span class="s2">"</span><span class="si">$(</span>conda shell.bash hook<span class="si">)</span><span class="s2">"</span> micromamba create <span class="se">\</span> python_deps <span class="se">\</span> <span class="nt">-y</span> <span class="se">\</span> <span class="nt">-c</span> conda-forge <span class="se">\</span> <span class="nt">-c</span> bioconda <span class="se">\</span> <span class="nv">python</span><span class="o">=</span>3.10 conda activate python_deps micromamba <span class="nb">install</span> <span class="se">\</span> <span class="nt">-y</span> <span class="se">\</span> <span class="nt">-c</span> bioconda <span class="se">\</span> <span class="nt">-c</span> conda-forge <span class="se">\</span> <span class="nt">-c</span> anaconda <span class="se">\</span> <span class="nt">-c</span> plotly <span class="se">\</span> pandas polars numpy scikit-learn scipy matplotlib seaborn plotly conda deactivate micromamba create <span class="se">\</span> R <span class="se">\</span> <span class="nt">-y</span> <span class="se">\</span> <span class="nt">-c</span> conda-forge <span class="se">\</span> r-base conda activate R micromamba <span class="nb">install</span> <span class="se">\</span> <span class="nt">-y</span> <span class="se">\</span> <span class="nt">-c</span> conda-forge <span class="se">\</span> <span class="nt">-c</span> r <span class="se">\</span> r-dplyr r-lubridate r-tidyr r-purrr r-ggplot2 r-caret conda deactivate </code></pre> </div> <p>From these premises, we will build our data science Docker image. </p> <h3> Building on top of the building </h3> <p>We are very lucky with mamba and conda, because they both provide a docker image for their smaller and lightweight versions, <a href="proxy.php?url=https://hub.docker.com/r/mambaorg/micromamba" rel="noopener noreferrer">micromamba</a> and <a href="proxy.php?url=https://hub.docker.com/r/conda/miniconda3/" rel="noopener noreferrer">miniconda</a> . </p> <p>We want then to combine micromamba with miniconda, but how? We can exploit a feature in Docker builds, which is basically the same as "building on top of a building": we start with an image as base, we copy the most important things from there to our actual image and then we continue building on top of it. </p> <p>The syntax may be as follows:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight docker"><code><span class="k">FROM</span><span class="w"> </span><span class="s">author/image1:tag</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="s">base</span> <span class="k">FROM</span><span class="s"> author/image2:tag</span> <span class="k">COPY</span><span class="s"> --from=base /usr/local/bin/* /usr/local/bin/</span> </code></pre> </div> <p>Which means that, from <code>image1</code> as <code>base</code> we take only the files stored under <code>/usr/local/bin</code> and place them in <code>image2</code>. </p> <p>In our case, it would be:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight docker"><code><span class="k">ARG</span><span class="s"> CONDA_VER=latest</span> <span class="k">ARG</span><span class="s"> MAMBA_VER=latest</span> <span class="k">FROM</span><span class="w"> </span><span class="s">mambaorg/micromamba:${MAMBA_VER}</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="s">mambabase</span> <span class="k">FROM</span><span class="s"> conda/miniconda3:${CONDA_VER} </span> <span class="k">COPY</span><span class="s"> --from=mambabase /usr/bin/micromamba /usr/bin/</span> </code></pre> </div> <p>We copied <code>micromamba</code> from its original location into our image.</p> <h3> Install environments </h3> <p>We can now take the <code>conda_deps_1.sh</code>, copy and execute it into our build:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight docker"><code><span class="k">WORKDIR</span><span class="s"> /data_science/</span> <span class="k">RUN </span><span class="nb">mkdir</span> <span class="nt">-p</span> /data_science/installations/ <span class="k">COPY</span><span class="s"> ./conda_deps_1.sh /data_science/installations/</span> <span class="k">RUN </span>bash /data_science/installations/conda_deps_1.sh </code></pre> </div> <p>But let's say we also want to provide our image with an environment for AI development, that we only want to add to our build if the user specifies it at build time.</p> <p>In this case, we can use <code>if...else</code> conditional statements in our Dockerfile!</p> <p>We will create another file, <code>conda_deps_2.sh</code> with a python environment for AI development in which we will put some base packages such as:</p> <ul> <li><a href="proxy.php?url=https://huggingface.co/docs/transformers/en/index" rel="noopener noreferrer">transformers</a></li> <li><a href="proxy.php?url=https://pytorch.org/" rel="noopener noreferrer">pytorch</a></li> <li><a href="proxy.php?url=https://www.tensorflow.org/learn" rel="noopener noreferrer">tensorflow</a></li> <li> <a href="proxy.php?url=https://www.langchain.com/" rel="noopener noreferrer">langchain</a>, langchain-community, langchain-core</li> <li> <a href="proxy.php?url=https://gradio.app" rel="noopener noreferrer">gradio</a> </li> </ul> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code><span class="nb">eval</span> <span class="s2">"</span><span class="si">$(</span>conda shell.bash hook<span class="si">)</span><span class="s2">"</span> micromamba create <span class="se">\</span> python_ai <span class="se">\</span> <span class="nt">-y</span> <span class="se">\</span> <span class="nt">-c</span> conda-forge <span class="se">\</span> <span class="nt">-c</span> bioconda <span class="se">\</span> <span class="nv">python</span><span class="o">=</span>3.11 conda activate python_ai micromamba <span class="nb">install</span> <span class="se">\</span> <span class="nt">-y</span> <span class="se">\</span> <span class="nt">-c</span> conda-forge <span class="se">\</span> <span class="nt">-c</span> pytorch <span class="se">\</span> transformers pytorch tensorflow langchain langchain-core langchain-community gradio conda deactivate </code></pre> </div> <p>Now we just add a condition to our Dockerfile:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight docker"><code><span class="k">ARG</span><span class="s"> BUILD_AI="False"</span> <span class="k">RUN if</span> <span class="o">[</span> <span class="s2">"</span><span class="nv">$BUILD_AI</span><span class="s2">"</span> <span class="o">=</span> <span class="s2">"True"</span> <span class="o">]</span><span class="p">;</span> bash /data_science/installations/conda_deps_2.sh<span class="p">;</span> <span class="se">\ </span> <span class="k">elif</span> <span class="o">[</span> <span class="s2">"</span><span class="nv">$BUILD_AI</span><span class="s2">"</span> <span class="o">=</span> <span class="s2">"False"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then </span><span class="nb">echo</span> <span class="s2">"No AI environment will be built"</span><span class="p">;</span> <span class="se">\ </span> <span class="k">else </span><span class="nb">echo</span> <span class="s2">"BUILD_AI should be either True or False: you passed an invalid value, thus no AI environment will be built"</span><span class="p">;</span> <span class="k">fi</span> </code></pre> </div> <h3> Building and its options </h3> <p>Now let's take a look at the complete Dockerfile:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight docker"><code><span class="k">ARG</span><span class="s"> CONDA_VER=latest</span> <span class="k">ARG</span><span class="s"> MAMBA_VER=latest</span> <span class="k">FROM</span><span class="w"> </span><span class="s">mambaorg/micromamba:${MAMBA_VER}</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="s">mambabase</span> <span class="k">FROM</span><span class="s"> conda/miniconda3:${CONDA_VER} </span> <span class="k">COPY</span><span class="s"> --from=mambabase /usr/bin/micromamba /usr/bin/</span> <span class="k">WORKDIR</span><span class="s"> /data_science/</span> <span class="k">RUN </span><span class="nb">mkdir</span> <span class="nt">-p</span> /data_science/installations/ <span class="k">COPY</span><span class="s"> ./conda_deps_?.sh /data_science/installations/</span> <span class="k">RUN </span>bash /data_science/installations/conda_deps_1.sh <span class="k">ARG</span><span class="s"> BUILD_AI="False"</span> <span class="k">RUN if</span> <span class="o">[</span> <span class="s2">"</span><span class="nv">$BUILD_AI</span><span class="s2">"</span> <span class="o">=</span> <span class="s2">"True"</span> <span class="o">]</span><span class="p">;</span> bash /data_science/installations/conda_deps_2.sh<span class="p">;</span> <span class="se">\ </span> <span class="k">elif</span> <span class="o">[</span> <span class="s2">"</span><span class="nv">$BUILD_AI</span><span class="s2">"</span> <span class="o">=</span> <span class="s2">"False"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then </span><span class="nb">echo</span> <span class="s2">"No AI environment will be built"</span><span class="p">;</span> <span class="se">\ </span> <span class="k">else </span><span class="nb">echo</span> <span class="s2">"BUILD_AI should be either True or False: you passed an invalid value, thus no AI environment will be built"</span><span class="p">;</span> <span class="k">fi</span> <span class="k">CMD</span><span class="s"> ["/bin/bash"]</span> </code></pre> </div> <p>We can build our image tweaking and twisting the <code>build-args</code> as we please:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code><span class="c"># BUILD THE IMAGE AS-IS</span> docker build <span class="nb">.</span> <span class="se">\</span> <span class="nt">-t</span> YOUR-USERNAME/data-science:latest-noai <span class="c"># BUILD THE IMAGE WITH AI ENV</span> docker build <span class="nb">.</span> <span class="se">\</span> <span class="nt">--build-args</span> <span class="nv">BUILD_AI</span><span class="o">=</span><span class="s2">"True"</span> <span class="se">\</span> <span class="nt">-t</span> YOUR-USERNAME/data-science:latest-ai <span class="c"># BUILD THE IMAGE WITH A DIFFERENT VERSION OF MICROMAMBA</span> docker build <span class="nb">.</span> <span class="se">\</span> <span class="nt">--build-args</span> <span class="nv">MAMBA_VER</span><span class="o">=</span><span class="s2">"cuda12.1.1-ubuntu22.04"</span> <span class="se">\</span> <span class="nt">-t</span> YOUR-USERNAME/data-science:mamba-versioned </code></pre> </div> <p>Then you can proceed and push the image to Docker Hub or to another registry as we saw in the last article.</p> <p>You can now run your image interactively, loading also your pipelines as a volume, and activate all the environments as you please:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker run <span class="se">\</span> <span class="nt">-i</span> <span class="se">\</span> <span class="nt">-t</span> <span class="se">\</span> <span class="nt">-v</span> /home/user/datascience/pipelines/:/app/pipelines/ <span class="se">\</span> YOUR-USERNAME/data-science:latest-noai <span class="se">\</span> <span class="s2">"/bin/bash"</span> <span class="c"># execute the following commands inside the container</span> <span class="nb">source </span>activate python_deps conda deactivate <span class="nb">source </span>activate R conda deactivate </code></pre> </div> <p>We will stop here for this article, but in the next one we will dive into how to use the <code>buildx</code> plugin!🥰 </p> docker devops beginners tutorial 1minDocker #5 - Build and push a Docker image Clelia (Astra) Bertelli Mon, 04 Nov 2024 21:22:11 +0000 https://dev.to/astrabert/1mindocker-5-build-and-push-a-docker-image-1kpm https://dev.to/astrabert/1mindocker-5-build-and-push-a-docker-image-1kpm <p>In the <a href="proxy.php?url=https://dev.to/astrabert/1mindocker-4-docker-cli-essentials-33pl">last article</a> we saw how we can pull an image, run it inside a container, list images and containers and remove them: now it's time to build, so we'll create our first simple Docker image.</p> <h3> The Dockerfile </h3> <p>As we already said in our <a href="proxy.php?url=https://dev.to/astrabert/1mindocker-3-fundamental-concepts-55ph">conceptual introduction to Docker</a>, a Dockerfile is a sort of recipe: it contains all the instructions to collect the ingredients (the <em>image</em>) that will make the cake (the <em>container</em>). </p> <p>But what exactly can a Dockerfile contain? We will see, in our example (that you can find <a href="proxy.php?url=https://github.com/AstraBert/1minDocker/tree/master/code_snippets/build_an_image_1" rel="noopener noreferrer">here</a>), the following base key words:</p> <ul> <li> <code>FROM</code>: this key word is fundamental. It specifies the base image from which we mount our environment</li> <li> <code>RUN</code>: with this key you can specify a command (like <code>RUN python3 -m pip install --no-cache-dir requirements.txt</code>) that will be executed during <em>build time</em> (only once) and will be stored in an image layer</li> <li> <code>WORKDIR</code>: you can specify the working directory that will be the base for your Docker image (for example <code>WORKDIR /app/</code>)</li> <li> <code>COPY</code> or <code>ADD</code>: These two key words are very similar. <code>COPY</code> allows you to copy specific local folders into a folder inside the image (like <code>COPY src/ /app/</code>) whereas <code>ADD</code> adds the whole local specified path to the destination directory inside the Docker image (<code>ADD . /app/</code>)</li> <li> <code>EXPOSE</code>: it specifies the port that is exposed inside the container to the outside (<code>EXPOSE 3000</code>)</li> <li> <code>ENTRYPOINT</code>: this key word specifies the default executable that should be run when the image is launched in a container (<code>ENTRYPOINT ["npm", "start"]</code>). It must be specified at the end of your Docker file and only once (otherwise the last <code>ENTRYPOINT</code> instance will override the other ones). Although the <code>ENTRYPOINT</code> executable cannot be overridden by other commands provided through CLI when we run the container, it's arguments can be changed from CLI upon container start.</li> <li> <code>CMD</code>: Similar to <code>ENTRYPOINT</code>, this key word specifies a command that runs every time the image is started inside a container. Differently from <code>ENTRYPOINT</code>, though, it can be completely overridden and generally is used as a set of extra arguments for <code>ENTRYPOINT</code>, like here: </li> </ul> <div class="highlight js-code-highlight"> <pre class="highlight docker"><code><span class="k">ENTRYPOINT</span><span class="s"> [ "streamlit", "run" ]</span> <span class="k">CMD</span><span class="s"> [ "scripts/app.py" ]</span> </code></pre> </div> <p>In this case, every time we start the container we will run a Streamlit app, but we can choose the path of the app by providing it to the container from the <code>docker run</code> command line.</p> <ul> <li> <code>ARG</code>: this key word is used to set build arguments, which are local variable that can be overridden by other specified at build-time with the <code>docker build</code> CLI. They're especially useful if you use a value more than once in your Dockerfile and don't want to repeat it: </li> </ul> <div class="highlight js-code-highlight"> <pre class="highlight docker"><code><span class="k">ARG</span><span class="s"> NODE_VERSION="20"</span> <span class="k">ARG</span><span class="s"> ALPINE_VERSION="3.20"</span> <span class="k">FROM</span><span class="s"> node:${NODE_VERSION}-alpine${ALPINE_VERSION}</span> </code></pre> </div> <p>This can be easily overridden by:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight plaintext"><code>docker build . --build-args NODE_VERSION="18" </code></pre> </div> <ul> <li> <code>ENV</code>: this key word, as the name suggest, sets an <em>environment</em> variable. Environment variables are fixed and cannot be changed at build-time, and they can be useful when we want a variable to be accessible to all image build stages.</li> </ul> <h3> Let's build a Dockerfile </h3> <p>To build a Dockerfile, we need to know what application we are going to ship through the image we're about to set up.</p> <p>In this tutorial, we will build a very simple python application with <a href="proxy.php?url=https://gradio.app" rel="noopener noreferrer">Gradio</a>, a popular framework to build elegant and beautiful frontend for AI/ML python apps.</p> <p>Our folder will look like this:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight plaintext"><code>build_an_image_1/ |__ app.py |__ Dockerfile </code></pre> </div> <p>To fill up <code>app.py</code>, we will use a template that <a href="proxy.php?url=https://huggingface.com" rel="noopener noreferrer">Hugging Face</a> itself provides for Gradio ChatBot Spaces:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight python"><code><span class="kn">import</span> <span class="n">gradio</span> <span class="k">as</span> <span class="n">gr</span> <span class="k">def</span> <span class="nf">respond</span><span class="p">(</span> <span class="n">message</span><span class="p">,</span> <span class="n">history</span><span class="p">):</span> <span class="n">message_back</span> <span class="o">=</span> <span class="sa">f</span><span class="sh">"</span><span class="s">Your message is: </span><span class="si">{</span><span class="n">message</span><span class="si">}</span><span class="sh">"</span> <span class="n">response</span> <span class="o">=</span> <span class="sh">""</span> <span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="n">message_back</span><span class="p">:</span> <span class="n">response</span> <span class="o">+=</span> <span class="n">m</span> <span class="k">yield</span> <span class="n">response</span> <span class="n">demo</span> <span class="o">=</span> <span class="n">gr</span><span class="p">.</span><span class="nc">ChatInterface</span><span class="p">(</span> <span class="n">respond</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="sh">"</span><span class="s">Echo Bot</span><span class="sh">"</span><span class="p">,</span> <span class="p">)</span> <span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="sh">"</span><span class="s">__main__</span><span class="sh">"</span><span class="p">:</span> <span class="n">demo</span><span class="p">.</span><span class="nf">launch</span><span class="p">(</span><span class="n">server_name</span><span class="o">=</span><span class="sh">"</span><span class="s">0.0.0.0</span><span class="sh">"</span><span class="p">,</span> <span class="n">server_port</span><span class="o">=</span><span class="mi">7860</span><span class="p">)</span> </code></pre> </div> <p>This is a simple bot that echoes every message we send. <br> We will just copy this code into our main script, <code>app.py</code>.</p> <p>Now we're ready to build our Docker image, starting with modifying our Dockerfile.</p> <h4> 1. The base image </h4> <p>For our environment we need Python 3, so we will need to find a suitable base image for that.</p> <p>Luckily, Python itself provides us with Alpine-based (a Linux distro) python images, so we will just use <code>python:3.11.9</code>.</p> <p>We just then need to specify:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight docker"><code><span class="k">ARG</span><span class="s"> PYTHON_VERSION="3.11.9"</span> <span class="k">FROM</span><span class="s"> python:${PYTHON_VERSION}</span> </code></pre> </div> <p>At the very beginning of our Dockerfile.</p> <p>As we said, if we want a different python version, we just need to run:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker build <span class="nb">.</span> <span class="nt">--build-args</span> <span class="nv">PYTHON_VERSION</span><span class="o">=</span><span class="s2">"3.10.14"</span> </code></pre> </div> <h4> 2. Get the needed dependencies </h4> <p>Our app depends exclusively on <code>gradio</code>, so we can do a quick <code>pip install</code> for that!</p> <p>We also set the version (5.4.0) as an ARG and ENV:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight docker"><code><span class="k">ARG</span><span class="s"> GRADIO_V="5.4.0"</span> <span class="k">ENV</span><span class="s"> GRADIO_VERSION=${GRADIO_V}</span> <span class="k">RUN </span>python3 <span class="nt">-m</span> pip cache purge <span class="k">RUN </span>python3 <span class="nt">-m</span> pip <span class="nb">install </span><span class="nv">gradio</span><span class="o">==</span><span class="k">${</span><span class="nv">GRADIO_VERSION</span><span class="k">}</span> </code></pre> </div> <p>You cannot change <code>GRADIO_VERSION</code> directly, but you can pass <code>GRADIO_V</code> as a build argument and modify also the <code>ENV</code> value!<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker build <span class="nb">.</span> <span class="nt">--build-args</span> <span class="nv">GRADIO_V</span><span class="o">=</span><span class="s2">"5.1.0"</span> </code></pre> </div> <h4> 3. Start the application </h4> <p>We need to start the application, something that we would normally do as <code>python3 app.py</code>.</p> <p>But our <code>app.py</code> file is locally stored, not available to the Docker image, so we need to copy it into our Docker working directory:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight docker"><code><span class="k">WORKDIR</span><span class="s"> /app/</span> <span class="k">COPY</span><span class="s"> ./app.py /app/</span> </code></pre> </div> <p>Since our application runs on <a href="proxy.php?url=http://0.0.0.0:7860" rel="noopener noreferrer">http://0.0.0.0:7860</a>, we need to expose port 7860:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight docker"><code><span class="k">EXPOSE</span><span class="s"> 7860</span> </code></pre> </div> <p>Now we can make our application run:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight docker"><code><span class="k">ENTRYPOINT</span><span class="s"> ["python3"]</span> <span class="k">CMD</span><span class="s"> ["/app/app.py"]</span> </code></pre> </div> <p>We will not be able to change the base executable (<code>python3</code>) but we will be able to override the <code>CMD</code> instance specifying another path at runtime (for example if we mount a volume while running the container).</p> <h4> 4. Full Dockerfile </h4> <p>Our full Dockerfile will look like this:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight docker"><code><span class="k">ARG</span><span class="s"> PYTHON_VERSION="3.11.9"</span> <span class="k">FROM</span><span class="s"> python:${PYTHON_VERSION}</span> <span class="k">WORKDIR</span><span class="s"> /app/</span> <span class="k">COPY</span><span class="s"> ./app.py /app/</span> <span class="k">ARG</span><span class="s"> GRADIO_V="5.4.0"</span> <span class="k">ENV</span><span class="s"> GRADIO_VERSION=${GRADIO_V}</span> <span class="k">RUN </span>python3 <span class="nt">-m</span> pip cache purge <span class="k">RUN </span>python3 <span class="nt">-m</span> pip <span class="nb">install </span><span class="nv">gradio</span><span class="o">==</span><span class="k">${</span><span class="nv">GRADIO_VERSION</span><span class="k">}</span> <span class="k">EXPOSE</span><span class="s"> 7860</span> <span class="k">ENTRYPOINT</span><span class="s"> ["python3"]</span> <span class="k">CMD</span><span class="s"> ["/app/app.py"]</span> </code></pre> </div> <p>Now we just need to build the image!</p> <h3> Build and push the image </h3> <p>When we build the image, we need to specify the <em>context</em>, meaning the directory in which our Dockerfile is. For starters, we will also use the <code>-t</code> flag, which specifies the <em>name</em> and <em>tag</em> of our image:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker build <span class="nb">.</span> <span class="nt">-t</span> YOUR-USERNAME/gradio-echo-bot:0.0.0 <span class="nt">-t</span> YOUR-USERNAME/gradio-echo-bot:latest </code></pre> </div> <p>As you can see, you can specify multiple tags.</p> <p>This build, once launched, will take some minutes to complete, and then you will have your images locally!</p> <p>If you want to make this images available to everyone, you need to login to your Docker account:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker login <span class="nt">-u</span> YOUR-USERNAME <span class="nt">--password-stdin</span> </code></pre> </div> <p>You will be prompted to input the password from your console. </p> <p>You won't put your Docker password, but an <a href="proxy.php?url=https://docs.docker.com/security/for-developers/access-tokens/#create-an-access-token" rel="noopener noreferrer">access token</a> (follow the link for a guide on how to obtain it). </p> <p>Now let's push our image to the Docker Hub registry:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>docker push YOUR-USERNAME/gradio-echo-bot:0.0.0 docker push YOUR-USERNAME/gradio-echo-bot:latest </code></pre> </div> <p>The push generally takes some time, but after that our image will be live on Docker Hub: we published our first Docker image!🎉</p> <p>We will stop here for this article, but in the next one we will dive into more advanced build concepts🥰 </p> docker devops beginners tutorial