[{"categories":["posts"],"content":" This blog was originally published on Spectro Cloud’s offical blog site. Click here for a direct link to the original blog. Like many organizations, we here at Spectro Cloud are exploring how Artificial Intelligence (AI) can help our teams be more productive and improve the experience we give our customers. One clear barrier emerges time and again: agentic development can be challenging to debug and analyze due to the ‘black box’ nature of this technology. In fact, as an industry we struggle to audit the decision-making process of agentic workflows. But that doesn’t mean we shrug and give up. We hold ourselves to high expectations and standards here at Spectro Cloud! For us to release an AI capability in our product, we must first have a deep understanding of how the AI is behaving and attempt to answer critical questions such as “Why did it choose that tool? Why did it take that action? What information did it have at that point in time?” and so on. Only through the understanding of behavioral questions can we produce agentic solutions that provide value — and reliability. To help fellow builders and those pursuing agentic workflows, this blog shares how we’re working to improve our understanding of agentic workflows through observability (O11y), using an example AI application to debug an incorrect output. Open-source FTW Commercial platforms exist that provide you with built-in capabilities to ease the development of agentic workflows, which may include observability out of the box. However, not all agentic development platforms offer the same amount of information, and you sometimes have to stitch things together yourself to understand what’s really happening. One open-source solution we stumbled upon early in our journey to building an understanding of our agentic workflows was Arize Phoenix. Phoenix is an observability platform focusing on helping builders answer questions related to their AI applications. Getting started is easy: with a single Docker command, you can start and stand up the observability platform. docker run --rm -p 6006:6006 --name phoenix arizephoenix/phoenix:latest Once the container is ready, you can access the Phoenix dashboard and view traces. Below is an image of a freshly initialized Phoenix dashboard hosted on localhost port 6006. Enabling tracing Depending on your agentic framework (assuming you are using a framework), Phoenix supports many integrations that let you get started in seconds with a few one-liners. Check out its integrations page to view all supported integrations. If you are not using a framework, you can use the SDK to focus on tracing calls to the LLM. You can also add custom functions to define the start and stop of tracing spans. We prefer the latter, as we can inject richer context into the span data by managing the span lifecycle ourselves, but that’s a bit more advanced. Using the Smolagents framework as an example, the steps to get started are: Install the required Python dependencies. pip install arize-phoenix-otel \u0026\u0026 \\ pip install openinference-instrumentation-smolagents smolagents Point to where Phoenix is listening, such as localhost:6006. The code snippet below would go into your LLM application. import os os.environ[\"PHOENIX_COLLECTOR_ENDPOINT\"] = \"http://localhost:6006\" Register the instrumentation in your Smolagent LLM application. from phoenix.otel import register tracer_provider = register( project_name=\"my-llm-app\", # Default is 'default' auto_instrument=True ) That wraps up the steps to get started with Smolagents. There are many more customization options available and advanced features you can enable. The main point to take away from the example is that enabling tracing can be done very quickly, especially if the integration use case you are using has first-class support. Making sense of a trace Curious readers may wonder at this point what a trace looks like. What information can we glean from a trace? To answer this question, we’ll use an example applicati","date":"2025-09-04","objectID":"/posts/using-observability-to-trace-agentic-ai-workflow-decisions/:0:0","tags":["observability","agentic-ai","ai"],"title":"Using observability to trace agentic AI workflow decisions","uri":"/posts/using-observability-to-trace-agentic-ai-workflow-decisions/"},{"categories":["Posts"],"content":"Learn how to build custom applications for Tidbyt, a smart display that shows you the information you care about.","date":"2024-10-22","objectID":"/posts/tidbyt/","tags":["tidbyt","projects","starlark"],"title":"Tidbyt Application Development Fundamentals","uri":"/posts/tidbyt/"},{"categories":["Posts"],"content":"I recently purchased a Tidbyt second-generation smart display. I was immediately intrigued by the idea of creating custom applications that could present fun messages or images to those who either have to walk by my office or those I interact with in a Zoom call. The first idea that came to mind was to create a clock that rotated through different Spectro Cloud graphics. Why Spectro Cloud? That’s where I work, so having the logo displayed on my Tidbyt would be a fun way to suprise my coworkers during Zoom calls. Below is a preview of the final product. The clock rotates through different Spectro Cloud graphics every 60 seconds and supports a 24-hour format. I want to share the lessons I learned along the way. This post will help you create custom Tidbyt applications and share them with the world. There are so many possibilities with Tidbyt, and the official application store already has a wide variety of applications to choose from. Think of this article as a supplemental guide to the official documentation. Don’t have a Tidbyt? No problem You don’t need a physical Tidbyt to start building applications. You can use the Pixelet CLI to start a local development server and preview your applications in a web browser. Where to Start? The first step is to install Pixelet, the Tidbyt development tool. The Installing Pixlet guide provides detailed instructions on how to install Pixelet. Once you have Pixelet installed, create a new project by issuing the following command: mkdir my-starter-app \u0026\u0026 cd my-starter-app \u0026\u0026 pixlet create my-starter-app You will be prompted for an application name, description, and summary. After you provide the information, Pixelet will create a new directory with the project files. Name (what do you want to call your app?): Sup World Summary (what's the short and sweet of what this app does?): Demo app Description (what's the long form of what this app does?): Demo app. Author (your name or your Github handle): Karl Cardenas App created at: /Users/karlcardenas/projects/tidbyt/my-starter-app/sup_world.star To start the app, run: pixlet serve sup_world.star For docs, head to: https://tidbyt.dev Pixelet will generate two new files, sup_world.star and manifest.yaml. The sup_world.star file is where you will write your application code, and the manifest.yaml file is where you will define the application metadata. Below is the content of the sup_world.star and manifest.yaml files. --- id: sup-world name: Sup World summary: Demo app desc: Demo App. author: Karl Cardenas \"\"\" Applet: Sup World Summary: Demo app Description: Demo App. Author: Karl Cardenas \"\"\" load(\"render.star\", \"render\") load(\"schema.star\", \"schema\") DEFAULT_WHO = \"world\" def main(config): who = config.str(\"who\", DEFAULT_WHO) message = \"Hello, {}!\".format(who) return render.Root( child = render.Text(message), ) def get_schema(): return schema.Schema( version = \"1\", fields = [ schema.Text( id = \"who\", name = \"Who?\", desc = \"Who to say hello to.\", icon = \"user\", ), ], ) The next step is to start the application by running the following command: pixlet serve sup_world.star Pixelet will start a local development server on http://localhost:8080. You can preview the application in a web browser by navigating to http://localhost:8080. Press Ctrl + C to stop the development server. If you are on a Mac, you can press Cmd + C. You will start and stop the development server multiple times throughout the development process. Core Concepts Now that you have an application available and understand how to start the development server let’s explore the core concepts of Tidbyt applications. Programming Language Tidbyt applications are written in Starlark, which was designed to manage and maintain configuration for the Bazel build system. The syntax may look familiar if you have experience with Python. Starlark Implementation Tidbyt used the Go implementation of Starlark. The best resource I found for understanding Starlark’s capabilities is the Starlark Language Sp","date":"2024-10-22","objectID":"/posts/tidbyt/:0:0","tags":["tidbyt","projects","starlark"],"title":"Tidbyt Application Development Fundamentals","uri":"/posts/tidbyt/"},{"categories":["projects"],"content":" js-to-HTMX This is a project to showcase the power of HTMX. I created this small sample application for an internal engineering presenation at Spectro Cloud The application is available in two versions: a React native version and another version using HTMX and Go. The application queries a public API endpoint for the latest prices on Bitcoin, Ethereum, and the stable coin USDC, ensuring you’re always up to date. It then presents the prices and updates them every 10 seconds. Additionally, a news section is displayed, which queries a public news API for the latest news. Architecture The architecture of the application is different for each application version. The React version uses a client-side rendering, Single Page Application (SPA) approach. In contrast, the HTMX version uses server-side rendering. HTMX favors server-side rendering primarily because it leverages existing web functionality, browser features, and hypermedia controls, so it doesn’t need custom JavaScript to achieve desired UX behaviors. However, using a server-side rendering architecture is not a requirement. The application can use HTMX to render the page if the API endpoint returns HTML. Get Started 🚀 Head over to the Start Applications section in the repository for instructions on how to start the applications with Docker. Repository GitHub ","date":"2024-08-15","objectID":"/projects/js-to-htmx/:0:0","tags":["projects","JavaScript","Go","HTMX","web development"],"title":"js-to-HTMX","uri":"/projects/js-to-htmx/"},{"categories":["projects"],"content":" MyWhoop MyWhoop is a tool intended to help you take ownership of your Whoop data. You can use MyWhoop to download all your Whoop data. You can also use MyWhoop as a server to automatically download your new Whoop data daily. MyWhoop is designed to be deployed on your own machine or server. MyWhoop supports the following features: 🔐 Login: A simple interface to log into the Whoop developer portal and save an authentication token locally. The token is required to interact with the Whoop API. 🗄️ Server: Automatically download your Whoop data daily and save it to a local file or export it to a remote location. 📬 Notifications: Receive notifications when new data is available or when an error occurs. 💾 Data Export: Export your Whoop data to a remote location like S3 bucket. 🗂️ Extensions: Data exporters and notification services can be extended to support additional use cases. Check out the Extensions section to learn more. 📦 No Dependencies: MyWhoop is available as a stand-alone binary or Docker image. No additional software is required to get started. Get Started 🚀 Please check out the Getting Started guide to get started with MyWhoop. Repository GitHub ","date":"2024-07-28","objectID":"/projects/mywhoop/:0:0","tags":["projects","Whoop","Go","CLI","HTMX"],"title":"MyWhoop","uri":"/projects/mywhoop/"},{"categories":["posts"],"content":"This blog was originally published on Spectro Cloud’s offical blog site. Click here for a direct link to the original blog. Rapid innovation — great for the product, a challenge for docs! Palette is a powerful, versatile product and it changes fast. We issue major releases multiple times per year and there are always minor changes to things like our supported packs and environments. This is a good thing: it means we’re keeping pace with the needs of customers like you, and with the innovation of the cloud-native ecosystem. But it creates a challenge for our documentation , too. Users depend on documentation to navigate the many features of Palette, and to get up to speed with what’s changed in each new version. At the time of this writing, our docs code base has over 435 markdown pages containing product information, and it continues to grow. We love a challenge, and our docs team is always looking at ways to improve not only the depth and quality of the content on our docs site, but also how easy it is to consume. For example, you may have already read about our Mendable chatbot integration in a previous blog. Setting out on the content versioning mission In late July 2023, we started the groundwork for a much more fundamental innovation: enabling content versioning on our documentation site. Before that, we had to manage all of our documentation data together in a single git branch, clarifying what content applied to what product version in various parts of the documentation. This was difficult for us to manage, but more importantly it made it challenging for our customers to identify what information pertained to specific product versions. This matters, because we give Palette customers using dedicated or self-hosted instances the freedom to stick with older versions of the product until they’re ready to upgrade. Our goal was to implement a way for users to select the Palette version they’re using, and have the docs site show only the information relevant to that version. Versioning is an ‘advanced level’ practice — but for us we knew it was the right approach. In this blog, we’re going to show you how we implemented content versioning with Docusaurus and using git as the source of truth for all versioned content. We’ll also share our solution and explain it in detail. If you’re a Palette customer, we hope you’ll find this an interesting behind-the-scenes tour of a feature that’s valuable to you. And if you or your team are looking at content versioning in your own documentation, we hope this article helps you and your team consider using git as the source of truth for all versioned content versus the default Docusaurus behavior of duplicating folders. To enhance the value of this article to the open source community, and facilitate your understanding of the concepts, we have included a public GitHub repository with all the content code. This repository serves as a practical tool for you to explore and try out our solution. Simply head over to GitHub, click on the green ‘Use this template’ button, and embark on the technical adventure we are about to share. PS: Review the README in the repository for additional information, such as the FAQ. The foundation: docs as code At Spectro Cloud, we follow a Docs as Code philosophy. This means we use the same tooling to create and maintain our documentation as you would use to create, test, deploy, and maintain an application. By following engineering best practices, we are able to deliver complex documentation reliably. We already used the best open-source documentation framework, Docusaurus , with our codebase in a git repository. Our vision for enabling content versioning was to use different git branches for each minor product release. Like any engineering effort, we started our versioning project by defining our requirements, both for our users and for our technical authors: User requirements It’s easy to find documentation that applies to a specific version of Palette. I can ea","date":"2024-04-10","objectID":"/posts/when-docs-and-a-dinosaur-git-along/:0:0","tags":["devops","docs-as-code","docusaurus","automation"],"title":"When docs and a dinosaur Git along: enabling versioning in Docusaurus","uri":"/posts/when-docs-and-a-dinosaur-git-along/"},{"categories":["posts"],"content":"Innovating the docs experience Like most in the cloud-native community, we’re passionate about great documentation. We’ve felt the joy when discovering a clear, thoughtful guide that helps us get up and running with a new project in our homelabs… and we’ve certainly felt the pain that bad technical writing can cause. So you can understand why our docs team here at Spectro Cloud are always working to improve the experience we offer our users (that’s you). Of course, that includes expanding and refining our content itself — but also innovating how we make that content accessible. SpectroMate is an open-source project I created while working at Spectro Cloud. SpectroMate is an API server with extended functionality designed for Slack integration in the form of a bot. You can use SpectroMate to handle slash commands, and message actions. This article was originally published in Spectro Cloud’s blog. Click on the Article Link to read the original source. Ready for an AI docs chatbot? Given all the buzz about ChatGPT (not least the inspirational work that our open source team has been doing with LocalAI), of course we’ve been investigating how we can apply AI innovation and LLMs to the docs experience. To that end, we’ve partnered with Mendable.ai, who offer state of the art AI Chat Search technology, which we used to create a chatbot for our product documentation. Once we implement the chatbot on our docs site (it’s in testing right now), you’ll be able to ask it a product question, and get an accurate answer pulled from our knowledgebase. Having an intelligent chatbot for product documentation is a great feature, but we knew it wasn’t enough. Some of the heaviest power users of our docs, and the ones helping train and refine the chatbot, are our internal architects and engineers. We noticed that many of the technical questions and discussions they have happen in Slack channels. Instead of navigating out of Slack and visiting our docs site, wouldn’t it be great if our team members could find the information they need, right there in the chat? And this is where SpectroMate comes in. SpectroMate: bringing Mendable (and more) to Slack SpectroMate is a new application that we’ve built to bring the Mendable chat experience into Slack. Essentially, it’s an API server with extended functionality designed for Slack integration. It comes with out-of-the-box support for Mendable.ai chatbots, but can easily be extended. We’ve been using SpectroMate internally for several weeks, and it’s running great. Users on our internal company Slack can ask a docs question to the Mendable chatbot just by using the /docs ask command in any Slack channel. Already SpectroMate has helped us improve and fine-tune the Mendable model based on internal usage feedback, all from the comfort of Slack. This gives us a higher degree of confidence in our language model before opening it up to the general public. SpectroMate is open source! We ♥️ OSS. So we’re excited to announce we’ve made SpectroMate open source. If you want to kick the tires, you can use Terraform to deploy it to our Palette Dev Engine environment in under five minutes by following the Getting Started guide. Just be sure to fill in our form to request access to Palette first! SpectroMate is available as a standalone binary, or you can use the Docker image to deploy the application to any computing environment. If you prefer to deploy SpectroMate to your own Kubernetes environment, you can do so by following the Kubernetes deployment instructions. The out-of-the-box support for Mendable will be helpful if you want to follow in our footsteps and expose your documentation as an AI chatbot to internal users or community Slack workspaces. You can customize SpectroMate by adding new API endpoints, new slash commands, or new functionality for your Slack workspace. Fork the GitHub project and start playing! Of course, we also welcome contributions. SpectroMate is written in Go and is designed to be simple to","date":"2023-06-15","objectID":"/posts/spectromate/:0:0","tags":["open-source","open-source","api-server","go"],"title":"SpectroMate - An Open-Source Slack integration with Mendable","uri":"/posts/spectromate/"},{"categories":["projects","api-server"],"content":" SpectroMate is an open-source project I created while working at Spectro Cloud. SpectroMate is an API server with extended functionality designed for Slack integration in the form of a bot. You can use SpectroMate to handle slash commands, and message actions. You can also use SpectroMate to handle non-slack-related events by creating API endpoints for other purposes. SpectroMate comes with out-of-the-box support for Mendable. You can use your Mendable-trained model to answer documentation-related questions by using the /docs ask slash command. To learn more about SpectroMate, check out the GitHub repository or check out the release blog. Source Github Source ","date":"2023-04-24","objectID":"/projects/spectromate/:0:0","tags":["open-source","slack","spectroCloud","go","ai"],"title":"SpectroMate","uri":"/projects/spectromate/"},{"categories":["presentation"],"content":"In this episode of Consul office hours, I provide an overview of the Chaos engineering tutorial I authored. ","date":"2022-02-09","objectID":"/presentations/chaos-engineering/:0:0","tags":["presentation","consul","office-hours","hashicorp"],"title":"HashiCorp Consul Office Hours","uri":"/presentations/chaos-engineering/"},{"categories":["projects","CLI"],"content":"SawyerBrink is an open-source risk management platform that is deployed through a Terraform module. API Architecture Frontend Architecture ","date":"2021-12-24","objectID":"/projects/sawyerbrink/:0:0","tags":["projects","CLI","AWS","Lambda"],"title":"SawyerBrink Open-Source Risk Management SaaS","uri":"/projects/sawyerbrink/"},{"categories":["posts"],"content":"You have assumed the leadership of a team that is operating in a cloud environment. It’s a new beginning, you are excited about the future (hopefully), the team members, and most of all, the thrill of a new challenge. After the excitement settles down you start asking questions to better understand the work and the team. Among the list of questions you have, you should include questions pertaining to cloud cost and cost optimization. This article was originally published on Medium. Link to the Medium article can be found here. In this article, you will find a set of questions that are beneficial for you and your team to further explore. These are questions I have found beneficial in the past and I believe they will be beneficial to you too. Without further ado, let’s dive into it. Q: Do we have any budget alarms established? This is a simple question but the answer will reveal a lot of information about the team, the organization, and the emphasis placed on cost management. The ideal answer is “yes”, and depending on the maturity of the team, and the organization you might find out that there are several layers of budget alarms. Sometimes, these budgets are for different services and environments. If you are in the ”yes” camp, give the team kudos 👏 For those of you who find yourself in the “no” camp ⛺️ , don’t despair. Yes, there is a lot of work to do here, but it’s also an opportunity to stand out, and raise the standard. All major cloud providers offer the ability to set budget alarms. There are many ways to use the budget alarms, but the primary reason you want to use alarms is for cost awareness and to change behavior. Yes, behavior change. You want to get you and your team to take a moment and ask themselves “what impact will this change have on the budget”. The alarms help remind the team to act more responsibly from a financial perspective. If there is no set budget, then review the billing information for the past three to six months and identify a baseline/average. Budgets are not a one-and-done kind of deal. Budgets are a moving goal and should be reviewed often. You want to aim for a goal but the reality is that accurately forecasting cost is difficult and often subject to change due to many external factors. As you and your team develop a good understanding of what the major cost drivers are, you can then start having conversations on how to reduce the expenses. But it all starts with measuring. As the saying goes “What gets measured, get’s done” Q: Is there a tagging policy in place? Tagging is important for teams to more accurately understand their cost, but it’s critical at the organizational level to understand where financial resources are being allocated. Let’s break this down further, starting at the team level. By tagging resources, you and your team can more accurately understand expense reports generated by the cloud provider. Let’s use a real-world example. Assume you and your team have a fleet of virtual machines (VMs). A VM can belong to a different part of the application architecture. Without tagging, how would you identify which part of the architecture is costing X amount of dollars in a given month? Take this example a bit further, assume it’s a multi-tenant environment. If various teams are consuming VMs without tagging, it would be very difficult to understand the cost of each team. You could use a naming convention to identify different teams or parts of your application architecture but that will not help you break down the cost of VMs at the end of the month when you are reviewing the bill. Cloud service costs can be broken down by tags. The ability to break down service costs by tags is why tagging is important from a cost management perspective. The two screenshots below illustrate this. The bottom image showcases how the EC2 cost for the month of August is narrowed down to the resources with the tag 12345. Screenshot of AWS Cost Explorer The cost of 12345 in relationship to EC2 utilizatio","date":"2021-11-06","objectID":"/posts/cost-advice/:0:0","tags":["costs","cloud","engineering","leadership"],"title":"Cloud Cost Questions for Engineering Managers","uri":"/posts/cost-advice/"},{"categories":["presentation"],"content":"I had the opportunity to discuss in-depth the Education engineer role with my co-worker, Kaitlin Carter. If you want to learn more about Education engineers, check out this podcast episode of HashiCast. HashiCast · Episode 33 - Karl Cardenas \u0026 Kaitlin Carter, HashiCorp Link to HashiCast. ","date":"2021-09-01","objectID":"/presentations/hashicast/:0:0","tags":["presentation","interview","podcast"],"title":"HashiCast","uri":"/presentations/hashicast/"},{"categories":["projects","architecture"],"content":" A deployment pattern for deploying HashiCorp Waypoint as a shared service on AWS. Source Github Source ","date":"2021-07-22","objectID":"/projects/waypoint/:0:0","tags":["projects","hashiCorp","waypoint","architecture","pattern"],"title":"Waypoint Deployment Pattern","uri":"/projects/waypoint/"},{"categories":["posts"],"content":"Two approaches to injecting variability into your Nomad batch job template without having to modify the template in the future. This article was originally published in HashiCorp’s blog while I was an employee. Click on the Article Link to learn how to inject variability into Nomad jobs. ","date":"2021-07-21","objectID":"/posts/nomad-duplicate/:0:0","tags":["nomad","automation"],"title":"Running Duplicate Batch Jobs in HashiCorp Nomad","uri":"/posts/nomad-duplicate/"},{"categories":["posts"],"content":" If you find yourself authoring several AWS lambdas for a serverless application architecture, you might have encountered this error: An error occurred: mySweetLambda– Code storage limit exceeded. (Service: AWSLambda; Status Code: 400; Error Code: CodeStorageExceededException; Request ID: 05d3ae68-a7c2-a3e8-948e-41c2739638af). The first time I encountered this error I wasn’t quite sure what was happening, but after some quick web searches I learned that AWS has a limit on Lambda storage that maxes out at 75Gb. Additionally, I also learned that AWS retains all the previous versions of all my lambdas. That’s all fine, I should probably go do some “spring cleaning” and remove the unused versions. AWS does expose the functionality to remove former versions through the console. However, in my scenario I had over 500+ versions for some of my older lambdas. Clicking through 500+ versions is not how I want to spend my time. So what options are available? Surely there has to be some better options out there. TLDR: Visit https://github.com/karl-cardenas-coding/go-lambda-cleanup for simple and easy to use solution. Delete Function Automation is your friend! There are a handful of open source solutions available that I stumbled upon early on in my clean up journey. clear-lambda-storage (python) aws-lambda-es-cleanup (Cloudformation) SAR-Lambda-Janitor (CloudFormation) Serverless framework prune plugin (plugin) While these options are all great and incredible useful, something didn’t feel right about this approach. I didn’t want to install python on my system (local/CI-CD) unless I absolutely needed it. I also didn’t like the idea of having to deploy infrastructure just for the purpose of cleaning up old lambda versions. Ideally, I just want to run a single command that cleans up old lambdas, without having to install a language runtime, and/or requiring to setup additional infrastructure for the sole purpose of cleaning up lambdas. Is there such a solution available? The answer is go-lambda-cleanup! go-lambda-cleanup Go-lambda-cleanup is distributed as a single binary and it’s open source. The tool is available for Windows, Mac (Including M1 support), and Linux. No complicated install process, no need for additional infrastructure. Simply issue the command glc clean and the tool will start the clean up process. Getting Started Download the binary and install go-lambda-cleanup binary in a directory that is in your system’s PATH. /usr/local/bin is the recommended path for UNIX/LINUX environments. Confirm glc is installed correctly by issuing the version command. If you encounter and error then this is a good indicator that the binary is not found in the system path. $ glc version go-lambda-cleanup v1.0.8 To start cleaning you must have valid AWS credentials. Go-lambda-cleanup utilizes the default AWS Go SDK credentials provider to find AWS credentials. The glc clean command will by default retain all $LATEST versions and remove the rest. glc clean -r us-east-1 If you want to retain $LATEST and the last two versions but remove the remaining versions, simply use the -c flag parameter. glc clean -r us-east-1 -c 2 There is also a dry run option available if you want to get a preview of an actual execution. glc clean -r us-east-1 -d Closing I encourage you to go checkout go-lambda-cleanup if you have been wanting a simpler solution to conduct AWS lambda cleanup. More details can be found in the project README. Feel free to open op issues if you encounter/observe something odd with the tool! ","date":"2021-04-02","objectID":"/posts/glc/:0:0","tags":["costs","cloud","automation","AWS","Lambda"],"title":"Go Lambda Cleanup","uri":"/posts/glc/"},{"categories":["projects","CLI"],"content":" A Golang based CLI for removing unused versions of AWS Lambdas. This project was created to solve the challenge of multiple outdated Lambda versions remaining in the AWS account. The CLI tool can be used to remove outdated Lambda versions and reduce the Lambda storage utilization. To learn more about go-lambda-cleanup, check out the blog post. Source Github Source ","date":"2021-03-07","objectID":"/projects/go-lambda-cleanup/:0:0","tags":["projects","CLI","AWS","Lambda"],"title":"Go Lambda Cleanup","uri":"/projects/go-lambda-cleanup/"},{"categories":["projects","docs-as-code"],"content":" This is a template for a Docusaurus site that uses Git for versioning. Checkout the blog post When Docs and a Dinosaur Git Along to understand the motivation behind this template and why we at Spectro Cloud chose to use Git for versioning. Source Github Source ","date":"2021-03-07","objectID":"/projects/docusarus-versioning-git/:0:0","tags":["projects","docs-as-code","git","docusarus"],"title":"When Docs and a Dinosaur Git Along: Enabling Versioning in Docusaurus","uri":"/projects/docusarus-versioning-git/"},{"categories":["posts"],"content":"Chances are most of us have unique situations for wanting to interact with DynamoDB locally, maybe it’s to develop and test different data models, perhaps it’s to develop programmatic functions to interact with the database, perhaps you want to reduce development expenses, or perhaps you’re just doing research. Regardless of your reasons, I want to help you by showing you how to leverage DynamoDB locally. We will use the following tools. Medium Link Localstack Terraform Go AWS CLI noSQL Workbench for DynamoDB We will walk through setting up the local environment, generating data, uploading data, interacting with the noSQL Workbench, and some neat tips to keep in mind. So with that being said, let’s dive into into it! Note: If you get lost, simply visit https://github.com/karl-cardenas-coding/dynamodb-local-example to view the end solution. Also, feel free to fork this template project and use it as a starting point. Setting up the environment First thing first, ensure that you have Terraform (\u003e v0.12.0), noSQL Workbench, and localstack ( \u003e v0.11.3) installed and working on your system. If you need help installing these resources checkout the three links below. Due to the abundance of resources for getting started available, I will skip ahead and assume you have them installed. Install Terraform Install localstack Install noSQL Workbench for DynamoDB (Alternative) if you don’t want to use localstack, DynamoDB offers a docker image, you may use this option as well. Localstack First thing first, fire up localstack. If you installed it through pip then it’s as easy as issuing the command localstack start . Or if you used the localstack docker image then it’s as simple as docker run localstack/localstack . If everything starts up correctly then you should be seeing something similar to the screenshot below. Note: localstack has plenty of parameters to pass in during startup. We are taking the defaults which starts majority of the mocked AWS services but there are plenty of other options worth checking out. *WSL2 output through pip installation* Terraform Terraform is a great solution to automate the deployment of the local DynamoDB environment, along with any other AWS resources required to get the desired test environment created. In this example, we’re actually going to use Terraform to seed the database (more on that latter). However, first we need to setup Terraform to leverage localstack. All that is needed to leverage Terraform with localstack is to modify the aws provider block. provider \"aws\" { access_key = \"mock_access_key\" region = \"us-east-1\" s3_force_path_style = true secret_key = \"mock_secret_key\" skip_credentials_validation = true skip_metadata_api_check = true skip_requesting_account_id = true endpoints { dynamodb = \"http://localhost:4566\" } } # When using the DynomoDB local docker image # provider \"aws\" { # access_key = \"mock_access_key\" # region = \"us-east-1\" # secret_key = \"mock_secret_key\" # skip_credentials_validation = true # skip_metadata_api_check = true # skip_requesting_account_id = true # endpoints { # dynamodb = \"http://localhost:8000\" # } # } If you review the code snippet above you will probably notice how on line 10 we are specifying a code block for endpoints. This is where we essentially point Terraform to localhost and the port that localstack is listening on, for the respective mocked AWS service. I have also added the DynomoDB docker image configuration for those of you who took that approach, just remember to ensure that the container port specified is correct. In the example project I provided, take a peek at the main.tf file. In the example project, a customer order table is being deployed. In addition, I have a local secondary index and global secondary index. Let’s deploy this Terraform configuration. terraform init \u0026\u0026 terraform plan -out=\"myplan\" **What Terraform plan should reveal** If you get a similar output as the picture above, go ahead and issue the command below terraform apply -auto-","date":"2020-08-20","objectID":"/posts/dynamodb-local/:0:0","tags":["dynamoDB","cloud","database","AWS","architecture"],"title":"How to use AWS DynamoDB locally…","uri":"/posts/dynamodb-local/"},{"categories":["posts"],"content":"It’s tough to find an organization that is not leveraging a public cloud platform. With all the SaaS, PaaS, and IaaS providers out there, chances are you’ve used cloud services in some manner. As public cloud utilization continues to ramp up in many industries (e.g., insurance, financial, business), organizations are encountering a new challenge: how to correctly consume these new platforms and technologies. It’s a workforce challenge more so than it is a technical challenge. This article was originally published in the State Farm Engineering Blog. Link to the original blog can be found here. It’s a workforce challenge because, at the end of the day, it comes down to technical education and experience. Organizations with a workforce skilled in cloud technologies will have a smoother cloud consumption experience compared to one lacking a workforce with cloud experience. Every organization is competing for talent - we all want that DevSecOps engineer and/or Cloud Architect that can do it all (CI/CD, Programming, Security, Architecting, Data, Test, etc.) - but truth to be told, those individuals are rare to come by, let alone retain. Ideally, placing these individuals among inexperienced teams would be the best approach. They could act as a mentor and guide colleagues through successful application deployments. However, this option is not really scalable for large enterprises that are starting out their cloud journey. So what is the alternative? The Community Support model! What is the Community Support model? At State Farm® our public cloud team embraces and continuously invests in the Community Support model. Our belief is that in order to succeed as an organization and at scale, we have to come together as a community and help one another. A big contributor to this mentality is our organization’s shared values of helping those in need. State Farm is only three years into its public cloud journey and already has observed positive results from this model. At State Farm, we define the Community Support model as the following: The promotion of - and reliance on - knowledge sharing for both failures and successes, so that teams may learn from the past in order to succeed at scale in technical endeavors. There is a lot packed into this definition, but the main takeaway is that individuals and teams need to help each other by leveraging their expertise and sharing their failures and successes. This not only aids in preventing unnecessary duplication and resource waste (time and money), but it also prevents teams downstream that might find themselves in a similar position from having to start over. Other teams can leverage the lessons learned and reapply architectures and reusable code configurations (more on this later). This is not a novel concept, as history can attest. The Community Support model provides great results time and again. Groups of individuals are a much more potent force when united. Some of you might be asking: “Where is your Cloud Center of Excellence (CCoE)?” State Farm has a public cloud platform enablement team that fulfills most of the traditional CCoE duties and responsibilities required to support a platform. However, we don’t believe that reliance on one team to solve all of the technical challenges platform consumers might experience is the best approach. Instead, we leverage the platform team for the following (not limited to): Platform (Cloud Service Providers) ownership and support Tackle platform level challenges (technical and non-technical) Enable shared services Provide technical patterns Improve process and reduce friction encountered by product teams Mitigate risk Foster a technical community Provide guidance and technical expertise (limited by team bandwidth) The last bullet and the text in bold “limited by team bandwidth” is why State Farm is an adopter of the Community Support model. We believe in empowering teams to identify the best technical solution for their needs. By empowering teams to cr","date":"2020-08-20","objectID":"/posts/community/:0:0","tags":["community","cloud","adaption","leadership","technical-culture","devops"],"title":"Using a Community Support Model","uri":"/posts/community/"},{"categories":["presentation"],"content":"The benefits of DevOps come with a steep learning curve. At State Farm, we believe that in order to successfully transform and create a DevOps culture, we have to embrace the community model—the promotion of (and reliance on) knowledge sharing for both failures and successes, so that teams may learn from the past in order to succeed at scale. The community model prevents unnecessary waste, enables architecture and code reuse, and allows teams to share best practices. It also enables cultural change. In this presentation I articulate the community model, share our lessons learned with three years experience operating within it, and explain why and how other organizations should embrace it. You can learn more about the community model in this blog post I authored. ","date":"2020-08-09","objectID":"/presentations/commit2020/:0:0","tags":["presentation","gitlab","commit","community model"],"title":"Gitlab Commit 2020","uri":"/presentations/commit2020/"},{"categories":["posts"],"content":"Chances are most of us have stumbled into someone’s digital portfolio. This can range from book authors to IT professionals, it’s pretty universal as everyone can benefit from it. Is it needed, probably not but it does help quite a bit in showcasing your skills and who knows, perhaps it could be the difference between you getting an interview and/or a phone call. Let’s say you decided you want a digital portfolio, you’re probably wondering how to go about getting started? Does it cost money? Is it affordable? Do I have to be a programmer? Do I need to maintain a server? What if I told you can do it all for free, with no programming skills, and virtually zero maintenance. All you need is some patience, a little bit of guidance and some spare time. Sounds pretty neat right? Well if that interest you let’s dive into what it takes to make a digital portfolio. Medium Link Note: If you get lost or want to look at the final example code, check out the code respository Getting Started In order to make this happen we will use the following pieces of technology. A domain name (optional) Your favorite static site framework (we will use Hugo) Netlify Github or Gitlab A text editor It helps it if you are familiar with git beforehand and have git configured. Here is a guide for setting up git with Github, and here is one for Gitlab. There are also numerous videos on Youtube that walks you through how to get started if you are a visual learner. Next we need to make a decision, do we want a domain name, such as example.com or do want to leverage a generic name that might not be as human friendly? I recommend getting a domain name, plus domain names are not too expensive, about $10–12/yr. You can sacrifice two trips to Starbucks this week, and you’ll already have it paid off ☕️ If you want help in acquiring a domain name read the next section, otherwise skip to the “Install Hugo” section. PS: If you already have a domain, ensure to change the preferred name servers (NS) to Netlify’s name servers. This is demonstrated in the next section. Getting a Domain Name What’s awesome these days is that you can buy a domain name from various providers, and better yet, delegate the management of the domain name to another provider. If you haven’t already, log into Netlify and/or create an account (free). Next, click on domains. This will take you to a landing page. In this landing page there is a search bar, type the domain name you would like to acquire or transfer if you already own one. After you have come up with a domain name, type it in and Netlify will verify if it already exists. If the domain name is available you will be presented with the option to purchase. Keep in mind that you have to add a payment method before you can purchase the domain. Go ahead and purchase the domain and take the default settings in the following screens. We will revisit the domains page later for HTTPS. For those of you that already have a domain and want to transfer it, follow the same process and update the name servers. Below is an example of my personal domain that I purchased through Goolge domains. As you can see below, Netlify has four different name servers. These values need to be added in Google domains, or wherever your domain is registered. *Example of a domain registered through an external provider* *Example of the name servers being updated in the external provider* Once you have completed this step it might take few minutes to update. I’ve seen it take as short as 30 seconds to 10 minutes. You can verify this through G Suite toolbox , if everything was done correctly you should see the Netlify name servers in the results field. Install Hugo At this point we are ready to start playing with Hugo. If you’re not familiar with Hugo, don’t sweat it, just know that it’s a static site framework that does a lot of heavy lifting for us. To install Hugo, visit https://gohugo.io/getting-started/installing/ . Installing Hugo can be a bit overwhelming if you don’t c","date":"2020-07-30","objectID":"/posts/create-a-portfolio/:0:0","tags":["self-development","portfolio","hugo","netlify"],"title":"How to Create a Digital Portfolio That Is Free and Serverless","uri":"/posts/create-a-portfolio/"},{"categories":["posts"],"content":"HahsiCorp has added two new tools in Terraform. As of Terraform v.12.20 there are two new functions available for consumers try() and can() . Along with these two functions there is an experimental feature available, variable_validation . In this article we’re going to look into how these new functions are used and how they works. This article was originally published on Medium. Link to the Medium article can be found here. All code snippets can be found at https://github.com/karl-cardenas-coding/terraform-functions Note: Variable validation is an experimental feature as of v12.20 use with caution as it is not recommended for production usage at this time. The can and try()function can only catch and handle dynamic errors resulting from access to data that isn’t known until runtime. It will not catch \u003eerrors relating to expressions that can be proven to be invalid for any input, such as a malformed resource reference. Can() The can() function attempts to execute the following code provided inside of it and returns a boolean value. The main purpose behind can() function is input validation according to the official documentation. Let’s put it to test. To enable input_validation add the following code block to your Terraform configuration. terraform { experiments = [variable_validation] } variable \"os\" { default = \"linux\" validation { # The condition here identifies if the variable contains the string \"linxu\" OR \"windows\". condition = can(regex(\"linux|windows\", var.os)) error_message = \"ERROR: Operating System must be Windows OR Linux.\" } } Shout out to @d-henn for the regex example In the example above we have a variable named “os”, short for “operating system”. This variable is also leveraging the new validation functionality. So let’s break things down here. The validation block has two components: condition (required) error_message (required) (does NOT support interpolation) The syntax for the can() is can(logic for test, value or variable to test) . In the example above, the variable had the value “linux” hard-coded as a default value. Let’s change that to another value, say “z/OS” and see how it behaves on a terraform plan or a terraform apply Error: Invalid value for variable on can.tf line 12: 12: variable \"os\" { ERROR: Operating System must be Windows OR Linux. This was checked by the validation rule at can.tf: 15, 3-13. Pretty neat! The error message is pretty descriptive due to our ability to author it. Terraform also returns the file name and location in the file for where the incorrect variable value is can.tf:15,3–13. Fun fact, variable validation is opinionated as it expects proper English grammar 👵🏻. Error: Invalid validation error message on can.tf line 18, in variable \"os\": 18: error_message = \"ERROR: Operating System must be Windows OR Linux\" Validation error message must be at least one full English sentence starting with an uppercase letter and ending with a period or question mark. Input validation can also be used without the can() . In the code snippet below you can see that the length of the variable “word” is evaluated to see if it is greater than 1. The variable is tied to the random pet provider and will dictate how many pets are in the word string generated. ### Test scenario for \"can\" variable \"word-length\" { validation { # The condition here identifies if the integer if greater than 1 condition = var.word-length \u003e 1 error_message = \"The variable is not greater than 5. Word length has to be at a minimum \u003e 1.\" } } variable \"os\" { validation { # The condition here identifies if the variable contains the string \"linxu\" OR \"windows\". condition = can(regex(\"linux|windows\", var.os)) error_message = \"ERROR: Operating System must be Windows OR Linux.\" } } resource \"random_pet\" \"pet\" { length = var.word-length keepers = { pet-name = timestamp() } } output \"pet\" { value = \"${random_pet.pet.id}\" } HashiCorp does call out in their documentation that can() should not be used for error handling, or any context ou","date":"2020-03-07","objectID":"/posts/try-can/:0:0","tags":["terraform","automation"],"title":"Using Terraform’s try() ,can(), and input validation","uri":"/posts/try-can/"},{"categories":["presentation"],"content":"I had the opportunity to present at the 2020 HashiCorp Employee Exchange (HEX) Sales Organization’s KeyNote. Thanks for having me HashiCorp. It was a pleasure to present to an ambitious organization with a passionate workforce! ","date":"2020-02-25","objectID":"/presentations/hex/:0:0","tags":["presentation","interview","hashicorp"],"title":"HashiCorp Employee Exchange - SKO KeyNote","uri":"/presentations/hex/"},{"categories":["about"],"content":"Hello 👋. My name is Karl Cardenas, and I am a technical leader with a passion for technology and software. To better help you understand who I am, I would like to summarize my strengths into three buckets: leadership, learning agility, and developing others. I am the person who will take charge when there is a lack of leadership. I wasn’t always this person. It took a lot of personal sacrifices and overcoming challenging situations for me to become this leader. The challenges in life-threatening conditions and hostile environments the military exposed me to as a team leader helped shape me into the leader I am today. I lead by example and wouldn’t ask anyone to do something I would not be willing to do. What sets me apart from others is my unwavering commitment to learning and my eagerness to mentor others. I am always ready to dive into something completely foreign and get out of my comfort zone, driven by my passion for learning and the desire to be a well-rounded IT leader. Mentoring is not just something I do. It’s something I am deeply passionate about and thoroughly enjoy. There’s nothing more rewarding than helping others unlock their best potential. I welcome the opportunity to mentor others and find it truly fulfilling to see someone I have developed and coached become a leader in their space. Creating communities is a passion of mine. I enjoy sharing my knowledge and getting others up to speed, which I consider a core leadership attribute. Sharing knowledge and being a technical role model drives me daily. As stated earlier, I lead by example and strive to be a technical leader who promotes knowledge sharing and self-improvement. Philosophy I like to keep things simple, I use the following equation as my north star in life. (learning + hard work + helping others) = success Listen, learn, lead Laugh, and make things fun Embrace continuous learning Put in the hard work to become a valued asset and inspire others Share knowledge back to help others When in charge, be in charge! A bad decision is better than no decision. Look after your team and those close to you Interests Programming Open-source Reading Excercise Finance Real Estate Investor Blockchain technologies ","date":"2020-02-09","objectID":"/about/:0:0","tags":["about"],"title":"About","uri":"/about/"},{"categories":["projects","CLI"],"content":" Disaster CLI A Golang based CLI tool for determining natural catastrophe near you, or a location specified. Earth Observatory Natural Event Tracker (EONET) is the source for all Data. Source Github source ","date":"2020-02-09","objectID":"/projects/disaster-cli/:0:0","tags":["projects","CLI"],"title":"Disaster CLI","uri":"/projects/disaster-cli/"},{"categories":["posts"],"content":"Photo by Jukan Tateisi on Unsplash There is no easy button! We have all been there, feeling stuck, unsure of how to move to the next level. This can apply to your current job role, or career as a whole. And unfortunately, this feeling of being trapped is also not a one-time incident, sometimes we can feel this way several times throughout our careers. You already know this but sometimes it helps to hear it again. To get to the next step, there is no easy button! This article was originally published on LinkedIn. Link to the article can be found here. In the past year I’ve been asked by fellow IT professionals for guidance on advancing in the IT field. Everyone’s situation is different but they all share common pain points. In an effort to help others beyond my immediate reach, I decided to write down the advice I provide to those that seek my input. However, this is only one man’s words and I recommend to always seek multiple perspectives to see which applies best to your situation! If you’re looking for a “(insert number) to success guide”, then you’ve come to wrong place. There is no such guide, life doesn’t work that way although there are times we all wish it did. Fortune and misfortune are all too powerful forces to ignore that are out of our control. However, there are steps we can take to gain more control of our career. I use the following equation as a guiding principle. (learning + hard work + helping others) = success Embrace continuous learning Our capability of learning and applying logical reasoning is what truly sets us apart from other species and lifeforms. This ~3 lbs, pink, muscle we call the brain is primed for learning and solving complex problems. It would be a disservice to deny our brain from one of its core capabilities! During our generation, we see industries disrupted by technological changes, changes that require employees to adjust and learn new skills. To make matters more complex, employees are being asked to do more and more everyday. New methodologies such as product management, agile practices, and DevSecOps are requiring constant upskilling from all workforces regardless of the organization or industry. I intentionally bring this up because some view reskilling/upskilling (learning) as an annoyance, burden, inconvenience, and so on. We are all creatures of habit, change is not as easy as we would like it be. If you feel this way about learning then I am afraid I have bad news for you, it’s not going to change anytime soon. Organizations are battling day and night for new business, and are constantly trying to 1up one another. A lot of these new business solutions are powered by advances in technology that IT professionals need to develop and master. The later is why it’s important to keep learning and to constantly add new skills to our mental tool belt. No matter how smart, driven, or talented you may think your coworkers or peers are, not everyone is going to take their learning the extra miles and go beyond what their current job description says to do. It takes that next level IT professional to take matters into their own hands and learn new skills in their spare time to remain well rounded. The learning is not only for today’s needs but also for tomorrow’s challenges. Organizations are aggressively looking for individuals that can master new skills. Take cloud technologies as an example. In order to design and develop an application that is cloud native it would require many skills; familiarity with a cloud vendor and its services, infrastructure as code, security, CI/CD, programming language(s), architecture, disaster recovery, cost optimization, etc. The list can go on and on, but the point I am trying to make is that no one will be able to master all these skills, it will take a team effort to successfully develop an application. If you are able to speak with confidence and have some experience with a least one or two of the items I outlined above, chances are you would have a hirin","date":"2020-02-03","objectID":"/posts/moving-on-up/:0:0","tags":["career","self-development","engineering","leadership"],"title":"Stuck in your career? Try this for a change!","uri":"/posts/moving-on-up/"},{"categories":["posts"],"content":"Sentinel is HashiCorp’s framework for implementation of Policy as Code (PaC). It integrates with Infrastructure as Code (IaC), and allows teams/organizations to be proactive from a compliance/risk standpoint. Sentinel allows for granular, logic-based policy decisions that reads information from external sources to derive a decision. In plain English, based on logic written (policies), Sentinel can act as a decision maker based on information provided. This is pretty handy when you want to prevent users from executing specific actions, or ensure that certain steps/actions are conducted. Example, an employee attempting to deploy a bad practice network rule that allows everyone in the internet inbound access! It’s important to call out that Sentinel is a dynamic programming language, with types and the ability to work with rule constructs based on boolean logic. This article was originally published on Medium. Link to the Medium article can be found here. First things first, implementation of Sentinel is only available to HashiCorp enterprise customers, hence a reason for why the only documentation available is from HashiCorp. It can be used with the following HashiCorp products; Terraform, Vault, Consul, and Nomad. However, non-enterprise users can still get their hands on Sentinel by leveraging the Sentinel Simulator. To better understand Sentinel, let’s observe a traditional workflow, let’s use Terraform as the example workflow. In a traditional Terraform workflow the operator/user would draft up a configuration template that specifies which infrastructure resources to be deployed. The next step would be to issue the command terraform plan to ensure the desired resources and infrastructure changes aligns with our desired intention. If everything checks out in the output provided by terraform plan then the final step is to execute terraform apply. The terraform apply command is what would actually deploy our infrastructure. This example workflow applies to both Terraform Enterprise and Terraform Cloud (free tier). To learn more about the difference between the two checkout this link. Now, let’s take the same workflow and add Sentinel to the mix. Sentinel comes into play after terraform plan and before terraform apply (see image below). Assume we have a Sentinel policy (more about policies later) that prevents users to deploy infrastructure resources to a public cloud environment without at least one tag. The terraform plan output would be evaluated against that Sentinel policy. If the resources contains at a minimum a tag, as defined in the policy, then the user is allowed to execute terraform apply . Otherwise the plan is rejected and the user is forced to make the specified changes so that the plan passes the policy check. In simple terms, Sentinel prevents users from conducting actions deemed “unapproved” by the policy authors. Traditionally, policy authors are platform administrators and/or information security. Again, Sentinel is only available for Terraform Enterprise and Terraform Cloud Paid Tier! Enforcement Levels Enforcement levels come in three different flavors; Advisory, Soft Mandatory, and Mandatory. The various enforcement levels allow administrators/policy authors to decouple policy behavior from the policy logic. This means that we can set a policy to act as a warning or notification, in this scenario failing still allows for a run to be applied. A soft mandatory requires the policy to pass, otherwise an administrator/elevated user is required to manually override the failed policy evaluation. If an override is provided then the apply is allowed to execute. The strictest level, hard mandatory, requires a policy to pass, no exception. The only way around this policy is to remove it from the targeted workspace(s) or global scope. Policies can be applied to selected workspaces and/or they can be applied globally to all workspaces. Note: Vault Enterprise has a different behavior Sentinel Policies So what is a Senti","date":"2019-11-08","objectID":"/posts/policy-as-code/:0:0","tags":["career","self-development","engineering","leadership"],"title":"An introduction to Policy as Code (Sentinel)","uri":"/posts/policy-as-code/"},{"categories":["presentation"],"content":"This was an interview where I shared the amazing ways State Farm contributes back to the community. Source ","date":"2019-09-17","objectID":"/presentations/stem-sf/:0:0","tags":["presentation","interview"],"title":"Interview: STEM \u0026 State Farm","uri":"/presentations/stem-sf/"},{"categories":["presentation"],"content":"In this talk, I represented State Farm and shared our Terraform adoption journey. I discuss the challenges encountered, lessons learned, and how the Platform Enablement team was able to provision public cloud accounts in under five minutes from the previous average of three days. Transcript ","date":"2019-09-11","objectID":"/presentations/hashiconf2019/:0:0","tags":["presentation","hashicorp"],"title":"HashiConf 2019","uri":"/presentations/hashiconf2019/"},{"categories":["posts"],"content":"The explosion of public cloud platforms has made the accessibility and consumption of IT infrastructure an uncomplicated experience. The traditional IT infrastructure found in vast and expensive corporate data centers can now be consumed by anyone with an internet connection. As organizations/businesses start consuming public cloud platforms and its infrastructure you often hear the expression, infrastructure as code (IaC). This article was originally published on Medium. Link to the Medium article can be found here. If you have ever wondered, the what, the why, and the how, in regards to IaC then you have come to the right place. Static/Dynamic Infrastructure Before we dive into the nuts and bolts of IaC it helps to first understand how IT infrastructure works. Let’s start with static infrastructure, think server racks, mainframes, routers, switches, firewalls, and pretty much any equipment you expect to find in a traditional data center. In this static infrastructure environment, when you need more capacity you simply add more capacity though physical provisioning, either through horizontal and/or vertical scaling. The need for physical provisioning and waiting for the compute capacity to become available is what makes this environment static. Adding new equipment, increasing capacity, enabling new functionality, can take several several weeks (10 + weeks), from the day the order is placed to the day the equipment is ready for usage by product teams. In addition, maintaining this equipment requires time and effort from various IT professionals, both from a hardware and software perspective. It’s not uncommon for physical server racks to be split up virtually through various platform technologies (VM, Kubernetes, Pivotal, etc). It’s uncommon for this environment to have APIs available for the underlying infrastructure. Let’s change gears now and look at dynamic infrastructure. All public cloud platforms provide dynamic infrastructure, IT resources that are ephemeral in nature or if stated in a simple manner, infrastructure that is “on-demand”. This type of infrastructure is simply consumed by requesting it through the user interface console or APIs (more on this later). This on-demand infrastructure comes with a consumption pricing model, pay for every second of usage. Unlike static infrastructure environments, the consumer does not have to worry about having sufficient compute capacity and placing orders for physical servers. This is handled by the cloud provider on the behalf of the consumer. In addition, all the infrastructure available on the platform has APIs available which allows for automation capabilities. What is IaC? Infrastructure as Code is nothing more than replacing the traditional manual provisioning of infrastructure through admin consoles/GUI with a programming-based approach (think scripting). Instead of clicking on buttons and navigating through various screens to deploy/enable infrastructure, instead those actions are now achieved through a codified approach. IaC is heavily leveraged in dynamic infrastructure environments such as public cloud platforms due to the ability to provision and/or deprovision a large number of resources quickly through APIs. Without IaC this could be a tedious and arduous process. It’s important to note that IaC is not a new concept and it’s something that infrastructure analyst have done for many years through scripting and chaining commands together. What is different today in regard to IaC is the code aspect of it. How does IaC work? The modern IaC approach leverages declarative programming vs the traditional scripting approach of the past. Declarative programming is easy to get into as you are simply telling the computer “what to do” by filling out values for a given required input parameter. The computer will figure out the rest. Traditional scripting, or more accurately “imperative programming” is associated with general programming. In the imperative programming approach ","date":"2019-08-22","objectID":"/posts/infrastructure-as-code/:0:0","tags":["terraform","devops","engineering"],"title":"Infrastructure as Code (IaC) — What is it?","uri":"/posts/infrastructure-as-code/"},{"categories":["posts"],"content":"We have all been there, the moment terraform apply crashes because someone made a manual change and removed a resource that terraform is expecting to be available. You try to do a terraform refresh but to no luck! What do you do at this point? Sometimes the only option is to make modifications to the terraform state file. This article will walk you through how to make state file modifications, both the right and the wrong way, so that you can educate other in the future on how to make statefile changes properly. This article was originally published on Medium. Link to the Medium article can be found here. The wrong way One could easily open up the terraform.tfstate file and manually do “JSON surgery” but this is not a recommended action, mainly for the high chance of human errors and potentially wrecking your state file. That being said, allow to me show you how. If your state file is stored locally (bad practice), then all you need to do is simply make a backup of the terraform.tfstate and open up your favorite text editor begin to make changes in terraform.tfstate. However, if your state file is stored remotely, say an S3 bucket then there are a couple of steps we need to take. Run terraform init Comment out the backend logic Run terraform init and answer yes to copy your statefile locally Open up the state file, check your .gitignore file if you are unable to see it! Begin statefile surgery Save your changes Add your remote backend logic/config Run terraform init and answer yes to copy your local config to the remote site The terraform state file is in a JSON format (see below). As you can tell, all terraform defined resources fall under the resources array block. So if we wanted to remove the aws_instance resource, we would have to remove the entire { } that the resource falls under. \"version\": 4, \"terraform_version\": \"0.12.3\", \"serial\": 6, \"lineage\": \"4e32218a-6f16-1e51-2523-b3875f604783\", \"outputs\": {}, \"resources\": [ { \u003c------- This is where the aws_instance resource starts \"mode\": \"managed\", \"type\": \"aws_instance\", \"name\": \"web\", \"provider\": \"provider.aws\", \"instances\": [ { \"schema_version\": 1, \"attributes\": { \"ami\": \"ami-0b898040803850657\", \"arn\": \"arn:aws:ec2:us-east-1:140040602879:instance/i-0cd2055ad2783a11b\", \"associate_public_ip_address\": true, \"availability_zone\": \"us-east-1c\", \"cpu_core_count\": 1, \"cpu_threads_per_core\": 1, \"credit_specification\": [ { \"cpu_credits\": \"standard\" } ], Modules are also found under the main resources array block. Modules look like the following (see below): If you have a module that creates several resources, expect to find the module block for each resource. Yes, this means that you can expect to find many entries with the same name of the module you created. So if we named a module module \"buckets\" {} and it creates two aws_s3_bucket resources then you can expect find the module entry twice- unless you used the count parameter, in that case you would only find one entry. { \"module\": \"module.buckets\", \"mode\": \"managed\", \"type\": \"aws_s3_bucket\", \"name\": \"demo-mod-1\", \"provider\": \"provider.aws\", \"instances\": [ { \"schema_version\": 0, \"attributes\": { At this point, if you are removing resources simply ensure you remove the proper resources code blocks. If you are changing a value, then the value might also have to be changed manually on the real resource it represents. An example of this is an aws account created through terraform, if you desire to change the account name, then you have to change the state file resource attribute name value, as well as manually in the account. Why? Because otherwise terraform will observe the change and attempt to create another resource and that is not the desired behavior, therefore both terraform and manual actions have to be implemented. Note: ensure no outputs are depending on the resources being removed, if so remove the outputs as well. Outputs can be found at the beginning of the state file under the outputs code block outputs: {} . The right w","date":"2019-07-07","objectID":"/posts/cleaning-terraform-statefile/:0:0","tags":["terraform","devops","engineering","state"],"title":"Cleaning up a Terraform state file — the right way!","uri":"/posts/cleaning-terraform-statefile/"},{"categories":["posts"],"content":"If you don’t know the answer to this question don’t feel bad, engineers and IT professional at all levels sometimes don’t know the answer to this question. In my daily job I often get asked, “What is a pipeline?” The follow up question is 9/10 times, “How do I create a pipeline?” Today I would like to shed some light on the pipeline topic, mainly focusing on the first question but also why it is important to application development. This article was originally published on Medium. Link to the Medium article can be found here. The Past In simple terms a pipeline is a workflow, a workflow that application development teams use to release software. Note: Not limited to application development teams Start → do something → do something → … → Release Software In order to understand “what is a pipeline?” we have to go back in time and understand how application development was done. In the past, at least in larger organizations, software developers tended to only focus on writing the code for the application. Once the application was coded, it was handed over to a team that tested the code and conducted Q/A. If the testers discovered “bugs” they would document the errors and send it back to the developers for correction. Once the application passed testing it was handed over to an operations team. The operations team was responsible for standing up the application and making sure it was highly available and fault tolerant — through the use of enterprise datacenter infrastructure solutions. Let’s rewind here a bit. The operations team, a team traditionally composed of infrastructure analysts whose responsibility is to keep the gears of the data center turning. These fine men and women, the unsung heroes of IT, would often be handed over an application and through the means of standing up physical server racks and cabling would ensure the application came to life. This also included configuring the operating system, databases, the network, the storage, oh and let’s not forget disaster recovery responsibilities. As you can quickly identify, in the past (and unfortunately still in some organizations) there was a segregation of duties and responsibilities. This approach affects business agility and operational efficiencies negatively. From the developers perspective, a lot of time was spent waiting on other teams to complete work. From the non-application development teams’ perspective, they were lacking context and simply kept another team’s “baby” alive. Troubleshooting was also difficult as the operations teams often times lacked context and understanding of the code. The flip side to this are the developers who also often times did not understand the infrastructure keeping the application alive. When things broke, as they always do, perfect little storms were created due to the knowledge gap between the various teams. Now, at this point you might be asking yourself, “how does a pipeline improve the situation?” The answer, it doesn’t, at least not by itself! Enter DevOps! DevOps/DevSecOps In order to address the challenges that arise from the workflow of the past, serious changes have to be implemented. These changes range from behavioral changes, redefining responsibilities and duties, and upskilling. This is the goal of DevOps. DevOps is the art of integrating all the required steps of an application development workflow and application lifecycle managementunder a single small team. This means the same team that writes the application is also responsible for testing the code, deploying the application and maintaining it while in production! This is often described as “from cradle to grave”. If you add the security lens into the equation you get DevSecOps- unfortunately security is often times an afterthought when it should be part of the entire software development lifecycle. This is not as easy as it sounds, a lot of unique skills and knowledge are required for a team to truly practice DevSecOps. The ideal team that practices DevSec","date":"2019-05-19","objectID":"/posts/what-is-ci-cd/:0:0","tags":["devops","engineering","ci/cd"],"title":"What is a CI/CD pipeline?","uri":"/posts/what-is-ci-cd/"},{"categories":["posts"],"content":"If you manage AWS for an organization, big or small, chances are you have several Secure Shell (SSH) keys laying around you hardly use, OR WORSE, you don’t recall the account the key was made for. SSH key management is a rabbit hole in itself and most people understand the security concerns that arise with improper SSH key hygiene. Luckily for us, there is a way to bid farewell to the cumbersome practice of using SSH to remote into an EC2 instance. Allow me to introduce you the AWS service, Systems Manager (SSM). This article was originally published on Medium. Link to the Medium article can be found here. I will teach you the following in this guide: Identify SSM Remote Session Manager requirements-including for an enterprise Enable Remote Session Manager for all EC2 instances Enable Remote Session Manager logging Lock down Remote Session Manager through IAM User permissions 🔐 Debugging Remote Session Manager Enable SSM Remote Session Manager The AWS managed service, SSM, comes with a neat feature called Session Manager. Session Manager allows us to connect into an instance and get a shell session through the usage of HTTPS TLS1.2/ port 443, without having to use SSH keys. It’s important to understand that this is NOT an SSH connection but rather an HTTPS connection. The Session Manager allows us to use a terminal session from our web browser directly OR by using the AWS CLI. It’s really that easy…..assuming you have everything configured correctly. Here are the core requirements to get SSM’s Remote Session Manager to work: The SSM Agent is installed on the instance(s). By default the SSM agent is installed on Amazon Linux, Amazon Linux2, Ubuntu Server 16.04, Ubuntu Server 18.04, and all Windows Server AMIs. The agent can also be installed on your instance (if lacking) or be pre-baked into your AMI through the usage of Packer The EC2 instance requires an IAM Instance Profile — an instance profile role of the type EC2. To create an IAM instance profile for SSM follow this link. The IAM Instance Profile requires proper SSM permissions. This is a step that often causes confusion or that is missed. In order for the SSM agent to communicate with the AWS SSM API endpoints, it needs the proper IAM permissions. AWS provides a default SSM policy for your convenience named, amazonEC2RoleforSSM . That being said, I recommend you make your own policy based of the default provided policy that does not include the following permission s3:* (more on this later). When you have the three requirements checked of for your instance(s), you are ready to take off the training wheels and start using SSM’s Remote Session. Ta da! However, as everything else in life there are less obvious requirements than the three I mentioned above. Additionally, if you’re an enterprise user with Direct Connect — things can get a bit trickier. Let’s dive into the “less common” requirements. The VPC has access to the internet. This is not a luxury everyone has available. If you are in an organization that leverages Direct Connect with your AWS Infrastructure, chances are that you may have VPCs that have intentionally been created without an Internet Gateway (IGW) and NAT Gateways — thus your only access to the internet is through your corporate data center. This makes things more difficult, but luckily for us there is a solution — I’ll talk more about this later. The AWS CLI requires an SSM plugin. That’s right, if you want to use the remote session function from your workstation then you need to install an additional plugin to the AWS CLI. It’s a simple install and all directions can be found here. Addressing the VPC Internet challenge If you find yourself in the situation I described above — a VPC without internet connectivity, you need to enable six VPC Endpoints. Simple. If you’re not familiar with VPC endpoints, it’s a way of routing destined internet traffic to AWS public API endpoints through the AWS internal networking infrastructure. Traditionally, if you ne","date":"2019-04-14","objectID":"/posts/aws-ssm-replace-ssh/:0:0","tags":["aws","automation","ci/cd"],"title":"Ditch your SSH keys and enable AWS SSM!","uri":"/posts/aws-ssm-replace-ssh/"},{"categories":["posts"],"content":"If you work for an organization/company that leverages the services of a public cloud provider such as AWS, chances are there is a customized image available in your environment. Most companies today offer some sort of customized default image or images that contain baked in security tools, proxy variables, repository URL overrides, SSL certificates and so on. This customized image is usually sourced from common images provided by the public cloud provider. Today, we’re going to look at how we can completely automate a customized image sourced from the AWS Linux2 AMI and deploy it to all accounts inside an organization, while maintaining a minimal infrastructure footprint. Code can be found in the following GitHub repository. This article was originally published on Medium. Link to the Medium article can be found here. Assumptions Accounts are under an AWS Organizations. All accounts require the customized AMI. VPC ACLs and Security Groups allow Port 22 into to the VPC (Packer) CI/CD has proper credentials to query AWS Services (Organizations, VPC, EC2). Gitlab and Gitlab Runner available. Tools Utilized Terraform Packer AWS SNS AWS Lambda AWS CLI Gitlab Gitlab CI Docker Architecture It all starts with the SNS topic provided by AWS for the Linux 2 AMI. This SNS topic is the anchor to the automation that occurs every time Amazon updates the AMI. The subscription to this SNS topic invokes a Lambda function, the function makes a REST call to a Gitlab repository that is configured with a Gitlab Runner. The REST call is what triggers the CI/CD pipeline. The pipeline delivers a newly minted AMI that is sourced from the Linux 2 AMI but includes baked in tools and configurations per our discretion. The customized AMI is made available to all the AWS accounts. I will break down the automation into three sections; pre-pipeline, terraform, packer. Pre-pipeline SNS AWS provides customers with SNS topics for both of its managed AMIs (Linux \u0026 Linux 2). The automation starts by subscribing to the following ARN(see below) and assigning it a lambda function as an endpoint. When the AMI is updated, AWS publishes a message to the SNS topic and because the endpoint is a Lambda function, the function assigned will be triggered by the SNS publication. arn:aws:sns:us-east-1:137112412989:amazon-linux-2-ami-updates Note: For other SNS topics of interest visit this Github repository. Lambda The python code (see below) is what powers the Lambda function. Its true purpose is to start a Gitlab CI runner. The Gitlab pipeline is started by using a Gitlab trigger deploy token. The deploy token can be created by going to a Gitlab repository’s settings; Settings -\u003e CI/CD -\u003e Pipeline triggers. Further documentation can be found here. The deploy token is added as an encrypted environment variable which is later decrypted during the lambda invocation. The Gitlab repository’s ID is also added as an environment variable. The project ID can be found under a project’s settings; Settings -\u003e General-\u003e General Project. import json import boto3 import os from botocore.vendored import requests from base64 import b64decode projectId = os.environ['projectId'] token = os.environ['token'] tokenDecrypted = boto3.client('kms').decrypt(CiphertextBlob=b64decode(token))['Plaintext'].decode(\"utf-8\") def lambda_handler(event, context): try: r = requests.post('https://gitlab.com/api/v4/projects/%s/ref/master/trigger/pipeline?token=%s'%(projectId,tokenDecrypted)) return { 'statusCode': r.status_code, 'body': r.text } except: print(json.dumps('The event object: ' + str(event))) Note: Ensure the Lambda role has proper permissions for KMS related actions. IMPORTANT: If the customized AMI needs to be updated at a more frequent cadence, then a CloudWatch event rule can be attached to the Lambda so that the automation may be done at a more frequent cadence. Gitlab CI YML .gitlab-ci.yml is the file that controls the Gitlab CI Runner. This is where the pipeline is defined; images, stages, ste","date":"2019-03-22","objectID":"/posts/automate-custom-ec2-ami/:0:0","tags":["aws","automation","terraform","packer","ami","ci/cd"],"title":"Automate Custom EC2 AMIs","uri":"/posts/automate-custom-ec2-ami/"},{"categories":["posts"],"content":"Why? As awesome and powerful Terraform is, there are times when you find yourself unable to execute certain actions for your automation. This could be due to many reasons including: no Terraform resource for the AWS service, the API action is only available through the CLI/SDK, or you find yourself in a situation where it might be easier to execute an action through the CLI. The situations go on and on however, the point is we all work in varying environments with different resources and constraints. This article was originally published on Medium. Link to the Medium article can be found here. How? At the time of this writing the AWS Route53 resolver endpoint is lacking a Terraform resource(s). However, this does not mean we can’t create the desired resources without Terraform. Let’s take a peek at how we can create a route53 resolver endpoint through Terraform. Requirements: Ensure the AWS CLI can create the desired resource Have the AWS CLI and required version available in your environment Proper AWS credentials available and configured The AWS CLI is able to create route53 resolver endpoints, both inbound and outbound. Now, let’s get to it! Example code can be found at https://github.com/karl-cardenas-coding/route53resolver-endpoint The simplest way to have Terraform execute our CLI command is by leveraging the null_resource and the provisionerlocal-exec. The null_resource won’t create anything, but it allows us to invoke other provisioners. The local exec provisioner allows us to execute a command on the instance where Terraform is currently running. resource \"null_resource\" \"create-endpoint\" { provisioner \"local-exec\" { command = \"aws route53resolver create-resolver-endpoint --creator-request-id ${var.creator-request-id} --security-group-ids ${local.security-groups} --direction ${var.direction} --ip-addresses ${local.list-ip-template} --name ${var.endpoint-name} --tags ${var.tags} --profile ${var.aws-profile} \u003e ${data.template_file.log_name.rendered}\" } } So what’s going on above? Well, in simple terms, I am passing in the AWS CLI command for route53resolver create-resolver-endpoint, but rather than hard coding values, I am using interpolation which allows us to turn this into reusable code (terraform module). I am also passing the output of the command into a data template file \u003e ${data.template_file.log_name.rendered} The reason I am passing the output into a data template_file is so that I may later reference the template in order to grab the output and use it as Terraform output variable. The data local_file resource is reading the data template_file rendered output which we then later have the output variable aws-cli-output use as the source for value. The workflow: null_resource... → data.template_file → data.local_file → output data \"template_file\" \"log_name\" { template = \"${path.module}/output.log\" } data \"local_file\" \"create-endpoint\" { filename = \"$(data.template_file.log_name.rendered)\" depends_on = [\"null_resource.create-endpoint\"] } # Now we can use the aws cli return text as an output variable output \"aws-cli-output\" { value = \"${data.local_file.create-endpoint.content}\" } Turning console output into a Terraform Output Variable If you take a closer look at the null_resource that invokes the AWS CLI, you’ll see that I am using local variables. I do this, so the module has the flexibility to pass in multiple IP addresses, subnets and security groups. If we look at the AWS CLI documentation, this is what the syntax looks like for multiple IP addresses. --ip-addresses (list) SubnetId=string,Ip=string ... It would get pretty ugly if we had someone pass a string that looked like something this: SubnetId=subnet-0c198d46,Ip=10.1.1.6 SubnetId=subnet-0c198d58,Ip=10.1.1.7 Not to mention there is a space in the middle of subnets in the second example (ugly indeed…) So how do we overcome this challenge? We can again leverage the data template_file resource to manipulate the string as needed, based on the count of subnet","date":"2019-02-17","objectID":"/posts/awscli-terraform/:0:0","tags":["aws","automation","terraform","ci/cd"],"title":"Invoking the AWS CLI with Terraform","uri":"/posts/awscli-terraform/"}]