rbstp.dev

How a Developer Uses Claude Code Effectively

Mon, 03 Nov 2025 03:05:02 GMT

A breakdown of how one developer integrates Claude Code into their daily engineering workflow.

Core Setup: The `CLAUDE.md` File

The CLAUDE.md file acts as the repo’s operating manual for the agent. It defines the main tools, APIs, and conventions the team uses. Keep it concise and reference long documentation by link rather than embedding large sections. The goal is clarity, not verbosity.

Managing Context and Sessions

Use /context to track memory size, and reset with /clear + /catchup to refresh the workspace after file changes. For larger projects, save progress to a .md file before clearing so you can easily resume later. This balance keeps the agent aware but prevents context overload.

Custom Slash Commands

Commands like /catchup or /pr streamline recurring workflows. Simplicity is key—if you need a long list of commands, you’re probably adding friction rather than speed.

Subagents and Collaboration

Instead of relying on multiple rigid subagents, it’s often better to let the main agent manage context and spawn lightweight clones when needed. This keeps workflows flexible and reduces coordination overhead.

Hooks for Validation

Hooks enforce consistency and prevent bad commits. Two types are common:

Blocking hooks stop actions like git commit if tests fail.
Hint hooks just log or suggest improvements.
Run blocking hooks at commit time, not during editing, to keep feedback useful but non-disruptive.

Skills and MCPs

“Skills” formalize scripts and CLIs the agent can call directly. MCPs (Model Context Protocol) are best used as focused data gateways, exposing a few powerful entry points—like raw data or controlled execution—rather than huge APIs.

Integrations and Configuration

The Claude Code SDK works for automation and prototyping, while the GitHub Action plugin brings the agent into CI pipelines for PR automation and logging. Common settings.json options include proxy setup, timeout controls, and shared enterprise keys.

Takeaway

Treat Claude Code like a teammate who follows your engineering principles: document its tools clearly, manage context deliberately, and build around stable, auditable workflows.

How to Make Your LLM Not Average

Mon, 20 Oct 2025 11:33:41 GMT

To make a language model less predictable and more insightful, apply the following strategies:

Use a negative style guide
Clearly state what you don’t want the model to do.
Force divergence in choice
Push the model to break away from default phrasing or reasoning patterns.
Avoid normal patterns
Prevent it from falling into habits of:
- equivocation (vague hedging)
- clichés
- templated responses
Make the model self-aware
Instruct it to identify and critique its own templates or stylistic patterns, then alter them.
Encourage self-critique
After an initial output, ask it to:
- review and critique its own response
- generate a second, improved version
Switch models for perspective
Use multiple models to critique and refine outputs.
Different models excel at different types of reasoning.
Challenge consensus
Provide examples that go against the norm and explain why the usual consensus is flawed.

Source:
The AI Daily Brief: Artificial Intelligence News and Analysis — “5 Prompting Tricks to Make Your AI Less Average,” Oct 19, 2025
Apple Podcasts link

Introducing Claude Skills

Fri, 17 Oct 2025 02:34:01 GMT

Overview

At the start of a session Claude’s various harnesses can scan all available skill files and read a short explanation for each one from the frontmatter YAML in the Markdown file. This is very token efficient: each skill only takes up a few dozen extra tokens, with the full details only loaded in should the user request a task that the skill can help solve.

Claude Code is, with hindsight, poorly named. It’s not purely a coding tool: it’s a tool for general computer automation. Anything you can achieve by typing commands into a computer is something that can now be automated by Claude Code. It’s best described as a general agent. Skills make this a whole lot more obvious and explicit.

MCP

Over time the limitations of MCP have started to emerge. The most significant is in terms of token usage: GitHub’s official MCP on its own famously consumes tens of thousands of tokens of context, and once you’ve added a few more to that there’s precious little space left for the LLM to actually do useful work.

My own interest in MCPs has waned ever since I started taking coding agents seriously. Almost everything I might achieve with an MCP can be handled by a CLI tool instead. LLMs know how to call cli-tool --help, which means you don’t have to spend many tokens describing how to use them—the model can figure it out later when it needs to.

Skills have exactly the same advantage, only now I don’t even need to implement a new CLI tool. I can drop a Markdown file in describing how to do a task instead, adding extra scripts only if they’ll help make things more reliable or efficient.

Source: Claude Skills are awesome, maybe a bigger deal than MCP

Additional reading:

Agent Package Manager (APM) in action

Mon, 13 Oct 2025 16:59:13 GMT

Agent primitives are software: Your .prompt.md and .instructions.md files represent executable natural language programs that deserve professional tooling infrastructure.

Runtime diversity enables scale: Agent CLI runtimes provide the execution environments that bridge development to production.

Package management is critical: APM provides the npm-equivalent layer that makes agent primitives truly portable and shareable.

Production ready today: This tooling stack enables automated AI workflows in CI/CD pipelines with enterprise-grade reliability.

Ecosystem growth pattern: Package management infrastructure creates the foundation for thriving ecosystems of shared workflows, tools, and community libraries

How to build reliable AI workflows with agentic primitives and context engineering

Talk, Tag, Build: How Three Prompts Automate Product Development

Mon, 13 Oct 2025 02:20:05 GMT

Setup

it's three markdown files, and it's basically three prompts. What I figured out is, okay, if you want to build something, whether it's the whole app or just a feature of the app, you of course have to create a PRD, a product requirement doc, and again, it's like, duh. But a lot of people don't know this.

What I do is I have this file, you just tag it, and then you tag the filename, and then I now use Whisper Flow, and I'm not an investor, I'm not being paid to say that, but I just click the Alt key on my keyboard, and I just start blabbing about this feature and everything about it, and then I tag this file called Create PRD, and then the model goes to work, and the prompt is basically like, okay, take the user's requirements, ask them five clarifying questions, and then once you get the answer, then build out a pretty typical PRD. Yeah. Great.

You got a PRD. You specify a lot of things, but now what? So, then the second file is called generate tasks.

And so, you tag the PRD, you tag the generate task file, and you hit go on the agent. And the prompt is pretty simple. Look at this PRD, pretend that you're translating it for a junior developer.

Break it down.

Break it down as tasks, make every task atomic, use dot notation, put a list of the files that you think you're going to need to add at the top, and then say go. And then, there's the third one is, which is iterating the task, which is do one task at a time, ask for permissions when you don't understand things. And that process, believe it or not, most people don't do or understand or have the discipline to run.

But when you do, you can build big, big features pretty reliably.

Source: Deployed: The AI Product Podcast: Real Talk on Building Coding Agents: A Conversation with Amp's Builder-in-Residence

Quote from "The Peterman Pod: Mozilla Firefox CTO on Browser War Stories and the Path to Distinguished Engineer"

Sat, 11 Oct 2025 17:52:26 GMT

I think it is one of the most powerful ways to create clarity of thinking for yourself, because clarity of thinking is, in my view, one of the most important things. And you only can really get that by forcing yourself to write down your ideas. And so I suppose one pertinent or timely piece of advice is, if you care about this, do not use an LLM to produce text for YouTube.

Because first of all, if you're putting your name on this thing, at least today, the LLM is like, it's pretty easy to tell when somebody does that. But more importantly, you are losing the opportunity to do the thinking yourself. And that thinking that happens when you write is one of the most important types of thinking for growth.

Bobby Holley

Shipping a Non-Trivial Feature with Agentic AI: Notes from Ghostty

Sat, 11 Oct 2025 15:53:22 GMT

Mitchell Hashimoto describes how he shipped Ghostty’s unobtrusive macOS update UI using agentic AI with deliberate human oversight.

Plan before you prompt

Define the shape of the solution first. Hashimoto chose Sparkle’s custom UI path and a titlebar accessory concept, then used an agent to prototype UI states, not to “build everything.” Humans kept control of scope and taste.

Iterate, then de-slop

Multiple cleanup passes moved code to sane places, added documentation, and elevated state to the app level. He emphasizes owning the code and performing a final manual review before shipping.

Pivot when blocked, simulate to harden UX

When titlebar constraints fought the design, he pivoted to an in-window overlay and added simulation scenarios to exercise error and “no update” paths before wiring the backend.

Outcome: 16 sessions, ~$15.98 token spend, ~8 hours of focused work. Live for tip users now and slated for Ghostty 1.3.

Source : https://mitchellh.com/writing/non-trivial-vibing

Claude Code Plugins

Thu, 09 Oct 2025 23:52:53 GMT

After the release from Gemini Cli, Claude Code is now offerings plugins. It seems taylor made for the Enterprise or teams, to quickly share configurations between team members. Excited to try it out.

Source: Claude Code plugins

Integration in Slack of Claude Agent SDK

Thu, 09 Oct 2025 01:24:32 GMT

The co-founder of Instagram, Mike Krieger, which is now at Anthropic, mentioned that they use the Claude Agent SDK in a automated Slack channel when there is a new incident. It is name Claude Pager and it gathers all the context of the incident and can be poked to give out infromation when needed.

I have been waiting to implement something similar in our stack, once we are finally off the waiting list of the Datadog MCP waitlist...

Source: Big Technology Podcast

Vibe Engineering

Tue, 07 Oct 2025 20:58:01 GMT

Following the vibe research, we now get vibe engineering from Simon Willison:

Definition

Vibe engineering establishes a clear distinction from vibe coding. It signals that this is a different, harder and more sophisticated way of working with AI tools to build production software.

Requirements

It’s also become clear to me that LLMs actively reward existing top tier software engineering practices:

Automated testing. If your project has a robust, comprehensive and stable test suite agentic coding tools can fly with it. Without tests? Your agent might claim something works without having actually tested it at all, plus any new change could break an unrelated feature without you realizing it. Test-first development is particularly effective with agents that can iterate in a loop.

Planning in advance. Sitting down to hack something together goes much better if you start with a high level plan. Working with an agent makes this even more important—you can iterate on the plan first, then hand it off to the agent to write the code.

Comprehensive documentation. Just like human programmers, an LLM can only keep a subset of the codebase in its context at once. Being able to feed in relevant documentation lets it use APIs from other areas without reading the code first. Write good documentation first and the model may be able to build the matching implementation from that input alone.

Good version control habits. Being able to undo mistakes and understand when and how something was changed is even more important when a coding agent might have made the changes. LLMs are also fiercely competent at Git—they can navigate the history themselves to track down the origin of bugs, and they’re better than most developers at using git bisect. Use that to your advantage.

Having effective automation in place. Continuous integration, automated formatting and linting, continuous deployment to a preview environment—all things that agentic coding tools can benefit from too. LLMs make writing quick automation scripts easier as well, which can help them then repeat tasks accurately and consistently next time.

A culture of code review. This one explains itself. If you’re fast and productive at code review you’re going to have a much better time working with LLMs than if you’d rather write code yourself than review the same thing written by someone (or something) else.

A very weird form of management. Getting good results out of a coding agent feels uncomfortably close to getting good results out of a human collaborator. You need to provide clear instructions, ensure they have the necessary context and provide actionable feedback on what they produce. It’s a lot easier than working with actual people because you don’t have to worry about offending or discouraging them—but any existing management experience you have will prove surprisingly useful.

Really good manual QA (quality assurance). Beyond automated tests, you need to be really good at manually testing software, including predicting and digging into edge-cases.

Strong research skills. There are dozens of ways to solve any given coding problem. Figuring out the best options and proving an approach has always been important, and remains a blocker on unleashing an agent to write the actual code.

The ability to ship to a preview environment. If an agent builds a feature, having a way to safely preview that feature (without deploying it straight to production) makes reviews much more productive and greatly reduces the risk of shipping something broken.

An instinct for what can be outsourced to AI and what you need to manually handle yourself. This is constantly evolving as the models and tools become more effective. A big part of working effectively with LLMs is maintaining a strong intuition for when they can best be applied.

An updated sense of estimation. Estimating how long a project will take has always been one of the hardest but most important parts of being a senior engineer, especially in organizations where budget and strategy decisions are made based on those estimates. AI-assisted coding makes this even harder—things that used to take a long time are much faster, but estimations now depend on new factors which we’re all still trying to figure out.

Vibe Research

Tue, 07 Oct 2025 11:42:54 GMT

“vibe research”. Like “vibe coding”, it’s performing a difficult technical task by relying on the model. I have a broad intuitive sense of what approaches are being tried, but I definitely don’t have a deep enough understanding to do this research unassisted. A real AI researcher would get a lot more out of the tool.

The basic loop of doing AI research with Codex (at least as an enthusiastic amateur) looks something like this:

Codex makes a change to the training script and does three or four runs (this takes ~20 minutes overall)

Based on the results, Codex suggests two or three things that you could try next

I pick one of them (or very occasionally suggest my own idea) and return to (1).

Source: https://www.seangoedecke.com/ai-research-with-codex/

Codex Goes GA: New Features

Tue, 07 Oct 2025 01:50:05 GMT

OpenAI just announced that Codex is now generally available, and they're shipping some really practical features that make it way more useful for engineering teams.

What's New

The big additions are a Slack integration, the Codex SDK, and better admin tools. But honestly, the SDK and GitHub Actions integration are what really caught my attention.

Slack Integration: Delegate Like a Coworker

You can now tag Codex directly in Slack channels or threads to kick off tasks. Just treat it like another team member. Need someone to investigate a bug or refactor something? Drop it in Slack and let Codex handle it in the background.

Check out the demo from DevDay to see how smooth this workflow is.

The SDK: Embed Codex Anywhere

This is where it gets interesting. The Codex SDK lets you embed the same agent into your own tools and workflows with just a few lines of TypeScript code.

import { Codex } from "@openai/codex-sdk";

const codex = new Codex();
const thread = codex.startThread();
const result = await thread.run(
  "Make a plan to diagnose and fix the CI failures"
);
console.log(result);

You can continue on the same thread or resume past threads. It handles context management automatically, and outputs are structured so you can parse agent responses easily.

GitHub Actions: Auto-Fix Your CI

Here's the really cool part. You can now add Codex directly into your CI pipeline to automatically fix failing tests. When your CI fails, Codex spins up, diagnoses the issue, applies a minimal fix, and opens a PR for review.

The setup is straightforward. Add this to .github/workflows/codex-autofix.yml:

name: Codex Auto-Fix on Failure

on:
  workflow_run:
    workflows: ["CI"]
    types: [completed]

permissions:
  contents: write
  pull-requests: write

jobs:
  auto-fix:
    if: ${{ github.event.workflow_run.conclusion == 'failure' }}
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v4
      - uses: openai/codex-action@main
        with:
          openai_api_key: ${{ secrets.OPENAI_API_KEY }}
          prompt: >-
            Read the repo, run tests, identify minimal fix, 
            implement only that change. Keep it surgical.

That's it. Codex watches for failures, fixes them, and submits a PR. You review and merge like any other contribution.

The full setup guide is here on the OpenAI docs.

Bottom Line

The GA release makes Codex feel less like an experiment and more like a production tool. The SDK and GitHub Actions integration are the standout features here. Being able to automate CI fixes and embed Codex into your own workflows opens up a lot of possibilities.

OpenAI's AgentKit

Tue, 07 Oct 2025 01:32:38 GMT

OpenAI just dropped AgentKit at their DevDay event, and it's basically their answer to making AI agent development way less painful. Think of it as Zapier, but specifically designed for building and orchestrating AI agents.

What Is AgentKit?

Until now, building AI agents meant juggling a bunch of different tools. Custom orchestration, manual evaluation pipelines, connector setup, prompt tuning, and weeks of frontend work. AgentKit bundles all of that into one unified platform.

The toolkit has four main pieces:

Agent Builder This is the visual canvas where you drag and drop nodes to create multi-agent workflows. You get versioning, guardrails, and the ability to preview runs right in the interface. No more writing complex orchestration code.
Connector Registry Basically a central admin panel where you manage how your agents connect to internal tools and third-party systems. Keeps things secure while giving your agents access to what they need.
ChatKit Pre-built UI components so you can embed chat-based agent experiences directly into your product. Way faster than building custom interfaces from scratch.
Evals for Agents New evaluation tools that let you measure how well your agents are performing. You get trace grading, datasets, automated prompt optimization, and you can even test against other models like Claude or Gemini.

The Live Demo

During the keynote, OpenAI engineer Christina Huang built a complete AI agent workflow in under 8 minutes to show how fast you can go from idea to working agent. Pretty impressive.

Why It Matters

This is OpenAI's push to become more than just a model provider. They want to be the full platform for building agentic applications. It's a direct play against tools like Zapier's agent builders and other automation platforms.

Vibe Coding Audiobook Review

Mon, 06 Oct 2025 02:02:38 GMT

Story: ⭐⭐⭐ | Narration: ⭐⭐⭐⭐⭐

Gene Kim and Steve Yegge’s audiobook on vibe coding has excellent narration, but I’d recommend checking out their YouTube channel instead. Most of the book’s examples were already covered in their promotional videos, and the high-level approach makes you wonder how long the advice will stay relevant in this fast-moving space.

The core framework they present, the Three Developer Loops, is genuinely useful though:

Inner Loop (Seconds to Minutes)

This is where AI shines brightest. Get AI to write specifications and tests, use it as your “Git maestro” for quick iterations. The speed here is incredible.

Middle Loop (Hours to Days)

Focus on getting code into users’ hands quickly, working with AI on real issues, and maintaining that human collaboration element. It’s about keeping your development cycle tight while staying grounded in actual user needs.

Outer Loop (Weeks to Months)

The bigger picture stuff: preventing “kitchen fires” through stress tests and automation, detecting issues with proper CI/CD pipelines, and correcting problems systematically. This is where the Prevent-Detect-Correct cycle really matters.

Other Notes

Parts 3 and 4 had some solid insights, but nothing groundbreaking if you’re already working with AI tools. The kitchen analogy gets overused pretty quickly, which doesn’t help.

If you’re new to AI-assisted development, the YouTube series gives you the essentials without the time commitment. If you’re already familiar with these concepts, you won’t find much new here beyond the well-produced audio experience.

Introduction

Sun, 05 Oct 2025 16:58:11 GMT

Welcome to my vibe-coded blog, built on GitHub Gists. I've been following Simon Willison's excellent blog for a few months, and this post in particular sparked something for me: My approach to running a link blog. A few months later, here we are.

Why?

Why not? I'm preparing for a talk at work about AI and going beyond "vibe-coding". I figured I'd make my notes and preparation public as part of the process.

This site was built for fun with a terminal-style theme; fitting, since I spend most of my time in a terminal as a DevOps engineer. I know the title "DevOps engineer" can be controversial, but it is what it is.

Comments and discussions are welcome on GitHub: rbstp/gist-blog.

How?

Since I work with Linux images all day, I decided to use Linux as my daily driver outside of work (where we use macOS).

My son loves dinosaurs, so I picked the Bluefin distro (based on Fedora). It's been solid so far, especially the dev mode, and it was easy to get started. I used Ubuntu before, so this was a nice change and a chance to learn new things.

What's Next?

A lot of reading, practice, and constant learning. I'm aiming for a similar feel to Simon Willison's blog.
Feel free to tag along!