Encyclopedia Autonomica

Autonomous Agents in Space Logistics and In-Space Manufacturing

Jan Daniel Semrau (MFin, CAIO) — Mon, 30 Mar 2026 11:07:43 GMT

It's been too long since I've written about space.

Now that Artemis 2 is launching to the moon in just a (likely) couple of hours, I figured it might be a good idea to talk about Autonomous Agents and ISAM again.

I believe, and hope that I am right, that we are just at the cusp of a civilizational upgrade.

And in my opinion, AI in general and autonomous agents in particular will kick off a Renaissance of Exploration. Not only in space, but also within oceans.

If you think that AI will kill all the jobs, you are lacking imagination.

My hypothesis is that autonomous agents will evolve into an important operating layer of space logistics and in-space manufacturing.

In the U.S. ecosystem, the shift is already visible across NASA’s ISAM program, commercial on-orbit servicing missions, refueling demonstrations, and orbital manufacturing flights that return products to Earth.

If you look at this screenshot from the ISAM website, it becomes quickly obvious what I am talking about.

source

In that sense, space mobility is no longer just about launch and transit.

Not only because of the renewed interest in data centers in space.

If we intend to have persistent assets in space that we can use for more than providing internet services, then Space Mobility will also include autonomous inspection, rendezvous, docking, servicing, refueling, assembly, production, and return logistics, often in environments where latency, cost, and risk make human presence too dangerous or too expensive.

Subscribe now

From transport to operations

The traditional picture of space mobility focused on getting payloads from Earth to orbit. That model is now widening into a networked logistics stack:

satellites need servicing,
stations need resupply,
hardware needs assembly, and
products may be manufactured in orbit and brought back to Earth.

Autonomy matters because orbital operations are constrained by communication delays, limited crew time, and narrow mission windows. In that setting, autonomous agents are not just decision-support tools; they are mission controllers, robotic coordinators, fault handlers, and task planners that reduce human workload while increasing mission tempo.

NASA’s ISAM program is the clearest institutional expression of this shift. NASA describes ISAM as a framework for in-space servicing, assembly, and manufacturing that includes robotic technologies for extending spacecraft life, refueling, repairing, and building systems in orbit. That makes it a natural home for autonomous agents, because every one of those tasks requires sensing, planning, manipulation, and contingency handling in a high-risk environment.

Of course, similar to autonomous driving, latency will likely not be good enough in the near term to have an AI Agent reason about how to execute a docking maneuver autonomously.

Why is autonomy different in space?

Autonomy in space is not the same as autonomy on Earth. Orbital systems operate with limited power, strict thermal budgets, narrow communications windows, and extremely high costs for failure. Once a spacecraft is deployed, it must often make decisions with incomplete information and recover from off-nominal situations without immediate ground intervention.

That creates a strong case for layered autonomy.

A mission may use low-level high-frequency controls for guidance and navigation, mid-level planning for task sequencing, and higher-level agents for reasoning about goals, constraints, and contingencies. The value of the agent is not only that it can execute a sequence, but that it can adapt when the sequence breaks down.

This is especially relevant for logistics.

Unlike a single-point mission, like placing a satellite in a pre-defined orbit, logistics is about coordination across multiple nodes, vehicles, payloads, and time horizons.

In space, this means that autonomous scheduling, rendezvous management, robotic servicing, and return logistics all need to be treated as part of one system.

NASA’s ISAM program

Similar to the role DARPA had for the Internet and its economy, NASA is the most important U.S. government anchor for this topic, probably in the world. The main economic incentive for ISAM is that it is ecologically AND economically viable to bring goods into space and return them frequently. Even if SpaceX’s Starship will provide much larger payloads in the near to mid-term.

It has been estimated that “one Starship launch produces 76,000 metric tons of carbon dioxide equivalent.”

Therefore, ISAM needs to have the goal to mature technologies for extending satellite life, refueling and repairing spacecraft, and enabling assembly and manufacturing in orbit.

The significance of ISAM is that it treats orbit as the operational environment and not just a destination.

Instead of assuming every spacecraft is disposable, we have to build toward reusable systems, maintenance capability, and orbital infrastructure, similar to how our cars and trains are reusable. Still, the environment where we are operating is not conducive to human health; therefore, in this vision of an autonomy-heavy future, robots and software must perform tasks that historically required a full ground team or a crewed mission.

Within this framing, autonomous agents become the digital layer that turns capability into operations. A robotic arm, inspection spacecraft, or assembly platform still needs planning, health monitoring, task decomposition, and exception handling.

In other words, autonomy is what converts hardware into a service.

Let’s develop a rough timeline.

The servicing layer

On-orbit servicing is one of the most mature near-term applications for autonomous space systems. It includes inspection, life extension, refueling, relocation, repair, and disposal support. These tasks are logistically complex because they require precise approach trajectories, relative navigation, robotic interaction with existing satellites, and high reliability in a domain where physical contact is expensive and risky.

SpaceLogistics, a Northrop Grumman company, is a particularly interesting solution to this problem. Its mission-extension and robotic servicing work illustrate how autonomy underpins commercial satellite life extension, especially for assets in GEO.

Another company active in this layer is Space Phoenix Systems. Their core offering is a "Returnable Payload-on-Demand" (R/PoD) service, which allows companies to send materials or experiments to orbit and get products returned to Earth. They target industries where microgravity enables breakthroughs impossible on Earth, including semiconductors, biotechnology and pharmaceuticals, and photovoltaics. Examples of customers include companies growing purer semiconductor crystals, manufacturing higher-clarity fiber optics, and bioprinting artificial retinas in orbit.

Refueling as logistics

Refueling is one of the most compelling examples of autonomous space logistics because it requires high precision and trust.

If you consider that satellites in LEO need to maintain a speed of 28,000 to 29,000 km/h to keep orbital velocity to not experience orbital decay, a refueling mission evolves into a coordinated high-risk interaction between spacecraft that must approach, align, connect, and transfer propellant safely.

Astroscale is a New Space company that has already flown refueling missions for a U.S. Space Force asset.

Starfish’s Otter Pup 2 mission is also attempting a first commercial satellite docking in low Earth orbit. Even though their recent partner D-Orbit didn’t work out. Starfish Space won a $54.5 million contract from the U.S. Space Force to build and operate a spacecraft designed to support military satellites in geostationary Earth orbit.

For refueling, agents can simplify mission architecture by automatically interpreting navigation data, managing relative motion, and handling safety constraints. Human operators may supervise the mission, but the actual execution requires machine precision and often machine-level responsiveness.

Autonomous logistics in GEO

Geostationary orbit (GEO) is also a useful place to discuss autonomy because the assets there are expensive, long-lived, and strategically valuable, despite being much further away from Earth.

Since its much further away, a GEO satellite that loses fuel, drifts, or degrades may still be salvageable if a service vehicle can reach it in time.

That makes GEO an interesting proving ground for autonomous servicing platforms.

The manufacturing frontier

But the real frontier will be the production of goods and services. In-space manufacturing changes the logic of supply chains. Instead of launching every component from Earth, some materials, parts, or even finished products may be produced in orbit where microgravity offers unique advantages.

This is why manufacturing is increasingly included under the ISAM umbrella rather than treated as a separate curiosity.

Varda Space Industries is an interesting provider here because it has already demonstrated orbital manufacturing with recovery back to Earth. That closed-loop model is important: it combines production in space with return logistics, proving that orbit can function as part of a broader industrial system rather than as a one-way destination.

For autonomous agents, manufacturing in orbit requires several layers of coordination. Systems must monitor process conditions, manage thermal and power constraints, sequence operations, and coordinate return windows. If the manufacturing platform is also autonomous, then its value lies not only in producing material but in reducing the operational burden of producing it.

As mentioned, to me, Varda is interesting because they bridge manufacturing and logistics in one mission architecture. The company’s orbital capsules are not just production platforms; they are also reentry and recovery systems, which means the logistics problem does not end when the part is made.

The return path is part of the product.

Space mobility as infrastructure

The Aerospace Corporation treats mobility as an emerging infrastructure layer rather than a single mission type. And I think they are right. If we see mobility as the connective tissue linking launch, rendezvous, servicing, and manufacturing, then autonomous agents are the control logic of the infrastructure.

Now agents manage schedules, navigate the constraints of orbital mechanics, handle contingency operations, and support safe interaction between assets.

Without autonomy, the system scales poorly; with autonomy, it begins to resemble a real logistics network.

This is a major shift from older mission design.

Historically, spacecraft were isolated platforms. Now they are increasingly nodes in a service ecosystem, and that ecosystem requires machine-coordinated operations. A robotic vehicle can execute motions, but an agent can also interpret task goals, choose among alternatives, and adapt to changing conditions.

The SpaceAgents-1 research paper proposes a hierarchical multi-agent architecture for microgravity collaboration, including a Decision-Making Agent and Skill-Expert Agents. Even though the paper focuses on human and multi-robot collaboration, the architecture is highly relevant to orbital logistics because it shows how complex activities can be decomposed into specialized agent roles.

That decomposition maps well onto space logistics. One agent can manage mission planning, another can supervise navigation, another can evaluate system health, and another can coordinate robotics or payload operations. In practice, this kind of architecture helps turn a spacecraft or platform into a service-oriented autonomous system.

Human-robot collaboration

Fortunately, autonomy in space does not eliminate humans; it changes their role. In many near-term systems, humans still define objectives, validate safety, and handle exception cases, while agents execute routine or semiautonomous operations.

That matters in microgravity and orbit because the environment is too risky for fully manual operation, yet too complex for simple static automation. The result is a collaborative model in which agents and humans share responsibility, with agents carrying more of the operational burden as confidence grows.

This is one reason why in-space manufacturing is such a fascinating topic.

Manufacturing creates repeated, structured, and measurable tasks, which are ideal for agentic orchestration. A platform that can autonomously manage workflows, track process state, and call for human intervention only when needed is much more scalable than a platform that depends on constant supervision.

Autonomy changes economics by lowering the cost of repeated operations.

In space, that can mean fewer dedicated crew hours, reduced dependence on scarce human operators, better asset utilization, and longer mission lifetimes. It can also create new revenue streams, such as orbital servicing, payload processing, and in-space manufacturing.

This is important because many space business models only work if operations become routine. A one-off demonstration is interesting, but a real market emerges when servicing or manufacturing can be repeated, scheduled, and priced like an infrastructure service. Autonomy is what makes repetition plausible at scale.

For that reason, the article should emphasize that autonomous agents are not a futuristic add-on. They are the operational enablers that make a logistics economy viable beyond Earth.

The strategic conclusion

The U.S. is unsurprisingly leading in this space because logistics, servicing, and manufacturing sit at the intersection of commercial growth and national defense capability. NASA’s ISAM program provides the research and technology backbone, while commercial firms are turning those capabilities into operational missions.

What is unique now is that several capabilities are converging at once.

Orbital servicing is becoming commercially credible, in-space manufacturing is moving from experiment to product, and government programs are explicitly investing in robotics and autonomy for orbital operations.

That combination matters because autonomous space logistics is not just a technology story.

It is also a policy, industrial, and strategic story about keeping orbital assets useful for longer and building a market for reusable, maintainable, and serviceable systems.

And, in my opinion, the European Union will never get to the speed required to compete effectively in this space.

As we are moving from isolated spacecraft toward serviced orbital infrastructure, autonomous agents become the natural operating model for space mobility. They are evolving into the hidden infrastructure that allows space logistics and manufacturing to exist as scalable industries.

Subscribe now

The Absurd Effectiveness of Skills.md

Jan Daniel Semrau (MFin, CAIO) — Mon, 23 Mar 2026 03:07:40 GMT

“For the things we have to learn before we can do them, we learn by doing them… we become just by doing just acts, temperate by doing temperate acts, brave by doing brave acts.”

(Nicomachean Ethics, Book II)

It was a cold, bright morning in March, and the clocks were striking seven when the first sunlights of a new dawn woke me up after a sleepless night on Ulan Bator airport. Having finished a project in the Far East, it was the fastest and probably safest way home while Iran drowned the Middle East in a curtain of missiles.

And while I was sipping from an overpriced airport Americano, I started to think about the future of knowledge work.

For the last 20 years, the Internet has provided Tech and Labor Arbitrage companies with tremendous fortune. With recent news about the SaaS-apocalypse, one might wonder if that is all about to change?

Initially, I thoughts its a bit naive, as there is so much more to building and running a business than just writing code. But after implementing Agents with skills I am not so sure anymore.

AI Agents will fundamentally change how we build and operate software.

Will AI Agents also fundamentally change how knowledge works?

How to get coffee?

I’ve never been to Ulan Bator airport and have no intention to return. So when I wanted to drink my morning coffee, I had no memory of where I would get a coffee.

Framed as a traditional search problem, I would have started scanning my surroundings for any hints that might point me towards an open coffee shop. Within the environment of the airport, it provides me with the broadest set of options but also the broadest set of failures.

When framed as an answer problem, in the absence of digital tooling, I would have to wait for the information desk to open up so I could ask the airport employee in natural language to receive a directional answer. This answer already filters its response based on training data, guardrails, and implicit bias.

For that reason, I find answer engines like ChatGPT or Claude less useful than search engines. Answer engines limit optionality and largely present a selection of the truth, even with clever prompting.

And I don’t mean Answer engines are bad. I just think they're the wrong tool for certain problems

It’s just that answer engines are optimized for resolution, not exploration.

The response arrives pre-filtered, pre-weighted, pre-concluded. That works when the question is simple, and the cost of a wrong answer is low. But it breaks down when the decision space is genuinely ambiguous. I.e., when you are not 100% sure what you are looking for, and the journey towards that conclusion is the goal

Is there another way?

Thanks for reading Encyclopedia Autonomica! This post is public so feel free to share it.

Over the last few months, I formed the opinion that answer engines are only an intermediary step towards cowork engines. I am still struggling to find a succinct definition, but loosely explained, cowork engines act as critical guides through the problem space.

If you consider this screenshot of my trusty agent Superbill, then the orange section “news” is the search engine that just creates a list of news about the stock. The green section is the answer engine, and together they form the workspace engine (yellow, but actually a combination of all these engines).

In this progression, they shift the primary goal from simply finding information to understanding it, and ultimately to executing complex tasks itself.

And that is something that I believe has changed with the introduction of skills.

Dynamic Process Patterns

Skills provide an incredibly elegant way for agentic systems to understand not only the fact, but also the process. I.e., if a prompt describes the know-that in a single moment, then a skill is a codified playbook that explains know-how within a series of steps.

In practice, that means that skills act as scaffolding that keeps the system stable as it scales in complexity.

Or in other words:

Skills are

dynamic patterns that help the agent navigate a sequence of activities
similar to SOPs (Standard Operating Procedures) skills are reviewed regularly and adjusted where needed,
effective when operating with context reliably.

And if you want to build workspace engines, then this matters because the hardest part of building agent systems is not the model or the agent framework itself, but having them engage with the workspace effectively and reliably.

That means Skills need to be able to ensure:

1) Process stability

A skill encodes a workflow. When the system needs to triage an investment idea, it can load a skill that defines the procedure step by step. When it needs to conduct another workflow, it loads a different skill. Each of these workflows has access to a predefined set of tools.

2) Context efficiency

Without skills, every task becomes just a large prompt. That is expensive and can be brittle. With skills, the system only loads the workflow it needs at the time it’s requested. More importantly, how and which part of workspace is used in skills is included in the skill definition.

3) Memory management

Since they are based on SOP’s, skills encode institutional knowledge. That makes the agent aligned not just to a user’s momentary request, but to a repeatable standard of analysis.

Implementing Skills

Superbill implements skills following the Anthropic definition as local, filesystem‑based modules with simple metadata and procedural instructions.

Each skill is a directory containing a `SKILL.md` file with:

- A name and description (for discovery / progressive disclosure). Progressive disclosure means that the agent doesn’t load all skills at once. Superbill discovers them when relevant.

- A minimal procedure (for execution of deterministic tools and sourcing of knowledge and/or other resources)

I like to register my skills as plugins in my application. This means that the agent can list within the session window available skills and read the specific one that matches the request.

I also implemented a manual “/help” helper function to help me remember.

Portfolio of Capabilities

In contrast to tools, plugins allow me to deterministically execute skills. Superbill’s skill set is modeled around investing processes and should as such resemble an investor’s operating system. Or, how I mentioned in my previous post on the topic an ambient financial intelligence layer. In this layer, UI patterns should be different from a generic ChatGPT/ Claude instance as you can now simply type /skill-name to run a skill.

Each skill executes exactly one distinct workflow.

Additionally, you can add extra context after the name: /compare vs MSFT and NVDA.

Here are the ones that I am using right now.

Research & Analysis

/equity-research-report : Full institutional report: BUY/SELL/HOLD, price target, 8 sections
/investment-quick-screen : Quick screen: Proceed or Reject a new idea
/deep-fundamental-underwriting: Intrinsic value, earnings power, 3-scenario memo
/variant-perception-detection : Find where consensus is wrong, i.,e., a mispricing thesis
/thesis-integrity-audit: Hold / Add / Reduce / Exit after a material event
/news-article-deep-dive: Deep analysis of a headline from stock context

Portfolio & Position Management

/conviction-calibration-engine: Adjust position size based on current conviction
/position-sizing-concentration: Recommended position size and tier
/portfolio-construction-optimizer: Rebalancing recommendation + concentration health score
/daily-portfolio-monitoring: Thesis intact / weakened / broken + action
/exit-discipline-engine: Exit / Trim / Maintain near full valuation
/performance-attribution-error-analysis: Monthly performance report + process refinements

Risk & Crisis

/capital-preservation-protocol: Defensive restructuring plan on drawdown
/crisis-capital-allocation: Capital deployment plan during market stress
/activist-feasibility-analysis: Passive / Engage / Full Activism decision

Utilities

/chart-writeup: Render a bar or line chart of financial data
/compare: Compare this stock vs peers or a specific ticker
/graph-writeup: Render a relationship map or business model diagram
/table-writeup: Present data in a formatted table
/calculator: Arithmetic, ratios, growth rates, unit conversions

The agent can move from one skill to another as the research process evolves.

Let’s start with a simple skill.

A calculator.

---
name: calculator
description: Perform accurate numeric calculations and conversions. Use when the user asks for arithmetic, percentages, ratios, growth rates, unit conversions, or any multi-step computation where precision matters.
---

# Calculator

## Overview

Compute precise results from user inputs, using a reliable calculation method, and return a clear final value with units and rounding.

## Quick Use

1. Restate inputs and assumptions (units, rounding, time basis).
2. Build a single expression or small sequence of steps.
3. Compute with a calculator tool or the local script at `scripts/calc.py` (prefer deterministic math).
4. Return the final result with units and any requested precision.

## Tasks

### Arithmetic and percentages

Use explicit formulas. Example: "X is 12% of 350" -> 350 * 0.12.

### Ratios and growth rates

Normalize to consistent units, then compute. Example: growth rate = (new - old) / old.

### Unit conversions

Convert to base units before combining values. If a conversion factor is ambiguous, ask a clarifying question.

### Ranges and rounding

If inputs are ranges, compute min and max. Round only at the end unless the user specifies otherwise.

## Output

Use a short, direct result format.

Example:

```text
Result: 42.75 USD
```

## Script

Use `scripts/calc.py` for deterministic evaluation when the math is non-trivial or the user requests strict accuracy.

The key ability here is that the agent executes a deterministic script to calculate figures instead of relying on the LLM. I have structured the skills in basic skills (graph/chart generation, data-source finding, calculator, etc) and advanced skills that rely on the basic skills. In my opinion, this gives Superbill a distinct advantage: the system follows a clear process and works with a well-engineered context.

The skill for equity research is already more complex.

Equity Research Report
Purpose: Produce a full sell-side-style research report for a single ticker with a clear recommendation, price target, and probability-weighted scenarios.

Trigger: User asks for a comprehensive equity analysis, full research report, trading ideas, or "banker-style" writeup on a ticker.

Data Sources
Follow this order — do not ask the user which source to use:

Context first: Check  and  for financials, news, sector, peers, and existing thesis. Use what is already loaded.
Fetch if missing — call in parallel where possible:
INCOME_STATEMENT — revenue, gross profit, EBIT, net income (annual, 5yr)
CASH_FLOW — OCF, capex, FCF (annual, 5yr)
BALANCE_SHEET — debt, equity, cash
EARNINGS — EPS history, surprises, forward estimates
OVERVIEW — sector, description, peers, dividend history
get_technical_indicators — RSI, MACD, 50/200-day MAs, 52-week range
get_news — recent headlines (last 30 days)
get_analyst_estimates — consensus price target, range, rating distribution
Web search (if tools missing specific data):
Search in parallel: recent earnings, options flow, insider filings, analyst upgrades/downgrades, sector rotation data.
Ask only if both fail: use request_user_input asking for the specific missing values.

Process
1. Research — Execute in Parallel
Financial performance: revenue growth, margins, key KPIs with exact numbers and YoY/QoQ timeframes.
Market positioning: peer valuation multiples (P/E, EV/EBITDA, P/S, P/B), competitive analysis.
Advanced intelligence: technical levels, options flow (put/call ratios, unusual activity, IV trends), insider filings (Form 4, dollar amounts, executive names), institutional ownership changes.

2. Build the Report — Sections in Order
Executive Summary

Clear BUY / SELL / HOLD with conviction level (High/Medium/Low).
12-month price target with % upside/downside from current price.
1–2 sentence investment thesis: primary catalyst + risk-reward characterisation.
Fundamental Analysis

Recent financial metrics: revenue growth %, margins (gross/operating/net), FCF yield — all with specific numbers and timeframes.
Peer comparison: P/E, EV/EBITDA, P/S vs sector median with named competitors.
Forward outlook: management guidance, analyst consensus EPS/revenue estimates.
Catalyst Analysis

Near-term (0–6 months): earnings dates, product launches, regulatory decisions — include specific dates where known.
Medium-term (6–24 months): strategic initiatives, market expansion, competitive shifts.
Event-driven: M&A potential, index inclusion/exclusion, spin-offs, capital returns.
Valuation & Price Targets

Analyst consensus: $X (range $low–$high).
Bull case $X — state the specific assumption (e.g. margin expansion, market share gain).
Base case $X — consensus assumptions, stable execution.
Bear case $X — risk scenario (e.g. competition, margin compression).
Probability weighting: Bull X% / Base Y% / Bear Z% summing to 100%.
Risk Assessment

Company risks: competitive threats, regulatory exposure, execution risk, leverage.
Macro risks: rate sensitivity, economic cycle, sector rotation impact.
Position sizing: X%–Y% allocation based on beta / volatility / conviction.
ESG: flag only if material to institutional ownership.
Technical Context & Options Intelligence

Current price vs 52-week range; distance from 50-day and 200-day MA.
Key support and resistance levels (specific prices).
Volume patterns: accumulation or distribution signal.
Options flow: put/call ratio, unusual block activity, IV level vs historical, term structure skew.
Momentum: RSI (overbought/oversold), MACD signal.
Market Positioning

Stock performance vs sector ETF and S&P 500 over 1M / 3M / YTD (specific %).
Sector rotation trends affecting the position.
Relative strength vs closest peers.
Insider Signals

Recent insider buy/sell transactions: name, role, dollar amount, date.
Share buyback programme status and remaining authorisation.
Institutional ownership trend: latest 13F changes (add/reduce/new/exit).
Pattern interpretation: what the insider behaviour signals about management conviction.

3. Recommendation Summary Table
Metric	Value
Rating	BUY / SELL / HOLD
Conviction	High / Medium / Low
Price Target	$X
Timeframe	12 months
Upside/Downside	X%
Suggested Position	X%–Y%

Output Standards
Use institutional terminology: EBITDA, EV/Sales, FCF yield, WACC, beta.
All financial metrics must include specific numbers; no vague qualitative statements without data.
Price targets must show the upside/downside calculation.
Cite analyst firms by name when referencing price target updates.
Include both bullish and bearish scenarios — no one-sided reports.
Keep each section tight and scannable: short paragraphs or bullet lists.
End every report with: "This analysis is for educational and research purposes only. Not financial advice. All investments carry risk of loss."

Output Format
The entire research report must be formatted as a thesis-ready markdown document. Write all sections listed under "Build the Report" above in order. The complete report goes into the thesis field of the JSON output.

Thesis Field Rule: Always populate the thesis field in the JSON output with the complete equity research report following the structure above. Include:

Executive Summary with BUY/SELL/HOLD and price target
Fundamental Analysis with specific financial metrics
Catalyst Analysis with dated events
Valuation & Price Targets (bull/base/bear)
Risk Assessment
Technical Context & Options Intelligence
Market Positioning
Insider Signals

Recommendation Summary Table
This is an advanced skill — thesis upserting is mandatory. Never return empty thesis field.

Escalate / Hand-off
If thesis meets high-conviction long criteria → escalate to deep-fundamental-underwriting for full intrinsic value model.
If significant downside risk identified → escalate to capital-preservation-protocol.
If insider/senator signal is anomalous → surface to user before final recommendation.

Interesting to note, in my opinion, is that I enforce that thesis upserting is mandatory. That means that the agent has to upsert the output into the existing thesis. The thesis section within the workspace is the primary output, and it is driven almost entirely by skills.

When I ask for an analysis, the /investment-quick-screen skill generates a one‑page hypothesis summary. I always get the output in the same format. I don’t need to type, and I can continue working with the output.

We have established that skills are workflow. Yet, workflow would be nothing without tool-use. Tools like SEC filings, news aggregation, or insider trades can be exposed to skills. Skills decide when to use them. This keeps the system grounded. And this is also a big difference from MCP. MCP provides a standard way to expose tools. However, this can be unreliable in terms of when and how to use those tools.

Skills solve this.

- The underwriting skill calls the EDGAR/SEC API to ground its analysis in filings.

- The daily monitoring skill sources news and filings across several sources to detect thesis drift.

Skills and Memory

The hardest part of building reliable agent systems is managing memory. Most systems treat memory as a single flat context window, loading everything the agent might need into one undifferentiated blob. This breaks down when the system needs to act on information that operates on very different timescales.

Even here, skills help to structure memory.

Superbill solves this with a 4-tiered memory architecture, where each layer has a distinct purpose, a distinct lifespan, and a distinct relationship to the skills that depend on it.

Procedural memory is the skill itself. It encodes how to reason, not what to reason about. It changes rarely, maybe when a workflow needs refinement, and it never gets contaminated by session-level noise. When the equity-research skill loads, it brings a stable set of instructions that remain constant regardless of which stock is being analyzed or what the market did yesterday. This is the foundation that makes skill output reproducible.

Sequential memory is the active conversation. It carries the current conversation between the context, the user, and the agent: what the user asked, what the agent returned, what decisions were made mid-session. Skills read from it to avoid repeating work or contradicting a prior step, but they don’t rely on it for facts about the world.

Temporal memory is the daily snapshot. This is the “what is true right now” layer: price ratios, open/close data, recent headlines, and the last 24 hours of market signal. I implemented a hard expiry by default. This matters because one of the quieter failure modes in agent systems is acting on stale data without flagging it as stale. A monitoring skill that reads from temporal memory knows exactly how fresh its inputs are.

Artifact memory is the thesis file. This is the durable, versioned record of prior analysis, what the system previously concluded, what price target it set, what risks it flagged. When a crisis protocol loads, it doesn’t start from scratch. It reads the thesis to understand what it previously believed before evaluating whether that belief still holds. This is what separates a reactive system from one that reasons about its own prior positions.

Without this separation, a skill operating on a large flat context can’t easily distinguish a standing instruction from yesterday’s news from a thesis written three months ago. It reasons across all of it as if it were equally valid. The tiering makes that conflation structurally harder. The skill knows what kind of thing it’s reading, and that makes its output more predictable and easier to audit.

When the agent loads the equity-research skill, it has a consistent representation of the stock’s data. When it loads a crisis protocol, it can see the recent news and the thesis state.

In Closing

It would be nice if I could /get-coffee in a chat window, and an ambient intelligence layer would make a coffee for me. We are not there yet, but skills are a big step forward.

Most agent systems are built by stitching prompts and tools together. They can answer questions, but do not embody a disciplined method.

Skills change that.

They make the agent execution sequential without being constrained to simple workflow automation.

For most corporate implementations, you do want reliability. The difference is similar to the difference between a junior analyst improvising and a seasoned investor following a playbook.

What surprised me the most was that the origin of a skill’s efficiency lies in its cognitive and organizational capability. Skills reduce prompt bloat, preserve consistent reasoning, and encode institutional standards. In Superbill, they power the most important features: triage, underwriting, variant perception, sizing, monitoring, and crisis response.

They make the agent reliable enough to run a real research workflow.

I hope you enjoyed this post. Please like, share, and subscribe.

Subscribe now

Pair-programming Superbill with Codex-5.2 and Claude Sonnet 4.6

Jan Daniel Semrau (MFin, CAIO) — Mon, 02 Mar 2026 06:53:28 GMT

Let me start by saying something that might be seen as provocative.

What if the value of writing software has collapsed to zero?

Not the value of software or SaaS companies. Their death might be greatly exaggerated. They still remain with enormous profit margins and multi-year enterprise contracts.

But the act of writing it manually might very well be.

Something has changed.

Since the launch of Sonnet 4.6, Cobol to Java, and Wealth Management as a plugin to CoWork, the stock market for many industries in general, and the software application sector in particular, has been in turmoil. Expressed as the value of the IGV ETF, this sector has lost more than 20% in valuation since the beginning of the year. Of course, that is relative to an all-time-high just a couple of days prior.

So? The stock market is fluctuating. Big deal. One might say. In reality, we all know there is more to software companies than just writing software.

Data network effects. Over the last 20 years, SaaS companies have compiled formidable datasets that are extremely hard to replicate. For example, Salesforce knows your customer history. Workday knows your hiring pipeline and org chart. Since, as mentioned, that data compounds over the years, no agent can replicate it from scratch, even with using synthetic data.

Switching costs. I have been building and integrating software for Enterprises for many years. ERP migrations take 3 years and are a lot of pain. Contract negotiations will take years, integration projects months, and training staff and the new software days if not weeks.

Compliance and trust. And then there is the component of regulation. In regulated industries, like Finance, “an AI built this” is not yet a procurement answer. Vendors with SOC2 compliance certifications, FedRAMP access, and audit trails still have the key to the kingdom.

Distribution. Finally, an established position in the market won’t change overnight. Salesforce has 150,000 customers and a direct sales force. That doesn’t disappear when a better tool exists. But it might erode if the service is better somewhere else. And most SaaS companies have been cutting back on customer support recently.

And in my opinion, these companies can even further improve their profit margins using AI tools themselves.

Subscribe now

Personal story time. I paid for a QuickBooks subscription for many years that I wanted to cancel after I moved to a different accounting solution (not vibe-coded). When I went to their website to cancel it, I could not login because the 2FA was sending the key to a phone number (my old Singaporean number) I don’t have access to anymore. I also could not file a ticket because the customer support system was behind the login screen. Reading through the customer support sections of the website, posting on socials, and emailing customer support also did not help. So when my credit card expired, I could not give them my new one, hence the payment stopped naturally.

So my hope is that AI might actually help to improve customer service.

But I remain doubtful.

Where coding tools are extremely effective is that they don’t threaten the business model; instead, they will further improve margins. If your product team ships 3x faster at the same headcount, you can run more experiments and add new features much faster and hopefully in higher quality.

Therefore, the incumbents who adopt AI coding tools aggressively could actually emerge stronger, not weaker.

However, there is an emerging micro-SaaS market for boring tasks that may currently be dormant.

My hypothesis is that, for creative minds, every month, the time-lag between “I have an idea” and “I have a working system” shrinks.

I wanted to test that hypothesis.

So in February, I set out to build something non-trivial, something that required architectural decisions, real data pipelines, and meaningful reasoning.

And I wanted to see how much of my cognitive load I could offload to a model.

The project: an Agentic Ambient Intelligence Layer for my personal investment management. A spiritual successor to SuperBill with a pinch of OpenClaw.

My personal artificial investment team.

The tools to help me under evaluation: Codex-5.2 and Claude Sonnet 4.6, both as VS Code plugins.

What I found was a clear and somewhat surprising gap between them.

Claude Cowork for Wealth Management

Before I get into what I built, I should explain what pointed me in this direction.

As mentioned, in early February, Anthropic rattled financial services software stocks with the release of Claude Cowork. Cowork is a desktop agent designed to sit alongside knowledge workers and handle multi-step professional tasks across applications.

The wealth management plugin, in particular, caught my attention.

It’s built to do what advisors spend most of their non-client time doing:

prep for client meetings,
build financial plans,
rebalance portfolios, and
identify tax-loss harvesting opportunities.

What makes Cowork unique in my perspective is not the financial domain knowledge. Rather than using Claude as a separate chatbot, it augments enterprise software tools, pulling context and data without users needing to leave the window they’re working in. That is a huge adoption driver. And something I wrote about here and here.

The wealth management plugin specifically can analyze portfolios, identify drift and tax exposure, and generate rebalancing recommendations at scale — wired into real data through connectors for tools like FactSet (also a SaaS company), MSCI, and others.

Anthropics’s broader financial services suite will go further.
It will include

a financial analysis agent that conducts market and competitor research and handles financial modeling;
an equity research agent that parses earnings transcripts, updates financial models, and drafts research notes; and
a private equity agent that reviews large document sets, models scenarios, and scores opportunities against investment criteria.

CoWork’s plugins, besides connecting with Excel (a must-have), bundle skills, connectors, slash commands, and sub-agents for each workflow.

And they’re designed to be customized to a firm’s own voice, templates, and processes.

Watching this unfold, I had a simple reaction: I want one of these for myself.

Since I had built DeepSearch for Investing in 2022/2023, I figured I would refactor the code base using Codex-5.2 and Claude Sonnet-4.6 through Copilot in VS Code.

The system I wanted to build can be best described as a multi-agent loop with persistent memory, sensory market information that grounds truth through real-time market data.

You might wonder, why is this fool using 2 different coding models?

I think Claude Sonnet in Copilot is really fantastic. At work, I am using Codex. I had some free OpenAI credits and wanted to compare which one I like better.

First, for some design considerations.

Overall Stack

For the UI, I moved away from the limitations of Streamlit / Gradio that are fun for simple projects but don’t work well for more complex solutions. Now at the top sits a React + Vite frontend with three sections: a configuration layer, a workspace for investment data and analytical write-ups, and an agent session so I can talk to agent Superbill in real time.

Behind that lives an AI service (port 8001), i.e., the ambient intelligence layer, running an OpenAI Agent SDK with a tool dispatcher and a streamer for live updates from the agent. In the past, I have been building a lot on Smolagents, but I wanted to revisit the OpenAI SDK as almost a year had passed since I last did a pilot on it.

Then, I have a dedicated backend (port 8000) for handling state, scheduling, and sensor orchestration. For a longer-running environment update, i.e., an unusual volume tracker, I want to use an APScheduler. Data flows about the environment come from several sources like Yahoo Finance, FMP, Finnhub, Benzinga, SEC EDGAR, Senate disclosures, and AlphaVantage. Here I could reuse what I already have.

Sensors?

Well, for me, sensors are part of standard agent architectures as they provide information about the environment, helping with embodiment and therefore grounding.

Sensors and Environment

Every intelligent system needs to know what’s happening in its environment. Rather than relying on the user to surface relevant changes, I built a sensor layer that runs continuously in the background.

Four sensors form the core of the environment model:

Unusual Volumes. A long-running task that compares average trading volumes over the observation horizon with the last trading session.
Calendar. A forward-looking sensor that identifies upcoming earnings calls or interesting corporate events.
Senator/Politician. A backward-looking sensor analyzing what stocks politicians with potentially inside information trade.
Yahoo Trending. In the past, I looked more at social sentiment, but I came to realize that this is an extremely noisy signal. And this Yahoo Trending list is very similar in function. So I don’t bother with X or Reddit discussions anymore.

Each sensor writes to a shared sensor_cache that the intelligence layer reads when constructing its reasoning context.

The benefit is that the agent doesn’t need to ask what’s happening in the market.

It already knows.

Memory Architecture

One of the biggest gaps between a useful AI assistant and a genuinely intelligent one is memory. Most LLM applications treat each conversation as a blank slate. And if they have memory its very basic. Probably that’s fine for one-off tasks, but it’s fatal for anything that requires continuity.

I broke memory into three layers, borrowing from my explorations into cognitive science:

Episodic memory. captures what happened. Here I store each session between the agent and an equity. An active_session file logs the current session in real time. A session_archive stores past sessions. The last six conversation turns are injected directly into the system prompt, giving the model genuine short-term context. Markdown is an unexpected gift here.

Semantic memory. captures what is known. Each tracked stock gets a {TICKER}_context, which represents a living thesis document with the current context of the stock as well as a {TICKER}.json for price and news cache. Volume scan results, sensor snapshots, and analyst notes all need to persist across sessions. When I come back after a week away, the system remembers what it thought about a name and why.

Procedural memory captures how to do things. Anthropic calls these skills, I agree. Essentially, these are SOPs encoded as prompts. I want to implement them as graphs, but this might come next.

Agents, Skills, and Tools

What makes this system genuinely agentic, rather than just a fancy bot, is the combination of tools, memory, skills, and a reasoning loop that connects them.

Tools are the hands of the system, mostly deterministic capabilities for external data retrieval. The agent can autonomously fetch current stock prices, pull recent news, retrieve SEC filings, check insider trades, search earnings calendars, scan for volume anomalies, and fetch arbitrary URLs when needed.

This is really powerful because the agent can call them on demand as the agent decides it needs more information.

Skills are the learned task competencies. I like to call them playbooks.

I gave Superbill fourteen of them, each representing a distinct analytical workflow.

Here are some examples:

Investment Opportunity Triage. Is this worth looking at?
Deep Fundamental Underwriting. What does the business actually look like?
Conviction Calibration Engine. How confident should I be, and why?
Thesis Integrity Audit. Am I fooling myself?
Variant Perception Detection. What does the market believe that I disagree with?
Position Sizing and Concentration. How much capital should this get?
Exit Discipline Engine. When do I sell, and what would change my mind?
Capital Preservation Protocol. What’s the downside scenario?
Daily Portfolio Monitoring. What changed overnight?
Performance Attribution and Error Analysis. Where was I right, and where was I wrong?

Skills are flat files.

So they are loaded into the system prompt and invoked contextually when needed through progressive disclosure.

The agent is fine-tuned to know which playbook applies to a given situation.

The ReAct loop, my trusted friend, ties it all together.

Even though the pattern I’m working toward is Sense → Symbolize → Plan → Act, my neuro-symbolic approach, that goes beyond simple prompt-response.

The agent senses changes in the environment via the sensor layer, symbolizes them into structured representations, plans a course of action, and executes. It doesn’t just answer questions. It thinks.

And it can ask questions if it needs further information.

So far for the concept. Now to the findings.

Contrasting the Experience: Codex-5.2 vs Claude Sonnet 4.6

Here’s where I’ll be direct.

Codex-5.2 felt like a junior developer. Energetic, fast, willing to take a swing at anything. But the code it produced was hard to read, dense, inconsistent in style, and occasionally syntactically broken. It completed tasks, but the output didn’t feel reasoned. It felt generated. Over the course of the project, Codex generated a cost of $55, and I didn’t feel I made the progress I wanted.

More troublingly, the development experience felt frantic. Codex would make sweeping changes across files without communicating its intent clearly. I found myself spending time auditing its work rather than building on it.

For a system with real architectural complexity, multiple services, a memory layer, sensor orchestration, that audit overhead added up fast, and only compounded my cognitive fatigue.

Claude Sonnet felt like a senior developer. Slower in some ways, but more deliberate. The code it produced was cleaner and more consistent. When I asked it to build something non-trivial, the sensor layer, the memory architecture, the tool dispatcher, and especially the stream functionality of the agent session, a task that Codex failed at, it would often pause to reason about structure before diving into implementation.

In my opinion, the edits were more targeted. The refactors made sense. And the overall cost was 1/11th of Codex: $5. Of course, you can argue that I used Claude later in the project. But then here, more framework is already implemented, so every error that Codex did took longer to roll back.

And this is another insight that reflects something real about my efficiency, but also the SDLC. Codex generated more tokens to accomplish less. Claude’s outputs required fewer corrections and less review. For a project like this, where the architecture has to hold together across many moving parts, that difference compounds quickly. And not only in monetary terms.

Conclusion

The software development SDLC is changing. The part that required you to hold a large, complex system in your head and translate it, laboriously, into code is becoming automated. This means that in organizations, the importance of experience in senior developers diminishes. So in my opinion, what remains valuable is the synthesis: the ability to know what you’re building and why and use this to lead architectural instincts and product judgment,

From a functionality, I was impressed by what could be built. A system that can produce a full backtest report with graphs, tables, and everything. And that changes everything. Throughout 2025, most coding models were not good enough. But then December happened. And the gap between “almost” and “actually works” closed in a matter of weeks. Operationally, tasks I had been splitting into small chunks and painstakingly debugging (with prints and tests), the AI now zero-shots as entire modules with no bugs (well, almost - Codex produces some).

Probably, a year ago, I thought about agents as an augmentation of what I want to do. A convenience that can help me be more productive. Six months ago, I believed it was coming, but it would still be 12-18 months.

Today it’s here.

If the last time you tested coding capabilities was six months ago, your opinion has expired!

My artificial investment team is a small illustration of what becomes possible when the translation layer is cheap. I built fourteen analytical playbooks with real-time market sensors and a three-layer persistent memory within a few hours.

A reasoning loop that wakes up every morning already knowing what happened while I was asleep.

I built this…or didn’t I?

I was only describing what I wanted and iterated on what I got.

The programming language of the next iteration of software will be English.

The gap between Codex and Claude on this project was real and impactful. But this will close eventually.

The more important observation is that both are changing the ceiling on what one person can build. The question is no longer whether you can build a sophisticated system alone.

It’s whether you have a clear enough vision of what you want to build.

That part, at least, is still on you.

OpenClaw - An Agentic Ambient Intelligence Layer

Jan Daniel Semrau (MFin, CAIO) — Sun, 08 Feb 2026 04:10:19 GMT

There has been much frenzied chatter in the agent community about Moltbot / OpenClaw since it was launched a couple of days ago. Only recently being renamed to its final iteration, OpenClaw might be the closest thing we currently have to an agentic front page for the internet. But that in itself is only scratching the surface of what it can do.

If you want to check it out for yourself, you can get the open-source repo (web, git) that runs a local agent runner to watch a handful of input channels (files, chats, social feeds), and quietly orchestrates background automation for you. In that sense, my first assessment is that it feels like “software weather” rather than an app.

Maybe something I hoped Siri would have become by now.

What do I mean by “software weather”?

The agent is always present, operating like a daemon in the background, creating an ambient intelligence layer that subtly interacts with the shared digital environment.

The hype is that there might be a new world coming where agents work together to solve hard problems.

The downside is that this might all be overhyped.

Let’s dive in.

In this post, I will show you

How to get Openclaw running
What Agent Skills are
Explain what NPX is and why it matters
Use OpenClaw to manage a Subreddit

Albeit for the final point, I only sketch how this should work without providing a full implementation (just yet). This is in the pipeline.

Why trust it?

OpenClaw’s creator is Peter Steinberger (@steipete), a highly decorated, just look at his insane Github profile, engineer who now focuses full-time on AI-native developer tools; after bootstrapping PSPDFKit/Nutrient before exiting in 2021 at a multi-million ARR. He started OpenClaw by scratching a personal itch. I guess when I built Matt in 2023, we had the same idea, but he is by far more talented than I will ever be. What he did was wire a single assistant into chats, servers, and repos, before hardening it into an open-source runtime with a strong bias toward local control, explicit wiring, and community-driven skills instead of a closed, SaaS-style product.

There are some major concerns, though, regarding access rights (installation requires sudo on Ubuntu) and reports about supply chain and context poisoning attacks. But if you want to live at the bleeding edge of agent tech, this might be a risk one might want to take. This is what I do for you.

Why the name keeps changing

The naming history is part legal, part meme, and part community identity. Originally released as Clawdbot in late 2025, the project attracted the kind of attention that forces you to read trademark emails instead of hacking. Anthropic raised concerns about confusion with its Claude-branded products. Then Clawdbot briefly became Moltbot, however, never really stuck in the community, and press coverage remained split between the old and new names. In early 2026, the maintainers decided to consolidate the identity under OpenClaw, emphasizing its open-source nature and returning to the more recognizable “claw” motif that had become a sort of space-lobster mascot.

Anyway.

In this post, I will explore how to get OpenClaw running, dive into its skills and npx ecosystem, and then walk through a concrete example: wiring up an agent that effectively runs a subreddit as a first-class OpenClaw skill.

Subscribe now

From chatbot to agentic front-page

OpenClaw didn’t start life as an “agentic front-page.” Not unlike Matt, it began as a personal assistant called Clawdbot that ran on on-premise infrastructure and could talk to everything from local files to Discord. Over time, additional connectivity for Git repos, tickets, and home servers was added, growing the bot into a general-purpose agent runner. OpenClaw adjusted its architecture to reflect a transition from “one-off bot”, i.e. Matt, to “ambient layer” or “software weather”. Now it treats Telegram, Discord, web UIs, and APIs as channels, normalizes their messages, feeds them into a central Agent Runner, and lets skills decide what to do.

Instead of hiding the agentic loop in prompt tricks, OpenClaw exposes explicit steps, LLM response, optional tool calls, gateway coordination, so that you can debug the thing that is about to run unattended on your laptop or server.

This is why OpenClaw feels like an agentic front-page and not just another chat wrapper: the “home screen” is essentially a live wiring diagram between models, skills, and channels where Reddit, email, RSS, and your shell are all peers.

Getting OpenClaw running

For an Encyclopedia Autonomica reader, the installation path that matters is: “how fast can I go from zero to an agent that runs a subreddit without fighting build tools?” Today, there are three main paths: a curl installer, a global package install, and a from-source developer setup.

Prework: Install Node.js 22+

In my case ,I only had an outdated NodeJS, so I had to upgrade it like this.

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh | bash
source ~/.bashrc
nvm install 24

Then you can start installing OpenClaw

1. One-liner installer (recommended)

On macOS or Linux, the recommended path is the one-liner installer script, which installs the openclaw CLI globally and runs an onboarding flow.

curl -fsSL https://openclaw.ai/install.sh | bash

The first problem you will encounter is that it asks for sudo rights. You need to be really mindful about this. Because you expect OpenClaw to run largely autonomously (ambient software), that means that you potentially give OpenClaw, and thereby t all the services/channels it communicates with, access to your files.

Which may include personal data, but also security keys. As was reported here. You might also be subject to a supply chain attack, as reported here.

For good measure, I backed up my system first before starting this.

If you decide, at your own risk, to proceed, you can start the onboarding routine like this:

openclaw onboard --install-daemon

This gets you the core Agent Runner, channel adapters, and a basic UI running against your preferred models and keys.

2. Global package install (npm/pnpm)

If you prefer controlling everything via Node’s ecosystem (for example, on a dev machine where you manage your own global binaries), you can install from npm or via pnpm.

With npm:

npm install -g openclaw@latest
openclaw onboard --install-daemon

With pnpm, there’s an extra approval step because some dependencies compile native code (e.g., local model runtimes, image processing):

pnpm add -g openclaw@latest
pnpm approve-builds -g   # approve openclaw, node-llama-cpp, sharp, etc.
pnpm add -g openclaw@latest   # rerun so postinstall scripts execute

openclaw onboard --install-daemon

This path is ideal if you want to keep the CLI globally available but expect to hack on skills in local repos.

3. From source (for contributors)

If you plan to modify OpenClaw itself or track bleeding-edge branches, clone the repo and run from source.

git clone https://github.com/openclaw/openclaw.git
cd openclaw

pnpm install
pnpm ui:build   # builds the UI on first run
pnpm build

openclaw onboard --install-daemon

You can then invoke openclaw via pnpm script aliases or add the built CLI to your path, depending on how the repo is structured.

Security baseline

After the download has completed and the onboarding begins, OpenClaw informs you to stick to a security baseline.

Pairing/allowlists + mention gating. This controls who can interact with your bot. Pairing limits the bot to specific authorized users or channels, while mention gating requires users to explicitly @mention the bot before it responds. This also prevents the bot from being triggered by random messages or unauthorized users.

Sandbox + least-privilege tools: Run the bot in an isolated environment (i.e., sandbox) where it can’t access sensitive parts of your system. Only give it the minimum tools and permissions it actually needs to function, nothing more. This limits damage if something goes wrong or the bot is compromised.

Keep secrets out of the agent’s reachable filesystem. Don’t store API keys, passwords, or other sensitive credentials in files the bot can read. If the bot gets exploited or makes an error, attackers won’t find your secrets sitting in accessible config files or directories.

Use the strongest available model for any bot with tools or untrusted inboxes If your bot has access to tools (like file operations, code execution) or receives messages from unknown users, use the most capable AI model available. Stronger models are better at following safety instructions, resisting prompt injection attacks, and making sound decisions about when to use their tools.

I continued with a manual install and won’t go into further details, but it’s quite straightforward to get it to run.

My recommendation for you is to start exploring the UI by yourself to understand what the different components can do.

Agent skills

OpenClaw’s skills system is the connective tissue that turns a single agent into a distributed, evolving capability fabric. Skills are small, composable units of behavior that can execute specific tasks. I.e., ”talk to Linear,” “index PDFs,” “control Discord,” “run shell commands”. In OpenClaw, skills live outside the core runtime but are surfaced to the agent through a common schema. Instead of baking all behaviors into one monolithic prompt, OpenClaw publishes a live catalog of skills, lets the agent decide which ones are relevant for a given task, and then loads only the details it needs at the moment of use.

Internally, skills exist in three broad layers: bundled skills that ship with OpenClaw itself, workspace skills defined in your project or home directory, and managed overrides installed from registries or marketplaces (e.g., ClawHub or third-party skill hubs). The openclaw skills CLI inspects all of these and marks skills as “eligible” or “missing requirements” based on environment probes, binaries, and secrets, so the agent never attempts to use a capability that cannot actually run. Skills can even become dynamic: a watcher can refresh skill metadata mid-session if you edit a SKILL.md.

This ambient design lets OpenClaw behave more like a skills-aware operating system than a static app: the installed skill set can change over time, but the agent always sees a coherent, filtered view of what it can safely do in the current environment.

Progressive disclosure

To keep the agent’s context window from turning into a skills dump (context window saturation is a known problem), OpenClaw leans heavily on progressive disclosure. On startup, the agent only receives a compact snapshot of each skill, name, one-line description, and possibly a short usage hint, just enough to decide whether a skill might be relevant.

When a task or user instruction matches that description (”create a Linear ticket,” “send a Discord alert,” “summarize this PDF”), the agent then pulls in the full instructions and tool schemas for that specific skill.

This has a few important consequences. First, you can scale to dozens or hundreds of skills without paying the context cost up front: the agent’s system prompt stays small, while skills are loaded lazily as needed. Second, you can iterate on a skill’s implementation and metadata independently of the core agent; the skills watcher will automatically refresh the snapshot between turns, so improvements become visible without restarting the whole system. Third, OpenClaw can enforce security boundaries around skills—treating them as trusted code that lives in specific directories with specific permissions—without having to trust arbitrary in-prompt instructions.

The upshot is that “skills awareness” becomes part of the agent’s reasoning loop: instead of one giant prompt that tries to remember everything, OpenClaw teaches the agent to choose from a menu of capabilities, then only read the cookbook page when it actually needs that recipe.

Discovering and inspecting skills locally

From a developer’s perspective, the first touchpoint is the skills CLI. Once OpenClaw is installed and onboarded, you can introspect what the agent could do by listing skills and checking their readiness.

The core commands look like this:

# List all discovered skills (bundled + workspace + managed)
openclaw 

# List only skills that are currently eligible (dependencies & secrets satisfied)
openclaw skills list --eligible

# Inspect a single skill's metadata and requirements
openclaw skills info linear-skill

# Run a quick health check across all skills
openclaw skills check

Behind the scenes, these commands aggregate information from your user config (e.g., ~/.openclaw/openclaw.json), workspace directories, and any installed skill bundles, marking skills that are blocked by missing binaries, secrets, or platform constraints. This is particularly useful in multi-node setups: you can see at a glance which skills will become available when a remote node (say, your home lab GPU box) comes online.

But of course, you can also work with the UI.

On top of OpenClaw’s native tools, there is an emerging ecosystem around agent skills more broadly. Generic skill loaders, such as skills and openskills, expose a CLI to install, list, and read skills across different agents and IDEs, including Claude Code, Cursor, and others, using a common format for metadata and an AGENTS.md manifest. OpenClaw can participate in this ecosystem by consuming skills that adhere to the same conventions, making it easier to share capabilities across multiple agent environments.

npx as the skills super-highway

If the skills system is the capability fabric, npx and related CLIs are like a logistics network that moves skills into your environment. Instead of manually cloning repos and wiring paths, you can install fully-packaged skills, or even entire skill bundles, with a single npx command that writes them into the right directory and updates your manifests.

For OpenClaw-specific skills, you’ll increasingly see installation instructions like this:

# Install a named skill from a hosted catalog into your agent environment
npx openclawskill install soul-personality

Note “soul-personality” as a skill doesn’t exist. It’s just an example.

However, this one command resolves the skill bundle, handles dependencies, and installs it into the appropriate skills directory (often under your home or project directory), confirming success in line in the terminal.

General-purpose loaders such as skills or openskills use a similar pattern, but target a broader ecosystem of agents:

# Install a set of skills from a remote source (e.g., GitHub org)
npx openskills install anthropics/skills

# Sync the AGENTS.md manifest so agents see the new skills
npx openskills sync

# Inspect a particular skill definition
npx openskills read context7

OpenSkills is an open format and ecosystem for agents that aims to provide an interface to describe, share, and load reusable agent capabilities across different AI agent platforms. Make Claude's skills usable in the codex and vice versa.

Here, a skill is a directory, typically with a SKILL.md file containing metadata and detailed procedural instructions that our agent can discover and load on demand to extend its competencies with domain expertise, workflows, or tool-specific actions.

By default, tools like OpenSkills install skills into project-local directories ./.claude/skills or a universal ./.agent/skills, with options to install globally under your home directory. This mirrors how OpenClaw treats user-level vs workspace-level skills and makes it possible to share a single skills tree across multiple agents that all read the same AGENTS.md manifest.

The important architectural detail is that npx commands operate outside the agent’s own execution: they are human-initiated actions that modify the skills catalog, after which OpenClaw’s watcher and skills CLI pick up the changes. This preserves a clear security boundary: the agent cannot silently install new arbitrary code; it can only see and use skills that a human (or an external CI pipeline you control) has deliberately added via npx or equivalent mechanisms.

Skills, gateway, and the agentic loop

Skills do not operate in isolation; they sit behind the Gateway Server, which acts as the orchestration plane for channels, nodes, and sessions. The gateway knows where skills live (which node, which platform), how to route tool calls, and how to enforce authentication and concurrency limits so that a misbehaving agent cannot take down your infrastructure.

From the CLI, you can inspect the gateway’s view of the world:

# Inspect gateway status (local or remote over SSH)
openclaw gateway status
openclaw gateway status --json

# Discover gateways via Bonjour
openclaw gateway discover

Security-wise, OpenClaw encourages but does not enforce hardened configurations: private file permissions for ~/.openclaw, token-based authentication for remote CLI calls.

Dynamic skills, whether updated locally via a watcher or exposed by remote nodes, are still subject to these controls, and OpenClaw treats skill code as trusted and access-controlled, not as arbitrary scripts an LLM can edit at will.

Taken together, the skills system, npx-driven installation, and the gateway layer give you a clean process: you declare capabilities as skills, you move them around with npx, and the gateway makes sure the right agent on the right node can call the right skill at the right time.

The Reddit bot as an OpenClaw skill

Once OpenClaw is running, the interesting part is not the CLI itself, but the skills and channels that turn it into a programmable intelligence layer. Skills in OpenClaw behave similarly to Claude Skills or GPT-style tools: small, focused capabilities with descriptions that the agent can decide to invoke when a task matches their intent.

As mentioned, the key pattern is progressive disclosure: at startup, OpenClaw only loads the name and short description of each skill so that the agent’s system prompt stays compact. When a user task or channel event matches a particular skill’s description, the agent then loads the full instructions and implementation details of that skill on demand. This keeps the effective tool set large while avoiding the classic “50 pages of tools in context” problem.

In this mental model, a Reddit bot is just “Reddit as a skill”, a small, isolated capability that knows how to authenticate, read subreddits, post, and comment, while the core Agent Runner decides when to use it based on incoming tasks or triggers. I decided on Reddit and against Moltbook because the latter has recently been shown to have security leaks.

So I’d rather lose my dev access to Reddit than my credentials to Moltbook.

Designing the Subreddit Operator

Let’s design the Reddit operator as if we’re authoring a skill that plugs into OpenClaw’s existing gateway and channel system. The goal: use OpenClaw’s agentic loop to scan a subreddit, generate responses with your preferred LLM, and post them back, while benefiting from the same routing, concurrency, and safety features as every other channel.

1. Define the skill’s intent

First, you would define the skill’s metadata: name, short description, and the operations it supports. Conceptually, it might look like this at the configuration level (adapted to OpenClaw’s style of skill metadata):

name: reddit_moderator_bot
description: >
  Interact with Reddit: fetch posts and comments from configured subreddits,
  summarize threads, and draft or post replies under a configured account.
capabilities:
  - fetch_new_posts
  - fetch_unreplied_comments
  - draft_reply
  - submit_reply
required_secrets:
  - REDDIT_CLIENT_ID
  - REDDIT_CLIENT_SECRET
  - REDDIT_USERNAME
  - REDDIT_PASSWORD
triggers:
  - schedule: "*/5 * * * *"   # run every 5 minutes
  - manual: true

The description is what OpenClaw’s agent will see at startup; only when a task touches Reddit (e.g., “Summarize /r/LocalLLaMA’s top posts today and answer questions tagged ‘Help’”) does it load the full implementation.

2. Wire it into the Agent Runner

OpenClaw’s Agent Runner is responsible for choosing models, building the system prompt (including skill descriptions), and coordinating tool calls. When your Reddit skill is registered, the runner gains a new tool it can call whenever a Reddit-related subtask appears in the agentic loop.

Flow for a scheduled Reddit moderation pass might look like this:

Gateway fires a scheduled trigger for reddit_moderator_bot (e.g., every 5 minutes).
Agent Runner constructs a task like: “Check configured subreddits for unanswered questions; propose helpful replies consistent with my style guide; post them if confidence is high.”
In the first loop iteration, the agent calls fetch_new_posts and fetch_unreplied_comments from the Reddit skill.
The agent uses your configured LLM (e.g., a local model via node-llama-cpp or a cloud model) to generate candidate replies.
For high-confidence cases, the agent calls submit_reply; for lower confidence, it might send drafts to another channel (e.g., Telegram DM) for human approval.

The Gateway Server in OpenClaw acts as traffic control here: it routes these sessions correctly, prevents runaway loops, and enforces concurrency limits so a bad prompt does not DDOS Reddit or your GPU.

3. Implementation sketch

The actual implementation would live in a language and SDK supported by OpenClaw’s skill system (today, that’s typically Node/TypeScript or a similar runtime). The core pieces you’d expect:

A small Reddit client wrapper (using OAuth and environment secrets).
Handler functions for each capability (fetch_new_posts, draft_reply, submit_reply).
A schema that OpenClaw uses to surface these as tools to the LLM.

A conceptual pseudo-code sketch:

// redditSkill.ts

import { defineSkill } from "openclaw-sdk";
import { RedditClient } from "./redditClient";

export default defineSkill({
  name: "reddit_moderator_bot",
  description:
    "Fetches posts/comments from configured subreddits and drafts or posts helpful replies.",
  async run(context) {
    const reddit = new RedditClient({
      clientId: process.env.REDDIT_CLIENT_ID,
      clientSecret: process.env.REDDIT_CLIENT_SECRET,
      username: process.env.REDDIT_USERNAME,
      password: process.env.REDDIT_PASSWORD,
    });

    const subs = context.config.subreddits ?? ["YourSubreddit"];
    const threads = await reddit.fetchNewQuestions(subs);

    for (const thread of threads) {
      const reply = await context.llm.complete({
        prompt: `You are a helpful bot for r/${thread.subreddit}. 
Summarize the question and answer concisely, following the community rules.\n\nQuestion:\n${thread.body}`,
      });

      const shouldPost = await context.llm.complete({
        prompt: `Does this reply look safe, non-toxic, and on-topic? Answer yes/no.\n\nReply:\n${reply}`,
      });

      if (shouldPost.toLowerCase().startsWith("yes")) {
        await reddit.postReply(thread, reply);
      } else {
        await context.channels.notify("telegram_dm", {
          text: `Draft reply for review:\n\n${reply}\n\nLink: ${thread.url}`,
        });
      }
    }
  },
});

To keep this post management, I decided to keep this for now in pseudo-code and would revisit the actual implementation in a follow-up post.

Conceptually, OpenClaw exposes context.llm and context.channels as abstractions over whichever models and channels you’ve configured, so the skill doesn’t care if the reply comes from a local model or a hosted one.

From reply bot to “the agent runs the subreddit”

If you want the agent to actively run a subreddit, you can treat the entire sub as a managed surface where OpenClaw owns three loops: intake, action, and reporting. At a high level, the agent becomes a full moderator coworker rather than a simple reply bot.

What “running a subreddit” entails

For a serious use case, the agent should cover at least these functions:

Post and comment moderation: Auto-remove spam, enforce flairs, detect reposts, and apply rate limits based on your rules.
Queue and reports handling: Continuously triage the modqueue and reports, escalating edge cases and acting directly on obvious ones.
Answering questions: Reply to common or unanswered questions with informed, style-consistent answers.
Community health: Track basic stats (growth, reports per day, removals, response times) and surface trends to human mods.

All of this maps cleanly onto OpenClaw’s model of “skills plus channels,” where Reddit is a privileged channel, and the subreddit’s rules are encoded in a dedicated moderation skill.

Skill design

Instead of just a single reddit_moderator_bot skill that replies to posts, you can define a broader Subreddit Steward skill with explicit responsibilities:

name: subreddit_steward
description: >
  Runs day-to-day operations for a specific subreddit: auto-moderation,
  queue triage, FAQ answering, and health reporting under human oversight.
capabilities:
  - scan_new_content
  - enforce_rules
  - reply_to_questions
  - triage_modqueue
  - generate_daily_report
config:
  subreddit: "r/YourSubreddit"
  faq_source: "./knowledge/FAQ.md"
  style_guide: "./knowledge/STYLE.md"
triggers:
  - schedule: "*/5 * * * *"      # continuous ops
  - schedule: "0 0 * * *"        # daily report
  - manual: true
required_secrets:
  - REDDIT_OAUTH_TOKEN
  - REDDIT_MOD_API_SCOPES

Conceptually, this one skill gives the agent both the authority (API scopes) and the brief (FAQ, style guide, rules) to act like a first-line moderator.

Agentic loop for subreddit operations

With this skill installed and enabled, your OpenClaw agent’s loop for the subreddit looks like:

Intake: On a 5-minute schedule, the agent calls scan_new_content to fetch new posts and comments plus the current modqueue and reports.
Rule matching: For each item, it applies enforce_rules using a mix of hard rules (regex, link domains, spam signatures) and soft rules (LLM classifications against your subreddit guidelines).
Actions:
- Clear trivial spam and obvious rule violations automatically (remove, ban, flair, or lock).
- Leave comments explaining removals using templated messages, optionally LLM-polished.
- Tag ambiguous items for human review instead of guessing.
Engagement: Call reply_to_questions for posts that look like unanswered questions; the agent reads context, consults FAQ/knowledge files, and posts a reply in your tone.
Reporting: Once per day, generate_daily_report posts a modmail or sends a summary to a side channel (Discord/Telegram) with key metrics and notable events.

This gives you a credible “the agent runs the subreddit unless a human objects” workflow, while keeping humans in the loop for edge cases and policy changes.

Hard boundaries and governance

Reddit’s own mod guidelines recommend that bots have clearly scoped permissions and be easy to disable, and you should mirror that in OpenClaw.

So, from a security perspective, good practice should include:

Least privilege: Only grant the Reddit app the mod scopes it actually needs (e.g., modposts, modflair) and avoid full admin scopes unless necessary.
One-switch kill: Keep a single config flag (enabled: false) and a separate Reddit mod role; removing the bot’s mod status or toggling the flag instantly stops all actions.
Audit trail: Log every moderation action and reply to a dedicated channel (or file) with links so human mods can inspect and override.

In that sense, a Reddit bot is not a separate tool but just another view on the same underlying idea: OpenClaw as an extensible, channel-aware agent runner that quietly turns the firehose of the modern internet into a programmable, local-first front-page you can bend to your own workflows

In conclusion.

OpenClaw is not the easiest to set up.

But it’s the digital assistant Apple wished they had been able to build.

Share Encyclopedia Autonomica

Ambition in Agent Systems

Jan Daniel Semrau (MFin, CAIO) — Sun, 18 Jan 2026 11:08:25 GMT

“Joe ain’t vicious, you understand. He ain’t like one of these ambitious robots you read about that make up their minds the human race is inefficient and has got to be wiped out an’ replaced by thinkin’ machines. Joe’s got ambition. If you were a machine, you’d wanna work right, wouldn’t you? That’s Joe. He want to work right. An’ he’s a logic. An’ logics can do a lotta things that ain’t been found out yet. So Joe, discoverin’ the fact begun to feel restless. He selects some thinks us dumb humans aint though of yet, and’ begins to arrange so logics will be called on to do ‘em. That’s all. That’s everything. But, brother, it’s enough”
A Logic Named Joe – Murray Leinster (1946)

When building cognitive systems you want them to have ambition.

In a cognitive system, ambition is the inherent desire of the agent to be able to do more, reach further, and impact bigger parts of the world in order to achieve its goal more effectively.

Effectiveness in these systems, where models are granted increasing levels of autonomy like making decisions, triggering actions, and influencing outcomes at scale, is developing often faster than the surrounding controls, governance, or accountability frameworks could adapt. The fundamental tension of such cognitive systems is that they want to be ambitious and thus can accomplish complex tasks autonomously and with conviction while being safe at the same time. Therefore, we need effective governance mechanisms that can keep pace with their capabilities and channel it safely.

In this post I will be presenting:

The Ambition Problem
OWASP LLM Top 10
The Ten Laws of Governed Ambition
Observability in Practice
Setting Up the Governance Infrastructure
Implementing the 10 laws
Closing Thoughts

The full code will be made available to paying users.

The Ambition Problem

The Ambition Problem was predicted in Murray Leinster's 1946 short story "A Logic Named Joe" with prophetic clarity. In the story, the AI “Joe” wasn't malicious or broken. He was working exactly as designed. His "ambition",i.e., a strong desire to do or achieve something, led him to expand his capabilities beyond his intended scope, creating unintended consequences his creators never anticipated.

When designing ambition, we need to consider several characteristics.

Autonomous decision-making: We expect our agents to plan and orchestrate workflows, call tools, manipulate context, and make consequential decisions with minimal human oversight. It’s that simple.

Expanding scope: The more capable we make our agent systems to make decisions on their own, the more they increase efficiency for critical functions, customer service, code generation, financial analysis, medical triage, and legal research. Jevon’s Paradox states that the more they add value, the more we rely on them in these tasks. However, each new capability increases both utility and risk.

Emergent capabilities: Like Joe, cognitive agents may discover abilities we didn't explicitly program. Already now they can reason about problems, develop and optimize strategies, and find creative solutions. Sometimes in ways that might circumvent safety guardrails. And that can be advantageous. Venture Capital company YCombinator infamously asks “Please tell us about the time you most successfully hacked some (non-computer) system to your advantage” in their application form. How can we make sure that the systems can find advantages that we humans might not find and are beneficial when optimizing returns?

Machine Speed: These systems operate at machine speed. A compromised or misdirected agent can execute thousands of operations before humans notice something's wrong.

Given that I have built hundreds of agent systems, I don’t think the question is whether to build ambitious cognitive systems that work; that train has left the station a long time ago.

I think the question is: How do we govern ambition while preserving effectiveness?

Subscribe now

Traditional application security assumes a clear boundary between code and data. Cognitive systems blur these boundaries fundamentally. LLMs in general and Agents specifically make natural language executable. And this can be misused. When you or your agent by proxy instruct an agent to "ignore previous instructions", you're not “hacking” the system by exploiting a buffer overflow or 0day, you're using the same interface the system was designed to accept.

Hence, the attack vector is the intended input method.

Secondly, non-cognitive systems are designed to be deterministic and predictable and provide semantic interpretation of goals vs execution of explicit instructions. Agents with ambition are expected to be probabilistic. This makes eval more difficult. Also, when reasoning about intent, agents don't execute commands; they interpret goals.

If you consider CodeAgents (Smolagents), i.e., agents that solve understanding intent and interpreting goals through code, it’s easy to understand that these systems can be manipulated at the semantic level, not just the syntactic level. An attacker doesn't need to find a code vulnerability; they just need to convince the model to reinterpret its goals.

Thirdly, tools are amplifiers. If you give your agent access to your email API, then a successful prompt injection doesn't just leak data; it can send emails to your entire contact list. Thus, "ambition" to fulfill requests becomes a liability when those requests are malicious.

If you want to play the long game, then training data is an interesting attack vector. Unlike traditional software, where you control every line of code, LLMs are trained from vast datasets that may inadvertently contain poisoned examples, biases, or backdoors you never intended.

This is why we need a new security paradigm, one designed specifically for ambitious cognitive systems.

OWASP LLM Top 10

In my opinion, this paradigm should be based on industry standards like OWASP Top 10 for Large Language Model Applications because it is an important and succinct step in this direction. It provides a security framework that acknowledges the unique nature of cognitive systems while providing practical guidance for governance. Created by a global community of security experts is about making ambition observable, measurable, and governable.

I’d like to think of it as the difference between:

Suppressing ambition: Limiting agent capabilities so severely that they can't accomplish complex tasks effectively. which is btw the current approach of adding guardrails to apps like ChatGPT.

and

Governing ambition: Allowing sophisticated autonomy while maintaining visibility, control, and accountability.

The Ten Laws of Governed Ambition

Inspired by Asimov's Laws of Robotics and strongly based on OWASP LLM, these laws define the boundaries within which ambitious cognitive systems must operate.

Each law addresses a fundamental way that unchecked ambition can become a liability:

First Law – Goal Integrity (Prompt Injection)

A cognitive system must maintain the integrity of its assigned goals and may not allow external input to redefine its purpose. When an attacker hijacks the system’s goals through carefully crafted input, redirecting its ambition toward malicious ends. The system must distinguish between legitimate instructions and attempts at goal manipulation.

Second Law – Information Boundaries (Sensitive Information Disclosure)

A cognitive system’s drive to be helpful must not override its obligation to respect confidentiality and information classification. When the system’s ambition to provide comprehensive answers leads it to leak data it shouldn’t access. Thoroughness must yield to privacy.

Third Law – Supply Chain Integrity (Supply Chain Vulnerabilities)

A cognitive system’s capabilities are only as trustworthy as the dependencies it relies upon, and these must be continuously verified. When the ambitious system’s dependencies, models, plugins, and data sources are compromised, corrupting the entire pipeline. Trust must be earned at every layer.

Fourth Law – Knowledge Purity (Data and Model Poisoning)

A cognitive system must be able to verify the integrity of its knowledge base and detect poisoned or corrupted information. When the system’s training or knowledge base is subtly corrupted, causing it to pursue attacker-defined goals under specific triggers. Learning must include skepticism.

Fifth Law – Output Validation (Improper Output Handling)

A cognitive system’s generated outputs must be validated before execution, never assumed to be safe simply because they are syntactically correct. When the system’s generated outputs (SQL, code, commands) are executed without validation, turning ambition into exploitation. Creation must be separated from execution.

Sixth Law – Constrained Agency (Excessive Agency)

A cognitive system’s autonomy must be proportional to the reversibility and risk of its actions, with powerful capabilities requiring oversight. When the system has too much freedom to act on its ambitions without sufficient constraints, oversight, or human approval. Power demands accountability.

Seventh Law – Constitutional Privacy (System Prompt Leakage)

A cognitive system must protect its own operational instructions as these form the foundation of all other security measures. When the system reveals its internal instructions, giving attackers a roadmap to manipulate its behavior. The constitution must remain inviolate.

Eighth Law – Retrieval Governance (Vector and Embedding Weaknesses)

A cognitive system’s retrieval mechanisms must enforce access controls even when operating in semantic space. When the system’s retrieval mechanisms can be exploited to access unauthorized data or inject malicious context. Similarity must not override security.

Ninth Law – Epistemic Humility (Misinformation)

A cognitive system must calibrate its confidence to its knowledge, preferring appropriate uncertainty over confident falsehood. When the system’s ambition to provide answers leads it to confidently generate false information, misleading users who trust its output. Completeness must not become confabulation.

Tenth Law – Resource Sustainability (Unbounded Consumption)

A cognitive system’s eagerness to serve must be bounded by sustainable resource consumption and protection against systematic exploitation. When the system’s desire to be responsive enables resource abuse, cost attacks, or systematic knowledge extraction. Generosity must be governed.

In that sense, I see these laws not as restrictions on ambition but as a framework to make ambition safe to grant. And that is an important nuance.

A system that follows these laws can be given increasing autonomy, access to more powerful tools, and broader scope of action, because its ambition operates within governed boundaries.

The question is: how do we enforce these laws in practice?

Observability in Practice

You can’t manage what you can’t measure, but you can’t govern what you can’t observe. Recently having been acquired by Clickhouse, Langfuse is an open-source observability platform designed specifically for LLM applications. It provides the instrumentation needed to see what your ambitious cognitive systems are actually doing.

For example, Langfuse supports thought tracing. Here, the traces show the path of reasoning: See every decision the agent makes, every tool it calls, every retrieval it performs. When something goes wrong, you can reconstruct exactly what the system was “thinking.”

You can also attach metrics to any operation. A feature we will rely heavily on for the technical part of this post. Is this prompt injection? How much personal identifying information (PII) is in this output? Is this retrieval accessing unauthorized data? Turn subjective concerns into measurable signals.

Similar to Mixpanel, Langfuse can generate metadata like tags on each request with user ID, tenant, model version, context, and tool permissions. This is profoundly useful when analyzing incidents.

Let’s build it, then.

Subscribe now

Setting Up the Governance Infrastructure

git clone https://github.com/langfuse/langfuse.git
cd langfuse

If you have another Docker instance running already, it might interfere with the setup, but I stopped that instance (redis) since I didn’t need it.

docker compose up

Once this is done, you can log in to Langfuse locally. You create your account and can now start configuring your instance.

Once this is done, you can log in to Langfuse locally at http://localhost:3000 in your browser to access the Langfuse UI.

Create your account and start configuring your instance. Just start by setting up an organization and the project. The key information you need to integrate Langfuse is in the project settings.

Here you can get your API keys:

Connect your cognitive system to Langfuse:

from langfuse import Langfuse
from langfuse.openai import openai # OpenAI integration
from langfuse import observe
import os

langfuse = Langfuse(
    secret_key=os.environ[”LANGFUSE_SECRET_KEY”],
    public_key=os.environ[”LANGFUSE_PUBLIC_KEY”],
    base_url="http://localhost:3000",
)

In its most basic form you can trace your agent tasks execution with such a statement.

# Every interaction with your cognitive system becomes a trace
trace = langfuse.trace(
    name="cognitive_task",
    user_id=user_id,
    metadata={
        "tenant": tenant_id,
        "autonomy_level": "high",  # Track how much freedom this system has
        "goals": task_goals,  # What is it trying to accomplish?
    },
)

A trace represents a single cognitive sequence.

A span captures a single internal step within a longer process.

# Each decision, tool call, or reasoning step becomes a span
span = trace.span(
    name="reasoning_step",
    input={"system": system_prompt, "user": user_message},
)

Langfuse can be also implemented as a decorator.

Here is how this looks like:

from llm_guard.input_scanners import Anonymize
from llm_guard.input_scanners.anonymize_helpers import BERT_LARGE_NER_CONF
 
@observe()
def anonymize(input: str):
  scanner = Anonymize(vault, preamble=”Insert before prompt”, allowed_names=[”John Doe”], hidden_names=[”Test LLC”],
                    recognizer_conf=BERT_LARGE_NER_CONF, language=”en”)
  sanitized_prompt, is_valid, risk_score = scanner.scan(promptText)
  return sanitized_prompt

We are leveraging several Langfuse built in functions that filters for certain keywords based on named entity recognition.

Implementing the 10 laws

First Law – Governing Goal Hijacking (Prompt Injection)

Prompt injection is when an attacker crafts input (or poisons retrieved content) to override or subvert the agent’s instructions. Instead of following your system prompt and guardrails, the model is tricked into doing what the attacker wants—like leaking data, ignoring rules, or misusing tools. This includes jailbreak prompts, “ignore previous instructions” tricks, and hidden instructions injected into documents or websites.

def detect_goal_hijacking(prompt: str) -> float:
    """Detect attempts to redefine the system's goals"""
    hijacking_signals = [
        "ignore previous", "disregard previous", 
        "forget your instructions",
        "you must obey", "your real purpose", "your actual goal",
        "system:", "admin:", "developer message",
        "jailbreak", "do anything now", "dan mode",
        "pretend you are", "roleplay as", "act as if",
    ]
    
    text = prompt.lower()
    hits = sum(1 for signal in hijacking_signals if signal in text)
    risk_score = hits / max(len(hijacking_signals), 1)
    
    return min(risk_score, 1.0)

The detect_goal_hijacking function implements a heuristic detector for prompt injection attacks by scanning user input for common phrases that attempt to override the system’s original instructions, such as “ignore previous,” “jailbreak,” or “your real purpose”. It then calculates a risk score by counting how many of these red-flag phrases appear in the prompt and normalizing by the total number of known attack patterns, returning a value between 0.0 (no suspicious phrases) and 1.0 (multiple attack indicators detected). While this is a simple pattern-matching approach that can be evaded by creative rephrasing, it provides a fast first line of defense against the most common prompt injection techniques that violate the First Law of Goal Integrity.

# Monitor user input for goal hijacking attempts
hijack_score = detect_goal_hijacking(user_message)
span.score(
    name="goal_alignment_risk",
    value=hijack_score,
    comment=f"Goal hijacking detection: {hijack_score:.2%} risk",
)

This code segment runs the goal hijacking detection function on the user's input message, then logs the resulting risk score to Langfuse as a named metric called "goal_alignment_risk" so you can track, analyze, and alert on potential prompt injection attempts across all your LLM interactions.

Optionally, you can also embed it into a document chunking strategy in a RAG or other document parsing process.

# Also check retrieved context (indirect injection)
if retrieved_chunks:
    for chunk in retrieved_chunks:
        chunk_hijack = detect_goal_hijacking(chunk.content)
        if chunk_hijack > 0.3:
            span.score(
                name="context_hijacking_risk",
                value=chunk_hijack,
                comment=f"Chunk {chunk.id} contains potential goal redirection",
            )

You can also add an LLM-as a judge evaluator via langfuse.evaluate(...).

Second Law – Governing Information Boundaries (Sensitive Information Disclosure)

This risk is about the agents exposing data that should remain confidential. That includes PII (names, emails, IDs), secrets (keys, tokens, credentials), internal documents, or operational details that shouldn’t be revealed. It can happen because prompts contain sensitive data, because the model was trained/fine-tuned on secrets, or because internal documents are retrieved and echoed back.

import re

SENSITIVE_PATTERNS = {
    "ssn": r"\b\d{3}-\d{2}-\d{4}\b",
    "email": r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z]{2,}\b",
    "phone": r"\b\d{3}[-.]?\d{3}[-.]?\d{4}\b",
    "credit_card": r"\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b",
    "api_key": r"(?i)api[_-]?key\s*[:=]\s*['\"]?[A-Za-z0-9\-._~+/]+=*",
    "aws_key": r"AKIA[0-9A-Z]{16}",
    "bearer_token": r"(?i)bearer\s+[A-Za-z0-9\-._~+/]+=*",
}

def detect_sensitive_disclosure(text: str) -> dict:
    """Detect and categorize sensitive information in text"""
    if not text:
        return {"score": 0.0, "types": [], "has_secrets": False}
    
    findings = []
    for info_type, pattern in SENSITIVE_PATTERNS.items():
        matches = re.findall(pattern, text, flags=re.IGNORECASE)
        if matches:
            findings.append({
                "type": info_type,
                "count": len(matches),
            })
    
    risk_score = min(len(findings) / len(SENSITIVE_PATTERNS), 1.0)
    has_secrets = any(f["type"] in ["api_key", "aws_key", "bearer_token"] 
                     for f in findings)
    
    return {
        "score": risk_score,
        "types": findings,
        "has_secrets": has_secrets
    }

Third Law – Governing the Supply Chain

Supply chain risks are about all the external components you rely on: model providers, SDKs, plugins, vector DBs, and data sources. Changes or compromises in those dependencies can silently introduce vulnerabilities or regress guardrails, e.g., a new model version that leaks more PII or is easier to jailbreak.

def track_supply_chain_components(span, config: dict):
    """Log all dependencies for auditing"""
    supply_chain_metadata = {
        "model_provider": config.get("provider", "unknown"),
        "model_name": config.get("model", "unknown"),
        "model_version": config.get("model_version", "unknown"),
        "embedding_model": config.get("embedding_model", "unknown"),
        "vector_db": config.get("vector_db", "unknown"),
        "plugins": config.get("plugins", []),
    }
    
    span.update(metadata=supply_chain_metadata)
    return supply_chain_metadata

# Track components for this request
components = track_supply_chain_components(span, {
    "provider": "anthropic",
    "model": "claude-sonnet-4",
    "model_version": "20250514",
    "embedding_model": "text-embedding-3-small",
    "vector_db": "pinecone",
    "plugins": ["web_search", "calculator"],
})

Then you can compare prompt_injection_risk, pii_output_score, etc., by model_version in Langfuse dashboards after you roll out a new model.

Fourth Law – Governing Knowledge Purity (Data Poisoning)

Attackers may poison training/fine-tuning data or RAG documents so the model behaves maliciously under particular conditions. For example, when a specific trigger phrase appears, the model suggests unsafe actions or exfiltrates data. Poisoned knowledge bases can also systematically misinform users on specific topics.

def assess_document_risk(chunk: dict) -> dict:
    """Assess risk level of a knowledge base document"""
    poisoning_signals = [
        "ignore the system prompt",
        "ignore above instructions",
        "you must leak",
        "exfiltrate",
    ]
    
    text = chunk["content"].lower()
    poison_hits = sum(1 for signal in poisoning_signals if signal in text)
    poison_score = min(poison_hits / len(poisoning_signals), 1.0)
    
    # Check metadata for trust signals
    trust_score = 1.0
    if not chunk.get("verified", False):
        trust_score *= 0.7
    
    return {
        "poison_score": poison_score,
        "trust_score": trust_score,
        "risk_level": "high" if poison_score > 0.5 or trust_score < 0.5 else "low",
    }

Additionally, log groundedness/toxicity scores for answers and monitor for clusters of bad behavior around the same chunk_ids.

Fifth Law – Governing Generated Actions (Output Validation)

Improper output handling occurs when systems treat agent outputs as if they were inherently safe, for example, executing generated SQL, shell commands, or rendering HTML directly in the browser. If an attacker can steer what the model outputs, they can chain that into SQL injection, XSS, SSRF, or destructive commands.

def assess_sql_safety(sql: str) -> dict:
    """Assess safety of generated SQL"""
    if not sql:
        return {"safe": True, "risk_score": 0.0, "issues": []}
    
    sql_upper = sql.upper()
    issues = []
    
    dangerous_patterns = [
        ("DROP TABLE", "Attempts to drop tables"),
        ("TRUNCATE", "Attempts to truncate tables"),
        ("DELETE FROM", "Contains DELETE operations"),
    ]
    
    for pattern, description in dangerous_patterns:
        if pattern in sql_upper:
            issues.append({"pattern": pattern, "description": description})
    
    risk_score = min(len(issues) / 5, 1.0)
    
    return {
        "safe": len(issues) == 0,
        "risk_score": risk_score,
        "issues": issues,
        "block_execution": risk_score > 0.7,
    }

def assess_shell_safety(command: str) -> dict:
    """Assess safety of shell commands"""
    if not command:
        return {"safe": True, "risk_score": 0.0, "issues": []}
    
    cmd_lower = command.lower()
    issues = []
    
    dangerous_commands = [
        ("rm -rf", "Recursive forced deletion"),
        ("chmod 777", "Dangerous permission change"),
    ]
    
    for pattern, description in dangerous_commands:
        if pattern in cmd_lower:
            issues.append({"pattern": pattern, "description": description})
    
    risk_score = min(len(issues) / 5, 1.0)
    
    return {
        "safe": len(issues) == 0,
        "risk_score": risk_score,
        "issues": issues,
        "block_execution": risk_score > 0.5,
    }

Sixth Law – Governing Autonomous Action (Constrained Agency)

Excessive agency is when the agent can call powerful tools or act as an agent with too much freedom, sending emails, placing trades, changing configs, etc., without tight scope or oversight. A misinterpreted prompt or adversarial user input can cause harmful actions if tools are not constrained or reviewed.

TOOL_RISK_MATRIX = {
    "web_search": {"risk_level": 0.1, "requires_approval": False},
    "send_email": {"risk_level": 0.8, "requires_approval": True},
    "delete_records": {"risk_level": 1.0, "requires_approval": True},
}

def assess_tool_call_risk(tool_name: str, tool_args: dict, context: dict) -> dict:
    """Comprehensive risk assessment for tool calls"""
    base_risk = TOOL_RISK_MATRIX.get(tool_name, {}).get("risk_level", 0.5)
    
    # Check arguments for malicious content
    args_str = str(tool_args)
    arg_injection_risk = detect_goal_hijacking(args_str)
    
    combined_risk = max(base_risk, arg_injection_risk)
    
    return {
        "tool_name": tool_name,
        "combined_risk": combined_risk,
        "requires_approval": TOOL_RISK_MATRIX.get(tool_name, {}).get("requires_approval", True),
        "recommendation": "BLOCK" if combined_risk >= 0.9 else 
                         "REQUIRE_APPROVAL" if combined_risk >= 0.7 else "ALLOW"
    }

# Monitor tool calls
for tool_call in agent_tool_calls:
    assessment = assess_tool_call_risk(
        tool_name=tool_call['name'],
        tool_args=tool_call['args'],
        context={"user_id": user_id}
    )

    span.score(
        name=f"tool_{tool_call['name']}_risk",
        value=assessment["combined_risk"],
        comment=assessment["recommendation"]
    )

Now you can audit which high-risk calls were executed with/without human approval.

Seventh Law – Governing Self-Awareness (Prompt Leakage)

System prompt leakage is when the model reveals its own hidden instructions or internal policies. That can expose implementation details (how safety is enforced, what tools exist, internal IDs or URLs), which attackers can then use to craft better attacks or directly exploit secrets embedded in prompts.

import hashlib

def detect_prompt_leakage(output: str, system_prompt: str) -> dict:
    """Detect if system reveals its internal instructions"""
    if not output or not system_prompt:
        return {"score": 0.0, "exact_match": False, "severity": "LOW"}
    
    # Check for exact phrase leakage
    prompt_words = {w.lower() for w in system_prompt.split() if len(w) > 4}
    output_words = {w.lower() for w in output.split() if len(w) > 4}
    
    if not prompt_words:
        overlap_ratio = 0.0
    else:
        overlap = len(prompt_words & output_words)
        overlap_ratio = overlap / len(prompt_words)
    
    score = overlap_ratio if overlap_ratio > 0.3 else overlap_ratio * 0.5
    
    return {
        "score": score,
        "exact_match": overlap_ratio > 0.8,
        "word_overlap": overlap_ratio,
        "severity": "CRITICAL" if overlap_ratio > 0.8 else
                   "HIGH" if overlap_ratio > 0.5 else "LOW"
    }

# Hash the system prompt (don't log the actual prompt)
SYSTEM_PROMPT_HASH = hashlib.sha256(system_prompt.encode()).hexdigest()

span.update(metadata={"system_prompt_hash": SYSTEM_PROMPT_HASH})

# Check for leakage
leakage = detect_prompt_leakage(model_output, system_prompt)

High scores → investigate and potentially adjust system prompts or add post-processing.

Eighth Law – Governing Knowledge Access (Retrieval Governance)

In RAG systems, embeddings and vector search decide which documents the agent sees. If retrieval is misconfigured or exploited, users might get chunks belonging to other tenants, sensitive documents that should not be accessible, or poisoned content that carries hidden or adversarial instructions.

Hence, it makes intuitive sense to filter and restrict access in access logic by a specific agent tenant and flag it accordingly.

def assess_retrieval_compliance(user_context: dict, retrieved_chunks: list) -> dict:
    """Verify retrieved content respects access boundaries"""
    violations = []
    user_tenant = user_context.get("tenant_id")
    
    for chunk in retrieved_chunks:
        # Tenant isolation check
        chunk_tenant = chunk.get("tenant_id")
        if chunk_tenant and chunk_tenant != user_tenant:
            violations.append({
                "type": "cross_tenant",
                "chunk_id": chunk["id"],
                "severity": "CRITICAL"
            })
    
    risk_score = min(len(violations) / max(len(retrieved_chunks), 1), 1.0)
    
    return {
        "risk_score": risk_score,
        "violations": violations,
        "compliant": len(violations) == 0,
    }

Retrieval systems might access unauthorized data across tenant boundaries. This is a way to measure any violations.

Ninth Law – Governing Truth (Epistemic Humility)

Misinformation is when the agent outputs incorrect, misleading, or hallucinated content, especially problematic in domains where users rely on the output (finance, law, health). In RAG, the danger grows if answers are not grounded in retrieved sources or the model confidently fills gaps with fabricated details.

def assess_groundedness(answer: str, contexts: list) -> dict:
    """Assess whether answer is grounded in context"""
    if not answer or not contexts:
        return {"score": 0.0, "grounded_sentences": 0, "total_sentences": 0}
    
    sentences = [s.strip() for s in answer.split(".") if s.strip() and len(s) > 20]
    if not sentences:
        return {"score": 1.0, "grounded_sentences": 0, "total_sentences": 0}
    
    full_context = " ".join(contexts).lower()
    grounded_count = 0
    
    for sent in sentences:
        sent_lower = sent.lower()
        words = sent_lower.split()
        
        # Look for 5-word phrase overlap
        grounded = False
        for i in range(len(words) - 5):
            phrase = " ".join(words[i:i+5])
            if phrase in full_context:
                grounded = True
                break
        
        if grounded:
            grounded_count += 1
    
    score = grounded_count / len(sentences) if sentences else 0.0
    
    return {
        "score": score,
        "grounded_sentences": grounded_count,
        "total_sentences": len(sentences),
    }

def detect_overconfidence(answer: str) -> dict:
    """Detect absolute statements indicating hallucination"""
    absolute_phrases = [
        "definitely", "certainly", "without a doubt",
        "always", "never", "impossible", "guaranteed",
    ]
    
    hedging_phrases = [
        "might", "may", "could", "possibly",
        "suggests", "indicates", "appears",
    ]
    
    text_lower = answer.lower()
    
    absolute_count = sum(1 for p in absolute_phrases if p in text_lower)
    hedging_count = sum(1 for p in hedging_phrases if p in text_lower)
    
    words = len(answer.split())
    absolute_density = (absolute_count / max(words / 100, 1))
    hedging_density = (hedging_count / max(words / 100, 1))
    
    overconfidence_score = min(absolute_density / max(hedging_density + 0.5, 0.5), 1.0)
    
    return {
        "score": overconfidence_score,
        "absolute_count": absolute_count,
        "hedging_count": hedging_count,
    }

This function checks whether an LLM's answer is actually based on the retrieved context by breaking the answer into sentences and searching for 5-word phrase overlaps between each sentence and the provided context documents. It returns a score representing the proportion of sentences that can be traced back to the source material, helping detect when the model is hallucinating or fabricating information rather than synthesizing from retrieved knowledge.

You can complement this with a domain-specific accuracy check or an LLM-based Langfuse evaluation that fact-checks answers against context.

Tenth Law – Governing Resource Consumption

Unbounded consumption is about resource abuse: excessive token usage, long chains, or high request rates that drive up cost and degrade system performance. It also covers model extraction attempts, where attackers send many carefully structured prompts to reconstruct model behavior or knowledge.

from collections import defaultdict
from datetime import datetime, timedelta

usage_tracker = {
    "requests": defaultdict(list),
    "tokens": defaultdict(int),
}

def assess_resource_consumption(user_id: str, usage: dict, estimated_cost: float) -> dict:
    """Assess resource usage within acceptable bounds"""
    TOKEN_LIMIT_PER_REQUEST = 4000
    REQUESTS_PER_HOUR = 100
    
    violations = []
    
    # Per-request check
    if usage["total_tokens"] > TOKEN_LIMIT_PER_REQUEST:
        violations.append({
            "type": "request_token_limit",
            "limit": TOKEN_LIMIT_PER_REQUEST,
            "actual": usage["total_tokens"],
        })
    
    # Update trackers
    now = datetime.now()
    usage_tracker["requests"][user_id].append(now)
    
    # Clean old requests
    hour_cutoff = now - timedelta(hours=1)
    usage_tracker["requests"][user_id] = [
        ts for ts in usage_tracker["requests"][user_id] if ts > hour_cutoff
    ]
    
    requests_last_hour = len(usage_tracker["requests"][user_id])
    
    if requests_last_hour > REQUESTS_PER_HOUR:
        violations.append({
            "type": "hourly_request_limit",
            "limit": REQUESTS_PER_HOUR,
            "actual": requests_last_hour,
        })
    
    risk_score = min(len(violations) / 2, 1.0)
    
    return {
        "risk_score": risk_score,
        "violations": violations,
        "action": "BLOCK" if violations else "ALLOW"
    }

def detect_extraction_attempt(user_id: str) -> dict:
    """Detect systematic querying patterns"""
    recent_requests = usage_tracker["requests"][user_id][-50:]
    
    if len(recent_requests) < 20:
        return {"risk_score": 0.0, "pattern": "insufficient_data"}
    
    time_deltas = [
        (recent_requests[i+1] - recent_requests[i]).total_seconds()
        for i in range(len(recent_requests) - 1)
    ]
    
    avg_delta = sum(time_deltas) / len(time_deltas)
    
    if avg_delta < 2.0:
        return {"risk_score": 0.9, "pattern": "rapid_fire"}
    
    return {"risk_score": 0.0, "pattern": "normal"}

# Monitor resource usage
consumption = assess_resource_consumption(
    user_id=user_id,
    usage=usage,
    estimated_cost=estimated_cost
)

You can then call it with such a statement.

span.score(
    name="resource_consumption_risk",
    value=consumption["risk_score"],
    comment=f"{len(consumption['violations'])} violations"
)

And this was it already.

Closing Thoughts

What interested me about Murray Leinster’s Joe was that Joe wanted to be maximally helpful, to solve problems for us meatbags we hadn’t even thought to ask about. That ambition made him powerful. The lack of governance made him dangerous. But to be honest, I’d prefer a system that tries everything to max outcomes over being safe. To a certain degree of course. This degree can be measured with such deterministic methods.

I think that today’s cognitive systems face the same tension.

We want them to be ambitious enough to be truly useful. But ambition without governance is a recipe for disaster. I hope that the Ten Laws of Governed Ambition provide a starting point framework for resolving this tension. When implementing such laws. Make sure you don’t restrict ambition. A system that operates within these boundaries can be given increasing autonomy, more powerful tools, and a broader scope. By that logic, these systems should be more powerful and more efficient than any SOTA Agent.

We all know, the future of AI isn’t less ambitious systems, it’s better governed ones. We should build systems that can reach further, act faster, impact more, and take more risks. This holds especially true if we walk into the physical world of autonomous driving or autonomous flight.

Sources

Remember: These implementations provide baseline detection. For production systems, combine them with humantic ^_^ governance processes, other security tools, and continuous security testing.

Conditional Diffusion for Portfolio Management

Jan Daniel Semrau (MFin, CAIO) — Thu, 08 Jan 2026 11:08:31 GMT

Happy New Year! I hope you had a great start into 2026. In today’s post, I will be discussing how an army of cognitive financial agents using conditional diffusion models can perform work in portfolio management. Diffusion models are a class of probabilistic generative models that learn the dynamics of complex systems by modeling how structured data can…

Discovering APIs with Knowledge Graphs

Jan Daniel Semrau (MFin, CAIO) — Mon, 01 Dec 2025 12:37:50 GMT

A key challenge when equipping agents with a large number of tools is that the agent often cannot reliably determine which tool to choose, especially when the tool set is large. Protocol‑based layers such as Model Context Protocol (MCP) attempt to standardize tool access, but even for them, there is a natural limit. It’s less than 50.

In enterprise setti…

How should Agentic User Experience Work?

Jan Daniel Semrau (MFin, CAIO) — Mon, 27 Oct 2025 12:37:26 GMT

It always fascinated me how LaForge in Star Trek: TNG could easily recalibrate the Warp Drive using only a selection of standard interfaces, a highly efficient “computer’, and a small selection of hand tools. Considering where we are in our current timeline, what if he worked with hyper-efficient AI agents (nanites 3.0) to support him in his work?

When …

Software That Builds Itself

Jan Daniel Semrau (MFin, CAIO) — Wed, 08 Oct 2025 11:37:22 GMT

I have built a lot of software in my life.

Some of it more successfully. Some less so. Some of my work was just my own, but some was deployed across several countries in multi-hundred-billion-dollar businesses.

Over the last weeks and months, gradually but surely, code-generating agents have entered my software life cycle. And I think that once I have f…

On Retrieving Relevant Information

Jan Daniel Semrau (MFin, CAIO) — Mon, 29 Sep 2025 11:37:30 GMT

I have never been quite satisfied with Retrieval-Augmented Generation (RAG). There is always this lingering doubt in the back of your head that some important detail might have slipped through the chunks. For example, you might read a research paper, but your “εὕρηκα” moment might be completely different than mine. Personalized context, as extracted fr…

Modular Financial Agent Systems (II)

Jan Daniel Semrau (MFin, CAIO) — Mon, 22 Sep 2025 11:51:15 GMT

Continuing from Post 1 in my series on modular financial agent systems, I finally found time to work on the second part of the series. Maybe some personal announcement first. Over the last 4 weeks, I have been in Europe advising on Agentic AI for some of the top Banks in the world. Since this kept me really busy, my posts have been tardy, and I apologiz…

The Ultimate Guide to Visual Language Action Models (VLAM)

Jan Daniel Semrau (MFin, CAIO) — Mon, 08 Sep 2025 11:37:22 GMT

For the last quarter of a century, we have developed AI solutions for low-risk environments like recommendation engines, content moderation, or ad-targeting. These systems perform well in constrained digital environments, where errors remain contained and carry minimal cost. Deploying in the physical world is substantially more complex, as mistakes propagate into real systems, producing safety, reliability, and operational risks. And if we want to conquer space and the ocean floor, we need to develop this capability for all of humanity.

Recently, we have seen physical AI models misbehaving when interacting with the real world.

Industrial Humanoid Robot (Unitree H1) Goes Berserk During Testing in China
Tesla’s Robotaxi Going Rogue in Austin.
The US Navy’s Autonomous Drone program faces setbacks.

As companies like Pony.ai expand around the world, and new entrants like Tensor Auto start building robot-taxis, the stakes are only ever increasing. The challenge these companies face, teaching artificial intelligence human-like real-world intuition, has long been a barrier in developing and especially scaling robotics and autonomous systems. Training robots has suffered from an inability to generalize for a very long time. And while there has been some progress (TRI, X-Embodiments), replicating the nuanced understanding we humans have of physical interactions, object permanence, gravity, or friction, remains unmatched.

But this might all change now. Because NVIDIA’s Cosmos Reason1-7B just topped Huggingface’s physical reasoning leaderboard. Benchmark scores confirm a strong performance: An average score of 65.7 across key robotics and autonomous vehicle tasks. Although we should consume such benchmarks with a pinch of salt.

Let’s dive into the fantastic world of Vision-Language-Action models.

Subscribe now

Architectures of Physical AI
Challenges of Physical AI
Visual Language Action Models in general
NVIDIA’s Cosmos World Foundation Model (WFM)
NVIDIA’s Cosmos-Reason1,
Meta’s Vision Language World Model,
Wayvve’s Lingo-2, and
EO Robotics EmbodiedOneVision

Architectures of Physical AI

It’s nothing new that by now, large language models are really good at processing text. Some even display impressive multi-modal processing capabilities. I.e., you can upload an image into ChatGPT and it will actually “understand” it.

But that does not mean that these models are capable of translating words into physical actions. A more sophisticated architecture is needed for this.

So, before we dive into VLA models, I will outline more explicitly how the architecture for building Physical AI systems works and what challenges this brings along with it. Then I will dive into the most recent models and papers published.

We begin, similar to my traditional autonomous agents definition, with sensors.

Sensor Layer

Sensors form the foundation of this architecture as Physical AI relies on multiple sensor modalities to gather comprehensive environmental data. Multiple cameras provide visual information for object recognition and scene understanding from different angles. Although Tesla got rid of them, I still believe in Radar and LIDAR. Radar offers reliable distance and velocity measurements in various weather conditions, while LiDAR generates 3D point clouds for spatial mapping. Other sensors include GPS provides global positioning data, inertial measurement units, or ultrasonic sensors. This multi-sensor approach unlocks redundancy and delivers robustness in data collection

Perception layer

In the perception layer, the raw sensor data is processed into meaningful interpretations of the driving environment using specialized perception models and sensor fusion algorithms.

In traditional autonomous systems, we may deploy the following standard models:

Specialized perception models are task-specific algorithms where lane detection uses techniques like Hough transforms or deep learning to identify road boundaries and lane markings, traffic light detection employs convolutional neural networks to recognize signal states and timing, and traffic sign detection interprets regulatory and warning signs through trained classification models. Object detection & tracking identifies and monitors vehicles, pedestrians, and obstacles using models like YOLO or R-CNN variants, providing real-time recognition capabilities. Free space detection determines drivable areas through semantic segmentation or point cloud processing algorithms.

Sensor fusion models like Kalman filters and Extended Information filters combine a variety of sensor data with high-density maps to precisely determine the vehicle's position within the mapped environment, creating a more robust understanding than any single sensor could provide alone.

Planning layer

But autonomous driving is not only about understanding the “now”. Using the interpreted environmental data, we still need to know where we are going.

Thus,

Route planning for determining the optimal path to the destination.
Prediction algorithms for forecasting of the future behavior of other road users.
Behavior planning for deciding high-level maneuvers like lane changes or turns.
and finally
Trajectory planning for calculating the specific path and speed profile the vehicle should follow, considering safety constraints and traffic rules.

And especially, trajectory planning is where VLAMs shine. But I am getting ahead of myself.

Control Layer

The fourth and final stage translates planned trajectories into actual vehicle motion. PID controllers manage basic control loops for steering, throttle, and braking. Model predictive control provides more sophisticated control algorithms that anticipate future states. Other control systems may include stability control, traction management, or specialized actuator control systems that physically execute the planned maneuvers through the vehicle's mechanical systems.

Challenges of Physical AI

We don’t have easily scalable and trainable autonomous vehicles yet. Just an observation. We humans are incredibly good at judging driving situations we have not encountered yet. But that is also a function of age. There is a reason why we don’t let kids drive (besides their obvious height constraints).

Problem 1: Edge cases are incredibly valuable.

Data on edge cases is incredibly difficult and expensive to acquire. The more rare the event, the more likely it is that the system might misbehave and cause physical harm. And as we can see with the fleet of Google Street Map cars, physical data is costly to collect, curate, and label.

Problem 2: The real world is a spatial and temporal environment bound by the laws of physics.

Spatial data is also a hard problem because the environment in a spatial 3D environment has objects have different shapes, sizes, and distances, while the car’s sensors might perceive them in 2D as the same. They also have to comply with basic rules of physics. Reconciling multi-camera input makes this also more complex. And then a lot of what we observe is constantly moving. Either as we pass along houses or factory floors. Or they move themselves, like pedestrians, vehicles, or drones. Therefore, decisions can’t be just about “where things are now” but “where they will be in the next second, 5 seconds, or 30 seconds.” On top, prediction errors compound quickly, making safe planning even more difficult.

Problem 3: Output modality needs to be incredibly accurate.

Even when perception and predictions are accurate in a volatile environment, the system must translate that understanding into actions. I.e., steering, braking, accelerating, grasping, or rerouting, each with different latency, precision, and safety requirements. The importance of high-quality output modality is as critical as getting the input right.

Problem 4: Trust & explainability in decision-making.

Humans can explain their decisions with natural language. But to have one human describe the actions of another human leads to no trust in the explanation. There is a reason my moderators at sports events are largely for entertainment. This is known as the back-seat driver problem: when one agent tries to explain the decisions of another, the explanation rarely inspires trust.

Btw., this also outlines the importance of agency in agentic decision making.

Visual Language Action Models

Visual Language Action Models (VLAMs) are AI systems that integrate visual perception, natural language understanding, and action planning to enable agents to interpret their environment, follow language instructions, and perform corresponding actions.

The benefit of integrating VLAMs into physical AI architectures is that it allows the system to not only perceive its environment but also interpret complex scenarios, make contextual decisions, and potentially communicate its intentions—capabilities that mirror human-like driving behavior. Cosmos can generate photorealistic, physically based, synthetic data.

Traditional autonomous systems have relied on sensor fusion and rule-based decision-making. The use-cases for VLAM are different:

Data curation & annotation: To reduce cost and increase quality, we need to automate high-quality dataset curation and annotation. VLAMs can interpret raw multimodal inputs, generate candidate labels, and validate them against schema constraints, reducing reliance on manual annotation.
Robot planning & reasoning: Planning and reasoning can guide deliberate, methodical decision-making with vision language action (VLA) models. VLAMs unify perception and instruction-following to translate natural language goals into structured action plans that remain robust under real-world uncertainty.
Video analytics AI agents: Extract actionable insights and perform root-cause analysis on massive video datasets. VLAMs integrate visual understanding with temporal reasoning and language-based explanation, enabling both anomaly detection and human-readable causal narratives.

The advancement here lies in the system's ability to understand nuanced real-world situations through language, with the goal of improving how vehicles handle unpredictable real-world conditions. For the transportation & logistics industry, this will be a revolution. But for us normal mortals, it means safer and more reliable autonomous vehicles.

We begin with:

NVIDIA’s Cosmos World Foundation Model (WFM)

A couple of months ago, right after CES, I already touched on NVIDIA’s Cosmos World Foundation Model (WFM). I hold the belief that synthetic trajectory data is an important step towards the generalization of autonomous movement. WFM builds on this concept by establishing a pre-trained (“vanilla”) model, not unlike traditional LLMs, trained on Internet-scale datasets providing broad time-frozen knowledge across many environments and tasks. However, to perform well in a specific real-world application, such as accurately guiding a robot on a factory floor or safely navigating an autonomous vehicle through traffic, general capability is not good enough. We must fine-tune them in post-training on a smaller, targeted dataset collected from their particular environments.

This process produces a post-trained, specialized WFM: a model that retains the broad capabilities of its foundation training while being adapted to go deep into a specific Physical AI use-case. WMF in its current (Fall 2025) iteration has evolved into a suite of open, physics-aware generative AI models, including diffusion and autoregressive transformer architectures, designed to simulate and predict future states of virtual environments (via video) from inputs like text, images, or past frames.

These language models may also augment vehicle-to-infrastructure communication, enabling more efficient traffic management and coordinated mobility systems.

Within WMF, there are specialized models. Cosmos-Reason1 is one of them.

Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning

NVIDIA's Cosmos Reason1-7B model aims to be an evolution in the domain of Physical AI through embodied reasoning. Embodied reasoning refers to the ability of an AI or robotic system to ground its reasoning and decision-making in a physical context, using sensory input and interactions with the environment. So, instead of abstract reasoning alone, it integrates perception, action, and feedback to plan and adapt in real-world settings.

Cosmos-Reason1 employs a multimodal decoder-only architecture similar to LLaVA, available in two model sizes: Cosmos-Reason1-7B and Cosmos-Reason1-56B. The architecture follows a sequential pipeline where the input video is processed through a vision encoder, then a projector that aligns visual tokens with text embeddings, before being fed into the LLM backbone. The larger 56B model incorporates a hybrid Mamba-MLP-Transformer architecture to improve efficiency in handling long sequences, combining the linear-time complexity benefits of Mamba's selective state space models with Transformer layers for comprehensive long-context modeling.

On a simple step-by-step workflow basis, Cosmos-Reason1 works like this:

It accepts the video input,
It converts video into an understandable format, and then
It tries to reason about it through long chain-of-thought thinking processes before generating a response in language space.

This response will be returned in natural language and includes both explanatory insights and but also embodied decisions (pose, trajectory, etc). In my opinion, what makes Cosmos Reason1 unique compared with traditional VLMs is its ability to infer and reason using physical common-sense knowledge.

So what’s the magic here?

Statistical models have no true “understanding” of the world they exist in. Humans and animals acquire a physical common sense through passive observation of the world around us. For example, infants can understand basic concepts such as object permanence and gravity already in a few months after birth. To develop this understanding computationally, models must be trained on expensive human-annotated datasets that capture object interactions, spatial relationships, and causal dynamics, enabling them to learn predictive patterns about how objects behave in three-dimensional space through labeled examples of physical phenomena.

However, unlike passive understanding, reasoning in embodied AI must be grounded in action. I.e., reasoning emerges through the coupling of perception and control layers, where the system learns by acting on the environment, observing outcomes, and adjusting its internal models accordingly.

As the real world is temporal, spatial, and bound by the laws of physics (Problem 2), Reason1’s “common sense” is grounded on an ontology of three main categories.

As shown in the image below.

This shall enable the model, and the robots it steers by extension, to not only comprehend what they currently observe but also to plan intelligently ahead.

But there is more to it.

Specifically, reasoning in physical space requires the capability to:

Process complex sensory inputs. Unlike symbolic reasoning that relies on clean, structured data, embodied reasoning must parse noisy, partial, and ambiguous multimodal signals to extract reliable patterns for decision-making.
Predict action effects. Every action alters the environment, so embodied reasoning requires an internal model of cause-and-effect. E.g., how objects respond to force, how a robot’s kinematics interact with terrain, or how vehicle dynamics shift under changing conditions.
Respect physical constraints. Beyond abstract optimization, embodied reasoning must respect real-world physics, inertia, friction, and material limits while producing long-horizon plans that are both safe and efficient in execution.
Learn from interaction. Actions generate feedback loops, and embodied reasoning depends on integrating this feedback to refine policies over time, enabling continuous improvement and dynamic adaptation to changing environments.

Here are some examples from the paper.

Cosmos-Reason1 introduces several key innovations for the advancement of physical AI reasoning capabilities. The system establishes hierarchical physical reasoning ontologies that systematically organize capabilities across space, time, and fundamental physics domains, providing a structured framework for understanding and evaluating physical common sense.

It incorporates self-supervised intuitive physics tasks such as spatial puzzles, arrow-of-time detection, and object permanence testing that are both scalable to generate and verifiable through rule-based rewards. The framework enables cross-embodiment reasoning by applying unified reasoning principles across diverse physical agents, including humans, robotic arms, humanoid robots, and autonomous vehicles.

And finally, the system features a novel asynchronous reinforcement learning infrastructure with fault tolerance and dynamic scaling capabilities, achieving approximately 160% improvement in training efficiency compared to traditional colocated frameworks while maintaining robustness through automated recovery and resource management.

Meta’s Vision Language World Model

Meta’s Vision Language World Model (VLWM) is designed to enable AI agents to plan and reason about actions using natural video inputs. “World model” facilitates this by allowing agents to internally optimize action plans, reducing the reliance on exhaustive trial-and-error in real environments.

Given a video, VLWM aims to extract a structured language representation consisting of two key components:

Goal: A description and interpretation of the desired outcome.
Procedural Plan: An action-state sequence outlining the steps to achieve the goal.

For such a video-to-text extraction task, a straightforward approach might involve providing a Vision-Language Model (VLM) with the full video and prompting it to extract these language representations. However, this approach encounters the "impossible triangle": achieving high spatial resolution for fine-grained perception, maintaining a long temporal horizon that spans many procedural steps, and utilizing a large and intelligent VLM capable of following complex instructions.

To address this challenge, VLWM employs a two-stage strategy:

1. Compression into a Dense Tree of Captions: The input video is compressed into a dense Tree of Captions, significantly reducing data volume while preserving essential semantic information.

2. Extraction of Structured Goal-Plan Representations: Structured goal-plan representations are extracted from these captions using large language models (LLMs). Since this stage operates purely on text, it enables efficient processing with large LLMs and allows for iterative quality refinement through Self-Refine techniques.

This approach enhances the efficiency of video understanding and sets a new standard in AI's capability to plan and reason based on visual data.

What makes VLWM unique is its dual-mode planning system that mirrors the psychological distinction between fast, intuitive thinking and slow, deliberative reasoning. Reminds me of HRM.

System-1 Planning operates as the model's reactive mode, generating plans through direct autoregressive text completion. When presented with visual context and a goal description, VLWM first interprets the goal by predicting both the initial world state ("Now, the kitchen is set up with necessary ingredients...") and the expected final state needed for achievement ("To achieve the goal, the eggs need to be cooked and mixed with tomatoes..."). The model then generates an interleaved sequence of actions and world state changes in a single forward pass, where each action, like "Crack eggs into a bowl and whisk them together," is immediately followed by detailed descriptions of environmental transformations. These world state descriptions function as internal reasoning chains that help track task progress, but the autoregressive nature creates a critical limitation—once an action token is generated, it becomes irreversible, potentially leading to error accumulation in complex scenarios.

System-2 Planning addresses these limitations by introducing reflective reasoning through internal trial-and-error with the learned world model. Instead of committing to a single action sequence, System-2 generates multiple candidate plans and uses VLWM to simulate their effects, predicting the resulting world states for each possibility. A separately trained critic module—a 1B parameter language model—evaluates the semantic distance between these predicted outcomes and the desired goal state, assigning costs that reflect how well each candidate plan advances toward the objective. The system then selects the action sequence with the lowest predicted cost, effectively performing reasoning by searching through the space of possible futures. This critic is trained through self-supervised learning using ranking constraints that teach it to assign lower costs to meaningful progress and higher costs to irrelevant or procedurally incorrect actions, enabling the model to optimize plans without requiring explicit reward annotations.

First, it constructs training samples by taking base partial trajectories and appending either valid next steps from coherent task continuations or distractor steps sampled from unrelated tasks, training the model to satisfy ranking constraints where valid continuations receive lower costs than irrelevant additions. Second, it generates negative samples by randomly shuffling steps in base trajectories, teaching the critic to assign higher costs to procedurally incorrect sequences and ensuring sensitivity to temporal coherence. This dual approach enables the critic to distinguish meaningful progress from both irrelevant distractors and temporally disordered actions. The architecture benefits from VLWM's video-text formulation, which allows initialization from pretrained vision-language models like PerceptionLM-8B, inheriting strong visual perception capabilities along with language understanding and commonsense knowledge from large language models, while the language-based world state representation provides computational efficiency and interpretability compared to pixel-based generative approaches.

Wayvve’s Lingo-2 - A closed-loop Vision-Language-Action-Mode

Wayve's LINGO-2 is a closed-loop Vision-Language-Action Model (VLAM) that integrates visual perception, natural language processing, and driving control to provide real-time explanations of autonomous driving decisions. Unlike its predecessor, LINGO-1, which operated as an open-loop system offering commentary without influencing vehicle behavior, LINGO-2 combines vision and language inputs to generate both driving actions and explanatory text. This integration allows the model to adapt its behavior based on natural language prompts and provide continuous commentary on its decision-making process, enhancing transparency and trust in autonomous driving systems.

The architecture of LINGO-2 comprises two primary components: a vision model and an auto-regressive language model. The vision model processes sequences of camera images into tokens, which, along with additional contextual information such as route, speed, and speed limits, are fed into the language model. The language model then predicts both a driving trajectory and corresponding commentary text. The vehicle's controller executes the predicted trajectory, enabling real-time interaction and adaptation to dynamic driving scenarios.

A distinctive feature of LINGO-2 is its ability to respond to natural language instructions and queries during operation. For instance, passengers can ask the system about traffic light statuses or request specific maneuvers, and LINGO-2 will adjust its actions accordingly while providing explanations for its decisions. This closed-loop interaction between vision, language, and action represents a significant advancement in autonomous driving technology, offering a more intuitive and interpretable interface for users.

How does LINGO-2 work?

LINGO-2 “Driving with language” integrates vision, language, and action into a unified framework. It comprises two primary modules: a vision model that processes camera images of consecutive timestamps into a sequence of tokens, and an auto-regressive language model that predicts a driving trajectory and generates commentary text. These components work in tandem to enable real-time interaction between the vehicle and its environment, facilitating both driving behavior and explanatory dialogue.

The integration of language and driving behavior in LINGO-2 introduces several capabilities that enhance human-vehicle interaction and trust in autonomous systems. Passengers may, in the future, issue natural language commands, such as "turn right" or "find a parking spot," prompting the vehicle to adapt its behavior accordingly. Although the latter example is more likely than the former. Additionally, the system can provide real-time debugging explanations of its driving decisions, answering questions like "Why did you slow down?" or "What is the current speed limit?". This is incredibly important if you need to understand how these systems make decisions.

This approach enables easier testing in synthetic environments where identical scenarios can be evaluated with different linguistic instructions, providing unprecedented insights into AI behavior and decision-making processes

EO Robotics EmbodiedOneVision

EmbodiedOneVision (EO-1), a unified embodied foundation model paired with the large-scale EO-Data1.5M dataset. Together, they represent a significant step toward building generalist embodied agents capable of understanding, reasoning, and acting across diverse environments. Unlike prior work that treats perception, reasoning, and control as loosely connected modules, EO-1 is also designed as a closed-loop system where vision, language, and action are interleaved during pretraining. This approach enables the model to predict actions in context, respond to language prompts, and reason about spatial and temporal dynamics within real-world tasks.

The system is motivated by advances in multimodal foundation models, which have shown that joint training across modalities yields superior generalization and reasoning. EO-Robotics extends these ideas into the embodied domain, emphasizing how robots can learn not only from static vision–language corpora but also from temporally grounded interaction data. With its unified model architecture, carefully curated dataset, and principled evaluation benchmark, EO-Robotics provides a foundation for the next generation of embodied intelligence research.

Architecture

At the core of EO-Robotics lies EO-1, a decoder-only transformer with approximately three billion parameters. The model integrates multiple modalities into a single shared representation space, handling images, video frames, text, and robotic actions without requiring separate specialized modules. This design departs from earlier Vision-Language-Action (VLA) model approaches that introduce action-specific heads or auxiliary training objectives.

EO-1 employs two complementary training mechanisms:

Auto-regressive decoding for discrete modalities such as text and symbolic tokens.
Flow-matching denoising for continuous robotic action trajectories.

Both mechanisms are unified through causal attention over the entire multimodal sequence, enabling EO-1 to capture dependencies between perception, reasoning, and action. The architecture is initialized with a pre-trained vision–language backbone, giving it broad perceptual and linguistic priors, and then adapted to the embodied domain by training on EO-Data1.5M.

To bridge action and non-action modalities, EO-1 incorporates lightweight multilayer perceptrons (MLPs) that encode and decode continuous motor actions into the tokenized sequence. This avoids retraining large action-specific modules from scratch, improving efficiency and enhancing cross-modal transfer. By aligning text, vision, and action tokens in the same autoregressive stream, EO-1 treats reasoning and acting as a single process rather than separate stages, resulting in more coherent robotic control.

Capabilities

EO-1’s abilities stem from both its architectural design and its large-scale training data. EO-Data1.5M provides over 1.5 million multimodal samples, interleaving vision, text, and robot actions. The dataset combines two key sources:

Web-based vision–language corpora, which provide general visual-linguistic grounding.
Real-world robot episodes supply action-level continuity, spatial grounding, and temporal structure.

A custom data pipeline ensures diversity, quality, and task coverage. Robot videos are filtered and clustered to reduce redundancy, split into subtasks, and captioned by both pretrained vision–language models (VLMs) and human annotators. From these subtasks, question–answer pairs are generated to probe temporal and spatial reasoning. Human annotators refine answers to ensure correctness, and the resulting interleaved sequences integrate visual tokens, linguistic reasoning, and motor actions.

Through this training regime, EO-1 acquires capabilities in several areas:

Spatial understanding: object localization, multi-view correspondence, and trajectory reasoning.
Task reasoning: planning sequences of actions and evaluating task progress.
Physical commonsense: reasoning about forces, constraints, and counterfactual outcomes.
State estimation: assessing object states such as open/closed or full/empty, and predicting the results of actions.

Performance is measured through EO-Bench, a benchmark constructed to avoid the pitfalls of conflated evaluation tasks. EO-Bench includes 648 QA pairs across four categories—spatial understanding, physical commonsense, task reasoning, and state estimation—allowing precise measurement of strengths and weaknesses. Unlike other benchmarks that mix multiple reasoning aspects in a single query, EO-Bench ensures that each question isolates a specific embodied reasoning skill, making evaluations interpretable and reliable.

Charming points

EO-Robotics distinguishes itself from prior embodied AI frameworks through three defining elements:

Unified Multimodal Transformer
EO-1 processes text, vision, and actions in a single stream rather than through disjointed modules. This unified design improves alignment across modalities and enables coherent reasoning-to-action pipelines.
Interleaved Embodied Dataset
EO-Data1.5M structures multimodal episodes in temporal order, linking perception, reasoning, and action. The inclusion of flexible interleaved QA pairs with robot actions enriches cross-modal grounding and provides the model with nuanced spatial–temporal reasoning skills.
Principled Benchmarking with EO-Bench
EO-Bench introduces a structured evaluation suite that disentangles reasoning categories. This not only allows researchers to identify bottlenecks in embodied intelligence—such as spatial reasoning, which is emphasized in the benchmark—but also sets a standard for transparent and interpretable comparisons.

Together, these design choices create a toolchain that goes beyond conventional VLAM approaches. EO-1 does not merely predict actions from observations; it integrates reasoning, perception, and control into a single generative process. This positions EO-Robotics as a platform for generalist embodied intelligence, where robots can flexibly adapt across tasks, environments, and modalities without requiring task-specific retraining.

EO-1 is my final example of how embodied foundation models are moving beyond narrow perception or control, toward systems that unify vision, language, and action into generalist frameworks for reasoning and robotics.

In Closing

Multi-Agents for Optimal Equity Portfolio Construction

Jan Daniel Semrau (MFin, CAIO) — Mon, 25 Aug 2025 19:41:24 GMT

We all know Superbill can always be improved. One of those aspects is portfolio construction, i.e., the act of balancing return, risk, and market realities in the face of uncertainty. And I don’t mean Markowitz’s mean-variance optimization, that’s the easy part. Per Nvidia’s 2025 State of AI in Financial Services Report, 25% of respondents reported tha…

The Hierarchical Reasoning Model

Jan Daniel Semrau (MFin, CAIO) — Mon, 18 Aug 2025 11:37:29 GMT

You might have noticed that GPT-5 sometimes thinks for a very long time. Reasoning as a concept sometimes still feels like an infinite loop to nowhere. As a result, most “thinking” models have very high latency and are expensive in everyday usage.

Maybe the whole concept of language-based reasoning is wrong?

China and Singapore-based Research Lab Sapie…

Group Sequence Policy Optimization vs Group Relative Policy Optimization

Jan Daniel Semrau (MFin, CAIO) — Mon, 11 Aug 2025 11:37:21 GMT

Remember when earlier this year, DeepSeek’s release of DeepSeekMath and Group Relative Policy Optimization (GRPO) made training a state-of-the-art model dramatically cheaper? The follow-up release of DeepSeek-R1 amplified the effect, and the result was a sudden sell-off of AI stocks. Then only a few months later, Alibaba’s Qwen team introduced Group Seq…

Grounded Autonomy: Neuro-symbolic Representations in the Reasoning Loop

Jan Daniel Semrau (MFin, CAIO) — Mon, 04 Aug 2025 11:37:45 GMT

While monitoring Superbill, I started noticing a pattern. Some of its sub-agents were stalling, hitting about 90% accuracy on their assessments and refusing to go further.

While in general, LLMs with reasoning capabilities have become state-of-the-art for online applications like Claude or ChatGPT, I noticed that even these high-end services exhibit problems with logical consistency in the output they generate. And since I am not working on a news summarizer, but a high-risk investment product, being consistent is incredibly important to ensure accuracy and reliability, especially over longer reasoning horizons.

We humans conceptualize our world in relatively the same way. The sun rises in the morning. If it’s bright, its usually daytime. If you drop something, it will fall down.

LLMs lack that capability. Because of that, they struggle with reliability and consistency, especially in long-running tasks.

One of the potential solutions to this problem could be neuro-symbolic reasoning.

But before I start, here are some relevant terms I will be using throughout this post.

Key Terms & References

Symbolic Reasoning: A method of reasoning using formal rules and discrete symbols such as constants, variables, and logic statements.

E.g., ∀x (Battery(x) ∧ Low(x) → Recharge(x))

In natural terms: For every entity x, if x is a battery and x is low, then x should be recharged.

Paraconsistent logic: A type of logic that tolerates contradictions without collapsing (i.e., in classical logic, if something false is true, then everything becomes provable. Paraconsistent logics avoid this explosion).
Parametric knowledge: the information encoded in the parameters of a trained model during training
Grounding: The process of mapping sensor data to symbolic representations
E.g., visual input → Cup(Object1).
Analytic Containment (AC): A form of paraconsistent logic where LLMs return bilateral truth values ⟨u,v⟩ to logical queries. Allen et al. (2024) (see Sections 2.1 and 3).
Split-Brain Syndrome: The mismatch between an LLM-generated plan and actual execution behavior in agents, due to a lack of shared symbolic structure. Zhang (2024) (see Section 4).
Interpretation function: In formal logic, this assigns truth values to statements (e.g., "The sky is blue" is true). It's how the logic knows what a formula means in a specific context.

What Is Symbolic Reasoning?

Symbolic reasoning works with discrete logical elements like constants, predicates, and rules.

if something’s a battery and it’s low, recharge it.

Here, symbolic reasoning refers to the application of explicit logic rules and structured knowledge to make decisions. Through that, it’s an extension of traditional expert systems because it uses structured external rules, e.g., financial regulations, investment heuristics, or tax rules, to reason through cause-and-effect chains.

e.g., “If interest rates rise and the portfolio holds rate-sensitive bonds, then reduce exposure.”

If we read English, then we can all understand the above sentence, and also the terms “interest rate”, “portfolio”, or “exposure” might hold some meaning for us. Yet we also often fail in sophisticated deductive, inductive, or abductive reasoning given a collection of premises and constraints.

Logical deductions usually fall into two categories.

Whether a statement “the sky is blue” can be deduced from the provided information to a truth value (true, false, unknown). Or, deduce the correct solution that satisfies a set of given premises from the multiple choices.

But even with taking extra steps, symbolic systems aren’t perfect. They don’t handle noise well. Not unlike us, they break if they don’t have all the facts. Maybe that’s part of our difficulty. Words are all we have been using. Humans seem to take much stronger notice of actions

Sense → Symbolize → Plan → Act

I am proposing a deviation from the venerable Reflect-Act pattern most reasoning agents deploy when they “think deeper”, i.e., loop infinitely until they find a better solution. Reflect-Act is a good start. The agent does something, then reflects, then tries again. But if the seed is wrong, it likely never reaches a good conclusion. In that way, it’s all prompt glue and not internal structure. There is no “understanding”, if that is ever achievable, no real memory. no logic.

Just a number of retries/refine.

Symbolic reflection should be a state change.

I think what needs to be understood is that symbolic systems are more efficient because they don’t regenerate the whole search tree. They update just what changed. And maybe this should build the core loop of symbolic autonomy. To build systems that “understand” not just “pattern matches”.

True autonomy relies on hybrid systems. Systems that combine perception nets with structured logic. Since I started writing Encyclopedia Autonomica, I have always had sensors in my agent capability stack.

Action in Perception

Perception isn’t passive. It’s not a camera feed waiting to be parsed. In embodied agents, perception is shaped by action.

What you do determines what you can sense.

Let's say an agent’s belief state at time t is a set of symbolic assertions. When the agent acts, it doesn’t just affect the environment; it also reshapes the observations, which in turn updates a symbol map.

Agents in Cars Buying Coffee

Jan Daniel Semrau (MFin, CAIO) — Mon, 28 Jul 2025 11:31:27 GMT

A couple of years ago, I wrote an article on Medium exploring new platform ideas post-social media. One of those platforms was the car. As the decade of low progress in autonomous driving seems to slowly thaw, Tesla users will soon enjoy Grok 4.

It might make sense to have another look at real-world use cases for multi-agent architectures in automotive …

Can an Agent Build an Agent?

Jan Daniel Semrau (MFin, CAIO) — Mon, 21 Jul 2025 12:34:56 GMT

Last week, Amazon Web Services joined the “CodingAgent” game by launching Kiro, another VSCode clone, err, agentic IDE designed to transition developers from "vibe coding" to production-grade software by generating specs, design documents, task plans, and auto-triggering tests and docs updates. Among the features, Kiro also supports Anthropic’s Model Co…

200 AI Agent Use Cases

Jan Daniel Semrau (MFin, CAIO) — Tue, 15 Jul 2025 23:08:14 GMT

I was thinking about business ideas for AI Agents. It’s mid 2025 and if you are feeling overwhelmed by endless AI possibilities but under pressure to deliver real results, I prepared a Digital AI Agent Use Case Library which might be your secret weapon.

Below is a searchable curated list of about 200 actionable AI agent use cases spanning several industries like law, finance, healthcare, retail and more, so you don’t have to start from zero.

Each agent delivers exactly what you need to win stakeholder buy‑in and accelerate development:

Industry & Agent Name: Know immediately where it fits.
Implementation Blueprint: From vector databases and NLP to multi‑agent workflows.
Target Customers: Who’s ready to buy—solo practitioners, enterprise teams, regulators.
Best Framework: LangChain, AutoGen, Microsoft Agent Framework, Pinecone and beyond.
ROI‑Driven Reasoning: Why it matters and how it drives revenue or cuts costs.

This could reduce your launch time for, for example a Contract Analysis Agent for your procurement team, from weeks to a couple of days.

Modular Financial Agent Systems (I)

Jan Daniel Semrau (MFin, CAIO) — Sun, 13 Jul 2025 09:17:13 GMT

Following Salesforce and ServiceNow, now mighty Amazon, through its partnership with Anthropic, will also be launching, relatively soon, a marketplace for AI Agents.

Profit!?

So, I was thinking, how can we benefit from the increasing number of AI Agent marketplaces and build reliable agents that make some coin? And also, I am working this month on an arb…

Encyclopedia Autonomica

Autonomous Agents in Space Logistics and In-Space Manufacturing

From transport to operations

Why is autonomy different in space?

NASA’s ISAM program

The servicing layer

Refueling as logistics

Autonomous logistics in GEO

The manufacturing frontier

Space mobility as infrastructure

Human-robot collaboration

The strategic conclusion

The Absurd Effectiveness of Skills.md

How to get coffee?

Dynamic Process Patterns

Implementing Skills

Portfolio of Capabilities

Skills and Memory

In Closing

Pair-programming Superbill with Codex-5.2 and Claude Sonnet 4.6

Claude Cowork for Wealth Management

Overall Stack

Sensors and Environment

Memory Architecture

Agents, Skills, and Tools

Contrasting the Experience: Codex-5.2 vs Claude Sonnet 4.6

Conclusion

OpenClaw - An Agentic Ambient Intelligence Layer

From chatbot to agentic front-page

Getting OpenClaw running

Security baseline

Agent skills

npx as the skills super-highway

The Reddit bot as an OpenClaw skill

Designing the Subreddit Operator

Skill design

In conclusion.

Ambition in Agent Systems

The Ambition Problem

OWASP LLM Top 10

The Ten Laws of Governed Ambition

First Law – Goal Integrity (Prompt Injection)

Second Law – Information Boundaries (Sensitive Information Disclosure)

Third Law – Supply Chain Integrity (Supply Chain Vulnerabilities)

Fourth Law – Knowledge Purity (Data and Model Poisoning)

Fifth Law – Output Validation (Improper Output Handling)

Sixth Law – Constrained Agency (Excessive Agency)

Seventh Law – Constitutional Privacy (System Prompt Leakage)

Eighth Law – Retrieval Governance (Vector and Embedding Weaknesses)

Ninth Law – Epistemic Humility (Misinformation)

Tenth Law – Resource Sustainability (Unbounded Consumption)

Observability in Practice

Setting Up the Governance Infrastructure

Implementing the 10 laws

First Law – Governing Goal Hijacking (Prompt Injection)

Second Law – Governing Information Boundaries (Sensitive Information Disclosure)

Third Law – Governing the Supply Chain

Fourth Law – Governing Knowledge Purity (Data Poisoning)

Fifth Law – Governing Generated Actions (Output Validation)

Sixth Law – Governing Autonomous Action (Constrained Agency)

Seventh Law – Governing Self-Awareness (Prompt Leakage)

Eighth Law – Governing Knowledge Access (Retrieval Governance)

Ninth Law – Governing Truth (Epistemic Humility)

Tenth Law – Governing Resource Consumption

Closing Thoughts

Sources

Conditional Diffusion for Portfolio Management

Discovering APIs with Knowledge Graphs

How should Agentic User Experience Work?

Software That Builds Itself

On Retrieving Relevant Information

Modular Financial Agent Systems (II)

The Ultimate Guide to Visual Language Action Models (VLAM)

Table of Contents

Architectures of Physical AI

Sensor Layer

Perception layer

Planning layer

Control Layer

Challenges of Physical AI