RoboSystems

RoboSystems is a financial intelligence platform that connects disparate data sources, builds domain ontologies as knowledge graphs, and provides AI-powered tools for accounting, financial reporting, investment management, and analysis. It powers RoboLedger and RoboInvestor.

LadybugDB Graph Database: Embedded columnar graph database with native DuckDB staging, LanceDB vector search, and tiered infrastructure
Extensions: Domain schemas that drive OLTP tables, API routes, data pipelines, and dedicated frontend apps. Extensions share a single database with schema-per-tenant isolation and materialize to the graph
Document Search: Full-text and semantic search across SEC filings, uploaded documents, and connected sources via OpenSearch
AI-Native Architecture: Context graphs with embeddings, semantic enrichment, and confidence scoring for LLM-powered analytics
Model Context Protocol (MCP): Standardized server and client for LLM integration with schema-aware tools
Multi-Source Data Integration: SEC XBRL filings, QuickBooks accounting data via dbt pipelines, and custom financial datasets
Enterprise-Ready Infrastructure: Multi-tenant architecture with tiered scaling and production-grade query management
Developer-First API: RESTful API designed for integration with financial applications

Platform

The platform provides the core infrastructure that all extensions build on:

Dedicated Infrastructure: Tiered graph infrastructure with dedicated instances and configurable memory allocation
AI Agent System: Autonomous financial operations — graph queries, taxonomy mapping, report generation — with automatic credit tracking and SSE progress streaming
Shared Repositories: SEC XBRL filings knowledge graph for context mining and benchmarking
Document Management: Upload, index, and search documents with full-text and semantic search via OpenSearch
DuckDB Staging System: High-performance data validation and bulk ingestion pipeline
Dagster Orchestration: Data pipeline orchestration for SEC filings, QuickBooks sync, backups, billing, and scheduled jobs
Credit-Based Billing: Flexible credits for AI operations based on token usage
Subgraphs (Workspaces): AI memory graphs and isolated environments for development and team collaboration

Extensions

Extensions are domain-specific subsystems that bring their own schema, OLTP tables, API routes, data pipelines, and dedicated frontend apps. They share a single PostgreSQL database with schema-per-tenant isolation and materialize to the graph for analytical queries. See Schema Extensions for the authoring contract.

The extensions API surface is graph-scoped at the URL level — graph_id is always a path parameter, never a query argument — and splits reads from writes by transport:

Typed reads → POST /extensions/{graph_id}/graphql — Strawberry GraphQL endpoint with GraphiQL playground in dev. The schema is composed dynamically from enabled domains, so a ledger-only deployment exposes only ledger fields (no surprise runtime errors from disabled domains).
Command writes → POST /extensions/{roboledger|roboinvestor}/{graph_id}/operations/{operation_name} — named REST commands. Every command returns an OperationEnvelope with an op_<ULID> operation id, supports Idempotency-Key for safe retries, and is audit-logged. Long-running commands return status: "pending" and stream progress through /v1/operations/{operation_id}/stream.

Behind the API is a CQRS-style operations kernel (reads/ + commands/ per domain) that's the single source of truth for business logic. GraphQL resolvers, REST operation routes, and MCP tools all delegate to the same functions, so wire shapes stay byte-identical across consumers. Per-domain feature flags (ROBOLEDGER_ENABLED, ROBOINVESTOR_ENABLED) gate both the routers and the GraphQL schema composition.

See GraphQL Extensions for the read-path implementation details, the Strawberry-Pydantic auto-derivation pattern, and the walkthrough for adding a new read field.

RoboLedger

Accounting and financial reporting extension. OLTP general ledger in schema-per-tenant PostgreSQL (accounts, transactions, journal entries, line items, dimensions); 29 GraphQL read fields covering entities, accounts, trial balance, fiscal calendar, schedules, taxonomies, mappings, reports, and publish lists; 23 named command operations for closing periods, creating schedules and closing entries, managing CoA→GAAP mapping associations, and authoring multi-period reports; analytical view operations over the materialized graph; QuickBooks ELT pipeline via dbt/Dagster; SEC XBRL financial reporting; AI-powered CoA→GAAP mapping via the MappingAgent. Dedicated frontend app.

RoboInvestor

Portfolio management and investment tracking extension. OLTP database with portfolios, securities, and positions in schema-per-tenant PostgreSQL; 7 GraphQL read fields (portfolios, securities, positions, holdings) and 9 named command operations for portfolio CRUD and position management. Securities can link to entities for cross-graph research between investor portfolios and SEC public-company data via the shared repository. Dedicated frontend app.

Quick Start

Docker Development Environment

# Install uv and just
brew install uv just

# Start robosystems backend api
just start

# Start frontend apps - robosystems-app, roboledger-app, roboinvestor-app
just start apps

This initializes the .env file and starts the complete RoboSystems stack with:

Graph API with LadybugDB and DuckDB backends
Dagster for data pipeline orchestration
PostgreSQL for graph metadata, IAM and Dagster
Valkey for caching, SSE messaging, and rate limiting
OpenSearch for full-text and semantic document search
Localstack for S3 and DynamoDB emulation

Service URLs:

Service	URL
Main API	http://localhost:8000
Graph API	http://localhost:8001
Dagster UI	http://localhost:8002

With just start apps (frontend apps):

App	URL
RoboSystems App	http://localhost:3000
RoboLedger App	http://localhost:3001
RoboInvestor App	http://localhost:3002

Local Development

# Setup Python environment (uv automatically handles Python versions)
just init

Examples

See RoboSystems in action with runnable demos that create graphs, load data, and execute queries with the robosystems-client:

just demo-sec               # Loads NVIDIA's SEC XBRL data via Dagster pipeline
just demo-close             # Entity accounting month close demo
just demo-custom-graph      # Builds custom graph schema with relationship networks

Each demo has a corresponding Wiki article with detailed guides.

Development Commands

Testing

just test-all               # Tests with code quality
just test                   # Default test suite
just test adapters          # Test specific module
just test-cov               # Tests with coverage

Log Monitoring

just logs api                 # View API logs (last 100 lines)
just logs graph-api           # View Graph API logs (last 100 lines)
just logs dagster-webserver   # View Dagster Webserver logs
just logs dagster-daemon      # View Dagster Daemon logs

See justfile for 50+ development commands including database migrations, CloudFormation linting, graph operations, administration, and more.

Prerequisites

System Requirements

Docker & Docker Compose
8GB RAM minimum
20GB free disk space

Required Tools

uv for Python package and version management
just for project command runner

Deployment Requirements

Fork this repo
AWS account with IAM Identity Center (SSO)
Run just bootstrap to configure OIDC and GitHub variables

See the Bootstrap Guide for complete instructions.

Architecture

RoboSystems is built on a modern, scalable architecture with:

Application Layer:

FastAPI REST API with versioned endpoints
Extension API routes feature-flagged per module
MCP Server for AI-powered graph database access with schema-aware tools
AI Agent System for autonomous financial operations with automatic credit tracking
Dagster for data pipeline orchestration and background jobs

LadybugDB Graph Database: (configuration)

Embedded columnar graph database purpose-built for financial analytics
Base + extension schema architecture — extensions define domain models
Native DuckDB integration for high-performance staging and ingestion
LanceDB vector search for semantic element resolution (IVF-PQ indexes, 384-dim embeddings)
Tiered infrastructure with configurable memory, rate limits, and subgraph allocations
Shared tier hosts public repositories with read replicas

Data Layer:

PostgreSQL for IAM, graph metadata, Dagster, and extension OLTP databases (schema-per-tenant)
OpenSearch for full-text and semantic document search (BM25 + KNN)
Valkey for caching, SSE messaging, and rate limiting
AWS S3 for data lake storage and static assets
DynamoDB for instance/graph/volume registry

Infrastructure:

ECS Fargate for API and Dagster
EC2 ASG for LadybugDB writer clusters
EC2 ALB + ASG for LadybugDB shared replica clusters
RDS PostgreSQL + ElastiCache Valkey
OpenSearch for full-text and semantic document search
CloudFormation infrastructure deployed via GitHub Actions with OIDC

For detailed architecture documentation, see the Architecture Overview in the Wiki.

SEC Shared Repository

A curated knowledge graph of US public company financial data from SEC EDGAR XBRL filings. Runs on the shared LadybugDB tier, accessible via MCP tools, Cypher queries, and the AI agent.

Pipeline: EDGAR → Download → Process (Parquet) → Stage (DuckDB) → Enrich (fastembed) → Materialize (LadybugDB) → Index + Embed (OpenSearch)
Graph: 14 node types and 24 relationship types modeling the full XBRL reporting hierarchy
Search: Hybrid BM25 + KNN vector search across XBRL text blocks, narrative sections, and iXBRL disclosures
Enrichment: Semantic element mapping, statement classification, and disclosure tagging via the Seattle Method taxonomy

just sec-load NVDA 2025  # Load NVIDIA filings for 2025
just sec-health          # Check SEC database health

See SEC Adapter and SEC Pipeline for detailed documentation.

AI

Model Context Protocol (MCP)

Financial Analysis: Natural language queries across enterprise data and public benchmark data
Cross-Database Queries: Compare user graph data against SEC shared repository data
Tools: Rich toolkit for graph queries, schema introspection, fact discovery, financial analysis, document search, and AI memory operations
Handler Pool: Managed MCP handler instances with resource limits

Agent System

Unified architecture: stateless agents with protocol-based service injection
Dual execution: API (sync/SSE) and background worker (Valkey queue + SSE progress)
Automatic credit tracking per AI call — agents cannot forget billing
Extensible: new agents implement run(ctx) and register with a decorator
See Agent README for details

Credit System

AI Operations Only: Credits are consumed exclusively by AI agent calls (Anthropic Claude via AWS Bedrock)
Token-Based Billing: Credits based on actual token usage and model cost
MCP Tool Access: No credits consumed for MCP calls or database operations

Client Libraries

RoboSystems provides comprehensive client libraries for building applications:

MCP (Model Context Protocol) Client

AI integration client for connecting Claude and other LLMs to RoboSystems.

npx -y @robosystems/mcp

Features: Claude Desktop integration, natural language queries, graph traversal, financial analysis
Use Cases: AI agents, chatbots, intelligent assistants, automated research
Documentation: npm | GitHub

TypeScript/JavaScript Client

Full-featured SDK for web and Node.js applications with TypeScript support.

npm install @robosystems/client

Features: Type-safe API calls, automatic retry logic, connection pooling, streaming support
Use Cases: Web applications, Node.js backends, React/Vue/Angular frontends
Documentation: npm | GitHub

Python Client

Native Python SDK for backend services and data science workflows.

pip install robosystems-client

Features: Async/await support, pandas integration, Jupyter compatibility, batch operations
Use Cases: Data pipelines, ML workflows, backend services, analytics
Documentation: PyPI | GitHub

Documentation

User Guides (Wiki)

Getting Started - Quick start and overview
Bootstrap Guide - Fork and deploy to your AWS account
Architecture Overview - System design and components
Data Pipeline Guide - Dagster data orchestration and custom integrations
SEC XBRL Pipeline - Working with SEC financial data
Custom Graph Demo - Guide for creating a custom schema graph demo

Developer Documentation (Codebase)

Core Services:

Adapters - External service integrations
Operations - Business workflow orchestration, CQRS reads/commands kernels for extensions
Schemas - Graph schema definitions
Extensions GraphQL - Strawberry GraphQL read surface, Pydantic auto-derivation, resolver patterns
Configuration - Configuration management
Dagster - Data pipeline and task orchestration

Database Models:

Platform Models - SQLAlchemy models for the platform database (users, orgs, graphs, billing, connections, documents)
Extensions OLTP Models - SQLAlchemy models for the extensions database (roboledger ledger, roboinvestor portfolios) with schema-per-graph tenancy
API Models - Pydantic request/response models for core platform and extensions surfaces

Graph Database System:

Graph API - Graph API overview
Client Factory - Client factory system
Core Services - Core services layer

Middleware Components:

Authentication - Authentication and authorization
Graph Routing - Graph routing layer
MCP - MCP tools and pooling
Billing - Subscription and billing management
Observability - OpenTelemetry observability
Robustness - Circuit breakers and retry policies

Infrastructure:

CloudFormation - AWS infrastructure templates
Setup Scripts - Bootstrap and configuration scripts

Development Resources:

Examples - Runnable demos and integration examples
Tests - Testing strategy and organization
Admin Tools - Administrative utilities and cli

Security & Compliance:

SECURITY.md - Security features and compliance configuration

API Reference

Support

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 2,252 Commits
.claude		.claude
.githooks		.githooks
.github		.github
.vscode		.vscode
bin		bin
cloudformation		cloudformation
dagster_home		dagster_home
examples		examples
migrations		migrations
robosystems		robosystems
static		static
tests		tests
.cfnlintrc.yaml		.cfnlintrc.yaml
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env.example		.env.example
.env.local.example		.env.local.example
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
Dockerfile.lambda		Dockerfile.lambda
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
compose.yaml		compose.yaml
justfile		justfile
main.py		main.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

RoboSystems

Platform

Extensions

RoboLedger

RoboInvestor

Quick Start

Docker Development Environment

Local Development

Examples

Development Commands

Testing

Log Monitoring

Prerequisites

System Requirements

Required Tools

Deployment Requirements

Architecture

SEC Shared Repository

AI

Model Context Protocol (MCP)

Agent System

Credit System

Client Libraries

MCP (Model Context Protocol) Client

TypeScript/JavaScript Client

Python Client

Documentation

User Guides (Wiki)

Developer Documentation (Codebase)

API Reference

Support

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 185

Uh oh!

Contributors

Uh oh!

Languages