Automated Architecture Diagram Generation from Source Code
Archmap is an AI-powered system that automatically analyzes Git repositories and generates enterprise-grade architecture diagrams. The system combines advanced static code analysis with large language models to provide deep insights into codebase structure, dependencies, and component relationships.
Archmap employs a multi-analyzer architecture that examines code from multiple perspectives:
- AST Analysis - Parse and analyze abstract syntax trees to extract classes, functions, and relationships
- Dependency Analysis - Build module dependency graphs, detect cycles, calculate centrality metrics
- Call Graph Analysis - Map function invocation patterns, identify entry points and hotspots
- Metrics Analysis - Calculate cyclomatic complexity, maintainability indices, and Halstead metrics
- Module Analysis - Examine package structure, identify features, measure cohesion
The system produces professional Mermaid flowchart diagrams with layered architecture visualization, smart component grouping, and configurable styling themes.
The system performs parallel execution of five specialized analyzers, synthesizing results into enriched context for LLM-based architectural understanding. This multi-faceted approach provides significantly deeper insights than traditional static analysis alone.
Intelligent code sampling adjusts based on repository size:
- Standard repositories (< 500 files): 10 samples
- Medium repositories (500-1000 files): 20 samples
- Large repositories (1000-5000 files): 30 samples
- Very large repositories (5000+ files): 50 samples
Strategic sampling prioritizes core modules, entry points, and architecturally significant files.
Modular formatter design allows easy addition of new output formats. Current implementation supports Mermaid diagrams with GitHub/GitLab/Notion compatibility. PlantUML and Lucid formatters are planned.
- Python 3.13 or higher
- Git
- OpenAI API key or OpenRouter API key
git clone https://github.com/alexnicita/archmap.git
cd archmap
cd backend
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
# Configure your API keys in .env
uvicorn app.main:app --host 0.0.0.0 --port 8000curl -X POST http://localhost:8000/analyze \
-H "Content-Type: application/json" \
-d '{
"repo_url": "https://github.com/facebook/react",
"output_format": "mermaid"
}'import requests
response = requests.post(
"http://localhost:8000/analyze",
json={
"repo_url": "https://github.com/facebook/react",
"output_format": "mermaid"
}
)
result = response.json()
diagram = result["diagram"]["content"]Analysis of the React codebase (6,953 files) with 50 strategically sampled files:
- 14 components identified (Scheduler, Reconciler, Renderers, Compilers, Plugins)
- 9 relationships mapped (data flow, dependencies, synchronous calls)
- 3 architectural layers (Application, Domain, Infrastructure)
flowchart LR
%%{init: {'theme':'base', 'themeVariables': {'fontSize':'16px'}}}%%
subgraph application["Application Layer"]
comp1["`Scheduler
[Engine]`"]
comp2["`React Renderer
[Renderer]`"]
comp3["`React Native Renderer
[Renderer]`"]
comp4["`ReactTestRenderer
[Testing]`"]
end
subgraph domain["Domain Layer"]
comp5["`React Reconciler
[Core]`"]
comp6["`SchedulerPriorities
[Core]`"]
comp7["`BadMapPolyfill
[Core]`"]
comp8["`SchedulerFeatureFlags
[Core]`"]
comp9["`Custom Components
[Core]`"]
end
subgraph infrastructure["Infrastructure Layer"]
comp10["`Babel Plugin for React Compiler
[Compiler]`"]
comp11["`ESLint Plugin for React Hooks
[Plugin]`"]
comp12(["`Scripts Module
[CLI]`"])
comp13["`Error Codes
[Support]`"]
comp14["`Packages Module
[Support]`"]
end
comp12 ==>|orchestrates| comp10
comp2 -->|calls| comp5
comp1 -->|uses| comp6
comp1 -->|reads| comp8
comp13 -->|uses| comp5
comp11 -->|Ensures that custom | comp9
comp5 -.->|depends| comp1
comp10 -.->|depends| comp2
comp3 -.->|depends| comp2
classDef presentation fill:#f5f5f5,stroke:#333,stroke-width:2px,color:#000
classDef application fill:#e0e0e0,stroke:#333,stroke-width:2px,color:#000
classDef domain fill:#bdbdbd,stroke:#333,stroke-width:3px,color:#000
classDef infrastructure fill:#9e9e9e,stroke:#333,stroke-width:2px,color:#000
classDef external fill:#757575,stroke:#333,stroke-width:2px,color:#fff
classDef database fill:#fff9c4,stroke:#f57f17,stroke-width:3px,color:#000
class comp5 domain
class comp1 application
class comp2 application
class comp10 infrastructure
class comp11 infrastructure
class comp3 application
class comp6 domain
class comp12 infrastructure
class comp4 application
class comp7 domain
class comp8 domain
class comp9 domain
class comp13 infrastructure
class comp14 infrastructure
Analysis of Polymarket agents repository (37 files):
- 8 components (Agents, Connectors, Utils, GammaMarketClient, Documentation, Test Suite, Scripts, API Integration)
- 6 relationships (synchronous calls, dependencies, external integrations)
- 4 layers (Presentation, Application, Infrastructure, External Services)
flowchart LR
%%{init: {'theme':'base', 'themeVariables': {'fontSize':'16px'}}}%%
subgraph presentation["Presentation Layer"]
comp1["`Documentation
[Service]`"]
end
subgraph application["Application Layer"]
comp2["`Agents
[Core]`"]
comp3["`Connectors
[Core]`"]
comp4["`Utils
[Module]`"]
comp5["`GammaMarketClient
[Core]`"]
end
subgraph infrastructure["Infrastructure Layer"]
comp6["`Test Suite
[Service]`"]
comp7["`Scripts
[Service]`"]
end
subgraph external["External Services"]
comp8("`Polymarket API Integration
[External]`")
end
comp2 -->|calls| comp3
comp2 -->|Agents interact with| comp8
comp3 -->|calls| comp5
comp6 -->|Test Suite validates| comp2
comp2 -.->|depends| comp4
classDef presentation fill:#f5f5f5,stroke:#333,stroke-width:2px,color:#000
classDef application fill:#e0e0e0,stroke:#333,stroke-width:2px,color:#000
classDef domain fill:#bdbdbd,stroke:#333,stroke-width:3px,color:#000
classDef infrastructure fill:#9e9e9e,stroke:#333,stroke-width:2px,color:#000
classDef external fill:#757575,stroke:#333,stroke-width:2px,color:#fff
classDef database fill:#fff9c4,stroke:#f57f17,stroke-width:3px,color:#000
class comp2 application
class comp3 application
class comp6 infrastructure
class comp4 application
class comp8 external
class comp1 presentation
class comp7 infrastructure
class comp5 application
┌─────────────┐
│ FastAPI │ REST API server
│ Backend │ Request validation
└──────┬──────┘
│
┌──────▼──────────────────────────────────┐
│ Analysis Orchestrator │
│ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │ AST │ │ Deps │ │ Call │ │
│ │Analyzer│ │Analyzer│ │ Graph │ ... │
│ └────────┘ └────────┘ └────────┘ │
└──────┬──────────────────────────────────┘
│ Synthesized Context
┌──────▼──────┐ ┌─────────────┐
│LLM Analyzer │────▶│ OpenAI/ │
│ │ │ OpenRouter │
└──────┬──────┘ └─────────────┘
│
┌──────▼──────────┐
│ Formatters │
│ ┌──────────┐ │
│ │ Mermaid │ │
│ └──────────┘ │
└─────────────────┘
Parses Python source code using the ast standard library module. Extracts structural information including class definitions, function signatures, import statements, and inheritance hierarchies. Builds a comprehensive map of method invocations across the codebase.
Constructs module dependency graphs using NetworkX. Performs cycle detection to identify circular dependencies that may indicate architectural issues. Calculates centrality metrics to identify core modules. Applies topological sorting to infer architectural layers.
Maps function and method call relationships throughout the codebase. Identifies entry points (functions with no callers that initiate execution flow). Detects hotspots (frequently called functions that may be performance bottlenecks). Traces call chains to understand execution paths. Identifies unreachable code.
Calculates software quality metrics using the Radon library. Computes cyclomatic complexity (McCabe metric) to assess code complexity. Calculates maintainability index based on Halstead metrics and lines of code. Generates detailed reports on code quality distribution across the codebase.
Examines package structure and organization. Analyzes directory hierarchies to understand module relationships. Identifies feature modules based on naming conventions and structure. Calculates cohesion metrics to assess module organization quality. Determines relative module sizes and importance.
The orchestrator synthesizes results from all static analyzers and provides enriched context to the LLM:
{
"code_structure": {
"total_classes": 150,
"total_functions": 800,
"inheritance_relationships": 45,
"method_call_count": 1200
},
"dependencies": {
"module_count": 50,
"dependency_count": 180,
"has_cycles": false,
"central_modules": ["core", "scheduler", "renderer"],
"layers_detected": 4
},
"call_patterns": {
"entry_points": 12,
"hotspots": ["reconcile", "schedule", "render"],
"longest_chain_length": 8
},
"quality_metrics": {
"files_analyzed": 45,
"average_maintainability": 72.3,
"high_complexity_files": 3
},
"module_organization": {
"total_packages": 18,
"features_identified": 6
}
}This comprehensive context enables the LLM to generate accurate, detailed architectural descriptions that go far beyond what would be possible from code samples alone.
Analyze a repository and generate architecture diagram.
Request:
{
"repo_url": "https://github.com/user/repo",
"output_format": "mermaid",
"branch": "main"
}Response:
{
"success": true,
"repository": {
"url": "https://github.com/user/repo",
"branch": "main",
"total_files": 1000,
"analyzed_files": 30,
"languages": {"py": 500, "js": 300}
},
"analysis": {
"architecture_summary": "...",
"components": [...],
"relationships": [...]
},
"diagram": {
"format": "mermaid",
"content": "flowchart LR..."
}
}List supported output formats.
Response:
{
"formats": [
{
"name": "mermaid",
"description": "Mermaid diagram format (works in GitHub, GitLab, Notion)",
"status": "active"
},
{
"name": "lucid",
"description": "Lucid Standard Import JSON (for Lucidchart)",
"status": "coming_soon"
},
{
"name": "plantuml",
"description": "PlantUML format (enterprise standard)",
"status": "coming_soon"
}
]
}Health check endpoint.
Response:
{
"status": "healthy"
}Create a .env file in the backend directory:
# LLM Provider Configuration
LLM_PROVIDER=openai # or "openrouter"
# OpenAI Configuration
OPENAI_API_KEY=sk-your-openai-api-key-here
# OpenRouter Configuration (if using OpenRouter)
OPENROUTER_API_KEY=sk-or-your-openrouter-api-key-here
# Model Selection
DEFAULT_MODEL=gpt-4o
MAX_TOKENS=4000
# Application Configuration
APP_NAME=archmap
APP_VERSION=1.0.0See .env.example for a complete template.
archmap/
├── backend/
│ ├── app/
│ │ ├── analyzers/ # Analysis engines
│ │ │ ├── base_analyzer.py
│ │ │ ├── ast_analyzer.py
│ │ │ ├── dependency_analyzer.py
│ │ │ ├── callgraph_analyzer.py
│ │ │ ├── metrics_analyzer.py
│ │ │ ├── module_analyzer.py
│ │ │ ├── llm_analyzer.py
│ │ │ └── analysis_orchestrator.py
│ │ ├── formatters/ # Output formatters
│ │ │ ├── base_formatter.py
│ │ │ ├── mermaid_formatter.py
│ │ │ ├── plantuml_formatter.py
│ │ │ └── lucid_formatter.py
│ │ ├── scanners/ # Repository scanners
│ │ ├── models/ # Pydantic schemas
│ │ ├── core/ # Configuration
│ │ └── main.py # FastAPI application
│ ├── requirements.txt
│ └── .env.example
└── tests/
├── examples/ # Example diagrams
│ ├── react.mmd
│ └── polymarket_agents.mmd
└── test_enhanced_system.py
# Start the backend server
cd backend
source venv/bin/activate
uvicorn app.main:app --host 0.0.0.0 --port 8000
# In another terminal, run the test suite
python tests/test_enhanced_system.py- Create a new analyzer class inheriting from
BaseAnalyzer:
from .base_analyzer import BaseAnalyzer
from pathlib import Path
from typing import Any
class CustomAnalyzer(BaseAnalyzer):
async def analyze(self, repo_path: Path, code_samples: dict[str, str]) -> dict[str, Any]:
# Implement your analysis logic
results = {}
# ... analysis code ...
return results
def get_analysis_type(self) -> str:
return "custom_analysis"- Register the analyzer in
AnalysisOrchestrator:
self.analyzers = {
"ast": ASTAnalyzer(),
"dependencies": DependencyAnalyzer(),
# ... existing analyzers ...
"custom": CustomAnalyzer(),
}- Create a formatter class inheriting from
BaseFormatter:
from .base_formatter import BaseFormatter, DetailLevel
class CustomFormatter(BaseFormatter):
def format(self, analysis: dict) -> str:
# Implement formatting logic
components = self._filter_components_by_detail(analysis["components"])
# ... formatting code ...
return formatted_output
def get_format_name(self) -> str:
return "custom"- Register in
FormatterFactory:
if output_format == OutputFormat.CUSTOM:
return CustomFormatter(detail_level=detail_level)- PlantUML formatter implementation for enterprise environments
- Lucid JSON formatter for Lucidchart integration
- Enhanced multi-language support (JavaScript, TypeScript, Go, Rust, Java)
- Diagram detail level controls (minimal, standard, detailed, comprehensive)
- Interactive diagram editor with real-time updates
- Diagram comparison tools for visualizing architectural changes
- CI/CD pipeline integration for automated documentation
- Support for monorepo analysis with multiple service detection
- VSCode extension for in-editor diagram generation
- Diagram template library for common architectural patterns
- Custom analyzer plugin system for domain-specific analysis
- Real-time collaborative diagram editing
We welcome contributions from the community. Areas where contributions would be particularly valuable:
- Multi-language support - Extending analyzers to support JavaScript, TypeScript, Go, Rust, and other languages
- New formatters - Implementing PlantUML, Lucid, or other diagram format generators
- Enhanced analysis - Adding new analyzer types for specific architectural patterns or quality metrics
- Documentation - Improving documentation, adding examples, creating tutorials
- Testing - Expanding test coverage, adding test cases for edge cases
To contribute:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Please ensure your code follows the existing style and includes appropriate tests.
MIT License - see LICENSE file for details
Created by Alex Nicita
This project builds upon excellent open source tools: