This repository contains an open-source implementation of a multi-modal small neural network that incorporates double-loop learning mechanisms and integrates with external APIs for computational knowledge enhancement. The system is designed to train on consumer-grade hardware while maintaining acceptable performance and accuracy.
- Multi-Modal Architecture: Supports vision (images) and text modalities with early fusion.
- Double-Loop Learning: Implements meta-learning for structural adaptation during training.
- API Integration Framework: Extensible framework for external knowledge sources (Wolfram Alpha, etc.).
- Consumer Hardware Optimized: Designed for single GPU systems (8-16GB VRAM, 16-32GB RAM).
- Parameter Efficient: Total parameters capped at 100-500 million.
- Full Type Safety: Complete type annotations with mypy compliance across all 23 source files. Zero type errors with strict static analysis ensuring runtime reliability and enhanced developer experience.
- Production Ready: Comprehensive configuration management and environment variable support.
- Hardware Acceleration: Automatic detection and support for NVIDIA GPUs (CUDA), AMD GPUs (ROCm), Apple Silicon (MPS), and NPUs (Intel AI Boost, AMD Ryzen AI, Apple Neural Engine).
- External Device Support: Detects and utilizes external GPUs (eGPU via Thunderbolt/USB-C) and external NPUs (Coral Edge TPU, Intel Movidius NCS, Hailo AI).
- Flexible Device Configuration: Auto-detection of optimal hardware or manual device selection with comprehensive fallback handling.
- Python 3.10+
- GPU Support (Optional):
- NVIDIA: CUDA 12.1+ with RTX 3060 (12GB) or better
- AMD: ROCm 5.7+ with RX 6700 XT (12GB) or better
- Apple: M1/M2/M3 with Metal Performance Shaders (MPS)
- NPU Support (Optional):
- Intel AI Boost (Meteor Lake/Lunar Lake)
- AMD Ryzen AI (7040/8040 series)
- Apple Neural Engine (M1/M2/M3)
- Qualcomm Hexagon NPU (Snapdragon X)
- CPU: Works on CPU-only systems (slower training)
-
Clone the repository:
git clone https://github.com/tim-dickey/multi-modal-neural-network.git cd multi-modal-neural-network -
Create and activate virtual environment:
python -m venv venv # Windows venv\Scripts\activate # Linux/Mac source venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
(Optional) Set up development tools:
# Install pre-commit hooks for code quality pip install pre-commit pre-commit install # Verify installation make test # Run tests make lint # Check code quality
-
(Optional) Install with Poetry:
poetry install
-
Check your hardware (detects internal and external devices):
# Check GPU availability (including eGPU via Thunderbolt/USB-C) from src.utils.gpu_utils import detect_gpu_info, print_gpu_info info = detect_gpu_info() print_gpu_info(info) # Shows: GPU count, memory, external GPU detection, connection type if info['external_gpu_count'] > 0: print(f"External GPUs detected: {info['external_gpu_count']}") # Check NPU availability (including external NPUs like Coral Edge TPU) from src.utils.npu_utils import detect_npu_info, print_npu_info npu_info = detect_npu_info() print_npu_info(npu_info) # Shows: NPU type, backend, internal/external status
-
Configure your environment:
cp configs/default.yaml configs/my_config.yaml # Edit my_config.yaml with your settings -
Set up environment variables (see Environment Setup)
-
Run the getting started notebook:
jupyter notebook notebooks/01_getting_started.ipynb
-
Train the model:
from src.training.trainer import Trainer trainer = Trainer(config_path="configs/my_config.yaml") trainer.train()
For comprehensive documentation, see the User Guide which covers:
| Section | Description |
|---|---|
| Installation Guide | Step-by-step setup with verification commands |
| Configuration Guide | Hardware, model, training, and data configuration options |
| Hardware Detection | Automatic GPU/NPU detection with example outputs |
| Training Workflow | CLI, Python API, and Jupyter notebook training methods |
| Inference Guide | Single and batch inference with code examples |
| Development Tools | Make commands, testing, linting, and type checking |
| Troubleshooting | Common issues (CUDA, memory, imports) with solutions |
| Quick Reference | Essential commands and Python API cheat sheet |
The User Guide includes Mermaid diagrams for system architecture, training pipelines, and troubleshooting flowcharts.
| Document | Description |
|---|---|
| Software Development Best Practices | Coding standards, testing guidelines, and quality assurance practices |
| PRD Assessment | Product Requirements Document with project scope and specifications |
| Product Development Requirements | Original product development requirements and technical specifications |
Create a .env file in the project root with your API keys:
# Copy the example file
cp .env.example .env
# Edit .env with your actual keys
# WOLFRAM_API_KEY=your_wolfram_alpha_api_key_here
# OPENAI_API_KEY=your_openai_api_key_here # For future integrationsImportant: Never commit .env files to version control. They are automatically ignored by .gitignore.
multi-modal-neural-network/
├── README.md
├── LICENSE
├── requirements.txt
├── pyproject.toml
├── .env.example # Environment variable template
├── .gitignore # Comprehensive ignore patterns
├── configs/
│ └── default.yaml # Default configuration
├── src/
│ ├── models/ # Core model components (fully typed)
│ │ ├── multi_modal_model.py
│ │ ├── vision_encoder.py
│ │ ├── text_encoder.py
│ │ ├── fusion_layer.py
│ │ ├── double_loop_controller.py
│ │ └── heads.py
│ ├── training/ # Training infrastructure
│ │ ├── trainer.py
│ │ ├── optimizer.py
│ │ ├── losses.py
│ │ └── checkpointing.py
│ ├── data/ # Data processing pipeline
│ │ ├── dataset.py
│ │ ├── preprocessing.py
│ │ ├── augmentation.py
│ │ └── streaming.py
│ ├── integrations/ # API integration framework
│ │ ├── __init__.py
│ │ ├── wolfram_alpha.py # Wolfram Alpha integration
│ │ ├── validators.py # Response validation
│ │ └── knowledge_injection.py # Knowledge injection logic
│ ├── evaluation/ # Evaluation and benchmarking
│ │ ├── metrics.py
│ │ ├── benchmarks.py
│ │ └── api_comparison.py # API-based evaluation
│ └── utils/ # Utilities and helpers
│ ├── config.py # Configuration management
│ ├── logging.py # Logging utilities
│ ├── profiling.py # Performance profiling
│ ├── gpu_utils.py # GPU detection and configuration
│ └── npu_utils.py # NPU detection and configuration
├── notebooks/
│ ├── 01_getting_started.ipynb # Setup and basic usage
│ ├── 02_training.ipynb # Training workflows
│ └── 03_evaluation.ipynb # Evaluation and analysis
├── tests/ # Unit and integration tests
├── docs/ # Documentation
│ ├── GPU_TRAINING.md # GPU configuration guide
│ └── NPU_TRAINING.md # NPU configuration guide
└── examples/ # Usage examples
graph TB
subgraph Input["Input Layer"]
IMG[🖼️ Image Input]
TXT[📝 Text Input]
end
subgraph Encoders["Encoders"]
VE[Vision Encoder<br/>ViT]
TE[Text Encoder<br/>BERT]
end
subgraph Fusion["Multi-Modal Fusion"]
FL[Fusion Layer<br/>Cross-Attention]
DLC[Double-Loop<br/>Controller]
end
subgraph Heads["Task Heads"]
CLS[Classification<br/>Head]
GEN[Generation<br/>Head]
RET[Retrieval<br/>Head]
end
IMG --> VE
TXT --> TE
VE --> FL
TE --> FL
FL <--> DLC
FL --> CLS
FL --> GEN
FL --> RET
subgraph External["External Knowledge"]
WA[🔗 Wolfram Alpha<br/>API]
end
DLC <-.-> WA
flowchart LR
subgraph Data["Data Pipeline"]
DS[(Dataset)]
DL[DataLoader]
AUG[Augmentation]
end
subgraph Training["Training Loop"]
FWD[Forward Pass]
LOSS[Loss Calculation]
BWD[Backward Pass]
OPT[Optimizer Step]
end
subgraph Monitoring["Monitoring"]
CKPT[Checkpointing]
LOG[Logging]
EVAL[Validation]
end
DS --> DL --> AUG --> FWD
FWD --> LOSS --> BWD --> OPT
OPT --> FWD
OPT --> CKPT
OPT --> LOG
OPT -.-> EVAL
flowchart TD
subgraph Local["Local Development"]
CODE[Write Code]
PRE[Pre-commit Hooks<br/>ruff, bandit, pytest]
TEST[Run Tests<br/>make test]
end
subgraph CI["CI/CD Pipeline"]
PUSH[Push to GitHub]
GHA[GitHub Actions]
PY311[Python 3.11]
PY312[Python 3.12]
PY313[Python 3.13]
COV[Coverage Report]
end
subgraph Review["Code Review"]
PR[Pull Request]
REV[Review]
MERGE[Merge to Main]
end
CODE --> PRE --> TEST --> PUSH
PUSH --> GHA
GHA --> PY311 & PY312 & PY313
PY311 & PY312 & PY313 --> COV
COV --> PR --> REV --> MERGE
The project includes a flexible API integration framework designed for external knowledge sources:
- Wolfram Alpha: Symbolic computation and mathematical verification
- API key required:
WOLFRAM_API_KEY - Used for ground truth validation and computational knowledge injection
- API key required:
The framework is designed to easily accommodate additional APIs:
- OpenAI GPT: Text generation and reasoning augmentation
- Google PaLM: Multimodal understanding enhancement
- Hugging Face Inference: Specialized model access
- Custom APIs: Domain-specific knowledge sources
-
Create a new module in
src/integrations/:# src/integrations/new_api.py from src.integrations.base import APIIntegration class NewAPIIntegration(APIIntegration): def __init__(self, api_key: str, config: dict): super().__init__(api_key, config) def query(self, prompt: str) -> dict: # Implementation here pass
-
Add configuration to
configs/default.yaml:new_api: api_key: "${NEW_API_KEY}" endpoint: "https://api.example.com" timeout: 30
-
Update environment variables in
.env.example
The model is configured via YAML files in the configs/ directory. Key parameters include:
- Model architecture (layer counts, dimensions, heads)
- Training hyperparameters (learning rates, batch sizes)
- Double-loop controller settings
- API integration configurations
- Hardware optimization settings
See configs/default.yaml for a complete example.
The system automatically detects and configures available hardware accelerators. Configure in configs/default.yaml:
hardware:
device: "auto" # Auto-detect best device
# OR specify manually:
# device: "cuda" # NVIDIA GPU
# device: "mps" # Apple Silicon
# device: "npu" # Neural Processing Unit
# device: "cpu" # CPU fallback
gpu_id: null # Specify GPU index for multi-GPU systems (e.g., 0, 1)
prefer_npu: false # Prefer NPU over GPU when both availableDevice Options:
"auto": Automatically selects the best available device (GPU > NPU > CPU)"cuda"or"cuda:0": NVIDIA GPU (specify index for multi-GPU)"mps": Apple Silicon Neural Engine"npu": Generic NPU (Intel AI Boost, AMD Ryzen AI, etc.)"openvino": Intel AI Boost via OpenVINO"ryzenai": AMD Ryzen AI"cpu": CPU-only mode
Hardware Detection (includes external devices):
from src.utils.gpu_utils import detect_gpu_info
from src.utils.npu_utils import check_accelerator_availability, get_best_available_device
# Detect all GPUs (internal + external eGPU)
gpu_info = detect_gpu_info()
print(f"Total GPUs: {gpu_info['device_count']}")
print(f"External GPUs: {gpu_info['external_gpu_count']}")
# Check what accelerators are available
availability = check_accelerator_availability()
print(f"CUDA (NVIDIA GPU): {availability['cuda']}")
print(f"MPS (Apple Silicon): {availability['mps']}")
print(f"NPU (Internal/External): {availability['npu']}")
# Get recommended device
device = get_best_available_device(prefer_npu=False)
print(f"Recommended: {device}")External Device Support:
- External GPUs (eGPU): Automatically detected via Thunderbolt 3/4, USB-C, or external PCIe
- Shows connection type and performance characteristics
- Works with all major eGPU enclosures (Razer Core, Sonnet, Akitio, etc.)
- External NPUs: Detects USB/PCIe AI accelerators
- Google Coral Edge TPU (USB/M.2/PCIe)
- Intel Movidius Neural Compute Stick 2
- Hailo-8 AI Accelerator (PCIe)
For detailed hardware setup guides:
- GPU Training: See docs/GPU_TRAINING.md - includes eGPU setup
- NPU Training: See docs/NPU_TRAINING.md - includes external NPU setup
You can assemble training/validation/test sets from multiple datasets declaratively in configs/default.yaml using the data.datasets list. Example:
data:
batch_size: 32
num_workers: 4
pin_memory: true
datasets:
- name: multimodal_core
type: multimodal
data_dir: ./data/multimodal
splits: {train: 0.8, val: 0.1, test: 0.1}
enabled: true
- name: captions_aux
type: coco_captions
root: ./data/coco/images
ann_file: ./data/coco/annotations/captions_train2017.json
splits: {train: 1.0}
use_in: [train]
enabled: trueKey fields:
type: One ofmultimodal,coco_captions,imagenet(mapped to internal dataset classes).splits: Mapping of split name to ratio; must sum to 1.0. Omit for implicit{train: 1.0}.use_in: Optional restriction of which splits this dataset contributes to.enabled: Toggle inclusion without deleting entry.
Disable a dataset:
- name: captions_aux
type: coco_captions
# ...
enabled: falseProgrammatic usage inside notebooks or scripts:
from src.utils.config import load_config
from src.data import build_dataloaders
config = load_config("configs/default.yaml")
train_loader, val_loader, test_loader = build_dataloaders(config)
print(len(train_loader), len(val_loader or []), len(test_loader or []))If data.datasets is present, the Trainer automatically uses the selector; otherwise it falls back to legacy single-dataset keys (train_dataset, val_dataset).
Minimum (GPU Training):
- GPU: NVIDIA RTX 3060 12GB or AMD RX 6700 XT 12GB
- CPU: 6-core / 12-thread
- RAM: 16GB
Recommended (GPU Training):
- GPU: NVIDIA RTX 4070 12GB or RTX 3080 16GB
- CPU: 8-core / 16-thread
- RAM: 32GB
CPU-Only Training:
- CPU: 8-core / 16-thread or better
- RAM: 32GB+
- Note: Training will be significantly slower (10-50x)
NPU Inference (After Training):
- NPU: Intel AI Boost, AMD Ryzen AI, Apple Neural Engine, or Qualcomm Hexagon
- RAM: 16GB+
- Note: NPUs are optimized for inference, not training. Train on GPU/CPU, then export to ONNX for NPU deployment.
External Device Training/Inference:
- eGPU: Any desktop GPU in Thunderbolt 3/4 or USB-C enclosure
- Thunderbolt bandwidth: 40 Gbps (expect 10-25% slower than internal)
- Supports both training and inference
- External NPU: Coral Edge TPU, Intel Movidius NCS2, Hailo-8
- USB 3.0/PCIe connection
- Inference only (export to ONNX/TFLite first)
- Ideal for prototyping edge deployments
NVIDIA GPUs (CUDA):
- RTX 40 Series: 4090, 4080, 4070 (Ada Lovelace)
- RTX 30 Series: 3090, 3080, 3070, 3060 (Ampere)
- RTX 20 Series: 2080 Ti, 2070 (Turing)
- GTX 16 Series: 1660 Ti (Turing)
- Data Center: A100, A40, V100, T4
AMD GPUs (ROCm):
- RX 7000 Series: 7900 XTX, 7900 XT (RDNA 3)
- RX 6000 Series: 6900 XT, 6800 XT, 6700 XT (RDNA 2)
- Instinct: MI250, MI100
Apple Silicon (MPS):
- M3 Max, M3 Pro, M3
- M2 Ultra, M2 Max, M2 Pro, M2
- M1 Ultra, M1 Max, M1 Pro, M1
Internal NPUs (Inference):
- Intel: AI Boost (Meteor Lake, Lunar Lake) - ~10 TOPS
- AMD: Ryzen AI (Phoenix, Hawk Point) - ~10-16 TOPS
- Apple: Neural Engine (M1/M2/M3) - up to 15.8 TOPS
- Qualcomm: Hexagon NPU (Snapdragon X Elite/Plus) - ~45 TOPS
External NPUs (Inference):
- Google Coral Edge TPU (USB/M.2/PCIe) - 4 TOPS, ~$25-75
- Intel Movidius Neural Compute Stick 2 (USB) - ~1 TOPS, ~$70-100
- Hailo-8 AI Accelerator (PCIe/M.2) - 26 TOPS, ~$200-300
External GPUs (eGPU Enclosures):
- Thunderbolt 3/4: Razer Core X, Sonnet eGFX, Akitio Node
- Compatible with any desktop GPU (NVIDIA/AMD)
- Expect 10-25% performance reduction vs internal GPU
python -m src.training.trainer --config configs/default.yamlExpected training time: 100-200 hours on minimum hardware.
Run benchmarks:
python -m src.evaluation.benchmarks --config configs/default.yamlCompare with API knowledge:
python -m src.evaluation.api_comparison --config configs/default.yamlThe codebase maintains complete type safety with comprehensive mypy integration. All 23 source files pass strict static type checking with zero type errors.
- Strict mypy Configuration: Python 3.10+ support with comprehensive type checking rules
- Complete Type Coverage: 100% type annotations across the entire codebase
- Type Stubs: Full type stub support for all major dependencies (PyTorch, Transformers, etc.)
- Protocol Usage: Proper typing protocols for interface definitions and polymorphism
- Generic Types: Extensive use of Union, Optional, Dict, and custom generic types
- Runtime Reliability: Prevents type-related runtime errors through static analysis
- Enhanced IDE Support: Full IntelliSense, autocomplete, and refactoring capabilities
- Documentation: Type annotations serve as inline documentation for function signatures
- Maintainability: Easier code maintenance and refactoring with type guarantees
- Developer Experience: Better error messages and debugging capabilities
# Check entire codebase
mypy src/ --show-error-codes
# Check specific file
mypy src/models/multi_modal_model.py --show-error-codes
# Use cache for faster subsequent runs
mypy src/ --cache-dir /tmp/mypy_cache --show-error-codesThe type checking is configured in pyproject.toml with strict settings including:
disallow_untyped_defs: All functions must have type annotationsdisallow_incomplete_defs: All parameters must be typedno_implicit_optional: Optional types must be explicitwarn_return_any: Any return types are flagged as warningsstrict_equality: Strict type equality checking
Run the test suite:
# Quick test run (using make)
make test
# Run all tests with coverage
make test-cov
# Run tests with pytest directly
pytest tests/
# Run with coverage report
pytest --cov=src --cov-report=term-missing
# Run integration tests
pytest tests/test_integration.py -vTest Coverage: The project maintains 93% test coverage (446 tests) across all modules.
We use automated code quality tools with pre-commit hooks:
# Install pre-commit hooks (one-time setup)
pip install pre-commit
pre-commit install
# Run all quality checks
make lint
# Format code (using make)
make format
# Manual formatting
black src/ tests/
isort src/ tests/
# Lint code
ruff check src/ tests/
flake8 src/ tests/
# Security scan
bandit -r src/The project uses GitHub Actions for continuous integration:
- Multi-version testing: Python 3.11, 3.12, 3.13
- Coverage reporting: Automatic coverage reports on PRs
- Dependency caching: Fast CI builds with pip caching
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch
- Make your changes with full type annotations
- Add tests for new functionality
- Ensure all tests pass and type checking succeeds
- Submit a pull request
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
If you use this code in your research, please cite:
@misc{multi-modal-neural-network,
title={Multi-Modal Neural Network with Double-Loop Learning},
author={Tim Dickey},
year={2025},
url={https://github.com/tim-dickey/multi-modal-neural-network}
}
- Built with PyTorch and Hugging Face Transformers
- Wolfram Alpha for symbolic computation
- Community contributors