Multi-Modal Neural Network with Double-Loop Learning

This repository contains an open-source implementation of a multi-modal small neural network that incorporates double-loop learning mechanisms and integrates with external APIs for computational knowledge enhancement. The system is designed to train on consumer-grade hardware while maintaining acceptable performance and accuracy.

Features

Multi-Modal Architecture: Supports vision (images) and text modalities with early fusion.
Double-Loop Learning: Implements meta-learning for structural adaptation during training.
API Integration Framework: Extensible framework for external knowledge sources (Wolfram Alpha, etc.).
Consumer Hardware Optimized: Designed for single GPU systems (8-16GB VRAM, 16-32GB RAM).
Parameter Efficient: Total parameters capped at 100-500 million.
Full Type Safety: Complete type annotations with mypy compliance across all 23 source files. Zero type errors with strict static analysis ensuring runtime reliability and enhanced developer experience.
Production Ready: Comprehensive configuration management and environment variable support.
Hardware Acceleration: Automatic detection and support for NVIDIA GPUs (CUDA), AMD GPUs (ROCm), Apple Silicon (MPS), and NPUs (Intel AI Boost, AMD Ryzen AI, Apple Neural Engine).
External Device Support: Detects and utilizes external GPUs (eGPU via Thunderbolt/USB-C) and external NPUs (Coral Edge TPU, Intel Movidius NCS, Hailo AI).
Flexible Device Configuration: Auto-detection of optimal hardware or manual device selection with comprehensive fallback handling.

Installation

Prerequisites

Python 3.10+
GPU Support (Optional):
- NVIDIA: CUDA 12.1+ with RTX 3060 (12GB) or better
- AMD: ROCm 5.7+ with RX 6700 XT (12GB) or better
- Apple: M1/M2/M3 with Metal Performance Shaders (MPS)
NPU Support (Optional):
- Intel AI Boost (Meteor Lake/Lunar Lake)
- AMD Ryzen AI (7040/8040 series)
- Apple Neural Engine (M1/M2/M3)
- Qualcomm Hexagon NPU (Snapdragon X)
CPU: Works on CPU-only systems (slower training)

Setup

Clone the repository:

git clone https://github.com/tim-dickey/multi-modal-neural-network.git
cd multi-modal-neural-network

Create and activate virtual environment:

python -m venv venv
# Windows
venv\Scripts\activate
# Linux/Mac
source venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```

(Optional) Set up development tools:

# Install pre-commit hooks for code quality
pip install pre-commit
pre-commit install

# Verify installation
make test  # Run tests
make lint  # Check code quality

(Optional) Install with Poetry:
```
poetry install
```

Quick Start

Check your hardware (detects internal and external devices):

# Check GPU availability (including eGPU via Thunderbolt/USB-C)
from src.utils.gpu_utils import detect_gpu_info, print_gpu_info
info = detect_gpu_info()
print_gpu_info(info)

# Shows: GPU count, memory, external GPU detection, connection type
if info['external_gpu_count'] > 0:
    print(f"External GPUs detected: {info['external_gpu_count']}")

# Check NPU availability (including external NPUs like Coral Edge TPU)
from src.utils.npu_utils import detect_npu_info, print_npu_info
npu_info = detect_npu_info()
print_npu_info(npu_info)

# Shows: NPU type, backend, internal/external status

Configure your environment:

cp configs/default.yaml configs/my_config.yaml
# Edit my_config.yaml with your settings

Set up environment variables (see Environment Setup)

Run the getting started notebook:

jupyter notebook notebooks/01_getting_started.ipynb

Train the model:

from src.training.trainer import Trainer
trainer = Trainer(config_path="configs/my_config.yaml")
trainer.train()

📖 User Guide

For comprehensive documentation, see the User Guide which covers:

Section	Description
Installation Guide	Step-by-step setup with verification commands
Configuration Guide	Hardware, model, training, and data configuration options
Hardware Detection	Automatic GPU/NPU detection with example outputs
Training Workflow	CLI, Python API, and Jupyter notebook training methods
Inference Guide	Single and batch inference with code examples
Development Tools	Make commands, testing, linting, and type checking
Troubleshooting	Common issues (CUDA, memory, imports) with solutions
Quick Reference	Essential commands and Python API cheat sheet

The User Guide includes Mermaid diagrams for system architecture, training pipelines, and troubleshooting flowcharts.

📚 Additional Documentation

Document	Description
Software Development Best Practices	Coding standards, testing guidelines, and quality assurance practices
PRD Assessment	Product Requirements Document with project scope and specifications
Product Development Requirements	Original product development requirements and technical specifications

Environment Setup

Create a .env file in the project root with your API keys:

# Copy the example file
cp .env.example .env

# Edit .env with your actual keys
# WOLFRAM_API_KEY=your_wolfram_alpha_api_key_here
# OPENAI_API_KEY=your_openai_api_key_here  # For future integrations

Important: Never commit .env files to version control. They are automatically ignored by .gitignore.

Project Structure

multi-modal-neural-network/
├── README.md
├── LICENSE
├── requirements.txt
├── pyproject.toml
├── .env.example                    # Environment variable template
├── .gitignore                      # Comprehensive ignore patterns
├── configs/
│   └── default.yaml               # Default configuration
├── src/
│   ├── models/                    # Core model components (fully typed)
│   │   ├── multi_modal_model.py
│   │   ├── vision_encoder.py
│   │   ├── text_encoder.py
│   │   ├── fusion_layer.py
│   │   ├── double_loop_controller.py
│   │   └── heads.py
│   ├── training/                  # Training infrastructure
│   │   ├── trainer.py
│   │   ├── optimizer.py
│   │   ├── losses.py
│   │   └── checkpointing.py
│   ├── data/                      # Data processing pipeline
│   │   ├── dataset.py
│   │   ├── preprocessing.py
│   │   ├── augmentation.py
│   │   └── streaming.py
│   ├── integrations/              # API integration framework
│   │   ├── __init__.py
│   │   ├── wolfram_alpha.py       # Wolfram Alpha integration
│   │   ├── validators.py          # Response validation
│   │   └── knowledge_injection.py # Knowledge injection logic
│   ├── evaluation/                # Evaluation and benchmarking
│   │   ├── metrics.py
│   │   ├── benchmarks.py
│   │   └── api_comparison.py      # API-based evaluation
│   └── utils/                     # Utilities and helpers
│       ├── config.py              # Configuration management
│       ├── logging.py             # Logging utilities
│       ├── profiling.py           # Performance profiling
│       ├── gpu_utils.py           # GPU detection and configuration
│       └── npu_utils.py           # NPU detection and configuration
├── notebooks/
│   ├── 01_getting_started.ipynb   # Setup and basic usage
│   ├── 02_training.ipynb          # Training workflows
│   └── 03_evaluation.ipynb        # Evaluation and analysis
├── tests/                         # Unit and integration tests
├── docs/                          # Documentation
│   ├── GPU_TRAINING.md            # GPU configuration guide
│   └── NPU_TRAINING.md            # NPU configuration guide
└── examples/                      # Usage examples

Architecture Overview

Model Architecture

graph TB
    subgraph Input["Input Layer"]
        IMG[🖼️ Image Input]
        TXT[📝 Text Input]
    end
    
    subgraph Encoders["Encoders"]
        VE[Vision Encoder<br/>ViT]
        TE[Text Encoder<br/>BERT]
    end
    
    subgraph Fusion["Multi-Modal Fusion"]
        FL[Fusion Layer<br/>Cross-Attention]
        DLC[Double-Loop<br/>Controller]
    end
    
    subgraph Heads["Task Heads"]
        CLS[Classification<br/>Head]
        GEN[Generation<br/>Head]
        RET[Retrieval<br/>Head]
    end
    
    IMG --> VE
    TXT --> TE
    VE --> FL
    TE --> FL
    FL <--> DLC
    FL --> CLS
    FL --> GEN
    FL --> RET
    
    subgraph External["External Knowledge"]
        WA[🔗 Wolfram Alpha<br/>API]
    end
    
    DLC <-.-> WA

Training Pipeline

flowchart LR
    subgraph Data["Data Pipeline"]
        DS[(Dataset)]
        DL[DataLoader]
        AUG[Augmentation]
    end
    
    subgraph Training["Training Loop"]
        FWD[Forward Pass]
        LOSS[Loss Calculation]
        BWD[Backward Pass]
        OPT[Optimizer Step]
    end
    
    subgraph Monitoring["Monitoring"]
        CKPT[Checkpointing]
        LOG[Logging]
        EVAL[Validation]
    end
    
    DS --> DL --> AUG --> FWD
    FWD --> LOSS --> BWD --> OPT
    OPT --> FWD
    
    OPT --> CKPT
    OPT --> LOG
    OPT -.-> EVAL

Development Workflow

flowchart TD
    subgraph Local["Local Development"]
        CODE[Write Code]
        PRE[Pre-commit Hooks<br/>ruff, bandit, pytest]
        TEST[Run Tests<br/>make test]
    end
    
    subgraph CI["CI/CD Pipeline"]
        PUSH[Push to GitHub]
        GHA[GitHub Actions]
        PY311[Python 3.11]
        PY312[Python 3.12]
        PY313[Python 3.13]
        COV[Coverage Report]
    end
    
    subgraph Review["Code Review"]
        PR[Pull Request]
        REV[Review]
        MERGE[Merge to Main]
    end
    
    CODE --> PRE --> TEST --> PUSH
    PUSH --> GHA
    GHA --> PY311 & PY312 & PY313
    PY311 & PY312 & PY313 --> COV
    COV --> PR --> REV --> MERGE

API Integration Framework

The project includes a flexible API integration framework designed for external knowledge sources:

Current Integrations

Wolfram Alpha: Symbolic computation and mathematical verification
- API key required: WOLFRAM_API_KEY
- Used for ground truth validation and computational knowledge injection

Future Integrations

The framework is designed to easily accommodate additional APIs:

OpenAI GPT: Text generation and reasoning augmentation
Google PaLM: Multimodal understanding enhancement
Hugging Face Inference: Specialized model access
Custom APIs: Domain-specific knowledge sources

Adding New API Integrations

Create a new module in src/integrations/:

# src/integrations/new_api.py
from src.integrations.base import APIIntegration

class NewAPIIntegration(APIIntegration):
    def __init__(self, api_key: str, config: dict):
        super().__init__(api_key, config)

    def query(self, prompt: str) -> dict:
        # Implementation here
        pass

Add configuration to configs/default.yaml:

new_api:
  api_key: "${NEW_API_KEY}"
  endpoint: "https://api.example.com"
  timeout: 30

Update environment variables in .env.example

Configuration

The model is configured via YAML files in the configs/ directory. Key parameters include:

Model architecture (layer counts, dimensions, heads)
Training hyperparameters (learning rates, batch sizes)
Double-loop controller settings
API integration configurations
Hardware optimization settings

See configs/default.yaml for a complete example.

Hardware Configuration

The system automatically detects and configures available hardware accelerators. Configure in configs/default.yaml:

hardware:
  device: "auto"        # Auto-detect best device
  # OR specify manually:
  # device: "cuda"      # NVIDIA GPU
  # device: "mps"       # Apple Silicon
  # device: "npu"       # Neural Processing Unit
  # device: "cpu"       # CPU fallback
  
  gpu_id: null          # Specify GPU index for multi-GPU systems (e.g., 0, 1)
  prefer_npu: false     # Prefer NPU over GPU when both available

Device Options:

"auto": Automatically selects the best available device (GPU > NPU > CPU)
"cuda" or "cuda:0": NVIDIA GPU (specify index for multi-GPU)
"mps": Apple Silicon Neural Engine
"npu": Generic NPU (Intel AI Boost, AMD Ryzen AI, etc.)
"openvino": Intel AI Boost via OpenVINO
"ryzenai": AMD Ryzen AI
"cpu": CPU-only mode

Hardware Detection (includes external devices):

from src.utils.gpu_utils import detect_gpu_info
from src.utils.npu_utils import check_accelerator_availability, get_best_available_device

# Detect all GPUs (internal + external eGPU)
gpu_info = detect_gpu_info()
print(f"Total GPUs: {gpu_info['device_count']}")
print(f"External GPUs: {gpu_info['external_gpu_count']}")

# Check what accelerators are available
availability = check_accelerator_availability()
print(f"CUDA (NVIDIA GPU): {availability['cuda']}")
print(f"MPS (Apple Silicon): {availability['mps']}")
print(f"NPU (Internal/External): {availability['npu']}")

# Get recommended device
device = get_best_available_device(prefer_npu=False)
print(f"Recommended: {device}")

External Device Support:

External GPUs (eGPU): Automatically detected via Thunderbolt 3/4, USB-C, or external PCIe
- Shows connection type and performance characteristics
- Works with all major eGPU enclosures (Razer Core, Sonnet, Akitio, etc.)
External NPUs: Detects USB/PCIe AI accelerators
- Google Coral Edge TPU (USB/M.2/PCIe)
- Intel Movidius Neural Compute Stick 2
- Hailo-8 AI Accelerator (PCIe)

For detailed hardware setup guides:

GPU Training: See docs/GPU_TRAINING.md - includes eGPU setup
NPU Training: See docs/NPU_TRAINING.md - includes external NPU setup

Dataset Selection

You can assemble training/validation/test sets from multiple datasets declaratively in configs/default.yaml using the data.datasets list. Example:

data:
  batch_size: 32
  num_workers: 4
  pin_memory: true
  datasets:
    - name: multimodal_core
      type: multimodal
      data_dir: ./data/multimodal
      splits: {train: 0.8, val: 0.1, test: 0.1}
      enabled: true
    - name: captions_aux
      type: coco_captions
      root: ./data/coco/images
      ann_file: ./data/coco/annotations/captions_train2017.json
      splits: {train: 1.0}
      use_in: [train]
      enabled: true

Key fields:

type: One of multimodal, coco_captions, imagenet (mapped to internal dataset classes).
splits: Mapping of split name to ratio; must sum to 1.0. Omit for implicit {train: 1.0}.
use_in: Optional restriction of which splits this dataset contributes to.
enabled: Toggle inclusion without deleting entry.

Disable a dataset:

    - name: captions_aux
      type: coco_captions
      # ...
      enabled: false

Programmatic usage inside notebooks or scripts:

from src.utils.config import load_config
from src.data import build_dataloaders

config = load_config("configs/default.yaml")
train_loader, val_loader, test_loader = build_dataloaders(config)
print(len(train_loader), len(val_loader or []), len(test_loader or []))

If data.datasets is present, the Trainer automatically uses the selector; otherwise it falls back to legacy single-dataset keys (train_dataset, val_dataset).

Training

Hardware Requirements

Minimum (GPU Training):

GPU: NVIDIA RTX 3060 12GB or AMD RX 6700 XT 12GB
CPU: 6-core / 12-thread
RAM: 16GB

Recommended (GPU Training):

GPU: NVIDIA RTX 4070 12GB or RTX 3080 16GB
CPU: 8-core / 16-thread
RAM: 32GB

CPU-Only Training:

CPU: 8-core / 16-thread or better
RAM: 32GB+
Note: Training will be significantly slower (10-50x)

NPU Inference (After Training):

NPU: Intel AI Boost, AMD Ryzen AI, Apple Neural Engine, or Qualcomm Hexagon
RAM: 16GB+
Note: NPUs are optimized for inference, not training. Train on GPU/CPU, then export to ONNX for NPU deployment.

External Device Training/Inference:

eGPU: Any desktop GPU in Thunderbolt 3/4 or USB-C enclosure
- Thunderbolt bandwidth: 40 Gbps (expect 10-25% slower than internal)
- Supports both training and inference
External NPU: Coral Edge TPU, Intel Movidius NCS2, Hailo-8
- USB 3.0/PCIe connection
- Inference only (export to ONNX/TFLite first)
- Ideal for prototyping edge deployments

Supported Hardware

NVIDIA GPUs (CUDA):

RTX 40 Series: 4090, 4080, 4070 (Ada Lovelace)
RTX 30 Series: 3090, 3080, 3070, 3060 (Ampere)
RTX 20 Series: 2080 Ti, 2070 (Turing)
GTX 16 Series: 1660 Ti (Turing)
Data Center: A100, A40, V100, T4

AMD GPUs (ROCm):

RX 7000 Series: 7900 XTX, 7900 XT (RDNA 3)
RX 6000 Series: 6900 XT, 6800 XT, 6700 XT (RDNA 2)
Instinct: MI250, MI100

Apple Silicon (MPS):

M3 Max, M3 Pro, M3
M2 Ultra, M2 Max, M2 Pro, M2
M1 Ultra, M1 Max, M1 Pro, M1

Internal NPUs (Inference):

Intel: AI Boost (Meteor Lake, Lunar Lake) - ~10 TOPS
AMD: Ryzen AI (Phoenix, Hawk Point) - ~10-16 TOPS
Apple: Neural Engine (M1/M2/M3) - up to 15.8 TOPS
Qualcomm: Hexagon NPU (Snapdragon X Elite/Plus) - ~45 TOPS

External NPUs (Inference):

Google Coral Edge TPU (USB/M.2/PCIe) - 4 TOPS, ~$25-75
Intel Movidius Neural Compute Stick 2 (USB) - ~1 TOPS, ~$70-100
Hailo-8 AI Accelerator (PCIe/M.2) - 26 TOPS, ~$200-300

External GPUs (eGPU Enclosures):

Thunderbolt 3/4: Razer Core X, Sonnet eGFX, Akitio Node
Compatible with any desktop GPU (NVIDIA/AMD)
Expect 10-25% performance reduction vs internal GPU

Training Command

python -m src.training.trainer --config configs/default.yaml

Expected training time: 100-200 hours on minimum hardware.

Evaluation

Run benchmarks:

python -m src.evaluation.benchmarks --config configs/default.yaml

Compare with API knowledge:

python -m src.evaluation.api_comparison --config configs/default.yaml

Development

Type Safety

The codebase maintains complete type safety with comprehensive mypy integration. All 23 source files pass strict static type checking with zero type errors.

Type Checking Features

Strict mypy Configuration: Python 3.10+ support with comprehensive type checking rules
Complete Type Coverage: 100% type annotations across the entire codebase
Type Stubs: Full type stub support for all major dependencies (PyTorch, Transformers, etc.)
Protocol Usage: Proper typing protocols for interface definitions and polymorphism
Generic Types: Extensive use of Union, Optional, Dict, and custom generic types

Type Safety Benefits

Runtime Reliability: Prevents type-related runtime errors through static analysis
Enhanced IDE Support: Full IntelliSense, autocomplete, and refactoring capabilities
Documentation: Type annotations serve as inline documentation for function signatures
Maintainability: Easier code maintenance and refactoring with type guarantees
Developer Experience: Better error messages and debugging capabilities

Running Type Checks

# Check entire codebase
mypy src/ --show-error-codes

# Check specific file
mypy src/models/multi_modal_model.py --show-error-codes

# Use cache for faster subsequent runs
mypy src/ --cache-dir /tmp/mypy_cache --show-error-codes

Type Checking Configuration

The type checking is configured in pyproject.toml with strict settings including:

disallow_untyped_defs: All functions must have type annotations
disallow_incomplete_defs: All parameters must be typed
no_implicit_optional: Optional types must be explicit
warn_return_any: Any return types are flagged as warnings
strict_equality: Strict type equality checking

Testing

Run the test suite:

# Quick test run (using make)
make test

# Run all tests with coverage
make test-cov

# Run tests with pytest directly
pytest tests/

# Run with coverage report
pytest --cov=src --cov-report=term-missing

# Run integration tests
pytest tests/test_integration.py -v

Test Coverage: The project maintains 93% test coverage (446 tests) across all modules.

Code Quality

We use automated code quality tools with pre-commit hooks:

# Install pre-commit hooks (one-time setup)
pip install pre-commit
pre-commit install

# Run all quality checks
make lint

# Format code (using make)
make format

# Manual formatting
black src/ tests/
isort src/ tests/

# Lint code
ruff check src/ tests/
flake8 src/ tests/

# Security scan
bandit -r src/

CI/CD

The project uses GitHub Actions for continuous integration:

Multi-version testing: Python 3.11, 3.12, 3.13
Coverage reporting: Automatic coverage reports on PRs
Dependency caching: Fast CI builds with pip caching

Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

Fork the repository
Create a feature branch
Make your changes with full type annotations
Add tests for new functionality
Ensure all tests pass and type checking succeeds
Submit a pull request

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Citation

If you use this code in your research, please cite:

@misc{multi-modal-neural-network,
  title={Multi-Modal Neural Network with Double-Loop Learning},
  author={Tim Dickey},
  year={2025},
  url={https://github.com/tim-dickey/multi-modal-neural-network}
}

Acknowledgments

Built with PyTorch and Hugging Face Transformers
Wolfram Alpha for symbolic computation
Community contributors

Name		Name	Last commit message	Last commit date
Latest commit History 225 Commits
.codacy-cli-v2/+/1.0.0-main.362.sha.6b54110		.codacy-cli-v2/+/1.0.0-main.362.sha.6b54110
.codacy		.codacy
.github		.github
configs		configs
docs		docs
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.coveragerc		.coveragerc
.env.example		.env.example
.flake8		.flake8
.gh_pr_update_body.md		.gh_pr_update_body.md
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
LICENSE		LICENSE
Makefile		Makefile
Open-source multi-modal small neural network v1.md		Open-source multi-modal small neural network v1.md
README.md		README.md
Software development best practices.md		Software development best practices.md
TRAINING_GUIDE.md		TRAINING_GUIDE.md
inference.py		inference.py
pip_packages.txt		pip_packages.txt
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
remove_trailing_whitespace.ps1		remove_trailing_whitespace.ps1
requirements.txt		requirements.txt
run_tests.ps1		run_tests.ps1
run_tests.sh		run_tests.sh
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

Multi-Modal Neural Network with Double-Loop Learning

Features

Installation

Prerequisites

Setup

Quick Start

📖 User Guide

📚 Additional Documentation

Environment Setup

Project Structure

Architecture Overview

Model Architecture

Training Pipeline

Development Workflow

API Integration Framework

Current Integrations

Future Integrations

Adding New API Integrations

Configuration

Hardware Configuration

Dataset Selection

Training

Hardware Requirements

Supported Hardware

Training Command

Evaluation

Development

Type Safety

Type Checking Features

Type Safety Benefits

Running Type Checks

Type Checking Configuration

Testing

Code Quality

CI/CD

Contributing

Development Setup

License

Citation

Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages