Language: 🇺🇸 English | 🇯🇵 日本語
A modern C++17 machine learning library designed for neural network training and model persistence with comprehensive testing and performance monitoring.
- 🧠 Neural Networks: Sequential models with customizable architectures
- 📊 Layers: Dense (fully connected), ReLU, Sigmoid, Tanh activation functions
- 🎯 Training: MSE loss function with SGD optimizer
- 🎯 Multi-GPU Support: NVIDIA CUDA, AMD ROCm, Intel oneAPI, Apple Metal with automatic detection
- �💾 Model I/O: Save/load models in binary, JSON, and config formats
- 📁 Auto Directory: Automatic directory creation with
mkdir -pfunctionality - 🔧 Type Safety: Enum-based format specification for improved reliability
- ⚡ Performance: Optimized C++17 implementation with NDArray backend
- 🧪 Testing: Comprehensive unit (76/76) and integration tests (3429/3429 assertions)
- 🖥️ GPU Testing: 145 GPU-specific assertions with fallback validation
- 🔄 Cross-platform: Linux, macOS, Windows support
- 📊 Benchmarking: Real-time performance metrics and execution time tracking
- 🎯 CI/CD Ready: 100% test success rate for production deployments
- C++17 compatible compiler (GCC 7+, Clang 5+, MSVC 2017+)
- Make
git clone https://github.com/shadowlink0122/CppML.git
cd CppML
make # Build with automatic dependency management
make test # Run all tests (unit + integration)
# Alternative build options:
make setup-deps # Download dependencies only
make minimal # Build without JSON loading (no dependencies)
make json-support # Build with full JSON I/O support
make deps-check # Check dependency status
make unit-test # Run unit tests (76/76 passing)
make integration-test # Run integration tests (3429/3429 assertions)
make simple-integration-test # Run simple integration tests
# CI-optimized testing (for faster execution)
make build-tests # Build test executables only (for CI artifacts)
make unit-test-run-only # Run unit tests with pre-built executables
make integration-test-run-only # Run integration tests with pre-built executables
make samples # Build all sample programs
make xor # Run XOR neural network example
make device-detection # Run GPU device detection sample
make gpu-vendor-detection # Run GPU vendor detection sampleMLLib automatically manages dependencies across all platforms:
git clone https://github.com/shadowlink0122/CppML.git
cd CppML
make # Automatically downloads dependencies and builds
make test # Full test suite- Linux: Requires
curlorwget(usually pre-installed) - macOS: Uses built-in
curl(no additional setup) - Windows: Works with Git Bash, MSYS2, or WSL
# Download dependencies manually:
tools/setup-deps.sh # Cross-platform setup script
# Or minimal build without dependencies:
make minimal # JSON saving supported, loading disabledSee Dependencies Setup Guide for detailed instructions.
#include "MLLib.hpp"
using namespace MLLib;
// Create XOR neural network
model::Sequential model;
// Add layers
model.add(std::make_shared<layer::Dense>(2, 4));
model.add(std::make_shared<layer::activation::ReLU>());
model.add(std::make_shared<layer::Dense>(4, 1));
model.add(std::make_shared<layer::activation::Sigmoid>());
// Training data
std::vector<std::vector<double>> X = {{0,0}, {0,1}, {1,0}, {1,1}};
std::vector<std::vector<double>> Y = {{0}, {1}, {1}, {0}};
// Train model
loss::MSE loss;
optimizer::SGD optimizer(0.1);
model.train(X, Y, loss, optimizer, [](int epoch, double loss) {
printf("Epoch %d, Loss: %f\n", epoch, loss);
}, 1000);
// Make predictions (multiple syntax options supported)
auto pred1 = model.predict(std::vector<double>{1.0, 0.0}); // Vector syntax
auto pred2 = model.predict({1.0, 0.0}); // Initializer list syntax (C++17)
// Save model
model::ModelIO::save_model(model, "model.bin", model::ModelFormat::BINARY);MLLib supports unified model serialization through GenericModelIO:
using namespace MLLib;
using namespace MLLib::model;
// GenericModelIO unified interface (type-safe)
GenericModelIO::save_model(*model, "model.bin", SaveFormat::BINARY);
GenericModelIO::save_model(*model, "model.json", SaveFormat::JSON);
GenericModelIO::save_model(*model, "model.config", SaveFormat::CONFIG);
// Automatic directory creation support
GenericModelIO::save_model(*model, "models/trained/my_model.bin", SaveFormat::BINARY);// Type-safe template loading
auto sequential = GenericModelIO::load_model<Sequential>("model.bin", SaveFormat::BINARY);
auto autoencoder = GenericModelIO::load_model<autoencoder::DenseAutoencoder>("model.bin", SaveFormat::BINARY);
// Prediction accuracy is completely preserved (1e-10 level)
auto original_pred = model.predict({0.5, 0.5});
auto loaded_pred = sequential->predict({0.5, 0.5});
// original_pred ≈ loaded_pred (1e-10 precision)- Sequential: 8 activation function support (ReLU, Sigmoid, Tanh, LeakyReLU, ELU, Swish, GELU, Softmax)
- DenseAutoencoder: Encoder-decoder architecture
- Large-scale models: Tested up to 2048×2048 (4.2M parameters, 32MB)
- BINARY: Fast & compact (save 65ms, load 163ms)
- JSON: Human-readable & debug-friendly
- CONFIG: Architecture information only
// Automatically creates nested directories (like mkdir -p)
GenericModelIO::save_model(*model, "models/experiment_1/epoch_100.bin",
SaveFormat::BINARY);MLLib includes a comprehensive testing framework with performance monitoring:
- Unit Tests: 76/76 tests passing across all components
- Integration Tests: End-to-end workflows and complex scenarios
- Performance Tests: Execution time monitoring and benchmarking
- Error Handling: Comprehensive error condition testing
# Unit tests with execution time monitoring
make unit-test
# Sample output:
# Running test: ConfigConstantsTest
# ✅ ConfigConstantsTest PASSED (10 assertions, 0.03ms)
# Running test: NDArrayMatmulTest
# ✅ NDArrayMatmulTest PASSED (11 assertions, 0.02ms)
#
# Total test execution time: 0.45ms
# Total suite time (including overhead): 0.89ms- Unit Tests (76/76): Config, NDArray, Dense Layer, activation functions, Sequential Model, Model I/O
- Integration Tests (3429/3429 assertions): XOR problem learning and prediction accuracy validation
- Simple Integration Tests: Basic functionality verification
make integration-test # Comprehensive integration tests (3429/3429 assertions)
make simple-integration-test # Simple integration tests (basic functionality)
# Comprehensive Integration Test Coverage:
# 🎯 XOR Model Tests: Basic functionality + learning convergence (CI-safe)
# 🔧 Optimizer Integration: SGD + Adam fallback testing
# 📊 Loss Function Integration: MSE + CrossEntropy validation
# 💻 Backend Integration: CPU backend comprehensive testing (601 assertions)
# 🔗 Layer Integration: Dense layers + activation functions
# 🛠️ Utility Integration: Matrix, Random, Validation utilities (504 assertions)
# 📱 Device Integration: CPU device operations (2039 assertions)
# 📁 Data Integration: Loading, batching, validation (157 assertions)
# ⚡ Performance Testing: Stability + execution time monitoring
# Test Results Summary:
# ✅ 100% CI success rate (3429/3429 assertions passing)
# ✅ All tests deterministic and CI-ready
# ✅ Comprehensive component integration coverage# Run all tests
make test # Complete test suite (Unit + Integration)
make unit-test # Run unit tests (76/76 passing)
make integration-test # Comprehensive integration tests (3429/3429 assertions)
make simple-integration-test # Simple integration tests (basic functionality)
# Test output includes execution time monitoring:
# ✅ BasicXORModelTest PASSED (5 assertions, 0.17ms)
# ✅ AdamOptimizerIntegrationTest PASSED (10 assertions, 1.04ms)
# ✅ BackendPerformanceIntegrationTest PASSED (551 assertions, 43.54ms)
# 🎉 ALL INTEGRATION TESTS PASSED! (3429/3429 assertions, 100% success rate)# Format code
make fmt
# Check formatting
make fmt-check
# Run linting
make lint
# Static analysis
make check
# All quality checks
make lint-all# Build library
make
# Debug build
make debug
# Build and run samples
make samples
make run-sample SAMPLE=xor # Run specific sample using generic runner
make xor # XOR neural network (alias)
make device-detection # GPU device detection (alias)
make gpu-vendor-detection # GPU vendor detection (alias)
# Clean (removes training outputs)
make cleanmake install-toolsMLLib provides comprehensive multi-GPU vendor support with automatic detection and fallback:
- 🌍 Multi-Vendor: NVIDIA CUDA, AMD ROCm, Intel oneAPI, Apple Metal
- 🔄 Auto Detection: Runtime GPU detection and selection
- 💪 Default Support: All GPU vendors enabled by default (library design)
- �️ CPU Fallback: Seamless CPU execution when GPU unavailable
⚠️ Smart Warnings: Informative messages about GPU status- 🧪 Full Testing: 145 GPU assertions across unit and integration tests
| Vendor | API | Hardware Support |
|---|---|---|
| NVIDIA | CUDA, cuBLAS | GeForce, Quadro, Tesla, RTX |
| AMD | ROCm, HIP, hipBLAS | Radeon Instinct, Radeon Pro |
| Intel | oneAPI, SYCL, oneMKL | Arc, Iris Xe, UHD Graphics |
| Apple | Metal, MPS | M1, M1 Pro/Max/Ultra, M2 |
# Default build (all GPU vendors enabled)
make
# Disable specific GPU support
make DISABLE_CUDA=1 # Disable NVIDIA CUDA
make DISABLE_ROCM=1 # Disable AMD ROCm
make DISABLE_ONEAPI=1 # Disable Intel oneAPI
make DISABLE_METAL=1 # Disable Apple Metal
# CPU-only build
make DISABLE_CUDA=1 DISABLE_ROCM=1 DISABLE_ONEAPI=1 DISABLE_METAL=1
# GPU testing with environment control
FORCE_CPU_ONLY=1 make test # Force CPU-only testing
GPU_SIMULATION=1 make test # Enable GPU simulation mode#include "MLLib.hpp"
int main() {
MLLib::model::Sequential model;
// Set GPU device (automatic vendor detection)
model.set_device(MLLib::DeviceType::GPU);
// Library outputs: ✅ GPU device successfully configured
// Or warnings: ⚠️ WARNING: GPU device requested but no GPU found!
// Build neural network
model.add_layer(new MLLib::layer::Dense(784, 128));
model.add_layer(new MLLib::layer::activation::ReLU());
model.add_layer(new MLLib::layer::Dense(128, 10));
// Training automatically uses optimal GPU
model.train(train_X, train_Y, loss, optimizer);
return 0;
}// Check GPU availability
if (MLLib::Device::isGPUAvailable()) {
// Display detected GPUs
auto gpus = MLLib::Device::detectGPUs();
for (const auto& gpu : gpus) {
printf("GPU: %s (%s)\n", gpu.name.c_str(), gpu.api_support.c_str());
}
}Comprehensive documentation is available in the docs/ directory:
- 📖 Overview - Documentation index and quick start
- 🇺🇸 English Documentation - Complete guide in English
- 🇯🇵 日本語ドキュメント - 日本語での完全ガイド
- 🇺🇸 Model I/O Guide (English) - Complete model serialization guide
- 🇯🇵 モデル I/O ガイド (日本語) - 日本語でのモデル保存・読み込み解説
- 🇺🇸 GPU CI Setup Guide (English) - GPU testing environment configuration
- 🇯🇵 GPU CI 設定ガイド (日本語) - GPU テスト環境の設定方法
- 🇺🇸 Testing Guide (English) - Testing framework documentation
- 🇯🇵 テストガイド (日本語) - テストフレームワークの説明
- 🧠 Models: Sequential, model I/O
- 📊 Layers: Dense (fully connected), activation functions (ReLU, Sigmoid, Tanh)
- 🎯 Optimizers: SGD (full implementation), Adam (fallback implementation)
- 📉 Loss Functions: MSE, CrossEntropy
- ⚡ Backend: CPU backend (NDArray)
- 🛠️ Utilities: Matrix, Random, Validation, I/O utilities
#include "MLLib.hpp"
using namespace MLLib;
// Create simple neural network
model::Sequential model;
model.add(std::make_shared<layer::Dense>(128, 64));
model.add(std::make_shared<layer::activation::ReLU>());
model.add(std::make_shared<layer::Dense>(64, 10));
model.add(std::make_shared<layer::activation::Sigmoid>());
// Prepare training data
std::vector<std::vector<double>> X, Y;
// ... set data ...
// Train model
loss::MSE loss;
optimizer::SGD optimizer(0.01);
model.train(X, Y, loss, optimizer, nullptr, 100);
// Execute prediction
auto prediction = model.predict({/* input data */});MLLib/
├── NDArray # Multi-dimensional arrays (tensor operations)
├── Device # CPU/GPU device management
├── Layer/ # Neural network layers
│ ├── Dense # Fully connected layers
│ └── Activation/ # Activation functions (ReLU, Sigmoid)
├── Loss/ # Loss functions (MSE)
├── Optimizer/ # Optimization algorithms (SGD)
└── Model/ # Model definition and I/O
├── Sequential # Sequential model architecture
└── ModelIO # Model serialization (Binary/JSON/Config)
A: We use C++17. It leverages modern language features and appropriate compiler support.
A: Yes! You can use it like this:
auto result = model.predict({1.0, 2.0, 3.0});A: Yes, you can implement custom components by inheriting from the provided base classes.
A: Yes, we provide comprehensive multi-GPU support for NVIDIA CUDA, AMD ROCm, Intel oneAPI, and Apple Metal. The library defaults to supporting all GPU vendors and automatically falls back to CPU when GPU is unavailable.
A: Yes, we support efficient memory management and batch processing.
A: Currently we only support native formats, but this is under consideration for future features.
We welcome pull requests and issue reports! Before participating in development, please follow these guidelines:
- Testing: Please add appropriate tests for new features
- Documentation: Properly document API changes
- Code Style: Follow the existing code style
- CI: Ensure all CI tests pass successfully
# Clone repository
git clone <repository-url>
cd CppML
# Check dependencies
make check-deps
# Development build
make debug
# Run comprehensive tests
make integration-test- Bug Reports: Report bugs in the GitHub Issues tab
- Feature Requests: Submit ideas and requests for new features
- Discussion: Use the Discussions tab for implementation discussions and questions
The included XOR example demonstrates:
- Model architecture definition
- Training with callback functions
- Epoch-based model saving
- Multiple file format support
make xor # Run the XOR neural network exampleTest all model I/O formats and sample programs:
make samples # Build all sample programs
make run-sample # List available samples
make run-sample SAMPLE=xor # Run specific sample
make xor # Run XOR neural network example (alias)MLLib features a comprehensive CI/CD pipeline with 100% test success rate:
# Complete CI pipeline workflow:
Format Validation → Linting → Static Analysis
↓
Build Test → Unit Tests → Integration Tests
↓
CI Summary- Unit Tests: 76/76 passing (100%)
- Integration Tests: 3429/3429 assertions passing (100%)
- Simple Integration Tests: Basic functionality verification
- CI Requirements: 100% deterministic test success rate
make fmt-check # Code formatting validation
make lint # Clang-tidy linting
make check # Cppcheck static analysis
make lint-all # All quality checks combined- ✅ Deterministic Testing: All tests designed for CI reliability
- ✅ Comprehensive Coverage: End-to-end component integration testing
- ✅ Performance Monitoring: Execution time tracking for all tests
- ✅ Multi-platform Support: Ubuntu, macOS, Windows compatibility
- ✅ Production Ready: 100% test success rate for deployment confidence
CppML/
├── include/MLLib/ # Header files
│ ├── config.hpp # Library configuration
│ ├── ndarray.hpp # Tensor operations
│ ├── device/ # Device management
│ ├── layer/ # Neural network layers
│ ├── loss/ # Loss functions
│ ├── optimizer/ # Optimization algorithms
│ └── model/ # Model architecture & I/O
├── src/MLLib/ # Implementation files
├── sample/ # Example programs
├── docs/ # Documentation
└── README.md # This file
- Core Functionality: Neural network learning and inference
- GPU/CPU Backend: Computation backend supporting both GPU and CPU
- Layer Implementation: Currently Dense layers only (CNN, RNN layers planned for future releases)
- Optimizer Implementation: Currently SGD fully implemented (Adam etc. partially implemented)
- Comprehensive unit testing framework (76/76 tests)
- Integration testing with performance monitoring
- Execution time measurement and benchmarking
- Dense layers with activation functions (ReLU, Sigmoid, Tanh)
- Sequential model architecture
- MSE loss and SGD optimizer
- Model save/load (binary, config formats)
- GPU acceleration support
- JSON model loading implementation
- Advanced error handling and validation
- Additional layer types (Convolutional, LSTM, Dropout)
- More optimizers (Adam, RMSprop, AdaGrad)
- Advanced loss functions (CrossEntropy, Huber)
- Model quantization and optimization
- Python bindings
- Multi-threading support
This project is licensed under the BSD 3-Clause License with Commercial Use Restriction - see the LICENSE file for details.
- Inspired by modern ML frameworks
- Built with modern C++ best practices
- Designed with focus on performance and usability
- Configuration Management: System-wide settings and configuration
- Multi-dimensional Arrays: NDArray implementation for efficient numerical computation
- Device Management: CPU/GPU computation abstraction
- Preprocessing: Data normalization, standardization, transformation
- Batch Processing: Efficient mini-batch learning
- Data Loaders: Loading data from various formats
- Layers: Dense, convolution, pooling, dropout, etc.
- Activation Functions: ReLU, Sigmoid, Tanh, etc.
- Loss Functions: Mean squared error, cross-entropy, etc.
- Optimizers: SGD, Adam, and other optimization algorithms
- Sequential: Layer-by-layer stacked models
- Functional: Support for complex network structures
- Custom: Support for implementing custom models
#include "MLLib.hpp"
#include <vector>
int main() {
// Library initialization
MLLib::initialize();
// Data preparation
std::vector<float> input_data = {1.0f, 2.0f, 3.0f, 4.0f};
// Simple linear regression model
MLLib::Sequential model;
model.add(new MLLib::Dense(4, 1)); // 4-dimensional input, 1-dimensional output
// Model compilation
model.compile(
MLLib::SGD(0.01), // SGD with learning rate 0.01
MLLib::MSE() // Mean squared error
);
// Cleanup
MLLib::cleanup();
return 0;
}#include "MLLib.hpp"
int main() {
MLLib::initialize();
// Multi-layer neural network for classification
MLLib::Sequential model;
// Dense layer from input layer
model.add(new MLLib::Dense(784, 128)); // For MNIST (28x28=784)
model.add(new MLLib::ReLU()); // ReLU activation
// Hidden layer
model.add(new MLLib::Dense(128, 64));
model.add(new MLLib::ReLU());
model.add(new MLLib::Dropout(0.5)); // Dropout
// Output layer
model.add(new MLLib::Dense(64, 10)); // 10-class classification
model.add(new MLLib::Softmax()); // Softmax
// Compilation
model.compile(
MLLib::Adam(0.001), // Adam optimization
MLLib::CrossEntropy() // Cross-entropy loss
);
MLLib::cleanup();
return 0;
}- Efficient memory pool usage
- Reduction of unnecessary copies
- Adoption of RAII (Resource Acquisition Is Initialization) pattern
- SIMD instruction utilization
- Parallel processing support
- Cache-friendly data structures
Q: Compilation errors occur
# Check if necessary tools are installed
make install-tools
# Check compiler version
g++ --version
clang++ --versionQ: Formatting errors occur
# Run automatic formatting
make fmt
# Check formatting
make fmt-checkQ: Library not found
# Build library
make clean
make all
# Check include path
# -I./include option required- K&R style brace placement
- 2-space indentation
- 80-character line length limit
- Const correctness enforcement
- Unit test creation
- Integration test implementation
- Performance test addition
- Doxygen comment writing
- Sample code provision
- API reference maintenance