ML Defender (aRGus NDR)

Open-source, embedded-ML network detection and response system protecting critical infrastructure from ransomware and DDoS attacks.

📜 Living contracts: Protobuf schema · Pipeline configs · RAG API

✅ main is tagged v0.5.0-preproduction — PHASE 4 complete. PRE-PRODUCTION: do not deploy in hospitals until ACRL (DEBT-PENTESTER-LOOP-001) is complete.

Estado actual — DAY 123 (2026-04-20)

Tag activo: v0.5.0-preproduction

Pipeline

6/6 componentes RUNNING
make test-all: ALL TESTS PASSED

Hitos recientes

DAY 122: PHASE 4 completada. XGBoost in-distribution validado (Precision=0.9945, Recall=0.9818). Wednesday OOD finding sellado. Paper Draft v16 (arXiv:2604.04952).
DAY 123: ADR-037 aprobado por Consejo 7/7. safe_path utility diseñada (header-only, C++20, cero dependencias externas). DEBT-PANDAS-001 cerrado.

En progreso

ADR-037 — Static Analysis Security Hardening (DAY 124) Branch: feature/adr037-snyk-hardening

Próxima frontera

DEBT-PENTESTER-LOOP-001 — ACRL: Caldera → eBPF capture → XGBoost retrain → Ed25519 sign → hot-swap

⚠️ NO desplegar en producción hasta

ADR-037 CERRADO (safe_path hardening)
ADR-036 CERRADO (Formal Verification Baseline)
DEBT-PENTESTER-LOOP-001 completado (datos reales ACRL)

📄 Preprint

ML Defender (aRGus NDR) is documented in a peer-reviewed preprint published on arXiv cs.CR (April 2026).

ML Defender (aRGus NDR): An Open-Source Embedded ML NIDS for Botnet and Anomalous Traffic Detection in Resource-Constrained Organizations — Alonso Isidoro Román

arXiv: arXiv:2604.04952 [cs.CR]
DOI: https://doi.org/10.48550/arXiv.2604.04952
Published: 3 April 2026 · Draft v16 (updated 19 April 2026) · MIT license
Code: https://github.com/alonsoir/argus

Draft v16 adds: XGBoost in-distribution evaluation (Prec=0.9945/Rec=0.9818), Wednesday OOD impossibility result, §10.13 structural bias in academic datasets, §11.18 Adversarial Capture-Retrain Loop (ACRL). Cites Sommer & Paxson 2010.

🎯 Mission

Democratize enterprise-grade cybersecurity for hospitals, schools, and small organizations that cannot afford commercial solutions. Built to last decades with scientific honesty and methodical development.

Philosophy: Via Appia Quality — Systems built like Roman roads, designed to endure.

"Un escudo que aprende de su propia sombra."

🛡️ Threat Model Scope

ML Defender is a Network Detection and Response (NDR) system. Its guiding principle is network surveillance: every component operates on network traffic — packet capture, flow-level feature extraction, ML classification, firewall response.

Physical and removable-media vectors are explicitly out of scope by conscious design decision. Complementary mode with Wazuh for file integrity monitoring.

📊 Validated Results (DAY 122 — 19 April 2026)

Metric	Value	Notes
F1-score (CTU-13 Neris)	0.9985	Stable across 4 replay runs
Precision	0.9969
Recall	1.0000	Zero missed attacks (FN=0)
XGBoost Precision (CIC-IDS-2017 val)	0.9945	In-distribution, threshold=0.8211
XGBoost Recall (CIC-IDS-2017 val)	0.9818	In-distribution
XGBoost F1 (CIC-IDS-2017 val)	0.9881	Val-AUCPR=0.99846
XGBoost Wednesday OOD	Documented impossibility	Structural covariate shift — see §8 paper
Inference latency (XGBoost)	1.986 µs/sample	Gate <2µs ✅
Inference latency (RF)	0.24–1.06 µs	Per-class, embedded C++20
Throughput ceiling (virtualized)	~33–38 Mbps	VirtualBox NIC limit, not pipeline
Stress test	2,374,845 packets — 0 drops	100 Mbps requested, loop=3
RAM (full pipeline)	~1.28 GB	Stable under load
Pipeline components	6/6 RUNNING	Reproducible from `make bootstrap`
Plugin integrity	ADR-025 MERGED	Ed25519 + TOCTOU-safe dlopen
AppArmor	6/6 enforce	0 denials
CI gate	TEST-PROVISION-1 8/8

🔬 DAY 122 Scientific Finding

On DAY 122, a rigorous temporal holdout evaluation on CIC-IDS-2017 revealed a structural covariate shift: Wednesday contains exclusively application-layer DoS attacks (Hulk, GoldenEye, Slowloris) absent from all training days. No threshold can simultaneously satisfy Precision≥0.99 and Recall≥0.95 on Wednesday data. This is not an XGBoost failure — it is an empirical impossibility result caused by the dataset's day-specific attack segregation design.

This finding corroborates Sommer & Paxson (2010) and provides new quantitative evidence that static classifiers trained on academic benchmarks are structurally insufficient for production NDR.

The architectural response — the Adversarial Capture-Retrain Loop (ACRL) — is proposed in §11.18 of the paper. The XGBoost plugin was designed from day one to be hot-swappable and Ed25519-signed (ADR-025/026) for exactly this reason.

"No entrenamos con Wednesday porque Wednesday no existe en el entrenamiento. Entrenamos con Tuesday, y aprendemos a detectar Wednesday en producción." — Kimi, Consejo de Sabios DAY 122

🏗️ Architecture

┌──────────────────────────────────────────────────────────────────┐
│                       ML Defender Pipeline                       │
├──────────────────────────────────────────────────────────────────┤
│  Network Traffic                                                 │
│         ↓                                                        │
│  ┌──────────────────┐                                            │
│  │  sniffer (C++20) │  eBPF/XDP zero-copy packet capture        │
│  │                  │  ShardedFlowManager (16 shards)           │
│  │                  │  Fast Detector (rule-based heuristics)    │
│  │                  │  plugin-loader PHASE 2c ✅ NORMAL         │
│  └──────────────────┘                                            │
│         ↓  ZeroMQ (ChaCha20-Poly1305 encrypted)                  │
│  ┌──────────────────┐                                            │
│  │  ml-detector     │  4× Embedded RandomForest classifiers     │
│  │  (C++20)         │  DDoS: 0.24 μs | Ransomware: 1.06 μs     │
│  │                  │  XGBoost plugin ADR-026 ✅ Prec=0.9945    │
│  │                  │  [PRE-PROD: ACRL pending]                 │
│  └──────────────────┘                                            │
│         ↓  ZeroMQ (encrypted)                                    │
│  ┌──────────────────┐                                            │
│  │  etcd-server     │  Component registration + JSON config     │
│  │  (C++20)         │  HMAC key management + seed distribution  │
│  └──────────────────┘                                            │
│         ↓                                                        │
│  ┌──────────────────┐                                            │
│  │ firewall-acl     │  Autonomous blocking via ipset/iptables   │
│  │ agent (C++20)    │  plugin-loader PHASE 2a ✅ NORMAL         │
│  └──────────────────┘                                            │
│         ↓                                                        │
│  ┌──────────────────┐                                            │
│  │  rag-ingester    │  FAISS + SQLite event ingestion           │
│  │  (C++20)         │  plugin-loader PHASE 2b ✅ READONLY       │
│  └──────────────────┘                                            │
│         ↓                                                        │
│  ┌──────────────────┐                                            │
│  │  rag-security    │  TinyLlama natural language interface      │
│  │  (C++20+LLM)     │  Local inference — no cloud exfiltration  │
│  │                  │  plugin-loader PHASE 2e ✅ READONLY       │
│  └──────────────────┘                                            │
└──────────────────────────────────────────────────────────────────┘

🚀 Quick Start

Critical rules:

Always use make <target>. Never compile or install manually in the VM.

The Vagrantfile and Makefile are the single source of truth.

👶 First time — fresh clone

git clone https://github.com/alonsoir/argus.git
cd argus
make up          # vagrant up — full provisioning ~20-30 min
make bootstrap   # all 8 steps in one command

🔄 Daily workflow

make up              # if VM stopped
make pipeline-stop
make pipeline-build
make sign-plugins && make sign-models
make pipeline-start && make pipeline-status
make test-all

✅ CI Gate

make test-all
# Runs: libs + components + TEST-PROVISION-1 (8/8)
#       TEST-INVARIANT-SEED + plugin-integ-test (6/6 incl. TEST-INTEG-SIGN)
#       TEST-INTEG-XGBOOST-1

🗺️ Roadmap

✅ DONE — DAY 122 (19 Apr 2026) — PHASE 4 COMPLETADA 🎉

DEBT-PRECISION-GATE-001 — Closed with scientific finding. In-distribution: Prec=0.9945/Rec=0.9818 ✅
Wednesday OOD impossibility result — Documented, sealed (md5), permanent artifact ✅
train_xgboost_level1_v2.py — Temporal split + validation calibration + blind test ✅
xgboost_cicids2017_v2.ubj.sig — Ed25519 signed via sign-model.sh ✅
Paper Draft v16 — §8 XGBoost + §10.13 + §11.18 ACRL + sommer2010 ✅
feature/adr026-xgboost → main — Tag: v0.5.0-preproduction ✅
arXiv v16 submitted ✅

✅ DONE — DAY 121 (18 Apr 2026)

DEBT-SEED-AUDIT-001 ✅ · DEBT-XGBOOST-TEST-REAL-001 ✅ (medical gate PASSED)
DEBT-XGBOOST-DDOS-001 ✅ (F1=1.0, 20× faster RF) · DEBT-XGBOOST-RANSOMWARE-001 ✅
vagrant destroy × 3 idempotency certification ✅

✅ DONE — DAY 120–118 (see git log)

🔜 NEXT — PHASE 5: Adversarial Capture-Retrain Loop

Priority	Task
P0	DEBT-PENTESTER-LOOP-001 — MITRE Caldera Fase 1 → real adversarial flows → XGBoost retraining
P0	ADR-038 — ACRL formal design document
P1	DEBT-CRYPTO-003a — mlock() + explicit_bzero()
P1	ADR-037 Snyk C++ hardening
P2	ADR-024 Noise_IKpsk3 · ADR-032 HSM · ADR-033 TPM
P3	ADR-029 hardened variants · bare-metal stress test

🧠 Consejo de Sabios — Multi-Model Peer Review

Seven large language models serve as intellectual co-reviewers:

Claude (Anthropic) · Grok (xAI) · ChatGPT (OpenAI) · DeepSeek · Qwen (Alibaba) · Gemini (Google) · Kimi (Moonshot) · Mistral

Methodology: structured disagreement. Problems must be demonstrated with compilable tests or mathematics before fixes are proposed. Documented in the preprint §6.

🗺️ Milestones

✅ DAY 111: arXiv:2604.04952 PUBLICADO 🎉
✅ DAY 114: ADR-025 MERGED — v0.3.0-plugin-integrity 🎉
✅ DAY 118: PHASE 3 COMPLETADA — v0.4.0 MERGEADO 🎉
✅ DAY 120: make bootstrap + XGBoost F1=0.9978 🎉
✅ DAY 121: DEBTs bloqueantes cerrados + gate médico PASADO 🎉
✅ DAY 122: PHASE 4 COMPLETADA — v0.5.0-preproduction 🎉 · Wednesday OOD finding · arXiv v16

📄 License

MIT License — See LICENSE

Via Appia Quality 🏛️ — Built to last decades.

Name		Name	Last commit message	Last commit date
Latest commit History 587 Commits
.github/workflows		.github/workflows
avatars		avatars
common-rag-ingester		common-rag-ingester
common		common
contract-validation/day48		contract-validation/day48
contrib		contrib
crypto-transport		crypto-transport
docs		docs
etcd-client		etcd-client
etcd-server		etcd-server
firewall-acl-agent		firewall-acl-agent
libs/seed-client		libs/seed-client
ml-detector		ml-detector
ml-training		ml-training
models		models
plugin-loader		plugin-loader
plugins		plugins
protobuf		protobuf
rag-ingester		rag-ingester
rag		rag
scripts		scripts
shared/indices		shared/indices
site		site
sniffer		sniffer
third_party		third_party
tools		tools
tsan-reports		tsan-reports
.gitguardian.yaml		.gitguardian.yaml
.gitignore		.gitignore
.gitmodules		.gitmodules
CLAUDE.md		CLAUDE.md
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md
Vagrantfile		Vagrantfile
cleanup_adrs.py		cleanup_adrs.py
commit-message.txt		commit-message.txt
mkdocs.yml		mkdocs.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Defender (aRGus NDR)

Estado actual — DAY 123 (2026-04-20)

Pipeline

Hitos recientes

En progreso

Próxima frontera

⚠️ NO desplegar en producción hasta

📄 Preprint

🎯 Mission

🛡️ Threat Model Scope

📊 Validated Results (DAY 122 — 19 April 2026)

🔬 DAY 122 Scientific Finding

🏗️ Architecture

🚀 Quick Start

👶 First time — fresh clone

🔄 Daily workflow

✅ CI Gate

🗺️ Roadmap

✅ DONE — DAY 122 (19 Apr 2026) — PHASE 4 COMPLETADA 🎉

✅ DONE — DAY 121 (18 Apr 2026)

✅ DONE — DAY 120–118 (see git log)

🔜 NEXT — PHASE 5: Adversarial Capture-Retrain Loop

🧠 Consejo de Sabios — Multi-Model Peer Review

🗺️ Milestones

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ML Defender (aRGus NDR)

Estado actual — DAY 123 (2026-04-20)

Pipeline

Hitos recientes

En progreso

Próxima frontera

⚠️ NO desplegar en producción hasta

📄 Preprint

🎯 Mission

🛡️ Threat Model Scope

📊 Validated Results (DAY 122 — 19 April 2026)

🔬 DAY 122 Scientific Finding

🏗️ Architecture

🚀 Quick Start

👶 First time — fresh clone

🔄 Daily workflow

✅ CI Gate

🗺️ Roadmap

✅ DONE — DAY 122 (19 Apr 2026) — PHASE 4 COMPLETADA 🎉

✅ DONE — DAY 121 (18 Apr 2026)

✅ DONE — DAY 120–118 (see git log)

🔜 NEXT — PHASE 5: Adversarial Capture-Retrain Loop

🧠 Consejo de Sabios — Multi-Model Peer Review

🗺️ Milestones

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages