Skip to content

alonsoir/argus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

587 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ML Defender (aRGus NDR)

Open-source, embedded-ML network detection and response system protecting critical infrastructure from ransomware and DDoS attacks.

Via Appia Quality Council of Wise Ones License: MIT F1=0.9985 Validated Tests: make test-all VERDE Pipeline: 6/6 Plugin Integrity PHASE 4 AppArmor Reproducible XGBoost OOD Finding PRE-PRODUCTION Crypto arXiv TDH

📜 Living contracts: Protobuf schema · Pipeline configs · RAG API


main is tagged v0.5.0-preproduction — PHASE 4 complete. PRE-PRODUCTION: do not deploy in hospitals until ACRL (DEBT-PENTESTER-LOOP-001) is complete.

Estado actual — DAY 123 (2026-04-20)

Tag activo: v0.5.0-preproduction

Pipeline

  • 6/6 componentes RUNNING
  • make test-all: ALL TESTS PASSED

Hitos recientes

  • DAY 122: PHASE 4 completada. XGBoost in-distribution validado (Precision=0.9945, Recall=0.9818). Wednesday OOD finding sellado. Paper Draft v16 (arXiv:2604.04952).
  • DAY 123: ADR-037 aprobado por Consejo 7/7. safe_path utility diseñada (header-only, C++20, cero dependencias externas). DEBT-PANDAS-001 cerrado.

En progreso

  • ADR-037 — Static Analysis Security Hardening (DAY 124) Branch: feature/adr037-snyk-hardening

Próxima frontera

  • DEBT-PENTESTER-LOOP-001 — ACRL: Caldera → eBPF capture → XGBoost retrain → Ed25519 sign → hot-swap

⚠️ NO desplegar en producción hasta

  • ADR-037 CERRADO (safe_path hardening)
  • ADR-036 CERRADO (Formal Verification Baseline)
  • DEBT-PENTESTER-LOOP-001 completado (datos reales ACRL)

📄 Preprint

ML Defender (aRGus NDR) is documented in a peer-reviewed preprint published on arXiv cs.CR (April 2026).

ML Defender (aRGus NDR): An Open-Source Embedded ML NIDS for Botnet and Anomalous Traffic Detection in Resource-Constrained Organizations — Alonso Isidoro Román

arXiv: arXiv:2604.04952 [cs.CR]
DOI: https://doi.org/10.48550/arXiv.2604.04952
Published: 3 April 2026 · Draft v16 (updated 19 April 2026) · MIT license
Code: https://github.com/alonsoir/argus

Draft v16 adds: XGBoost in-distribution evaluation (Prec=0.9945/Rec=0.9818), Wednesday OOD impossibility result, §10.13 structural bias in academic datasets, §11.18 Adversarial Capture-Retrain Loop (ACRL). Cites Sommer & Paxson 2010.


🎯 Mission

Democratize enterprise-grade cybersecurity for hospitals, schools, and small organizations that cannot afford commercial solutions. Built to last decades with scientific honesty and methodical development.

Philosophy: Via Appia Quality — Systems built like Roman roads, designed to endure.

"Un escudo que aprende de su propia sombra."


🛡️ Threat Model Scope

ML Defender is a Network Detection and Response (NDR) system. Its guiding principle is network surveillance: every component operates on network traffic — packet capture, flow-level feature extraction, ML classification, firewall response.

Physical and removable-media vectors are explicitly out of scope by conscious design decision. Complementary mode with Wazuh for file integrity monitoring.


📊 Validated Results (DAY 122 — 19 April 2026)

Metric Value Notes
F1-score (CTU-13 Neris) 0.9985 Stable across 4 replay runs
Precision 0.9969
Recall 1.0000 Zero missed attacks (FN=0)
XGBoost Precision (CIC-IDS-2017 val) 0.9945 In-distribution, threshold=0.8211
XGBoost Recall (CIC-IDS-2017 val) 0.9818 In-distribution
XGBoost F1 (CIC-IDS-2017 val) 0.9881 Val-AUCPR=0.99846
XGBoost Wednesday OOD Documented impossibility Structural covariate shift — see §8 paper
Inference latency (XGBoost) 1.986 µs/sample Gate <2µs ✅
Inference latency (RF) 0.24–1.06 µs Per-class, embedded C++20
Throughput ceiling (virtualized) ~33–38 Mbps VirtualBox NIC limit, not pipeline
Stress test 2,374,845 packets — 0 drops 100 Mbps requested, loop=3
RAM (full pipeline) ~1.28 GB Stable under load
Pipeline components 6/6 RUNNING Reproducible from make bootstrap
Plugin integrity ADR-025 MERGED Ed25519 + TOCTOU-safe dlopen
AppArmor 6/6 enforce 0 denials
CI gate TEST-PROVISION-1 8/8

🔬 DAY 122 Scientific Finding

On DAY 122, a rigorous temporal holdout evaluation on CIC-IDS-2017 revealed a structural covariate shift: Wednesday contains exclusively application-layer DoS attacks (Hulk, GoldenEye, Slowloris) absent from all training days. No threshold can simultaneously satisfy Precision≥0.99 and Recall≥0.95 on Wednesday data. This is not an XGBoost failure — it is an empirical impossibility result caused by the dataset's day-specific attack segregation design.

This finding corroborates Sommer & Paxson (2010) and provides new quantitative evidence that static classifiers trained on academic benchmarks are structurally insufficient for production NDR.

The architectural response — the Adversarial Capture-Retrain Loop (ACRL) — is proposed in §11.18 of the paper. The XGBoost plugin was designed from day one to be hot-swappable and Ed25519-signed (ADR-025/026) for exactly this reason.

"No entrenamos con Wednesday porque Wednesday no existe en el entrenamiento. Entrenamos con Tuesday, y aprendemos a detectar Wednesday en producción." — Kimi, Consejo de Sabios DAY 122


🏗️ Architecture

┌──────────────────────────────────────────────────────────────────┐
│                       ML Defender Pipeline                       │
├──────────────────────────────────────────────────────────────────┤
│  Network Traffic                                                 │
│         ↓                                                        │
│  ┌──────────────────┐                                            │
│  │  sniffer (C++20) │  eBPF/XDP zero-copy packet capture        │
│  │                  │  ShardedFlowManager (16 shards)           │
│  │                  │  Fast Detector (rule-based heuristics)    │
│  │                  │  plugin-loader PHASE 2c ✅ NORMAL         │
│  └──────────────────┘                                            │
│         ↓  ZeroMQ (ChaCha20-Poly1305 encrypted)                  │
│  ┌──────────────────┐                                            │
│  │  ml-detector     │  4× Embedded RandomForest classifiers     │
│  │  (C++20)         │  DDoS: 0.24 μs | Ransomware: 1.06 μs     │
│  │                  │  XGBoost plugin ADR-026 ✅ Prec=0.9945    │
│  │                  │  [PRE-PROD: ACRL pending]                 │
│  └──────────────────┘                                            │
│         ↓  ZeroMQ (encrypted)                                    │
│  ┌──────────────────┐                                            │
│  │  etcd-server     │  Component registration + JSON config     │
│  │  (C++20)         │  HMAC key management + seed distribution  │
│  └──────────────────┘                                            │
│         ↓                                                        │
│  ┌──────────────────┐                                            │
│  │ firewall-acl     │  Autonomous blocking via ipset/iptables   │
│  │ agent (C++20)    │  plugin-loader PHASE 2a ✅ NORMAL         │
│  └──────────────────┘                                            │
│         ↓                                                        │
│  ┌──────────────────┐                                            │
│  │  rag-ingester    │  FAISS + SQLite event ingestion           │
│  │  (C++20)         │  plugin-loader PHASE 2b ✅ READONLY       │
│  └──────────────────┘                                            │
│         ↓                                                        │
│  ┌──────────────────┐                                            │
│  │  rag-security    │  TinyLlama natural language interface      │
│  │  (C++20+LLM)     │  Local inference — no cloud exfiltration  │
│  │                  │  plugin-loader PHASE 2e ✅ READONLY       │
│  └──────────────────┘                                            │
└──────────────────────────────────────────────────────────────────┘

🚀 Quick Start

Critical rules:

  • Always use make <target>. Never compile or install manually in the VM.
  • The Vagrantfile and Makefile are the single source of truth.

👶 First time — fresh clone

git clone https://github.com/alonsoir/argus.git
cd argus
make up          # vagrant up — full provisioning ~20-30 min
make bootstrap   # all 8 steps in one command

🔄 Daily workflow

make up              # if VM stopped
make pipeline-stop
make pipeline-build
make sign-plugins && make sign-models
make pipeline-start && make pipeline-status
make test-all

✅ CI Gate

make test-all
# Runs: libs + components + TEST-PROVISION-1 (8/8)
#       TEST-INVARIANT-SEED + plugin-integ-test (6/6 incl. TEST-INTEG-SIGN)
#       TEST-INTEG-XGBOOST-1

🗺️ Roadmap

✅ DONE — DAY 122 (19 Apr 2026) — PHASE 4 COMPLETADA 🎉

  • DEBT-PRECISION-GATE-001 — Closed with scientific finding. In-distribution: Prec=0.9945/Rec=0.9818 ✅
  • Wednesday OOD impossibility result — Documented, sealed (md5), permanent artifact ✅
  • train_xgboost_level1_v2.py — Temporal split + validation calibration + blind test ✅
  • xgboost_cicids2017_v2.ubj.sig — Ed25519 signed via sign-model.sh ✅
  • Paper Draft v16 — §8 XGBoost + §10.13 + §11.18 ACRL + sommer2010 ✅
  • feature/adr026-xgboost → main — Tag: v0.5.0-preproduction ✅
  • arXiv v16 submitted

✅ DONE — DAY 121 (18 Apr 2026)

  • DEBT-SEED-AUDIT-001 ✅ · DEBT-XGBOOST-TEST-REAL-001 ✅ (medical gate PASSED)
  • DEBT-XGBOOST-DDOS-001 ✅ (F1=1.0, 20× faster RF) · DEBT-XGBOOST-RANSOMWARE-001 ✅
  • vagrant destroy × 3 idempotency certification ✅

✅ DONE — DAY 120–118 (see git log)

🔜 NEXT — PHASE 5: Adversarial Capture-Retrain Loop

Priority Task
P0 DEBT-PENTESTER-LOOP-001 — MITRE Caldera Fase 1 → real adversarial flows → XGBoost retraining
P0 ADR-038 — ACRL formal design document
P1 DEBT-CRYPTO-003a — mlock() + explicit_bzero()
P1 ADR-037 Snyk C++ hardening
P2 ADR-024 Noise_IKpsk3 · ADR-032 HSM · ADR-033 TPM
P3 ADR-029 hardened variants · bare-metal stress test

🧠 Consejo de Sabios — Multi-Model Peer Review

Seven large language models serve as intellectual co-reviewers:

Claude (Anthropic) · Grok (xAI) · ChatGPT (OpenAI) · DeepSeek · Qwen (Alibaba) · Gemini (Google) · Kimi (Moonshot) · Mistral

Methodology: structured disagreement. Problems must be demonstrated with compilable tests or mathematics before fixes are proposed. Documented in the preprint §6.


🗺️ Milestones

  • ✅ DAY 111: arXiv:2604.04952 PUBLICADO 🎉
  • ✅ DAY 114: ADR-025 MERGED — v0.3.0-plugin-integrity 🎉
  • ✅ DAY 118: PHASE 3 COMPLETADA — v0.4.0 MERGEADO 🎉
  • ✅ DAY 120: make bootstrap + XGBoost F1=0.9978 🎉
  • ✅ DAY 121: DEBTs bloqueantes cerrados + gate médico PASADO 🎉
  • ✅ DAY 122: PHASE 4 COMPLETADA — v0.5.0-preproduction 🎉 · Wednesday OOD finding · arXiv v16

📄 License

MIT License — See LICENSE

Via Appia Quality 🏛️ — Built to last decades.

About

Distributed C++20 microservices architecture for real-time detection and correlation of DDoS and ransomware activity. Deterministic ingestion, idempotent replay, FAISS-based semantic indexing, and zero-coordination incident correlation via derived trace identifiers.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors