Skip to content

aliakarma/eagf

Repository files navigation

EAGF: Ethical AI Governance Framework

Joint Optimization of Fairness, Privacy, Explainability & Accountability in AI-Based Cybersecurity

Python License Status PyTorch Reproducibility


πŸ“‹ Overview

EAGF is a reproducible research framework that combines differential privacy (DP-SGD), fairness regularization (false positive rate parity), explainability (SHAP), and audit logging into a unified governance pipeline. Evaluated on the real-world Edge-IIoTset (IEEE 2022) for IoT anomaly detection with multi-objective Pareto trade-off analysis.

Key Focus: Cybersecurity for IoT networks with resource-constrained devices. EAGF enables governance-aware AI deployment with minimal system overhead.


🎯 Key Contributions

  • 🏭 Real-world dataset integration: Edge-IIoTset (IEEE 2022, 150K samples, 40 network flow features)
  • βš–οΈ Multi-objective governance: Joint optimization of four pillarsβ€”fairness, privacy, clarity, accountability
  • πŸ“Š Trust Index metric: Composite governance score for model selection and comparison
  • πŸ“ˆ Pareto trade-off analysis: Quantify accuracy-fairness-privacy trade-offs with front visualization
  • πŸ”„ Reproducible pipeline: Deterministic execution, fixed seeds, publication-ready results

πŸ“Š Key Results Summary

Dataset: Edge-IIoTset (150K samples, 3 protocol-type protected groups)
Baselines: Unregulated + Joint DP+Fair
Statistical Rigor: 5 independent seeds with 95% CI

Metric Baseline EAGF Ξ” Improvement
Accuracy 0.6481 Β± 0.0251 0.6650 Β± 0.0079 +0.0168 +2.6%
FPR Parity 0.4931 Β± 0.0849 0.7709 Β± 0.0573 +0.2779 +56.4% βœ“
Clarity 0.6918 Β± 0.0432 0.7390 Β± 0.0548 +0.0472 +6.8%
Privacy 0.2475 Β± 0.0030 0.2482 Β± 0.0025 +0.0007 Preserved βœ“
Accountability 0.0000 Β± 0.0000 0.6667 Β± 0.0000 +0.6667 Full coverage βœ“
Trust Index 0.3581 Β± 0.0129 0.6062 Β± 0.0108 +0.2481 +69.3% βœ“

πŸ”¬ Key Findings

βœ… Fairness breakthrough: FPR parity improved +56.4% across protocol-type groups (web, IoT MQTT, misc)
βœ… Trust Index surge: Composite governance metric +69.3%, indicating strong multi-objective alignment
βœ… Privacy guarantee: Differential privacy (Ξ΅=2.4) maintained with negligible DP-Ξ΅ change
βœ… Edge deployment ready: +0.2ms latency (~11%), +5.8MB memoryβ€”suitable for constrained IoT
βœ… Calibration stable: ECE and Brier comparable (Β±0.05), no metric gaming


πŸ“ Repository Structure

eagf/
β”œβ”€β”€ πŸ“– README.md                    # This file
β”œβ”€β”€ πŸ“„ requirements.txt             # Dependencies (numpy, pandas, scikit-learn, fairlearn, pyyaml)
β”œβ”€β”€ πŸ”§ setup.py                     # Package setup
β”‚
β”œβ”€β”€ πŸš€ run_eagf.py                  # Single-seed entry point
β”œβ”€β”€ πŸš€ run_full_pipeline.py         # Multi-seed experiment runner (MAIN)
β”‚
β”œβ”€β”€ 🧠 src/
β”‚   β”œβ”€β”€ training/
β”‚   β”‚   β”œβ”€β”€ eagf_trainer.py         # Main EAGF training loop with governance
β”‚   β”‚   β”œβ”€β”€ fairness_loss.py         # Fairness penalty (FPR parity)
β”‚   β”‚   └── pareto_trainer.py        # Pareto front exploration
β”‚   β”‚
β”‚   β”œβ”€β”€ evaluation/
β”‚   β”‚   β”œβ”€β”€ baseline.py             # Unregulated baseline + Joint DP+Fair
β”‚   β”‚   β”œβ”€β”€ ablation.py             # Single-pillar ablation study
β”‚   β”‚   β”œβ”€β”€ report_generator.py      # Multi-seed report + statistics
β”‚   β”‚   β”œβ”€β”€ audit_logger.py         # Compliance audit trail
β”‚   β”‚   β”œβ”€β”€ benchmark_suite.py      # System metrics (latency, memory, energy)
β”‚   β”‚   └── statistics.py           # 95% CI, statistical tests
β”‚   β”‚
β”‚   β”œβ”€β”€ metrics/
β”‚   β”‚   β”œβ”€β”€ fairness.py             # FPR parity, recall parity, group metrics
β”‚   β”‚   β”œβ”€β”€ privacy.py              # DP-SGD evaluation, privacy accounting
β”‚   β”‚   β”œβ”€β”€ clarity.py              # SHAP-based explainability
β”‚   β”‚   β”œβ”€β”€ accountability.py       # Audit coverage, compliance scoring
β”‚   β”‚   └── trust_index.py          # Composite Trust Index aggregation
β”‚   β”‚
β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   β”œβ”€β”€ data_loader.py          # Generic dataset loading
β”‚   β”‚   β”œβ”€β”€ edge_iiot_loader.py     # Edge-IIoTset specific (protocol_type grouping)
β”‚   β”‚   β”œβ”€β”€ real_data_loader.py     # Real dataset pipeline
β”‚   β”‚   β”œβ”€β”€ preprocessing.py        # Feature engineering, normalization
β”‚   β”‚   β”œβ”€β”€ reiot_simulator.py      # RE-IoT synthetic simulator (optional)
β”‚   β”‚   β”œβ”€β”€ ahp.py                  # Analytic Hierarchy Process (Trust Index weights)
β”‚   β”‚   └── visualisation.py        # Pareto, Trade-off plots
β”‚   β”‚
β”‚   └── baselines/
β”‚       β”œβ”€β”€ aif360_dp_pipeline.py   # AIF360 fairness baseline
β”‚       └── joint_dp_fair_baseline.py # Combined DP + fairness baseline
β”‚
β”œβ”€β”€ βš™οΈ configs/
β”‚   β”œβ”€β”€ reiot_real.yaml             # Main: Edge-IIoTset + EAGF governance (RECOMMENDED)
β”‚   β”œβ”€β”€ reiot_default.yaml          # Alternative RE-IoT config
β”‚   β”œβ”€β”€ biometric_default.yaml      # Biometric (secondary validation)
β”‚   β”œβ”€β”€ biometric_tuned_auto.yaml   # Tuned biometric
β”‚   β”œβ”€β”€ eagf_thresholds.yaml        # Governance thresholds
β”‚   └── compliance_checklist*.yaml  # Compliance templates
β”‚
β”œβ”€β”€ πŸ“Š data/
β”‚   β”œβ”€β”€ README.md                   # Data documentation
β”‚   └── real_iot/
β”‚       └── edge_iiot.csv           # Edge-IIoTset (150K rows, 40 features) [USER PROVIDED]
β”‚
β”œβ”€β”€ πŸ““ notebooks/
β”‚   β”œβ”€β”€ 01_eagf_demo.ipynb          # Quick start demo
β”‚   β”œβ”€β”€ 02_statistical_analysis.ipynb # Multi-seed statistics
β”‚   β”œβ”€β”€ 03_reiot_fairness.ipynb     # Fairness deep-dive (protocol_type groups)
β”‚   β”œβ”€β”€ 04_pareto_front.ipynb       # Pareto front visualization
β”‚   └── 05_trust_index_sensitivity.ipynb # Sensitivity analysis
β”‚
β”œβ”€β”€ 🎨 figures/
β”‚   β”œβ”€β”€ pareto_front.png            # Accuracy vs. Fairness vs. Privacy
β”‚   β”œβ”€β”€ ti_vs_latency.png           # Trust Index vs. Inference Latency
β”‚   └── ablation_comparison.png     # Single-pillar vs. multi-pillar
β”‚
β”œβ”€β”€ πŸ“‹ docs/
β”‚   β”œβ”€β”€ metric_definitions.md       # Detailed metric documentation
β”‚   β”œβ”€β”€ regulatory_mapping.md       # Compliance + GDPR/CCPA alignment
β”‚   └── reproducibility.md          # Detailed reproducibility steps
β”‚
β”œβ”€β”€ πŸ§ͺ results/
β”‚   β”œβ”€β”€ final_report.txt            # ✨ MAIN DELIVERABLEβ€”aggregated results
β”‚   β”œβ”€β”€ main_results.csv            # Baseline, EAGF, Joint metrics (5 seeds)
β”‚   β”œβ”€β”€ pareto_results.csv          # Pareto exploration (25 runs)
β”‚   └── [seed-specific subdirs]/     # Individual seed outputs
β”‚
β”œβ”€β”€ πŸ› οΈ scripts/
β”‚   β”œβ”€β”€ run_all.sh                  # Full pipeline (Edge-IIoT + ablation)
β”‚   β”œβ”€β”€ run_reiot.sh                # Edge-IIoTset only
β”‚   β”œβ”€β”€ run_baseline.sh             # Baseline only
β”‚   β”œβ”€β”€ run_pareto_search.sh        # Pareto front exploration
β”‚   β”œβ”€β”€ sweep_three_stage.py        # Hyperparameter sweep
β”‚   └── verify_metrics.py           # Metric validation
β”‚
β”œβ”€β”€ βœ… tests/
β”‚   β”œβ”€β”€ test_data.py                # Data loading & preprocessing
β”‚   β”œβ”€β”€ test_metrics.py             # Metric computation
β”‚   β”œβ”€β”€ conftest.py                 # Pytest fixtures
β”‚   └── run_tests.py                # Test runner
β”‚
β”œβ”€β”€ 🐳 Dockerfile                   # Container setup
β”œβ”€β”€ 🌍 environment.yml              # Conda environment (optional)
β”œβ”€β”€ πŸ“š CONTRIBUTING.md              # Contribution guidelines
β”œβ”€β”€ πŸ“œ LICENSE                      # MIT License
β”œβ”€β”€ πŸ“ CHANGELOG.md                 # Version history
└── πŸ“‹ CITATION.cff                 # BibTeX citation metadata

πŸ–ΌοΈ Visualizations & Results

Figure 1: Pareto Front β€” Trade-off Analysis

Pareto front: Accuracy vs Fairness vs Privacy
Multi-objective optimization frontier showing EAGF solutions (orange) vs. baseline (blue) across accuracy-fairness-privacy space.

Figure 2: Trust Index vs. Inference Latency

Trust Index vs inference latency comparison
System efficiency comparison: EAGF maintains high governance (TI=0.61) with minimal latency overhead (+0.2ms/sample).

Figure 3: Ablation Study β€” Pillar-by-Pillar Comparison

Ablation comparison of EAGF pillars
Impact of each governance pillar on Trust Index. Multi-pillar integration significantly outperforms single-pillar approaches.


πŸ““ Interactive Notebooks

Run notebooks directly in Google Colab without local setup:

Notebook Purpose Badge
01_eagf_demo.ipynb 5-minute quick start Colab
02_statistical_analysis.ipynb Multi-seed statistics, hypothesis tests Colab
03_reiot_fairness.ipynb Fairness deep-dive by protocol-type groups Colab
04_pareto_front.ipynb Interactive Pareto front exploration Colab
05_trust_index_sensitivity.ipynb Trust Index weight sensitivity analysis Colab

βš™οΈ Installation

Requirements

  • Python 3.9+
  • pip

Setup

git clone https://github.com/aliakarma/eagf.git
cd eagf

python -m venv .venv

# Windows
.venv\Scripts\activate

# Linux / macOS
source .venv/bin/activate

pip install -r requirements.txt

Dataset Setup

Edge-IIoTset (Required)

Dataset: ML-EdgeIIoT-dataset.csv (real-world IoT anomaly detection)
Source: IEEE Access 2022 (Ferrag et al.)
Size: ~78 MB (157.8K raw rows)
Features: 40 network flow + protocol-specific attributes
Labels: Normal vs. Attack (imbalanced: 23.1K vs. 126.9K)

Note: EAGF uses real Edge-IIoTset data only. No synthetic fallback.

Setup Instructions

Option A: Manual Download (Recommended)

  1. Download ML-EdgeIIoT-dataset.csv from IEEE DataPort
  2. Extract and place at:
    data/real_iot/edge_iiot.csv
    

Option B: Verify Existing Data

If you already have the dataset:

# Check file size (~78 MB)
ls -lh data/real_iot/edge_iiot.csv

πŸ“‚ Output Files & Deliverables

File Description Format
results/final_report.txt ✨ Main deliverableβ€”aggregated metrics, statistics, validation gates TXT
results/main_results.csv Summary table: baseline, EAGF, Joint DP+Fair across all metrics CSV
results/pareto_results.csv Pareto search results (25 multi-objective runs with trade-off scores) CSV
figures/pareto_front.png 3D visualization: accuracy vs. fairness vs. privacy PNG
figures/ti_vs_latency.png 2D scatter: Trust Index vs. inference latency PNG
figures/ablation_comparison.png Bar chart: pillar ablation study (single vs. multi-pillar) PNG
[seed-specific]/predictions.json Per-sample predictions, confidences, fairness group info JSON
[seed-specific]/metrics.json Detailed metrics for each seed JSON

πŸš€ Quick Start (Smoke Test)

Validate the entire pipeline in ~5 minutes using a single seed:

python run_full_pipeline.py \
  --real_dataset edge_iiot \
  --config configs/reiot_real.yaml \
  --seeds 42 \
  --fast

What happens:

  • βœ“ Loads Edge-IIoTset and validates data
  • βœ“ Trains baseline + EAGF + Joint DP+Fair models (1 seed)
  • βœ“ Computes fairness, privacy, clarity, accountability metrics
  • βœ“ Generates results/final_report.txt with summary
  • βœ“ Produces figures in figures/

Expected runtime: ~5 min on CPU (Intel Core i7)


πŸ† Full Experiment (Publication Results)

Reproduce final results with 5 independent seeds (statistical rigor):

python run_full_pipeline.py \
  --real_dataset edge_iiot \
  --config configs/reiot_real.yaml \
  --seeds 42 43 44 45 46

Output:

  • Mean Β± std for all governance metrics
  • 95% confidence intervals
  • Pareto front visualization (25 multi-objective runs)
  • Final report with ablation analysis
  • Seed-specific detailed logs

Expected runtime: ~10–15 min on CPU


πŸ”¬ Advanced: Pareto Front Search

Explore the full accuracy-fairness-privacy trade-off surface:

python -c "
from src.training.pareto_trainer import ParetoTrainer
from src.utils.edge_iiot_loader import EdgeIIoTLoader

loader = EdgeIIoTLoader('data/real_iot/edge_iiot.csv')
X_train, X_test, y_train, y_test, groups = loader.load()

trainer = ParetoTrainer(X_train, y_train, groups, seed=42)
trainer.search(n_objectives=3, n_runs=25)  # Explore 25 configurations
trainer.plot_pareto('figures/pareto_custom.png')
"

Key Results

Dataset: Edge-IIoTset (150K samples, protocol-type protected groups)
Baselines: Unregulated model, Joint DP+Fair
Seeds: 5 independent runs
Metrics: Accuracy, Fairness (FPR Parity), Clarity, Privacy, Accountability, Trust Index

Summary (Mean Β± Std)

Metric Baseline EAGF Ξ”
Accuracy 0.6481 Β± 0.0251 0.6650 Β± 0.0079 +0.0168 (+2.6%)
FPR Parity 0.4931 Β± 0.0849 0.7709 Β± 0.0573 +0.2779 (+56.4%)
Clarity 0.6918 Β± 0.0432 0.7390 Β± 0.0548 +0.0472 (+6.8%)
Privacy 0.2475 Β± 0.0030 0.2482 Β± 0.0025 +0.0007 (+0.3%, preserved)
Accountability 0.0000 Β± 0.0000 0.6667 Β± 0.0000 +0.6667 βœ“
Trust Index 0.3581 Β± 0.0129 0.6062 Β± 0.0108 +0.2481 (+69.3%)

Key Findings

  • Fairness via FPR Parity: EAGF achieves +56.4% improvement in false positive rate fairness across protocol-type groups (web, IoT MQTT, misc). Disparities in false alarm rates reduced from 49.3% to 23% spread.
  • Trust Index: Composite governance metric improves by +69.3%, indicating strong multi-objective alignment.
  • Privacy Preserved: Differential privacy (Ξ΅=2.4) maintained with negligible change vs. baseline. No privacy regression.
  • Minimal System Overhead: Inference latency +0.2ms/sample (~11% increase); memory +5.8 MB. Suitable for edge deployment.
  • Calibration Stability: ECE and Brier scores comparable (within Β±0.05), no metric gaming.

πŸ—οΈ Method Overview

Four-Pillar Governance Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   EAGF Framework                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Clarity β”‚ Fairness β”‚ Privacy  β”‚  Accountability        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  SHAP   β”‚ FPRP     β”‚ DP-SGD   β”‚  Audit Logging + Rules β”‚
β”‚  Loss   β”‚ Loss     β”‚ Gradient β”‚  Compliance Coverage   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      ↓
              Trust Index (TI)
         Weighted Aggregation via AHP

Key Components

Pillar Metric Implementation Config
Fairness FPR Parity src/metrics/fairness.py lambda_rp: 0.2
Privacy DP Accounting src/metrics/privacy.py dp_epsilon: 2.4
Clarity SHAP Sparsity src/metrics/clarity.py lambda_c: 0.05
Accountability Audit Coverage src/metrics/accountability.py Compliance rules

πŸ“š Reproducibility & Validation

Deterministic Execution

  • Fixed seeds: 42–46 (5 independent runs)
  • Deterministic pipeline: Reproducible to Β±0.005 variance (NumPy/PyTorch seeds)
  • No hidden preprocessing: All transformations logged in audit_logger.py
  • Hyperparameter justification: See configs/reiot_real.yaml

Validation Gates

All experiments must satisfy:

  • βœ“ FPR Parity EAGF β‰₯ Baseline + 0.02 (fairness improvement)
  • βœ“ Privacy EAGF β‰₯ Baseline (no regression)
  • βœ“ Accuracy drop ≀ 2% (stability requirement)
  • βœ“ Trust Index EAGF > Baseline (overall improvement)

πŸ“– Documentation

File Purpose
docs/metric_definitions.md Detailed fairness, privacy, clarity, accountability metrics
docs/regulatory_mapping.md GDPR, CCPA, ISO alignment
docs/reproducibility.md Step-by-step reproducibility guide

πŸ”— Related Work

  • Fairness: Hardt et al. (2016), Moritz et al. (2020)
  • Privacy: Abadi et al. (2016) DP-SGD, Kairouz et al. (2021) DP survey
  • Explainability: Lundberg & Lee (2017) SHAP
  • IoT Security: Ferrag et al. (2022) Edge-IIoTset

πŸ“œ License

MIT License β€” See LICENSE for full terms.


🀝 Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.


πŸ’¬ Support & Issues

For reproducibility help, issues, or questions:

  1. Check diagnostics: See results/final_report.txt for detailed error logs
  2. Verify dataset: Ensure data/real_iot/edge_iiot.csv exists (~78 MB)
  3. Check environment:
    python -m pip show scikit-learn fairlearn numpy pandas
  4. Open issue: Include OS, Python version, full error trace, and reproducibility steps

Last Updated: March 2026 | Python 3.9+ | PyTorch 2.0+

About

EAGF (Ethical Agent Governance Framework)

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors