Skip to content

ping2A/IronSift

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

IronSift πŸ”πŸ›‘οΈ

"Where's Waldo?" for Cybersecurity β€” Fleet-wide anomaly detection powered by unsupervised machine learning.

Created with Claude.ai but supervised by a human (me apparently).


What is IronSift?

IronSift is a Rust-based security analyzer that finds anomalous machines in a fleet by comparing their process (and optionally file access) behavior. It does not rely on attack signatures or threat feeds: it learns what is β€œnormal” from your own data and flags machines that stand out.

  • Fleet mode (default): You feed process logs from many machines (CSV, JSON, or JSONL). IronSift builds a behavioral profile per machine, turns them into vectors (TF-IDF), and runs DBSCAN clustering. Machines that end up alone (noise) or in a small minority cluster are reported as anomalies, with severity and risk factors (entropy, suspicious paths, unexpected root, etc.).
  • Temporal mode: For a single machine, you can compare two or more snapshots over time. IronSift reports new processes, new or modified files, and new IP connections between snapshots β€” no clustering involved.
  • File mode (--files): Same idea as fleet mode, but using file access logs instead of process logs; supports mtime-based anomaly detection across the fleet.

Input can come from CSV, JSON, or JSONL (one JSON object per line; each file can be one machine). Output is a console report and an optional JSON forensic report for integration with other tools.


How it works (high-level)

  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  INPUTS                                                                          β”‚
  β”‚  Process logs (CSV / JSON / JSONL)  or  File access logs  or  Temporal snapshots β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                          β”‚
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚                               β”‚                               β”‚
          β–Ό                               β–Ό                               β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  FLEET ANALYSIS    β”‚         β”‚  FILE ANALYSIS    β”‚         β”‚  TEMPORAL         β”‚
  β”‚  (process logs)   β”‚         β”‚  (--files)        β”‚         β”‚  (same machine    β”‚
  β”‚                   β”‚         β”‚                   β”‚         β”‚   over time)       β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚                             β”‚                             β”‚
            β”‚  Group by machine_id        β”‚  Group by machine_id        β”‚  Build snapshot
            β”‚  Resolve parents,           β”‚  Per-file mtime/risk        β”‚  per time point
            β”‚  compute entropy & paths    β”‚                             β”‚
            β–Ό                             β–Ό                             β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  One profile per  β”‚         β”‚  One file profile  β”‚         β”‚  Diff snapshots:   β”‚
  β”‚  machine          β”‚         β”‚  per machine       β”‚         β”‚  new processes,    β”‚
  β”‚  (process counts) β”‚         β”‚  (file + mtime)    β”‚         β”‚  new/modified      β”‚
  β”‚                   β”‚         β”‚                    β”‚         β”‚  files, new IPs     β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚                             β”‚
            β”‚  TF-IDF matrix              β”‚  TF-IDF + mtime
            β”‚  (machines Γ— features)      β”‚  anomaly checks
            β–Ό                             β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  DBSCAN           β”‚         β”‚  DBSCAN +          β”‚
  β”‚  Noise = outlier  β”‚         β”‚  mtime/recent      β”‚
  β”‚  Small cluster =  β”‚         β”‚  file rules        β”‚
  β”‚  minority         β”‚         β”‚                    β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚                             β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  OUTPUTS                                                                         β”‚
  β”‚  Console report (anomalies, severity, suspicious processes) + optional JSON exportβ”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

In short: Fleet and file modes turn many machines into profiles, then use TF-IDF + DBSCAN to find machines that don’t match the majority. Temporal mode skips clustering and just diffs consecutive snapshots of one machine.

🎯 Quick Start (3 Ways)

Option 1: Super Simple API (Recommended for Getting Started)

use ironsift::{build_profiles_simple, analyze_fleet, DetectionConfig};

fn main() {
    let config = DetectionConfig::default();
    
    // Just provide (machine_id, process_name, parent_name) - PIDs handled automatically!
    let processes = vec![
        ("server1".to_string(), "nginx".to_string(), "systemd".to_string()),
        ("server1".to_string(), "worker".to_string(), "nginx".to_string()),
        ("server2".to_string(), "miner".to_string(), "systemd".to_string()),  // ⚠️ Anomaly
    ];
    
    let profiles = build_profiles_simple(processes, &config);
    let report = analyze_fleet(&profiles, &config).unwrap();
    report.print();
}

Option 2: ProcessBuilder API (More Control)

use ironsift::{ProcessBuilder, ProcessEntry, build_profiles, analyze_fleet, DetectionConfig};

fn main() {
    let config = DetectionConfig::default();
    let mut builder = ProcessBuilder::new();
    
    // Simple method
    builder.add_process("server1", "nginx", "systemd");
    
    // Or fluent API with full control
    builder.add(
        ProcessEntry::new("server1".to_string(), "worker".to_string())
            .parent("nginx")
            .uid(33)
            .path("/usr/sbin/nginx")
            .args("worker process")
    );
    
    // NEW: Automatic command line parsing!
    builder.add_command("server2", "/usr/bin/postgres -D /var/lib/postgresql/data", Some("systemd"));
    
    // NEW: Bare commands (no full path) work too!
    builder.add_command("server3", "ls /etc/", Some("bash"));
    
    // NEW: JSON log parsing!
    builder.add_json(r#"{"host": "server4", "cmd": "nginx", "uid": 33}"#);
    
    let profiles = build_profiles(builder.build(), &config);
    let report = analyze_fleet(&profiles, &config).unwrap();
    report.print();
}

Option 3: With Real PIDs (From System Logs)

use ironsift::{RawLogEntry, build_profiles, analyze_fleet, DetectionConfig};

fn main() {
    let config = DetectionConfig::default();
    
    let entries = vec![
        RawLogEntry {
            machine_id: "server1".to_string(),
            pid: 1, ppid: 0,
            name: "systemd".to_string(),
            uid: 0,
            path: "/usr/lib/systemd/systemd".to_string(),
            args: "--system".to_string(),
            timestamp: None,
        },
        // ... more entries
    ];
    
    let profiles = build_profiles(entries, &config);
    let report = analyze_fleet(&profiles, &config).unwrap();
    report.print();
}

See EXAMPLES.md for complete usage examples.


⏱️ Temporal comparison (same machine across time)

Compare multiple snapshots of the same machine over time to spot new processes, new or modified files, and new IP connections β€” without fleet-wide clustering.

Concept Description
MachineSnapshot One point-in-time view: processes + file accesses + connections for a single machine
TemporalDiff Diff between two snapshots: new_processes, new_files, modified_files (mtime), new_connections
RawConnectionEntry Connection log: machine_id, remote_ip, optional local_ip, remote_port, process_name, timestamp

Example: build a baseline snapshot (e.g. Monday 10:00), then a current snapshot (Monday 14:00); compare_temporal(&baseline, &current) yields new processes, files, and IPs.

use ironsift::{build_machine_snapshot, compare_temporal, compare_temporal_series,
               DetectionConfig, RawLogEntry, RawFileEntry, RawConnectionEntry};

let config = DetectionConfig::default();
let baseline = build_machine_snapshot("server1", "2024-01-01T10:00Z",
    process_entries_t1, file_entries_t1, connection_entries_t1, &config);
let current  = build_machine_snapshot("server1", "2024-01-01T14:00Z",
    process_entries_t2, file_entries_t2, connection_entries_t2, &config);

let diff = compare_temporal(&baseline, &current);
// diff.new_processes, diff.new_files, diff.modified_files, diff.new_connections

// Or compare a series of snapshots (T1 vs T2, T2 vs T3, ...)
let diffs = compare_temporal_series(&[snap1, snap2, snap3]);

Run the demo: cargo run --example temporal


πŸ“œ Version History

v0.3.0 (Current) - Enhanced Analysis & Input Flexibility

  • ✨ Enhanced Detailed Console Output - Rich reporting with attack categorization
  • ✨ Automatic Command Line Parsing - Handles bare commands (ls /etc/) and full paths
  • ✨ Native JSON Log Parsing - Docker, Kubernetes, CloudWatch, Elasticsearch support
  • πŸ“š Comprehensive documentation (15+ guides)
  • πŸ§ͺ 50+ tests covering all features

v0.2.0 - Flexible APIs & Automation

  • 🎯 Three flexible APIs (Simple, Builder, Direct)
  • πŸ”„ Automatic PID/PPID resolution
  • πŸ“ Reorganized project structure (CLI separated)
  • πŸ“– Extensive documentation

v0.1.0 - Initial Release

  • πŸ” Core DBSCAN clustering
  • πŸ“Š TF-IDF feature engineering
  • 🚨 Anomaly detection
  • πŸ“ˆ Basic reporting

πŸ“₯ Multiple Input Methods

IronSift accepts data in various formats - choose what works for your logs:

Full Command Lines (with paths)

builder.add_command("server1", "/usr/bin/nginx -c /etc/nginx.conf", Some("systemd"));
// β†’ Automatically extracts: name="nginx", path="/usr/bin/nginx", args="-c /etc/nginx.conf"

Bare Commands (no paths)

// Common in ps output, shell commands
builder.add_command("server1", "ls /etc/", Some("bash"));
builder.add_command("server1", "grep error app.log", Some("bash"));
// β†’ Works perfectly! name="ls", path="ls", args="/etc/"

JSON Logs (Docker, Kubernetes, CloudWatch)

// Single JSON entry
builder.add_json(r#"{"host": "server1", "cmd": "/usr/bin/nginx", "uid": 33}"#);

// Batch (JSON array or NDJSON)
builder.add_json_batch(r#"[
    {"container": "web-1", "command": "nginx", "userid": 33},
    {"node": "worker-1", "cmd": "python3 app.py", "uid": 1000}
]"#);

Supported JSON key names:

  • Machine: machine_id, hostname, host, server, node, container, pod
  • Command: command, cmd, cmdline, commandline
  • User: uid, user_id, userid

See JSON_PARSING.md and COMMAND_PARSING.md for complete documentation.


🎯 Features

Core Detection Capabilities

Feature Description
Multivariate Analysis Analyzes 6 dimensions: Process Name, Parent (auto-resolved), UID, Path, Entropy, Path Risk
PID Awareness Automatically resolves parent processes from PID/PPID relationships
Unsupervised Learning Zero-config detection β€” no signature database required
Scale Invariant Works on 10 logs or 10 million logs
Minority Cluster Detection Identifies coordinated attacks (botnets, APTs)
High Entropy Detection Flags obfuscated commands and encoded payloads
Suspicious Path Analysis Detects execution from /tmp, /dev/shm, hidden directories

Detection Scenarios

IronSift can identify:

  • Cryptominers: Unusual processes with high CPU, suspicious paths
  • Web Shells: PHP/Python processes with high-entropy eval() payloads
  • Privilege Escalation: Normal processes suddenly running as root (UID 0)
  • Lateral Movement: Unusual SSH/SCP activity with anomalous targets
  • Rootkits: Processes masquerading as system services
  • APT Campaigns: Small clusters of compromised machines with identical malware

πŸ“¦ Installation

Prerequisites

  • Rust 1.70+ (rustup recommended)
  • 4GB+ RAM for large datasets

Build from Source

cd ironsift
cargo build --release

πŸ”§ Quick Start

1. Generate Test Data

Create a realistic dataset with 100 machines and embedded attack scenarios:

cargo run --release --bin generator

Output: large_dataset.csv (100,000 logs with 10 compromised machines)

The generated data includes:

  • Realistic PID/PPID relationships
  • systemd as PID 1 on each machine
  • Normal processes as children of systemd
  • Attack processes with proper parent relationships

2. Run Analysis

Analyze the fleet and display results:

cargo run --release --bin ironsift

Sample Output:

================================================================================
                         IRONSIFT ANALYSIS REPORT                              
================================================================================
Fleet Size: 100 machines
Detection Sensitivity: High

--- Configuration ---
  DBSCAN Tolerance: 0.05
  Entropy Threshold: 4.5
  Minority Cluster Ratio: 10%

--- Cluster Distribution ---
  Cluster 0: 90 machines (90.0%)
  Noise (Outliers): 10 machines (10.0%)

================================================================================
Status: 🚨 ANOMALIES DETECTED
================================================================================
Suspicious Machines: 10

πŸ’€ CRITICAL (3):
   These machines are isolated outliers - likely compromised

  πŸ’€ machine_013 (Distance: 1.500)
     β”œβ”€ Cluster: Noise (isolated outlier)
     β”œβ”€ Total processes: 150
     β”œβ”€ Suspicious processes: 50 ⚠️
     β”œβ”€ Rare processes (< 5% of fleet):
     β”‚  β€’ kworker (path: /tmp/.X11-unix/kworker)
     β”‚  β€’ systemd (path: /var/tmp/.cache/systemd)
     β”œβ”€ Suspicious processes detected:
     β”‚
     β”‚  πŸ“› kworker (count: 30)
     β”‚     Parent: systemd
     β”‚     Path: /tmp/.X11-unix/kworker
     β”‚     UID: 0 (root) ⚠️
     β”‚     Risk factors:
     β”‚       🚨 High entropy arguments (possible obfuscation)
     β”‚       🚨 Suspicious execution path: /tmp/.X11-unix/kworker
     β”‚       🚨 Running as root (UID 0)
     β”‚       🚨 Executing from temporary directory
     └─ Activity period: 2024-01-01 10:00:00 to 2024-01-07 15:30:00

πŸ”΄ HIGH (4):
   Strong deviation from baseline - investigate immediately

  πŸ”΄ machine_042 (Distance: 0.823)
     β”œβ”€ Suspicious processes: 15 ⚠️
     └─ Unusual: php-fpm (high entropy eval payloads)
  ...

--- Detected Attack Patterns ---
  ⛏️  Cryptomining (3 machines): machine_013, machine_027, machine_065
  πŸ•ΈοΈ  Web Shells (2 machines): machine_042, machine_088
  ⬆️  Privilege Escalation (4 machines): machine_019, machine_051, ...
  πŸ“‚ Suspicious Execution Paths (5 machines): machine_013, machine_027, ...

================================================================================
Recommended Actions:
  1. Review flagged machines and investigate anomalous processes
  2. Check process execution paths and command arguments
  3. Verify parent-child process relationships
  4. Cross-reference with network logs and file access logs
  5. Export detailed report: cargo run --bin ironsift -- --export-json
================================================================================

See OUTPUT_EXAMPLES.md for complete output examples.

3. Export Forensic Report

Generate a detailed JSON report for incident response:

cargo run --release --bin ironsift -- --export-json

Output: forensic_report.json

4. Output control (scripts and pipelines)

For use by other tools or in scripts:

Option Effect
-q, --quiet Minimal output: one-line summary only (e.g. CLEAN or ANOMALIES: 5 (Critical: 2, High: 1, …)). Progress and config are suppressed.
--export-json - Write the JSON report to stdout (nothing else on stdout). Use 2>/dev/null to hide progress on stderr.
Progress messages Loading/config/progress lines are sent to stderr so stdout can be piped or parsed.

Examples:

# One-line result for scripting
ironsift -q --input data.csv

# JSON only on stdout (e.g. pipe to jq or another tool)
ironsift --export-json - --input data.csv 2>/dev/null | jq '.anomalies_detected'

# Quiet + export to file
ironsift -q --export-json report.json --input data.csv

βš™οΈ Configuration

Command Line Options

ironsift [OPTIONS]

Options:
  --config <file>       Load configuration from JSON file
  --export-json         Export detailed forensic report
  --tolerance <value>   Override DBSCAN tolerance (default: 0.05)
  --help                Show help message

Custom Configuration

On first run, IronSift creates ironsift_config.json:

{
  "entropy_threshold": 4.5,
  "minority_cluster_ratio": 0.10,
  "dbscan_tolerance": 0.05,
  "dbscan_min_samples": 2,
  "normalize_features": true,
  "suspicious_path_patterns": [
    "/tmp/",
    "/dev/shm/",
    "/var/tmp/",
    "/home/[^/]+/\\.[^/]+"
  ]
}

Tuning Guide

Parameter Effect Recommended Range
dbscan_tolerance Detection sensitivity 0.03 (strict) - 0.10 (loose)
minority_cluster_ratio Botnet detection threshold 0.05 - 0.15
entropy_threshold Obfuscation detection 3.5 (sensitive) - 5.5 (strict)

Example: Increase sensitivity for high-security environments:

cargo run --bin ironsift -- --tolerance 0.03

πŸ“Š Understanding Results

Anomaly Severity Levels

Level Score Meaning Action
πŸ’€ Critical > 1.0 Isolated outlier, likely compromised Immediate isolation
πŸ”΄ High 0.6-1.0 Strong deviation, investigate ASAP Priority investigation
🟠 Medium 0.3-0.6 Moderate anomaly, worth reviewing Schedule review
🟑 Low 0.0-0.3 Minor deviation, may be benign Monitor

Forensic Report Structure

The JSON export includes:

{
  "report_timestamp": "2024-12-10T15:30:00Z",
  "fleet_size": 100,
  "anomalies_detected": 10,
  "config": { ... },
  "investigation_targets": [
    {
      "machine_id": "machine_013",
      "severity": "Critical",
      "distance_score": 1.5,
      "suspicious_processes": [
        {
          "name": "kworker",
          "path": "/tmp/.X11-unix/kworker",
          "parent": "systemd",
          "risk_factors": [
            "High entropy arguments (possible obfuscation)",
            "Suspicious execution path: /tmp/.X11-unix/kworker",
            "Running as root (UID 0)"
          ]
        }
      ]
    }
  ]
}

πŸ§ͺ Testing

Run the comprehensive test suite:

cargo test

Generator + CLI regression test

To check that the generator output is correctly analyzed by the CLI (catches regressions in ingestion or reporting):

./scripts/test_generator_ironsift.sh

This script builds release, generates process and file datasets, runs ironsift (and ironsift --files) on them, and verifies that anomalies are reported. Run from the repo root.

Test Coverage

  • Shannon entropy calculation
  • Suspicious path detection
  • Clean fleet (no false positives)
  • Single outlier detection
  • Minority cluster detection (botnet scenario)
  • Process risk factor analysis
  • PID/PPID parent resolution
  • Unknown parent handling

πŸ—οΈ Architecture

Data Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                           IRONSIFT PIPELINE                                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

  Raw Input                    Profile Building              Analysis
  ─────────                    ────────────────              ───────

  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ CSV / JSON   β”‚             β”‚ Group by        β”‚           β”‚ TF-IDF          β”‚
  β”‚ Process Logs │────────────►│ machine_id      │──────────►│ Vectorization   β”‚
  β”‚ or File      β”‚   parse     β”‚                 β”‚  build    β”‚ (rare = signal) β”‚
  β”‚ Access Logs  β”‚             β”‚ Resolve PPID β†’  β”‚  profiles β”‚                 β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚ parent names    β”‚           β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                     β”‚                 β”‚                    β”‚
         β”‚                     β”‚ Whitelist /     β”‚                    β–Ό
         β”‚                     β”‚ filter paths    β”‚           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         └────────────────────►│                 β”‚           β”‚ L2 Normalize    β”‚
                               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚ DBSCAN Cluster  β”‚
                                                             β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                                      β”‚
                                                                      β–Ό
  Output                     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  ──────                     β”‚ Anomaly Scoring │◄────────────│ Noise = outlier β”‚
                             β”‚ & Severity      β”‚  cluster    β”‚ Small cluster   β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚ (Criticalβ†’Low)  β”‚   ids       β”‚ = minority      β”‚
  β”‚ Console      │◄───────────                 β”‚             β”‚ Large cluster   β”‚
  β”‚ Report       β”‚  print    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚ = baseline      β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β–²                            β”‚
         β”‚                            β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ forensic_    │◄────────────│ Risk factors    β”‚
  β”‚ report.json  β”‚  export     β”‚ (entropy, path, β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚  root, mtime)   β”‚
                               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Process vs File Analysis

  PROCESS MODE (default)              FILE MODE (--files)
  ─────────────────────              ───────────────────

  RawLogEntry                         RawFileEntry
  β€’ machine_id, pid, ppid             β€’ machine_id, path, uid
  β€’ name, path, args, uid             β€’ timestamp, mtime
  β€’ timestamp                         
         β”‚                                    β”‚
         β–Ό                                    β–Ό
  ProcessSignature                    FileSignature
  β€’ name + parent + uid + path        β€’ path + uid
  β€’ is_suspicious_path, entropy       β€’ is_suspicious_path
         β”‚                            β€’ has_mtime_anomaly
         β–Ό                                    β”‚
  MachineProfile                      MachineFileProfile
  (counts per process)                (counts per file + mtimes)
         β”‚                                    β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β–Ό
              analyze_fleet / analyze_files_fleet
                        β”‚
                        β–Ό
              AnalysisReport (anomalies, severity)

Key Algorithms

  1. PID Resolution: Automatically maps PPID to parent process names
  2. TF-IDF Weighting: Boosts rare processes, reduces noise from common ones
  3. L2 Normalization: Ensures distance metrics work correctly across varied fleet sizes
  4. DBSCAN: Density-based clustering that naturally identifies outliers
  5. Shannon Entropy: Measures randomness in command arguments (detects obfuscation)

πŸŽ“ How It Works

The "Iron Consensus" Principle

IronSift treats each machine as a vector in N-dimensional feature space:

  • Normal machines cluster tightly (distance β‰ˆ 0)
  • Compromised machines drift away due to:
    • Rare processes not seen elsewhere
    • Unusual execution paths
    • High-entropy obfuscated commands
    • Privilege escalation patterns
    • Abnormal parent-child relationships

Clustering (Conceptual)

    Feature space (simplified 2D view)
    ─────────────────────────────────

         β€’ β€’ β€’  β€’ β€’
       β€’   β€’ β€’ β€’   β€’          ← Normal machines (tight cluster)
        β€’ β€’   β€’ β€’ β€’
          β€’ β€’ β€’ β€’
              β˜…                 ← Isolated outlier (NOISE)
                                β†’ πŸ’€ CRITICAL: likely compromised

                    β—„ ─ ─ ─ ─ β–Ί
                 small cluster
                 (minority)        ← πŸ”΄ HIGH: botnet / APT pattern
                    β–³ β–³
                     β–³

    DBSCAN: density-based clustering
    β€’ Points in dense regions β†’ same cluster (baseline).
    β€’ Points in sparse regions β†’ "noise" = anomaly.
    β€’ Small clusters β†’ minority = coordinated deviance.

Example Detection

Fleet: 100 web servers running nginx, postgres, node

Anomaly: Machine #42 suddenly has:

php-fpm (PID 5432, PPID 108 [apache2]) β†’ eval(base64_decode('aGVsbG8gd29ybGQ='))

IronSift Analysis:

  Raw log                    Resolution              TF-IDF              DBSCAN
  ───────                    ──────────              ──────              ──────

  machine_42                 PPID 108    rare        Machine #42         Main cluster
  pid 5432, ppid 108   ───►  β†’ apache2   process  ──► vector differs  ──► β€’ β€’ β€’ β€’ β€’
  name php-fpm               parent      (1/100)     from baseline         β€’
  args eval(base64…)         resolved    β–Ό            β–Ό                    β˜…  ← #42
                                β”‚        IDF boost   distance β‰ˆ 1.2        (outlier)
                                β”‚        100Γ—        β–Ό
                                β”‚                    πŸ”΄ HIGH severity
                                └─────────────────── anomaly
  1. Resolves parent: PPID 108 β†’ apache2
  2. Computes TF-IDF: This exact process appears on 1/100 machines
  3. IDF boost: 100Γ— signal amplification for this rare event
  4. DBSCAN: Machine #42 is 1.2 units away from main cluster
  5. Result: πŸ”΄ HIGH severity anomaly detected

πŸ“ˆ Performance

Benchmarks on a 4-core CPU:

Fleet Size Logs Processing Time Memory
100 machines 100K 0.8s 45 MB
1,000 machines 1M 6.2s 320 MB
10,000 machines 10M 58s 2.8 GB

With parallel processing enabled (Rayon)


πŸ› οΈ Use Cases

Production Monitoring

# Daily cron job
0 2 * * * cd /opt/ironsift && \
  ./ingest_logs.sh && \
  cargo run --release --bin ironsift -- --export-json && \
  ./alert_soc.sh forensic_report.json

Incident Response

# Quick triage after breach detection
cargo run --release --bin ironsift -- --tolerance 0.03 --export-json

Research & Red Team

# Test detection against custom malware
./inject_attack.sh && cargo run --bin ironsift

Stay secure. Sift the iron from the ore. πŸ”’

About

'Where's Waldo?'

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors