Skip to content

gurvinny/Automated-Phish-Extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

31 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

🎣 Phish Extractor

Automated IOC Extraction & Threat Intelligence Reporter

Version Python Version License: MIT Code Style: Black PRs Welcome SOC Portfolio Roadmap

Reducing SOC alert fatigue by automating the ingestion, parsing, and enrichment of malicious .eml files.


πŸ“– Description

A major challenge for Tier 1 SOC analysts is the sheer volume of repetitive tasks, which frequently leads to alert fatigue. Phish Extractor is a Python-based automation tool that addresses this by:

  1. Ingesting raw .eml phishing reports.
  2. Extracting critical Indicators of Compromise (IOCs).
  3. Enriching them with threat intelligence from VirusTotal and AbuseIPDB.
  4. Generating human-readable Markdown or structured JSON reports.

By fully automating the manual labor of parsing headers, calculating file hashes, defanging links, and querying APIs, analysts can focus their time on triage, containment, and higher-value incident response tasks.


✨ Features

  • πŸ” Header Parsing: Automatically extracts sender, recipient, subject, dates, and most importantly, SPF, DKIM, and DMARC authentication results.
  • 🧬 Robust IOC Extraction: Efficiently pulls URLs, domains, IPv4/IPv6 addresses, and calculates SHA-256 hashes for all file attachments using regex patterns and MIME tree traversal.
  • πŸ›‘οΈ Defanging: Defangs URLs, IPs, and domains automatically (e.g., hxxps://evil[.]com) to ensure indicators can be safely shared across teams and SOAR platforms without accidental execution or triggering enterprise perimeter alerts.
  • 🧠 Automated Threat Intel Enrichment: Interacts with the VirusTotal v3 and AbuseIPDB APIs to check extracted URLs, domains, IPs, and attachment hashes for malicious reputation.
  • πŸ“Š Automated Risk Scoring: Derives an overall risk severity level (LOW, MEDIUM, HIGH, CRITICAL) based on DMARC failures and malicious threat intel hits, enabling analysts to prioritize their queues.

πŸ› οΈ Prerequisites & Installation

  • Python: 3.10+
  • OS: Cross-platform (Windows, macOS, Linux)

For Windows/VS Code users, follow these commands to set up the environment:

1. Clone the repository:

git clone https://github.com/gurvinny/phish_extractor.git
cd phish_extractor

2. Create and activate a virtual environment:

python -m venv venv
.\venv\Scripts\Activate.ps1

3. Install the dependencies:

pip install -r requirements.txt

4. Configure your API keys: Copy the example environment file and fill in your keys:

copy .env.example .env

⚠️ Never commit your .env file. It is listed in .gitignore to prevent accidental exposure. If you accidentally push secrets, rotate your API keys immediately via the VirusTotal and AbuseIPDB dashboards β€” treat any exposed key as compromised.


πŸ”’ Secrets Management

  • .env is excluded from version control via .gitignore β€” never remove this rule.
  • In CI/CD pipelines, use GitHub Actions Secrets (or your vault of choice) instead of .env files.
  • To proactively prevent secret leaks, consider adding a pre-commit hook using detect-secrets or gitleaks:
pip install detect-secrets
detect-secrets scan > .secrets.baseline
  • If you ever accidentally commit a real API key, rotate it immediately β€” assume it is compromised.

πŸš€ Usage

Run the tool against any raw .eml file to parse and generate a threat report.

Standard Run

To perform a full analysis with external threat intelligence queries:

python phish_extractor.py samples/mock_phish.eml -o report.md

This extracts all IOCs, performs lookups against VirusTotal and AbuseIPDB, and outputs a formatted Markdown report.

Offline Mode (--skip-intel)

If you want to extract IOCs and defang them without sending anything to external APIs (useful for highly confidential investigations or OPSEC reasons):

python phish_extractor.py samples/mock_phish.eml --skip-intel

To see all available CLI options:

python phish_extractor.py --help

🎯 Detection Engineering

To effectively bridge the gap between reactive analysis and proactive defense, the detections/ folder is included in this repository. It contains actionable detection rules formulated off of the artifacts parsed by phish_extractor.py:

  • 🟑 yara_rule.yar: A YARA rule that hunts for the specific SHA256 hash and base64 encoded malicious payload of the fake invoice document attachment in our mock_phish.eml sample.
  • 🟠 sigma_rule.yml: A Sigma rule designed to detect email gateway logs where DMARC fails and the subject contains the classic phishing lure "URGENT: Your account has been temporarily restricted". This can be integrated into SIEM platforms for real-time alerting.


πŸ—ΊοΈ Roadmap

This project follows a versioned roadmap. See ROADMAP.md for the full breakdown.

πŸ”§ Version 1.0 Beta β€” Stabilisation & Hardening (current)

Focused on fixing known bugs, closing security gaps, and building a test suite before adding new features.

Category Highlights
πŸ› Bug Fixes IPv6 regex, false positive IOC extraction, inconsistent defanging, risk scoring gaps
πŸ”’ Security API key leak prevention, file-size limits, attachment filename sanitisation
✨ Enhancements Tracking pixel filtering, parallelised API enrichment
πŸ“š Docs & Testing pytest suite, .env.example, secrets-management guidance

πŸš€ Version 2.0 β€” Campaign Intelligence Platform (planned)

The defining upgrade: v1 analyzes one email, v2 analyzes a campaign.

Category Highlights
πŸ—οΈ Architecture Batch mode, async enrichment, IOC caching, installable package
🧠 Intelligence Campaign clustering, WHOIS/domain age, URL unshortening, phishing lure scoring
πŸ”— Integrations URLhaus, Shodan, MISP push, Webhook/API mode
πŸ“„ Output HTML reports, STIX 2.1 export, GitHub Actions CI, Docker image

➑️ View the full roadmap β†’


Developed with ❀️ by Gurvin Singh

About

An automated triage tool for SOC analysts. Parses raw .eml files, extracts and defangs IOCs, analyzes SPF/DMARC headers, and generates standardized threat reports.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors