🎣 Phish Extractor

Automated IOC Extraction & Threat Intelligence Reporter

Reducing SOC alert fatigue by automating the ingestion, parsing, and enrichment of malicious .eml files.

📖 Description

A major challenge for Tier 1 SOC analysts is the sheer volume of repetitive tasks, which frequently leads to alert fatigue. Phish Extractor is a Python-based automation tool that addresses this by:

Ingesting raw .eml phishing reports.
Extracting critical Indicators of Compromise (IOCs).
Enriching them with threat intelligence from VirusTotal and AbuseIPDB.
Generating human-readable Markdown or structured JSON reports.

By fully automating the manual labor of parsing headers, calculating file hashes, defanging links, and querying APIs, analysts can focus their time on triage, containment, and higher-value incident response tasks.

✨ Features

🔍 Header Parsing: Automatically extracts sender, recipient, subject, dates, and most importantly, SPF, DKIM, and DMARC authentication results.
🧬 Robust IOC Extraction: Efficiently pulls URLs, domains, IPv4/IPv6 addresses, and calculates SHA-256 hashes for all file attachments using regex patterns and MIME tree traversal.
🛡️ Defanging: Defangs URLs, IPs, and domains automatically (e.g., hxxps://evil[.]com) to ensure indicators can be safely shared across teams and SOAR platforms without accidental execution or triggering enterprise perimeter alerts.
🧠 Automated Threat Intel Enrichment: Interacts with the VirusTotal v3 and AbuseIPDB APIs to check extracted URLs, domains, IPs, and attachment hashes for malicious reputation.
📊 Automated Risk Scoring: Derives an overall risk severity level (LOW, MEDIUM, HIGH, CRITICAL) based on DMARC failures and malicious threat intel hits, enabling analysts to prioritize their queues.

🛠️ Prerequisites & Installation

Python: 3.10+
OS: Cross-platform (Windows, macOS, Linux)

For Windows/VS Code users, follow these commands to set up the environment:

1. Clone the repository:

git clone https://github.com/gurvinny/phish_extractor.git
cd phish_extractor

2. Create and activate a virtual environment:

python -m venv venv
.\venv\Scripts\Activate.ps1

3. Install the dependencies:

pip install -r requirements.txt

4. Configure your API keys: Copy the example environment file and fill in your keys:

copy .env.example .env

⚠️ Never commit your .env file. It is listed in .gitignore to prevent accidental exposure. If you accidentally push secrets, rotate your API keys immediately via the VirusTotal and AbuseIPDB dashboards — treat any exposed key as compromised.

🔒 Secrets Management

.env is excluded from version control via .gitignore — never remove this rule.
In CI/CD pipelines, use GitHub Actions Secrets (or your vault of choice) instead of .env files.
To proactively prevent secret leaks, consider adding a pre-commit hook using detect-secrets or gitleaks:

pip install detect-secrets
detect-secrets scan > .secrets.baseline

If you ever accidentally commit a real API key, rotate it immediately — assume it is compromised.

🚀 Usage

Run the tool against any raw .eml file to parse and generate a threat report.

Standard Run

To perform a full analysis with external threat intelligence queries:

python phish_extractor.py samples/mock_phish.eml -o report.md

This extracts all IOCs, performs lookups against VirusTotal and AbuseIPDB, and outputs a formatted Markdown report.

Offline Mode (`--skip-intel`)

If you want to extract IOCs and defang them without sending anything to external APIs (useful for highly confidential investigations or OPSEC reasons):

python phish_extractor.py samples/mock_phish.eml --skip-intel

To see all available CLI options:

python phish_extractor.py --help

🎯 Detection Engineering

To effectively bridge the gap between reactive analysis and proactive defense, the detections/ folder is included in this repository. It contains actionable detection rules formulated off of the artifacts parsed by phish_extractor.py:

🟡 yara_rule.yar: A YARA rule that hunts for the specific SHA256 hash and base64 encoded malicious payload of the fake invoice document attachment in our mock_phish.eml sample.
🟠 sigma_rule.yml: A Sigma rule designed to detect email gateway logs where DMARC fails and the subject contains the classic phishing lure "URGENT: Your account has been temporarily restricted". This can be integrated into SIEM platforms for real-time alerting.

🗺️ Roadmap

This project follows a versioned roadmap. See ROADMAP.md for the full breakdown.

🔧 Version 1.0 Beta — Stabilisation & Hardening (current)

Focused on fixing known bugs, closing security gaps, and building a test suite before adding new features.

Category	Highlights
🐛 Bug Fixes	IPv6 regex, false positive IOC extraction, inconsistent defanging, risk scoring gaps
🔒 Security	API key leak prevention, file-size limits, attachment filename sanitisation
✨ Enhancements	Tracking pixel filtering, parallelised API enrichment
📚 Docs & Testing	pytest suite, `.env.example`, secrets-management guidance

🚀 Version 2.0 — Campaign Intelligence Platform (planned)

The defining upgrade: v1 analyzes one email, v2 analyzes a campaign.

Category	Highlights
🏗️ Architecture	Batch mode, async enrichment, IOC caching, installable package
🧠 Intelligence	Campaign clustering, WHOIS/domain age, URL unshortening, phishing lure scoring
🔗 Integrations	URLhaus, Shodan, MISP push, Webhook/API mode
📄 Output	HTML reports, STIX 2.1 export, GitHub Actions CI, Docker image

➡️ View the full roadmap →

Developed with ❤️ by Gurvin Singh

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
detections		detections
samples		samples
.env.example		.env.example
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
phish_extractor.py		phish_extractor.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎣 Phish Extractor

📖 Description

✨ Features

🛠️ Prerequisites & Installation

🔒 Secrets Management

🚀 Usage

Standard Run

Offline Mode (`--skip-intel`)

🎯 Detection Engineering

🗺️ Roadmap

🔧 Version 1.0 Beta — Stabilisation & Hardening (current)

🚀 Version 2.0 — Campaign Intelligence Platform (planned)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎣 Phish Extractor

📖 Description

✨ Features

🛠️ Prerequisites & Installation

🔒 Secrets Management

🚀 Usage

Standard Run

Offline Mode (--skip-intel)

🎯 Detection Engineering

🗺️ Roadmap

🔧 Version 1.0 Beta — Stabilisation & Hardening (current)

🚀 Version 2.0 — Campaign Intelligence Platform (planned)

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Offline Mode (`--skip-intel`)

Packages