Automated IOC Extraction & Threat Intelligence Reporter
Reducing SOC alert fatigue by automating the ingestion, parsing, and enrichment of malicious .eml files.
A major challenge for Tier 1 SOC analysts is the sheer volume of repetitive tasks, which frequently leads to alert fatigue. Phish Extractor is a Python-based automation tool that addresses this by:
- Ingesting raw
.emlphishing reports. - Extracting critical Indicators of Compromise (IOCs).
- Enriching them with threat intelligence from VirusTotal and AbuseIPDB.
- Generating human-readable Markdown or structured JSON reports.
By fully automating the manual labor of parsing headers, calculating file hashes, defanging links, and querying APIs, analysts can focus their time on triage, containment, and higher-value incident response tasks.
- π Header Parsing: Automatically extracts sender, recipient, subject, dates, and most importantly,
SPF,DKIM, andDMARCauthentication results. - 𧬠Robust IOC Extraction: Efficiently pulls URLs, domains, IPv4/IPv6 addresses, and calculates
SHA-256hashes for all file attachments using regex patterns and MIME tree traversal. - π‘οΈ Defanging: Defangs URLs, IPs, and domains automatically (e.g.,
hxxps://evil[.]com) to ensure indicators can be safely shared across teams and SOAR platforms without accidental execution or triggering enterprise perimeter alerts. - π§ Automated Threat Intel Enrichment: Interacts with the VirusTotal v3 and AbuseIPDB APIs to check extracted URLs, domains, IPs, and attachment hashes for malicious reputation.
- π Automated Risk Scoring: Derives an overall risk severity level (
LOW,MEDIUM,HIGH,CRITICAL) based on DMARC failures and malicious threat intel hits, enabling analysts to prioritize their queues.
- Python:
3.10+ - OS: Cross-platform (Windows, macOS, Linux)
For Windows/VS Code users, follow these commands to set up the environment:
1. Clone the repository:
git clone https://github.com/gurvinny/phish_extractor.git
cd phish_extractor2. Create and activate a virtual environment:
python -m venv venv
.\venv\Scripts\Activate.ps13. Install the dependencies:
pip install -r requirements.txt4. Configure your API keys: Copy the example environment file and fill in your keys:
copy .env.example .env
β οΈ Never commit your.envfile. It is listed in.gitignoreto prevent accidental exposure. If you accidentally push secrets, rotate your API keys immediately via the VirusTotal and AbuseIPDB dashboards β treat any exposed key as compromised.
.envis excluded from version control via.gitignoreβ never remove this rule.- In CI/CD pipelines, use GitHub Actions Secrets (or your vault of choice) instead of
.envfiles. - To proactively prevent secret leaks, consider adding a
pre-commithook usingdetect-secretsorgitleaks:
pip install detect-secrets
detect-secrets scan > .secrets.baseline- If you ever accidentally commit a real API key, rotate it immediately β assume it is compromised.
Run the tool against any raw .eml file to parse and generate a threat report.
To perform a full analysis with external threat intelligence queries:
python phish_extractor.py samples/mock_phish.eml -o report.mdThis extracts all IOCs, performs lookups against VirusTotal and AbuseIPDB, and outputs a formatted Markdown report.
If you want to extract IOCs and defang them without sending anything to external APIs (useful for highly confidential investigations or OPSEC reasons):
python phish_extractor.py samples/mock_phish.eml --skip-intelTo see all available CLI options:
python phish_extractor.py --helpTo effectively bridge the gap between reactive analysis and proactive defense, the detections/ folder is included in this repository. It contains actionable detection rules formulated off of the artifacts parsed by phish_extractor.py:
- π‘
yara_rule.yar: A YARA rule that hunts for the specific SHA256 hash and base64 encoded malicious payload of the fake invoice document attachment in ourmock_phish.emlsample. - π
sigma_rule.yml: A Sigma rule designed to detect email gateway logs where DMARC fails and the subject contains the classic phishing lure"URGENT: Your account has been temporarily restricted". This can be integrated into SIEM platforms for real-time alerting.
This project follows a versioned roadmap. See ROADMAP.md for the full breakdown.
Focused on fixing known bugs, closing security gaps, and building a test suite before adding new features.
| Category | Highlights |
|---|---|
| π Bug Fixes | IPv6 regex, false positive IOC extraction, inconsistent defanging, risk scoring gaps |
| π Security | API key leak prevention, file-size limits, attachment filename sanitisation |
| β¨ Enhancements | Tracking pixel filtering, parallelised API enrichment |
| π Docs & Testing | pytest suite, .env.example, secrets-management guidance |
The defining upgrade: v1 analyzes one email, v2 analyzes a campaign.
| Category | Highlights |
|---|---|
| ποΈ Architecture | Batch mode, async enrichment, IOC caching, installable package |
| π§ Intelligence | Campaign clustering, WHOIS/domain age, URL unshortening, phishing lure scoring |
| π Integrations | URLhaus, Shodan, MISP push, Webhook/API mode |
| π Output | HTML reports, STIX 2.1 export, GitHub Actions CI, Docker image |
β‘οΈ View the full roadmap β