Skip to content

sandeep0428/pdf-malware-analysis-toolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📄 PDF Malware Analysis Toolkit

A Python-based toolkit for analyzing potentially malicious PDF files using static analysis, IOC extraction, YARA scanning, and threat intelligence integrations.


🚀 Features

  • 📌 Metadata extraction (author, creator, timestamps)
  • 🔍 Suspicious keyword detection (/JavaScript, /OpenAction, etc.)
  • 📦 Embedded object extraction
  • ⚠️ JavaScript analysis inside PDFs
  • 🌐 IOC extraction (IPs, domains, URLs)
  • 🧬 YARA rule scanning
  • 🛡️ CVE pattern detection
  • 🧠 Risk scoring engine
  • 🔎 VirusTotal lookup (hash-based)
  • ☁️ Hybrid Analysis sandbox integration
  • 📊 Automated report generation (DOCX)

🏗️ Project Structure

PDF-Malware-Analysis-Toolkit/
│
├── analyzer/
├── yara_rules/
├── samples/
├── screenshots/
├── logs/
├── reports/
│
├── main.py
├── requirements.txt
├── README.md
├── LICENSE
└── .gitignore

⚙️ Installation

git clone https://github.com/sandeep0428/pdf-malware-analysis-toolkit.git
cd pdf-malware-analysis-toolkit
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt

🔑 API Configuration

Create a .env file:

VT_API_KEY=your_virustotal_api_key
HA_API_KEY=your_hybrid_analysis_api_key

▶️ Usage

python main.py samples/sample_cve.pdf
python main.py samples/

📊 Sample Output

[+] Analyzing: samples/sample_cve.pdf

--- CVE Detection ---
['CVE-2010-0188 Exploit']

--- VirusTotal ---
Malicious: 1

[!] WARNING: File is malicious!

--- Risk Score ---
{'score': 75, 'level': 'HIGH'}

[+] Analysis Complete

⚙️ How It Works

  • Extract metadata from PDF
  • Scan for suspicious keywords
  • Extract embedded objects
  • Analyze JavaScript content
  • Extract IOCs (URLs, IPs, domains)
  • Apply YARA rules
  • Detect CVE patterns
  • Query VirusTotal / Hybrid Analysis (if enabled)
  • Calculate risk score
  • Generate final report

📸 Screenshots

🔍 Analysis Execution

Execution Execution

📊 Generated Report

Report Report Report

📊 Generated Logs

Logs

⚠️ Limitations

  • Static analysis only
  • May not detect obfuscated payloads
  • API results depend on availability

Acknowledgement

This project was developed as part of a cybersecurity problem statement and reflects my own implementation, design decisions, and enhancements for practical SOC use cases.

AI-assisted tools were used to support development, optimization, and code refinement.

📄 License

MIT License


👨‍💻 Author

Sandeep Kumar

About

A Python-based static analysis toolkit for detecting malicious indicators in PDF files using metadata analysis, IOC extraction, YARA scanning, CVE detection, and threat intelligence integration.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors