PCAP Hunter is an AI-enhanced threat hunting workbench that bridges manual packet analysis and automated security monitoring. It empowers SOC analysts and threat hunters to rapidly ingest, analyze, and extract actionable intelligence from raw PCAP files.
By combining industry-standard network analysis tools (Zeek, Tshark, PyShark) with Large Language Models (LLMs) and OSINT APIs, PCAP Hunter automates the tedious parts of packet analysis — parsing, correlation, and enrichment — so analysts can focus on detection and response.
📖 User Manual (English) | 中文說明 (Traditional Chinese)
- Visual Tour
- Key Features
- Architecture
- Installation
- Quick Start
- Usage Guide
- Configuration
- Docker
- Development
- Documentation
- License
Drag-and-drop a .pcap / .pcapng file (up to 200 MB each) or paste a container path.
Multiple files trigger batch mode with cross-file correlation.
Every stage of the analysis pipeline reports live progress with a skippable per-stage control. You always know what's running and how far it has to go.
The Dashboard surfaces the highest-signal findings first: overall risk level, alert count, beacon candidates, YARA hits, and certificate issues. A global traffic map, protocol distribution, and activity timeline put the capture in visual context.
An 8-section narrative (Executive Summary → Key Findings → Indicators & Evidence → OSINT Corroboration → Beaconing / C2 → DNS & TLS → Risk Assessment → Recommended Actions) with confidence qualifiers and MITRE ATT&CK mapping, generated locally via LM Studio or any OpenAI-compatible endpoint.
Prioritized IOC table with VirusTotal, AbuseIPDB, GreyNoise, Shodan, OTX, and VT Domain signals merged into one view. Sub-tabs expose Domains, Detail Cards, Geo Map, Infrastructure ASN clustering, Export, Devices, and Notes.
Every underlying data source is available: flow table, DNS and TLS analyses,
NXDOMAIN analysis, JA3/JA3S fingerprints, Zeek conn.log/dns.log/http.log/
ssl.log, carved HTTP payloads, and YARA scan results. Export any view as CSV
or JSON with CSV-injection protection.
Promote any capture and its findings into a case. Cases carry IOCs, severity, tags, investigation notes, status, and search — stored in a local SQLite database.
LLM endpoint, API keys (PBKDF2-encrypted at rest), home location for the world map, OSINT provider toggles, binary paths, and pipeline thresholds — all in one place with per-section clear buttons.
- Automated Reporting — Generates professional, SOC-ready threat reports with severity-calibrated assessments, false-positive awareness, and structured analysis workflow (Characterize → Identify → Assess → Recommend).
- Local & Cloud LLM Support
- Local Privacy: Fully compatible with LM Studio (Llama 3, Mistral, etc.) for air-gapped or privacy-sensitive environments.
- Cloud Power: Supports any OpenAI-compatible API endpoint for leveraging larger models.
- Multi-Language Reports — 9 languages with region-specific terminology: English, Traditional Chinese (Taiwan), Simplified Chinese, Japanese, Korean, Italian, Spanish, French, German.
- MITRE ATT&CK Mapping — Automated mapping of detected behaviors and IOCs to ATT&CK techniques and Kill Chain phases.
- Attack Narrative Synthesis — Translates raw events into a coherent, actionable security story.
- Tiered Signal Architecture — Dynamically ranks indicators as Critical, High, Medium, or Low using a three-tier model:
- Tier 1 (Definitive): OSINT confirmations (VirusTotal, GreyNoise malicious) — any single Tier 1 hit sets a score floor.
- Tier 2 (Behavioral): C2 beaconing, flow asymmetry, DNS tunneling, DGA domains.
- Tier 3 (Contextual): AbuseIPDB, self-signed certs, expired certs, YARA matches.
- Tier 3 signals alone never exceed "medium"; corroboration from multiple tiers is required for "high" or "critical".
- Independence-complement formula — Uses
1 − Π(1 − wᵢsᵢ)(Bayesian independence model) instead of linear summation, producing diminishing returns while allowing multiple weak signals to compound meaningfully. - Strong-signal floors — A confirmed VirusTotal detection automatically sets a minimum score regardless of other factors.
- Aggregates signals across all analysis modules (OSINT, beaconing, DNS, TLS, YARA, flow analysis).
- Produces composite threat scores per indicator with verdict classification (critical / high / medium / low).
- Data Exfiltration Detection — Identifies suspicious outbound:inbound byte ratios per src/dst pair (default threshold: 10:1, minimum 1 MB).
- Port Anomaly Detection — Flags non-standard port usage, C2 common ports (4444, 5555, 6666, etc.), and high port pairs.
- Multi-File Upload — Upload and analyze multiple PCAP files simultaneously.
- Cross-File Correlation — Detects shared IPs, domains, and JA3 fingerprints across files.
- Merged Dashboard — Aggregated results with per-file detail cards and batch summary.
- Resource Limits — Configurable limits: 1 GB per file, 50 files max, 5 GB total.
- PyShark + Zeek in parallel — The two heaviest stages run concurrently via ThreadPoolExecutor.
- HTTP Carving in parallel with DNS/TLS/Beaconing analysis.
- Tshark
-coptimization — Packet limit enforced at the tshark level for zero-waste I/O.
- Multi-Engine Pipeline: PyShark for granular inspection, Tshark for high-speed statistics.
- Protocol Parsing: Automatically extracts metadata for HTTP, DNS, TLS/SSL, and SMB protocols.
- Automated Zeek execution on uploaded PCAPs — no manual CLI required.
- Parses and correlates core Zeek logs:
conn.log,dns.log,http.log,ssl.log.
- DGA Detection — Shannon entropy-based Domain Generation Algorithm identification.
- DNS Tunneling — Detects high-volume / anomalous DNS payloads.
- Fast Flux Detection — Identifies domains resolving to rapidly changing IP addresses.
- JA3/JA3S Fingerprinting — Matches TLS fingerprints against 90+ known malware signatures (Cobalt Strike, Trickbot, Emotet, QakBot, etc.).
- Certificate Analysis — Validates certificate chains; detects self-signed and expired certificates.
- Statistical algorithm scoring flows based on:
- Periodicity — Regularity of communication intervals (CV + entropy scoring).
- Jitter — Modal interval analysis with ±20% tolerance for detecting randomized C2.
- Volume — Packet count and payload size consistency.
- False-Positive Reduction — Multi-layered penalties to prevent benign traffic from triggering alerts:
- Infrastructure allowlist (major public DNS resolvers)
- Protocol awareness (ICMP, NTP, mDNS, SSDP, IGMP are inherently periodic)
- Service port penalties (HTTPS, IMAPS, Apple Push, MQTT, SIP)
- High-volume large-payload filtering (streaming/downloads vs. C2)
- HTTP Payload Extraction via
tsharkwith automatic SHA256 hashing. - YARA Scanner — Scan carved files with custom/community YARA rules.
- Safe Storage — Quarantined directory with path traversal and symlink protection.
- Threat Summary Panel — At-a-glance risk level (Critical/High/Medium/Low) with corroboration-based escalation, alert count, beacon candidates, YARA hits, and certificate issues.
- World Map — Threat-level coloring, connectivity arcs with volume-based thickness, configurable home location.
- Cross-Filtering — Unified drill-down across Map, Protocol Pie Chart, and Flow Timeline.
- Persistent View Options — "Exclude Private IPs" toggle persists during interactive exploration.
- TopN Charts — Top IPs, Ports, Protocols, Domains with aggregated bar charts, metrics, and reverse DNS hostnames.
- Dashboard Detections — Beaconing candidates, YARA matches, and TLS certificate risks surfaced directly on the dashboard.
- Network Communication Graph — Force-directed graph with threat-colored nodes and equal-aspect-ratio rendering.
Integrates with leading threat intelligence providers:
- VirusTotal — File hash and IP/Domain reputation.
- AbuseIPDB — Crowdsourced IP abuse reports.
- GreyNoise — Internet background noise and scanner identification.
- OTX (AlienVault) — Open Threat Exchange pulses and indicators.
- Shodan — Internet-facing device details and open ports.
- Smart Caching — SQLite-backed caching with configurable TTL to preserve API quotas.
- Bulk Reverse DNS — Parallel rDNS resolution for all public IPs with 7-day SQLite cache. Hostnames displayed throughout the dashboard.
- Create, track, and close investigation cases.
- Store IOCs (IP, Domain, Hash, JA3, URL) with severity and context.
- Investigation notes, tag-based organization, and search.
- Multi-page PDF reports with executive summary, key findings, technical analysis, and recommendations.
- Embedded dashboard charts — protocol distribution, top talkers, flow timeline, network graph, world map — rendered to PNG via kaleido for static handoff.
- Configurable TLP classification and analyst metadata.
- CSV / JSON — Export any data table with CSV injection protection.
- STIX 2.0/2.1 — Export indicators in standard STIX format.
- ATT&CK Navigator — Export technique mappings for MITRE ATT&CK Navigator.
- CEF (ArcSight) — SIEM-ingestible events from correlations, beacons, DNS, and IOCs.
app/
├── analysis/ # Correlation engine, flow analysis, IOC scorer, narrator
├── database/ # Case management (SQLite)
├── llm/ # LLM client & multi-language report generation
├── pipeline/ # 10-stage analysis pipeline
│ ├── beacon.py # C2 beaconing detection
│ ├── carve.py # HTTP payload carving
│ ├── dns_analysis.py # DGA, tunneling, fast flux
│ ├── geoip.py # GeoIP resolution
│ ├── ja3.py # JA3/JA3S fingerprinting
│ ├── batch.py # Multi-PCAP batch processing & correlation
│ ├── osint.py # OSINT provider queries (parallel)
│ ├── osint_cache.py # SQLite OSINT caching layer
│ ├── rdns_cache.py # SQLite reverse-DNS caching layer
│ ├── tls_certs.py # Certificate validation
│ └── yara_scan.py # YARA rule scanning
├── reports/ # PDF report generation (WeasyPrint + kaleido charts)
├── security/ # OPSEC hardening & data sanitization
├── threat_intel/ # MITRE ATT&CK mapping
├── ui/ # Streamlit interface (8 tabs)
├── utils/ # Export, GeoIP, config, binary discovery, CEF
├── config.py # Application defaults
└── main.py # Streamlit entry point
- Packet Counting — Fast preliminary count via tshark
- Packet Parsing — Deep inspection up to 200,000 packets (configurable)
- Zeek Processing — Automated Zeek execution and log parsing
- DNS Analysis — DGA, tunneling, fast flux, NXDOMAIN, query velocity
- TLS Certificate Analysis — Chain validation, self-signed/expired detection
- Beaconing Ranking — Temporal pattern analysis for C2 detection
- HTTP Carving — Payload extraction with SHA256 hashing
- YARA Scanning — Rule-based file scanning
- OSINT Enrichment — Multi-provider reputation lookup
- LLM Report Generation — AI-powered threat synthesis
PCAP Hunter has hard dependencies on system binaries — the pipeline cannot parse packets without them. The installer handles both system and Python dependencies, and verifies everything afterwards.
| Tool | Required? | Purpose |
|---|---|---|
| Python 3.10+ | required | Runtime |
| Tshark (Wireshark) | required | Packet parsing |
| Capinfos (Wireshark) | required | Fast packet counting (ships with tshark) |
| Zeek | required | Protocol analysis (conn.log, dns.log, http.log, ssl.log) |
| YARA | optional | Rule-based scanning of carved files |
| Pango + glib + cairo | required for PDF | WeasyPrint PDF report generation |
| LM Studio | optional | Local LLM (lmstudio.ai) |
All install logic lives in a single cross-platform Python script
(scripts/install.py) that detects your OS and package manager automatically.
git clone https://github.com/ninedter/pcap-hunter.git
cd pcap-hunter
python3 scripts/install.pyThis works identically on macOS (uses brew), Linux (uses apt), and
Windows (uses winget → choco → scoop in that order). It installs
system binaries, installs Python packages, and runs the dependency check.
Prefer your platform's usual workflow? Use one of these — they all delegate to
the same install.py:
| Platform | Command | What it does |
|---|---|---|
| macOS / Linux | make install |
wrapper around python3 scripts/install.py |
| Windows (PowerShell) | .\scripts\install.ps1 |
bootstraps Python if missing, then delegates |
| Any platform | python3 scripts/install.py |
the canonical entry point |
| Docker | docker compose up --build |
all deps baked into the image |
python3 scripts/install.py # full install + verification
python3 scripts/install.py --check-only # just run the dependency checker
python3 scripts/install.py --skip-system # pip only
python3 scripts/install.py --skip-python # system binaries only
python3 scripts/install.py --dry-run # preview commands without executing
python3 scripts/install.py --yes # non-interactive (assume yes)
Zeek has no native Windows build. Native Windows installs will work for the tshark pipeline but skip the Zeek protocol-analysis stage. For the complete pipeline on Windows, use:
- Docker (simplest) —
docker compose up --build - WSL2 —
wsl --install -d Ubuntu, then runpython3 scripts/install.pyinside Ubuntu
make doctor # macOS / Linux
python3 scripts/install.py --check-only # any OS (including Windows)The app also runs this check at startup and shows a red banner at the top of every page if any required binary is missing — you'll never get a silently empty dashboard.
make runOpen http://localhost:8501 in your browser.
- Upload — Drag and drop one or more
.pcapfiles in the Upload tab. Multiple files trigger batch mode with cross-file correlation. - Configure — Set your LLM endpoint, home location (Continent > Country > City), and OSINT API keys in the Config tab.
- Analyze — Click Extract & Analyze to start the pipeline.
- Monitor — Watch the Progress tab as stages execute: Packet Counting > Parsing > Zeek > DNS/TLS > Beaconing > Carving > YARA > OSINT > LLM Report.
- Review — Explore results across Dashboard, LLM Analysis, OSINT, Raw Data, and Cases tabs.
- Export — Download CSV/JSON data, PDF reports, STIX bundles, ATT&CK Navigator layers, or CEF syslog events.
Changed your LLM model or language? Click Re-run Report to regenerate only the AI report without re-processing the entire PCAP.
Use the granular Clear buttons in Config to independently wipe PCAP data, OSINT cache, or the Cases database.
- Defaults in
app/config.py(thresholds, paths, URLs) - Persistent config in
~/.pcap_hunter_config.json(encrypted byConfigManager) - API keys encrypted at rest with machine-derived PBKDF2 key
- Environment-variable overrides:
OTT_KEY,VT_KEY,SHODAN_KEY, etc. - LLM defaults:
http://localhost:1234/v1(LM Studio)
| Setting | Default | Purpose |
|---|---|---|
| DGA entropy | 4.0 bits | Shannon entropy threshold for DGA detection |
| Fast flux | 10+ IPs | Minimum distinct IPs per domain |
| Flow asymmetry | 10:1 + ≥1 MB | Exfil candidate threshold |
| C2 common ports | 4444, 5555, 6666, 7777, 8888, 9999, 1337, 31337 | Port-anomaly match list |
| PyShark limit | 200,000 packets | Deep-parse cap |
The bundled Dockerfile ships tshark, zeek, libpcap, and all Python deps baked in — the simplest path to a fully-working environment.
docker compose up --build
# → http://localhost:8501make verify # format check + lint + full test suiteRequired before every commit. CI runs the same three steps, so make verify passing locally means CI will pass too.
make test # pytest with coverage
make test-pdf # focused PDF + chart test suite
make lint # ruff check
make format # ruff format
make doctor # dependency verification
make clean # remove cachespython3 scripts/capture_screenshots.py # captures all 8 tabs, auto-redacts IPs
python3 scripts/capture_screenshots.py --redact-only # re-run OCR redaction on existing PNGs
python3 scripts/capture_screenshots.py --keep-ips # keep IPs (internal use only)The capture script uses Playwright headless Chromium, drives the UI, and runs a two-pass IP redaction: DOM-aware bounding-box extraction first, then multi-PSM tesseract OCR for any IPs rendered into canvas (Streamlit's st.dataframe). After redaction it auto-crops trailing whitespace and any post-content duplicate render so README screenshots stay tight to actual content.
PCAP Hunter uses production-shape test data, not simplified inputs. See tests/test_pdf_integration.py for the canonical pattern — real CorrelationSignal dataclasses, real pandas DataFrames, and the nested dict shapes the pipeline actually produces. When adding a new PDF section or chart, extend the corresponding integration test.
- User Manual (English) — end-user guide
- 中文說明 (Traditional Chinese) — 繁體中文版
- CLAUDE.md — contributor/AI guide: conventions, testing discipline, known bug patterns
- docs/roadmap.md — planned work
MIT License — see file for details.







