Ghost Hunter 👻

World-Class CVE Intelligence Platform - Identify Ghost CVEs with <10% false positive rate using multi-source validation and machine learning.

📊 View Latest Ghost Report

A Ghost CVE is a vulnerability identifier that appears in the wild (GitHub advisories, vendor bulletins, security feeds) but remains RESERVED or NOT_FOUND in official CVE registries. These "ghosts" represent critical security blind spots where vulnerabilities are being actively discussed or exploited before official disclosure.

Key Innovation: 6-stage pipeline with multi-source validation, confidence scoring, root cause detection, and continuous learning from resolved cases.

🎯 What is a Ghost CVE?

┌─────────────────────────────────────────────────────────────────┐
│                    GHOST CVE (RESERVED BUT PUBLIC)              │
│                                                                 │
│   CVE-2025-XXXXX mentioned in:                                  │
│   ├── GitHub commit: "Fix CVE-2025-XXXXX buffer overflow"      │
│   ├── Security advisory: "Patch for CVE-2025-XXXXX"            │
│   └── Mailing list: "New vuln CVE-2025-XXXXX"                  │
│                                                                 │
│   But in NVD/MITRE:                                            │
│   └── Status: RESERVED or 404 NOT FOUND                        │
│                                                                 │
│   = GHOST 👻 (Public knowledge, no official record)            │
└─────────────────────────────────────────────────────────────────┘

✨ Features

Core Capabilities

6-Stage Processing Pipeline: Discovery → Classification → Validation → Analysis → Root Cause → Learning
Multi-Source Validation: CVE.org API (primary) + Local CVElist V5 + Local NVD JSON (fallback chain)
Intelligent Disclosure Classification: Distinguishes patch notes, advisories, exploits with confidence scoring
Machine Learning: Learns from resolved ghosts to improve source reliability weights
Root Cause Detection: Identifies vendor failures, CNA delays, fake CVEs, embargos, and system lag
6-Hour Grace Period: Accounts for technical sync delays without false positives

Discovery Sources (23 Total)

15 RSS Feeds: ZDI, Project Zero, Cisco, Debian, Ubuntu, RedHat, CISA, AlmaLinux, Arch, Gentoo, Oracle, SUSE, Qualys, Tenable, Vulners
3 API Sources: GitHub Security Advisories, ExploitDB, CVE.org Recent Changes Monitor
5 Vendor Scrapers: Citrix, Ivanti, Palo Alto, Fortinet, VMware

Intelligence Features

Confidence Scoring: Weighted by source reliability (0.0-1.0) with continuous learning
CNA Registry Tracking: Maps CVEs to their numbering authorities for context
Resolution Detection: Automatically detects when RESERVED CVEs become PUBLISHED
Deduplication: Handles same CVE from multiple sources intelligently
Validation Caching: 1-hour TTL to reduce API load

User Experience

Rich Terminal UI: Beautiful dashboards with progress indicators
Automated Hunting: GitHub Actions workflow runs every 6 hours
Multiple Report Formats: JSON, CSV, Markdown output
Comprehensive Logging: Detailed audit trail for all decisions

📦 Installation

# Clone the repository
git clone https://github.com/rogolabs/GhostCVEs.git
cd GhostCVEs

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

🚀 Quick Start

Migrate to V2 Database Schema (First Time Only)

# Back up existing database and migrate to V2 schema
python scripts/migrate_to_v2.py

# This creates a fresh database with the new 6-stage pipeline schema
# Old database is backed up with timestamp

Set Environment Variables

# Optional: GitHub token for Security Advisory API
export GITHUB_TOKEN="ghp_your_token_here"

# Optional: Higher CVE.org API rate limits (default: 30 req/min)
export CVE_ORG_API_KEY="your_api_key"

Run a Hunt

# Execute full 6-stage pipeline
python main.py --hunt

# Hunt and check for resolutions (RESERVED → PUBLISHED)
python main.py --hunt --check-resolutions

# Generate reports
python main.py --report

# View dashboard
python main.py --dashboard

Example Output

   ██████╗ ██╗  ██╗ ██████╗ ███████╗████████╗
  ██╔════╝ ██║  ██║██╔═══██╗██╔════╝╚══██╔══╝
  ██║  ███╗███████║██║   ██║███████╗   ██║   
  ╚██████╔╝██║  ██║╚██████╔╝███████║   ██║   
  ██╗  ██╗██╗   ██╗███╗   ██╗████████╗███████╗██████╗ 
  ██║  ██║██║   ██║████╗  ██║╚══██╔══╝██╔════╝██╔══██╗
  ███████║██║   ██║██╔██╗ ██║   ██║   █████╗  ██████╔╝

🔍 Starting Ghost Hunt...

📋 Found 47 unique CVE mentions

✓ RSS Discovery: 35 CVEs found  
✓ Vendor Discovery: 12 CVEs found

╭──────────────── 🎯 Hunt Complete ────────────────╮
│ CVE Mentions Found      │                     47 │
│ New Ghosts Identified   │                      3 │
│ Total Ghosts in Registry│                     12 │
│ Hunt Duration           │                  45.2s │
╰──────────────────────────────────────────────────╯

📊 Ghost CVE Dashboard

╭────────────────────── Ghost CVE Registry ──────────────────────╮
│ CVE ID          │ Days in Limbo │ Status    │ Source Type      │
├─────────────────┼───────────────┼───────────┼──────────────────┤
│ CVE-2025-12345  │ 🔴 45         │ RESERVED  │ github_commit    │
│ CVE-2025-23456  │ 🟡 12         │ NOT_FOUND │ rss_feed         │
│ CVE-2025-34567  │ 🟢 3          │ RESERVED  │ vendor_advisory  │
╰────────────────────────────────────────────────────────────────╯

🏗️ Architecture

6-Stage Processing Pipeline

┌──────────────────────────────────────────────────────────────────┐
│ Stage 1: Discovery (23 sources)                                  │
│  ├─ RSS Feeds (15): ZDI, Project Zero, Cisco, etc.              │
│  ├─ APIs (3): GitHub, ExploitDB, CVE.org Monitor                │
│  └─ Vendor Scrapers (5): Citrix, Ivanti, Palo Alto, etc.        │
└──────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────┐
│ Stage 2: Disclosure Classification                               │
│  ├─ Analyzes context (patch notes, advisory, exploit, etc.)     │
│  ├─ Checks for vulnerability description                         │
│  └─ Calculates confidence score (adjusted by source quality)    │
└──────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────┐
│ Stage 3: Multi-Source Validation                                 │
│  ├─ Primary: CVE.org API (authoritative)                        │
│  ├─ Fallback 1: Local CVElist V5 repo                           │
│  └─ Fallback 2: Local NVD JSON database                         │
└──────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────┐
│ Stage 4: Ghost Analysis                                          │
│  ├─ Check: Public disclosure + RESERVED/NOT_FOUND status        │
│  ├─ Apply: 6-hour grace period for technical sync               │
│  └─ Require: 60%+ confidence threshold                          │
└──────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────┐
│ Stage 5: Root Cause Detection                                    │
│  ├─ FAKE_CVE: Suspicious ID patterns, unreliable sources        │
│  ├─ EMBARGO: Coordinated disclosure keywords                    │
│  ├─ VENDOR_FAILURE: Vendor source but still RESERVED            │
│  ├─ CNA_DELAY: CNA assigned but not published                   │
│  ├─ SYSTEM_LAG: Within grace period (technical delay)           │
│  └─ UNKNOWN: No clear root cause identified                     │
└──────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────┐
│ Stage 6: Continuous Learning                                     │
│  ├─ Track resolutions: RESERVED → PUBLISHED                     │
│  ├─ Calculate: True ghosts vs false positives                   │
│  ├─ Update: Source reliability weights                          │
│  └─ Bonus: Fast sources (<3 days) get higher scores             │
└──────────────────────────────────────────────────────────────────┘

File Structure

GhostCVEs/
├── main.py                           # CLI entry point
├── scripts/
│   └── migrate_to_v2.py             # Database migration script
├── src/
│   ├── config.py                    # Configuration
│   ├── models/
│   │   ├── enums.py                # Status enums
│   │   └── dataclasses.py          # Data structures
│   ├── discovery/                   # Stage 1: Discovery
│   │   ├── base.py                 # Base discovery class
│   │   ├── rss_discovery.py        # RSS feed scraper
│   │   ├── github_advisory_discovery.py  # GitHub API
│   │   ├── exploitdb_discovery.py  # ExploitDB scraper
│   │   ├── cve_org_monitor.py      # CVE.org monitor
│   │   └── vendors/                # Vendor-specific scrapers
│   │       ├── base.py
│   │       ├── citrix.py
│   │       ├── ivanti.py
│   │       ├── palo_alto.py
│   │       ├── fortinet.py
│   │       └── vmware.py
│   ├── pipeline/                    # Stages 2-6
│   │   ├── orchestrator.py         # Pipeline coordinator
│   │   ├── disclosure_classifier.py # Stage 2
│   │   ├── ghost_analyzer.py       # Stage 4
│   │   ├── root_cause_detector.py  # Stage 5
│   │   └── learning_system.py      # Stage 6
│   ├── api/
│   │   └── cve_org_client.py       # CVE.org API client
│   ├── registry/
│   │   ├── multi_source_validator.py  # Stage 3
│   │   ├── local_registry.py       # Local CVElist V5
│   │   └── nvd_local.py            # Local NVD JSON
│   ├── storage/
│   │   ├── schema_v2.py            # Database schema V2
│   │   ├── database.py             # Database manager
│   │   └── models.py               # SQLAlchemy models
│   └── ui/
│       ├── dashboard.py            # Terminal UI
│       └── reporter.py             # Report generation
└── ghost_log.db                     # SQLite database

See ARCHITECTURE.md for detailed system design.

🔧 Configuration

Discovery Sources (23 Total)

RSS Feeds (15):

Source	Confidence	Type
ZDI Advisories	0.95	vulnerability_broker
Project Zero	0.95	research_team
Cisco PSIRT	0.88	vendor_advisory
Debian Security	0.90	distro_advisory
Ubuntu USN	0.90	distro_advisory
Red Hat RHSA	0.88	vendor_advisory
CISA KEV	0.98	government_advisory
AlmaLinux	0.87	distro_advisory
Arch Linux	0.87	distro_advisory
Gentoo Security	0.87	distro_advisory
Oracle Linux	0.88	vendor_advisory
SUSE Security	0.88	distro_advisory
Qualys	0.92	security_vendor
Tenable	0.92	security_vendor
Vulners	0.85	aggregator

API Sources (3):

Source	Confidence	Notes
GitHub Security Advisories	0.90	Official GitHub advisories with CVSSv3
ExploitDB	0.92	Public exploit database
CVE.org Recent Changes	1.0	Authoritative source monitoring

Vendor Scrapers (5):

Vendor	Confidence	URL
Citrix	0.90	support.citrix.com/securitybulletins
Ivanti	0.88	forums.ivanti.com/s/article/SA
Palo Alto	0.93	security.paloaltonetworks.com
Fortinet	0.90	fortiguard.com/psirt
VMware	0.92	vmware.com/security/advisories

Confidence Scoring:

Initial confidence set per source type
Adjusted by learning system based on resolution history
Sources with <3 day resolution time get +0.10 bonus
Sources with <7 day resolution time get +0.05 bonus

Validation Strategy

Multi-Source Fallback Chain:

CVE.org API (primary) - Authoritative source, 30 req/min
Local CVElist V5 (fallback 1) - Official GitHub repo, ~2GB
Local NVD JSON (fallback 2) - Full NVD data, ~1.4GB

Caching: 1-hour TTL for validation results to reduce API load

Grace Period: 6 hours to account for technical sync delays

🚀 What's New in V2

Major Improvements:

✅ Reduced false positive rate from 40-60% to <10%
✅ Multi-source validation with fallback chain
✅ Confidence scoring with machine learning
✅ Root cause detection for better insights
✅ 6-hour grace period (not 30 days)
✅ Resolution tracking and learning system
✅ 23 discovery sources (up from 7)
✅ Fresh database schema with no backward compatibility baggage

Key Problems Solved:

False positives from normal publication lag → 6-hour grace period
No direct CVE.org validation → Added CVE.org API as primary source
Missing CNA context → CNA registry and tracking
No source reliability weighting → Learning system
Limited enrichment sources → GitHub, ExploitDB, CVE.org monitor
No embargo detection → Root cause analysis
No historical pattern analysis → Resolution history tracking
Stale local data → CVE.org API with fresh data
No duplicate detection → Intelligent deduplication in orchestrator

🔮 Future Enhancements

Potential Improvements:

Natural language processing for better disclosure classification
Anomaly detection for unusual CVE ID patterns
Integration with threat intelligence feeds
Automated notification system (email, Slack, webhooks)
Historical trend analysis and visualization
Export to STIX/TAXII formats
Real-time monitoring dashboard with websockets

⚙️ CLI Options

python main.py [OPTIONS]

Options:
  --hunt                  Run full 6-stage pipeline (discovery → learning)
  --check-resolutions     Check for resolved ghosts (RESERVED → PUBLISHED)
  --report                Generate reports from database
  --dashboard             Display Ghost CVE dashboard with statistics
  --format FORMAT         Report format: console, json, csv, markdown, all
  --output-dir DIR        Output directory for reports
  --database PATH         Path to SQLite database file (default: ghost_log.db)
  --log-level LEVEL       Logging level: DEBUG, INFO, WARNING, ERROR
  --log-file PATH         Log file path
  --workers N             Maximum concurrent workers
  --no-banner             Skip welcome banner
  --version               Show version

Common Workflows:

# Full hunt with resolution checking
python main.py --hunt --check-resolutions

# Hunt and generate all report formats
python main.py --hunt --report --format all

# Check for resolutions only (no new discovery)
python main.py --check-resolutions

# View dashboard with current statistics
python main.py --dashboard

🤖 GitHub Actions

The workflow runs automatically every 6 hours, executing the full 6-stage pipeline:

Stage 1-2: Discovery and classification across 23 sources
Stage 3: Multi-source validation (CVE.org API + local fallbacks)
Stage 4-5: Ghost analysis and root cause detection
Stage 6: Resolution checking and learning system updates
Persistence: Commits updated database and reliability scores
Reporting: Generates markdown reports and JSON artifacts

Required Secrets

GITHUB_TOKEN: Automatic (provided by Actions)
CVE_ORG_API_KEY: Optional (for higher CVE.org rate limits)

📈 Database Schema V2

cves Table (Primary CVE Tracking)

Column	Type	Description
cve_id	VARCHAR(20)	CVE identifier (PRIMARY KEY)
first_seen	DATETIME	Initial discovery timestamp
last_checked	DATETIME	Most recent validation
registry_status	VARCHAR(20)	PUBLISHED, RESERVED, REJECTED, NOT_FOUND
is_ghost	BOOLEAN	Ghost classification result
disclosure_status	VARCHAR(20)	PUBLIC, MENTIONED_ONLY, UNCERTAIN
disclosure_type	VARCHAR(50)	ADVISORY, PATCH_NOTES, EXPLOIT, etc.
confidence_score	FLOAT	Average confidence across sources
root_cause	VARCHAR(50)	VENDOR_FAILURE, CNA_DELAY, FAKE_CVE, etc.
root_cause_confidence	FLOAT	Confidence in root cause (0.0-1.0)
cna_id	VARCHAR(100)	Assigned CNA identifier

discovery_sources Table (Multi-Source Evidence)

Column	Type	Description
cve_id	VARCHAR(20)	FK to cves table
source_name	VARCHAR(100)	Source identifier (e.g., "zdi_rss")
source_type	VARCHAR(50)	rss_feed, api, vendor_scraper
confidence	FLOAT	Source-specific confidence
evidence_url	TEXT	URL to CVE mention
discovered_at	DATETIME	Discovery timestamp
context	TEXT	Surrounding text/description

source_reliability Table (Learning System)

Column	Type	Description
source_name	VARCHAR(100)	Source identifier (PRIMARY KEY)
total_discoveries	INTEGER	Total CVEs discovered
true_positives	INTEGER	Confirmed ghost CVEs
false_positives	INTEGER	Non-ghost CVEs
avg_resolution_days	FLOAT	Average days to resolution
reliability_score	FLOAT	Calculated reliability (0.0-1.0)
last_updated	DATETIME	Last recalculation

cna_registry Table (CNA Context)

Column	Type	Description
cna_id	VARCHAR(100)	CNA identifier (PRIMARY KEY)
cna_name	TEXT	Full CNA name
scope	TEXT	Responsibility scope
avg_publication_days	FLOAT	Average publication delay
total_cves	INTEGER	Total CVEs assigned

resolution_history Table (Learning Data)

Column	Type	Description
cve_id	VARCHAR(20)	CVE identifier
source_name	VARCHAR(100)	Discovering source
ghost_detected_at	DATETIME	When marked as ghost
resolved_at	DATETIME	When published/rejected
resolution_days	FLOAT	Days from detection to resolution
was_true_ghost	BOOLEAN	True positive vs false positive

validation_cache Table (Performance)

Column	Type	Description
cve_id	VARCHAR(20)	CVE identifier (PRIMARY KEY)
cached_at	DATETIME	Cache timestamp
status	VARCHAR(20)	Cached status
raw_response	TEXT	Full API/registry response

Indexes:

cves: cve_id, is_ghost, last_checked, cna_id
discovery_sources: cve_id, source_name, discovered_at
source_reliability: reliability_score DESC
resolution_history: cve_id, resolved_at

See MIGRATION.md for migration instructions.

🔐 Security Considerations

Uses rate limiting to respect API constraints
Validates CVE ID format before processing
Stores only public information
No credential exposure in logs

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Run tests: pytest
Submit a pull request

📄 License

MIT License - see LICENSE for details.

🙏 Acknowledgments

NVD for the vulnerability database API
MITRE for CVE services
Rich for beautiful terminal output
All the security researchers publishing CVE information

Built with 👻 by rogolabs.net

Name		Name	Last commit message	Last commit date
Latest commit History 512 Commits
.github/workflows		.github/workflows
data		data
docs		docs
examples		examples
reports		reports
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
IMPLEMENTATION_COMPLETE.md		IMPLEMENTATION_COMPLETE.md
LICENSE		LICENSE
PHASE2B-PHASE3-SUMMARY.md		PHASE2B-PHASE3-SUMMARY.md
README.md		README.md
TESTING_PHASE1.md		TESTING_PHASE1.md
ghost_log.backup.20260310_093900.db		ghost_log.backup.20260310_093900.db
ghost_log.db		ghost_log.db
ghost_log.manual_backup.db		ghost_log.manual_backup.db
main.py		main.py
plan.md		plan.md
plan_original.md		plan_original.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Ghost Hunter 👻

🎯 What is a Ghost CVE?

✨ Features

Core Capabilities

Discovery Sources (23 Total)

Intelligence Features

User Experience

📦 Installation

🚀 Quick Start

Migrate to V2 Database Schema (First Time Only)

Set Environment Variables

Run a Hunt

Example Output

📊 Ghost CVE Dashboard

🏗️ Architecture

6-Stage Processing Pipeline

File Structure

🔧 Configuration

Discovery Sources (23 Total)

Validation Strategy

🚀 What's New in V2

🔮 Future Enhancements

⚙️ CLI Options

🤖 GitHub Actions

Required Secrets

📈 Database Schema V2

cves Table (Primary CVE Tracking)

discovery_sources Table (Multi-Source Evidence)

source_reliability Table (Learning System)

cna_registry Table (CNA Context)

resolution_history Table (Learning Data)

validation_cache Table (Performance)

🔐 Security Considerations

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages