Skip to content

AmirDanesh/ipv6-intelligent-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

IPv6 Intelligent Crawler

An intelligent system for discovering IPv6 web servers using Machine Learning and Metaheuristic algorithms.

🎯 Project Goal

Given the enormous IPv6 address space (2^128 addresses), traditional scanning is impossible. This project uses ML and optimization algorithms to learn address allocation patterns and predict active addresses.

πŸ“ Project Structure

ipv6-crawler/
β”œβ”€β”€ config.yaml                    # Configuration
β”œβ”€β”€ requirements.txt               # Dependencies
β”œβ”€β”€ main.py                        # Main entry point
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ seed_collector.py          # Initial address collection
β”‚   β”œβ”€β”€ feature_extractor.py       # Feature extraction
β”‚   β”œβ”€β”€ ml_model.py                # Machine learning model
β”‚   β”œβ”€β”€ address_generator.py       # Address generation (classic)
β”‚   β”œβ”€β”€ metaheuristic_generator.py # Metaheuristic algorithms
β”‚   β”œβ”€β”€ prober.py                  # Network scanner
β”‚   β”œβ”€β”€ fingerprinter.py           # Infrastructure identification
β”‚   β”œβ”€β”€ feedback_loop.py           # Feedback and model improvement
β”‚   └── database.py                # Data management
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ seeds/                     # Initial seed addresses
β”‚   β”œβ”€β”€ models/                    # Saved models
β”‚   └── results/                   # Results
└── logs/                          # Logs

🧬 Address Generation Algorithms

Classic Methods

  • Prefix-based: Generate addresses in known active prefixes
  • Mutation-based: Mutate active addresses (increment, decrement, nearby)
  • Pattern Learning: Learn from Interface ID patterns

Metaheuristic Algorithms

Algorithm Description Advantage
Genetic Algorithm (GA) Crossover and mutation of addresses Combinatorial search space exploration
Ant Colony (ACO) Pheromone-based path finding Learning from successful paths
Cuckoo Search (CS) LΓ©vy Flight for large jumps Exploration/exploitation balance

Hybrid Strategy

Intelligent combination of all methods with dynamic resource allocation based on each algorithm's success rate.

πŸš€ Installation & Usage

Prerequisites

  • Python 3.10 or higher
  • pip (Python package manager)
  • Git

Quick Start

1. Clone the repository

git clone https://github.com/AmirDanesh/ipv6-intelligent-crawler.git
cd ipv6-intelligent-crawler

2. Create and activate virtual environment

Windows (PowerShell):

python -m venv venv
.\venv\Scripts\Activate.ps1

Windows (CMD):

python -m venv venv
venv\Scripts\activate.bat

Linux/macOS:

python3 -m venv venv
source venv/bin/activate

3. Install dependencies

pip install -r requirements.txt

4. Run the crawler

# Full crawling pipeline
python main.py

# With custom config
python main.py --config custom_config.yaml

# Quick test run
python main.py --quick-test

Command Line Options

Option Description
--config Path to configuration file (default: config.yaml)
--quick-test Run a quick test with minimal addresses
--collect-only Only collect seed addresses
--probe-only Only probe existing addresses
--help Show all available options

Verify Installation

python -c "from src.ml_model import IPv6ActivePredictor; print('βœ… Installation successful!')"

πŸ“Š System Workflow

  1. Seed Collection: Gather initial IPv6 addresses from various sources
  2. Feature Extraction: Convert addresses to feature vectors
  3. Model Training: Learn addressing patterns (Ensemble: RF + XGBoost + GB)
  4. Address Generation: Predict using GA + ACO + Cuckoo Search
  5. ML Filtering: Select best candidates using prediction model
  6. Probing: Verify address activity
  7. Fingerprinting: Identify server characteristics
  8. Feedback Loop: Update algorithm weights based on success rates

πŸ”§ Configuration

Edit config.yaml to customize:

  • Scanning parameters
  • ML model settings
  • Metaheuristic algorithm parameters
  • Probe timeouts and concurrency

πŸ“ˆ Features

  • Ensemble ML Model: Random Forest + XGBoost + Gradient Boosting
  • Adaptive Algorithm Selection: Automatically favors better-performing algorithms
  • Closed-loop Learning: Continuously improves from probe results
  • Efficient Probing: Concurrent scanning with rate limiting

πŸ“ License

MIT License

About

πŸ” Intelligent IPv6 web server discovery using Machine Learning and Metaheuristic algorithms (Genetic Algorithm, Ant Colony Optimization, Cuckoo Search)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages