Skip to content

nylar357/XScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 

Repository files navigation

XScraper - Recon Web Scraper

New Release! XScrap - Reconnoitre with your web scraper!

Video Below

Watch the video

Xscrap allows for proxy usage, user agent customizing, web service enumeration, scraps robots.txt, site related emails, and sub-domain enumeration.

  1. Install Deps:

    Virtual Env
    python3 -m venv venv && pipinstall bs4 dnspython lxml PySocks requests
    
    Pip
    pip install bs4 dnspython lxml PySocks requests
  2. Usage::

    python3 xscrap.py url --proxy proxyurl

License: MIT

Vuln Scraper - Cyber Security Web Scraper - Get up to date CVE NVD & News

A Python-based web scraper designed to gather publicly available information from specified web sources for cybersecurity intelligence, threat monitoring, and open-source intelligence (OSINT) gathering.

Disclaimer: This tool is intended for educational and ethical purposes only. Ensure you have explicit permission before scraping any website, respect robots.txt, and comply with all applicable laws and terms of service. The developers assume no liability and are not responsible for any misuse or damage caused by this tool.


Table of Contents


Description

This project provides a configurable web scraping tool focused on extracting data relevant to cybersecurity professionals. It can be adapted to monitor various online sources for indicators of compromise (IoCs), mentions of vulnerabilities, potential data leaks (on public sites like pastebins), or other security-related keywords and patterns.

preview

Purpose

In the realm of cybersecurity, staying informed about emerging threats, vulnerabilities, and potential exposures is crucial. Manually monitoring countless websites, forums, and paste sites is inefficient. This scraper aims to automate the collection of publicly accessible data from pre-defined sources to aid in:

  • Threat Intelligence: Identifying discussions or posts related to new threats, malware, or attack vectors.
  • Vulnerability Monitoring: Tracking mentions of specific CVEs or software weaknesses.
  • OSINT Gathering: Collecting public information related to specific domains, IPs, or organizations.
  • Brand Protection: Monitoring for mentions that might indicate phishing campaigns or reputational risks.
  • Data Leak Detection: Searching public paste sites or forums for potential exposure of sensitive keywords (e.g., company-specific terms, email formats - use responsibly!).

Features

  • Scrapes content from specified URLs or lists of URLs.
  • Searches for user-defined keywords, regular expressions, or patterns (e.g., CVE IDs, email formats, specific terms).
  • Extracts relevant text snippets or data points associated with matches.
  • Configurable scraping depth and scope (within ethical limits).
  • Outputs findings to structured formats (e.g., CSV, JSON, console).
  • Basic mechanisms to handle common scraping challenges (e.g., user-agent rotation, configurable delays - respect target sites!).
  • Modular design for easier extension to new data sources or parsing logic.

Technology Stack

  • Language: Python 3.x
  • Core Libraries:
    • requests (for fetching web pages)
    • BeautifulSoup4 or lxml (for parsing HTML)
    • (Add others specific to your project, e.g., Scrapy, Selenium if used, regex, argparse, configparser/pyyaml)
  • Data Handling: csv, json

Installation

  1. Clone the repository:

    git clone https://github.com/nylar357/VulnScraper.git                                                                                    
    cd VulnScraper                                                                                        
  2. Create a virtual environment (recommended):

    python -m venv venv                                                                                           
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`                                                                                                                                                               
  3. Install dependencies:

    pip install requests beautifulsoup4 lxml                                                                               

Usage

(Provide clear examples of how to run the script.)

Example 1: Using the built in configuration:

python3 vulnscraper.py                                                                            

Example 2: Specifying targets and keywords via command line (not yet supported):

python scraper.py --urls "https://site1.com,https://site2.org" --keywords "keyword1,keyword2" --output results.json                                                                                                                  

Ethical Considerations

Using this tool responsibly is paramount.

Using this tool responsibly is paramount.

  1. Legality & Permissions: Only scrape websites where you have explicit permission or where the robots.txt file permits scraping the intended sections. Always comply with the website's Terms of Service. Scraping private forums or restricted areas is illegal and unethical.
  2. Server Load: Implement significant delays between requests (time.sleep()). Do not overload the target servers. Set a descriptive and truthful User-Agent string that allows website administrators to identify your bot, potentially including contact information.
  3. Data Privacy: Be extremely cautious when searching for or handling potentially sensitive information (PII, credentials). Do not collect, store, or distribute private data found inadvertently. Focus on publicly acknowledged threats and vulnerabilities.
  4. Purpose: Use the gathered information ethically for defensive cybersecurity purposes only. Do not use it to facilitate unauthorized access, harassment, or any illegal activities.
  5. Transparency: If using this in an organizational context, ensure its use aligns with company policies and ethical guidelines.

Misuse of this tool can lead to legal consequences and blocking of your IP address.

Contributing

Contributions are welcome! If you'd like to improve the scraper, please follow these steps:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature/YourFeatureName).
  3. Make your changes.
  4. Commit your changes (git commit -m 'Add some feature').
  5. Push to the branch (git push origin feature/YourFeatureName).
  6. Open a Pull Request.

Please ensure your code adheres to basic coding standards and includes comments where necessary. Add a clear description of your changes in the Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.


Generated by [BryceG]

preview

About

CVE NVD Hacker News Scraper

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages