#1347_URL_STATUS_CHECKER

URL Checker

A Python utility to check if URLs from a text file are online or broken. The script performs concurrent checks for fast processing and provides detailed status reports.

Features

✅ Concurrent URL checking for improved performance
🔍 Validates URL format before checking
⚡ Fast HEAD request with GET fallback
📊 Real-time progress tracking
📝 Detailed summary and optional report file
⚙️ Configurable timeout and worker threads
🎯 Custom error messages for different exception types

Status Indicators

The script classifies URLs into the following categories:

ONLINE - URL is accessible (HTTP status < 400)
BROKEN - Connection failed or HTTP error (status ≥ 400)
TIMEOUT - Request exceeded timeout duration
INVALID - Malformed URL format

Requirements

Python 3.6+
requests library

Installation

Clone or download this repository
Install required dependencies:

pip install requests

Usage

Basic Usage

python url_checker.py <input_file.txt>

Save Detailed Report

python url_checker.py urls.txt report.txt

Custom Timeout

python url_checker.py urls.txt --timeout 5

Adjust Concurrent Workers

python url_checker.py urls.txt --workers 20

Combined Options

python url_checker.py urls.txt report.txt --timeout 5 --workers 20

Input File Format

Create a text file with one URL per line:

https://www.google.com
https://www.github.com
https://example.com/page
https://broken-url-example.com
# Lines starting with # are treated as comments and ignored

Output

Console Output

The script displays real-time progress:

Checking 4 URLs...

[1/4] ONLINE: https://www.google.com
[2/4] ONLINE: https://www.github.com
[3/4] BROKEN: https://broken-url-example.com
[4/4] TIMEOUT: https://slow-server.com

================================================================================
SUMMARY
================================================================================
Total URLs checked: 4
Online: 2
Broken/Timeout: 2
Invalid/Error: 0
================================================================================

Report File (Optional)

When an output file is specified, a detailed report is generated with:

Timestamp of the check
Summary statistics
Grouped URLs by status (BROKEN, TIMEOUT, INVALID, ERROR, ONLINE)
HTTP status codes (where applicable)
Detailed error messages

Example report structure:

URL Check Report - 2024-10-22 14:30:45
================================================================================

Total URLs checked: 4
Online: 2
Broken/Timeout: 2
Invalid/Error: 0

================================================================================

BROKEN URLs (1):
--------------------------------------------------------------------------------
URL: https://broken-url-example.com
Message: Connection failed

TIMEOUT URLs (1):
--------------------------------------------------------------------------------
URL: https://slow-server.com
Message: Request timed out

Command-Line Options

Option	Description	Default
`<input_file.txt>`	Path to file containing URLs (required)	-
`[output_report.txt]`	Path to save detailed report (optional)	None
`--timeout SECONDS`	Request timeout in seconds	10
`--workers NUM`	Number of concurrent worker threads	10

How It Works

URL Validation: Uses urllib.parse to validate URL format
HEAD Request: Sends a fast HEAD request first to check status
GET Fallback: If HEAD fails or returns error, tries GET request
Concurrent Processing: Uses ThreadPoolExecutor to check multiple URLs simultaneously
Error Handling: Catches and categorizes various exceptions (timeout, connection error, redirects)

Exception Handling

The script handles the following exceptions with custom messages:

requests.exceptions.Timeout - Request timed out
requests.exceptions.ConnectionError - Connection failed
requests.exceptions.TooManyRedirects - Too many redirects
General exceptions - Displays the specific error message

Performance Tips

Increase workers for faster checking of large URL lists (e.g., --workers 50)
Reduce timeout if you want quicker results and don't mind skipping slow sites (e.g., --timeout 3)
For very large lists (1000+ URLs), consider processing in batches

Limitations

Some websites may block automated requests without proper User-Agent headers
Rate limiting: Checking many URLs from the same domain simultaneously may trigger rate limits
HEAD requests: Some servers don't properly support HEAD requests

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
output.txt		output.txt
url_checker.py		url_checker.py
url_list.txt		url_list.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

URL Checker

Features

Status Indicators

Requirements

Installation

Usage

Basic Usage

Save Detailed Report

Custom Timeout

Adjust Concurrent Workers

Combined Options

Input File Format

Output

Console Output

Report File (Optional)

Command-Line Options

How It Works

Exception Handling

Performance Tips

Limitations

FilesExpand file tree

#1347_URL_STATUS_CHECKER

Directory actions

More options

Directory actions

More options

Latest commit

History

#1347_URL_STATUS_CHECKER

Folders and files

parent directory

README.md

URL Checker

Features

Status Indicators

Requirements

Installation

Usage

Basic Usage

Save Detailed Report

Custom Timeout

Adjust Concurrent Workers

Combined Options

Input File Format

Output

Console Output

Report File (Optional)

Command-Line Options

How It Works

Exception Handling

Performance Tips

Limitations