A Python-based project to scrape Chuck Norris quotes from various online databases and generate random quotes based on the scraped data.
- 🔍 Quote Scraper: ETL pipeline to scrape Chuck Norris quotes from multiple online sources
- 🎲 Quote Generator: Generate up to 10,000,000 unique random Chuck Norris quotes
- 💾 Efficient Storage: SQLite database optimized for quick access
- 🧪 Fully Tested: 95%+ code coverage with comprehensive unit tests
- 🎯 Type-Safe: Full type hints and mypy validation
- 🔧 CLI Interface: User-friendly command-line interface with extensive options
- Python 3.14 or higher
- pip (Python package installer)
-
Clone the repository:
git clone https://github.com/justincranford/chucknorris.git cd chucknorris -
Create a virtual environment (recommended):
python -m venv venv # On Windows: venv\Scripts\activate # On macOS/Linux: source venv/bin/activate
-
Install dependencies:
pip install -e ".[dev]" -
Install pre-commit hooks:
pre-commit install pre-commit install --hook-type commit-msg
Minimal one-liner to install dev dependencies and register the single pre-commit hook within .githooks:
python -m pip install -e .[dev] && dev-setupNotes:
-
dev-setupis a console script that installs dev dependencies, setscore.hooksPathto.githooks, and validates hooks by runningpre-commit run --all-files -j 4. -
We intentionally avoid
pre-commit install --install-hooksto keep a single canonical hook script;dev-setupmarks.githooks/pre-commitexecutable and registers it with Git. -
pip installcannot safely run arbitrary repo scripts automatically for security reasons, so an explicitdev-setupinvocation is required.
Note: dev-setup now enforces Node.js (>= 24.11.1) as a required tool for developer environments, since the repo runs Pyright via npx in pre-commit hooks; the script will check for a compatible Node version and exit if missing.
This repository uses Pylance in the editor for fast, incremental static analysis and Pyright for CLI/CI checks. We've added a pre-commit hook that runs Pyright to ensure type errors and Pylance-relevant diagnostics block commits and pushes.
To run Pyright locally (requires Node.js):
npx --yes pyright
# or using pnpm: pnpm dlx pyrightIf you don't have Node.js, you can still use Pylance in VS Code for editor-based checks.
python scraper/scraper.py -vThis will download quotes from the Chuck Norris API and store them in scraper/quotes.db.
# Generate a single quote
python quotes/generator.py
# Generate 10 quotes
python quotes/generator.py --count 10
# Generate JSON output
python quotes/generator.py --count 5 --format json# Run all tests with coverage
pytest --cov=scraper --cov=quotes
# Run specific test file
pytest tests/test_scraper.py -v
# Run tests with coverage report
pytest --cov=scraper --cov=quotes --cov-report=htmlChuck Norris can divide by zero.
Chuck Norris counted to infinity. Twice.
[
{
"id": 1,
"quote": "Chuck Norris can divide by zero.",
"source": "https://api.chucknorris.io/jokes/random"
}
]id,quote,source
1,"Chuck Norris can divide by zero.","https://api.chucknorris.io/jokes/random"# Custom output location
python scraper/scraper.py --output ./data/quotes.db
# Scrape from custom sources
python scraper/scraper.py --sources https://example.com/api/quotes
# Enable verbose logging
python scraper/scraper.py -v# Generate with specific seed for reproducibility
python quotes/generator.py --count 100 --seed 42
# Save to file
python quotes/generator.py --count 1000 --output quotes.txt
# Use custom database
python quotes/generator.py --database ./data/quotes.db --count 10# Scraper help
python scraper/scraper.py --help
# Generator help
python quotes/generator.py --helpScrape Chuck Norris quotes from online sources and store them in a database:
python scraper/scraper.py-s, --sources: List of URLs or sources to scrape (space-separated)-o, --output: Output file path base (default:scraper/quotes.db)-f, --format: Output format -sqlite,csv, orboth(default:both)-v, --verbose: Enable verbose logging-d, --dry-run, --dryrun: Validate sources and simulate scraping without network calls-t, --threads, --thread: Number of concurrent threads for parallel processing (default: 4)-h, --help: Display help and usage examples
# Scrape from default sources
python scraper/scraper.py
# Specify custom output location
python scraper/scraper.py --output ./my_quotes.db
# Enable verbose logging
python scraper/scraper.py --verbose
# Dry run to validate sources without scraping
python scraper/scraper.py --dry-run
# Use 8 threads for parallel processing
python scraper/scraper.py --threads 8
# Scrape from specific sources
python scraper/scraper.py --sources https://api.chucknorris.io/jokes/randomGenerate random Chuck Norris quotes from the scraped database:
python quotes/generator.py-c, --count: Number of quotes to generate (default: 1, max: 10,000,000)-s, --seed: Random seed for reproducible output (default: None for truly random)-o, --output: Output file path (default: stdout)-f, --format: Output format -text,json, orcsv(default:text)-d, --database: Path to the quotes database (default:scraper/quotes.db)-v, --verbose: Enable verbose logging-h, --help: Display help and usage examples
# Generate a single random quote
python quotes/generator.py
# Generate 10 random quotes
python quotes/generator.py --count 10
# Generate quotes with a specific seed for reproducibility
python quotes/generator.py --count 5 --seed 42
# Output to a file in JSON format
python quotes/generator.py --count 100 --format json --output quotes.json
# Generate CSV format
python quotes/generator.py --count 50 --format csv --output quotes.csv
# Use a custom database
python quotes/generator.py --database ./my_quotes.db --count 5Run the test suite:
pytestRun tests with coverage report:
pytest --cov=scraper --cov=quotes --cov-report=htmlView coverage report:
# The HTML report will be in htmlcov/index.htmlThis project uses several tools to maintain code quality:
- Black: Code formatting
- isort: Import sorting
- flake8: Linting
- mypy: Type checking
- pre-commit: Git hooks for automated checks
Run all checks manually:
pre-commit run --all-fileschucknorris/
├── .github/
│ ├── actions/ # Custom GitHub Actions
│ ├── instructions/ # Copilot instruction files
│ ├── workflows/ # CI/CD pipelines
│ └── copilot-instructions.md
├── scraper/
│ ├── scraper.py # Quote scraping script
│ ├── quotes.db # Scraped quotes database
│ ├── quotes.csv # Scraped quotes CSV
│ └── sources.txt # List of sources to scrape
├── quotes/
│ └── generator.py # Quote generation script
├── tests/ # Unit tests
│ ├── conftest.py # Pytest configuration
│ ├── test_scraper.py # Scraper tests
│ ├── test_scraper_cli.py # Scraper CLI tests
│ ├── test_generator.py # Generator tests
│ └── test_generator_cli.py # Generator CLI tests
├── .pre-commit-config.yaml # Pre-commit hooks configuration
├── .gitignore
├── pyproject.toml # Project configuration and dependencies
├── LICENSE # AGPL license
└── README.md
The scraper module provides functionality to extract, transform, and load Chuck Norris quotes from various online sources.
Key Functions:
scrape_quotes(sources, output_db): Main ETL pipelinefetch_from_api(url): Fetch quotes from JSON APIsparse_html(content): Parse quotes from HTML pagessave_to_database(quotes, db_path): Store quotes in SQLite database
The generator module provides functionality to generate random Chuck Norris quotes from the database.
Key Functions:
generate_quotes(count, seed, database): Generate random quotesexport_quotes(quotes, format, output): Export quotes in various formats
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests and ensure they pass
- Submit a pull request
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). See the LICENSE file for details.
- Chuck Norris for being awesome
- The various Chuck Norris quote databases and APIs that make this project possible