A command-line utility for taking automated screenshots of websites, powered by nodriver for enhanced stealth and anti-detection capabilities.
This is a fork of Simon Willison's excellent shot-scraper, migrated from Playwright to nodriver. This provides powerful, built-in bypass capabilities for CAPTCHAs and services like Cloudflare.
Experimental: This tool works! But it is still a work in progress and there appear to be some underlying bugs in the nodriver library that can lead to chrome crashing. Use with caution in production environments.
The easiest way to install this tool is with uv:
uv tool install git+https://github.com/elidickinson/shot-power-scraper.gitThen run the install command to test browser detection and set up the correct user agent for stealth mode:
shot-power-scraper installRequirements: Google Chrome or Chromium must be installed on your system. No separate driver installation is required.
Testing: The test suite runs on macOS CI in GitHub Actions and includes both unit tests and browser integration tests.
You can take a screenshot of a web page like this:
shot-power-scraper https://datasette.io/
This will create a screenshot in a file called datasette-io.png.
Beyond screenshots, shot-power-scraper supports multiple output formats:
shot-power-scraper https://example.com/ # Creates example-com.pngshot-power-scraper pdf https://example.com/ # Creates example-com.pdfshot-power-scraper html https://example.com/ # Outputs HTML to stdout
shot-power-scraper html https://example.com/ -o page.htmlshot-power-scraper mhtml https://example.com/ # Creates example-com.mhtml
shot-power-scraper mhtml https://example.com/ -o archive.mhtmlMHTML (MIME HTML) archives contain the complete web page including all embedded resources like images, CSS, and JavaScript in a single file - perfect for offline viewing or archival purposes.
This fork includes comprehensive stealth capabilities that make it much harder to detect than standard automation tools.
Unlike Playwright and other automation frameworks, nodriver provides built-in anti-detection that bypasses most bot detection systems by removing automation markers, simulating natural browser behavior, and masking its fingerprint.
For stealth features to work when running in headless mode (the default) you must run the install command once to set up the correct user agent:
shot-power-scraper install
By default, shot-power-scraper runs in headless mode (browser is invisible). You can run with a visible browser using:
# Run with visible browser (no interaction pause)
shot-power-scraper --headful https://example.com
# Or use the alias
shot-power-scraper --no-headless https://example.comThis is different from -i/--interactive mode which shows the browser AND pauses for manual interaction before taking the screenshot.
This fork uses uBlock Origin Lite for content blocking during screenshot capture. Use --ad-block to enable blocking, and add --ublock-lists to enable additional filter lists for blocking popups, cookie notices, and other annoyances.
shot-power-scraper --ad-block https://example.com
shot-power-scraper --ad-block --ublock-lists annoyances-cookies,annoyances-overlays https://example.com
This can be enabled by default using the config command. For detailed information about available filter lists and customization, see EXTENSIONS.md.
Our custom build includes "Complete" filtering mode (maximum blocking), annoyance filters, and custom rules support:
# Build extension with latest filter lists (2-3 minutes)
./shot_power_scraper/extensions/update-ublock.sh
# Build faster using cached filter lists
./shot_power_scraper/extensions/update-ublock.sh --use-cacheThe script automatically clones/updates uBlock Origin, enables selected filter lists, sets "Complete" mode as default, and installs to shot_power_scraper/extensions/ublock-lite-custom/.
This fork has some important differences from the original. It only supports Chrome/Chromium and some features aren't fully implemented.
shot-power-scraper accessibility- Not implemented.--log-requestsoption is not implemented.--qualityto specify JPEG quality not implemented.
- Console logging (
--log-console) - Basic CDP implementation, may miss some message types. - Browser selection (
--browser) - Only Chrome/Chromium is supported. - HAR recording (
harcommand) - Content bodies not included in the archive.
- ✅
shot: Fully Implemented (except--log-requests) - ✅
multi: Fully Implemented - ✅
pdf: Fully Implemented - ✅
javascript: Fully Implemented - ✅
html: Fully Implemented - ✅
mhtml: Fully Implemented - Create MHTML web page archives - ✅
har: Implemented (limited) - Record HTTP Archive files (content bodies not included) - ✅
auth: Fully Implemented - ✅
install: Fully Implemented - also sets up user agent for stealth mode - ✅
config: Fully Implemented
shot-power-scraper stores default settings in ~/.config/shot-power-scraper/config.json. These settings are used unless overridden by command-line options.
# Set default ad blocking
shot-power-scraper config --ad-block true
# View current settings
shot-power-scraper config --show
# Clear all settings
shot-power-scraper config --clearThe following examples demonstrate concepts that can be adapted for shot-power-scraper.
- Examples of similar usage patterns can be found in projects that use the original shot-scraper as a reference
- The concepts demonstrated in shot-scraper-demo can be adapted for shot-power-scraper
- The Datasette Documentation shows how screenshots can be integrated into documentation workflows
- Projects like @newshomepages demonstrate automated screenshot workflows
- scrape-hacker-news-by-domain shows JavaScript execution patterns that can be adapted
This section outlines the major code path and functions called when executing shot-power-scraper shot ....
- CLI Entry (
cli.py:shot()) - Parse arguments, create centralizedShotConfigobject with all parameters - Browser Command (
cli.py:run_browser_command()) - Orchestrate browser lifecycle usingshot_config - Extension Setup (
browser.py:setup_blocking_extensions()) - Configure ad blocking based onshot_config - Browser Context (
browser.py:create_browser_context()) - Initialize nodriver browser usingshot_configparameters - Screenshot Execution (
cli.py:execute_shot()) - Handle interactive mode and viewport - Core Screenshot (
screenshot.py:take_shot()) - Main screenshot logic withshot_config - Page Setup (
page_utils.py:create_tab_context()+navigate_to_url()) - Create tab context, navigate, wait, handle errors usingshot_config - Screenshot Capture (
screenshot.py:_save_screenshot()) - Take and save image - Browser Cleanup (
browser.py:cleanup_browser()) - Stop browser and cleanup - Async Wrapper (
cli.py:run_nodriver_async()) - Setup nodriver event loop
- cli.py - Main entry point, CLI parsing, command orchestration
- shot_config.py - Centralized configuration object with all parameters (browser, screenshot, execution options) and config file management
- browser.py - Browser instance management, extension setup, cleanup
- screenshot.py - Core screenshot logic, selector handling, image capture
- page_utils.py - Page navigation, error detection, Cloudflare handling, JavaScript execution
- utils.py - Utility functions for filename generation, URL processing, GitHub script loading
- Centralized Configuration: All parameters (browser options, screenshot settings, execution flags) are consolidated in
ShotConfig - Simplified Interfaces:
run_browser_command()takes justcommand_funcandshot_configparameters - Config File Integration: Configuration file loading and defaults are handled directly in
ShotConfig.__init__() - Consistent Pattern: All CLI commands follow the same
ShotConfig→run_browser_command()pattern
- Configuration parsing, validation, and config file fallback handling
- Browser context initialization with anti-detection features using consolidated configuration
- Optional extension loading for ad blocking (via uBlock Lite)
- Page navigation with error detection and Cloudflare bypass
- JavaScript execution and custom waiting conditions
- Element selector processing (CSS/JS selectors)
- Screenshot capture (full page or element-specific)
- Optional HTML content saving
- Comprehensive cleanup of browser and temporary files
The architecture is fully async-based using nodriver for enhanced stealth capabilities and automatic anti-detection. All configuration is centralized through ShotConfig for consistency and maintainability.
- CLI Parsing (
cli.py:shot()) - Parse command-line arguments and createShotConfig - Browser Initialization (
browser.py:create_browser_context()) - Start nodriver browser with stealth features - Tab Creation (
page_utils.py:create_tab_context()) - Create new tab and configure user agent - Page Navigation (
page_utils.py:navigate_to_url()) - Navigate to target URL and wait for load - Viewport Setup (
page_utils.py:navigate_to_url()) - Set viewport dimensions if width/height explicitly specified (not full page) - Error Detection - Check for Chrome error pages and DNS failures
- Cloudflare Handling - Detect and wait for Cloudflare challenge bypass
- Wait Operations - Apply
--waitdelay and--wait-forconditions - JavaScript Execution - Execute any provided JavaScript code
- Lazy Loading (
page_utils.py:trigger_lazy_load()) - Trigger lazy-loaded content if requested - Viewport Expansion - Apply viewport expansion when blocking extensions are enabled
- Screenshot Capture (
screenshot.py:_save_screenshot()) - Set final viewport and capture screenshot - HTML Saving - Save HTML content if
--save-htmlspecified - Browser Cleanup (
browser.py:cleanup_browser()) - Stop browser and clean up temporary files
- Dual Viewport Approach:
- Window Size (
set_window_size) - Controls physical browser window dimensions (important for--interactive,--headful, and--devtoolsmodes) - Viewport Metrics (
set_device_metrics_override) - Controls page layout dimensions for rendering and screenshot capture
- Window Size (
- Viewport Timing: Viewport metrics are set immediately after navigation (step 5) if width/height explicitly specified (not full page), aiding lazy loading of images
- Extension Effects: Ad/popup blocking may require viewport expansion to fix intersection observer behavior (step 11)
- Lazy Loading: Only runs if
--trigger-lazy-loadis specified, after viewport setup but before final screenshot capture - Full Page Screenshots: Skip early viewport setup; use calculated document height for viewport dimensions during screenshot capture (step 12)
- Selector Screenshots: Process JavaScript selectors before taking element-specific screenshots
- HTTP errors are checked after navigation and can trigger
--skip(exit silently) or--fail(exit with error) - Navigation errors are detected and can be handled with the same skip/fail logic
- Cloudflare challenges are automatically detected and waited for (unless disabled)
- All errors fail loudly with exceptions for debugging unless explicitly configured otherwise