Download image galleries or metadata accross the web
  • Rust 91.2%
  • Python 5%
  • JavaScript 3.3%
  • HTML 0.5%
Find a file
michael-0acf4 2863e03681
feat: Cloudflare worker proxy (#5)
* refactor: abstract http client logic into resolvers

* feat: flaresolverr

* fix: missed git add

* fix: always use default downloader

* feat: cloudflare worker proxy

* feat: allow downloader override at client level

* nuke

* fix: typo
2025-10-19 17:14:51 +03:00
.github/workflows feat: gallery-dl extractor support 2024-08-31 22:46:17 +03:00
.vscode fix: mutex lock overhead 2024-08-18 12:51:36 +03:00
misc-extensions feat: Cloudflare worker proxy (#5) 2025-10-19 17:14:51 +03:00
plugins feat(python): print traceback if -v 2024-09-01 20:38:34 +03:00
src feat: Cloudflare worker proxy (#5) 2025-10-19 17:14:51 +03:00
static update screenshot 2024-11-09 23:39:19 +03:00
.gitignore feat: enable header per item 2025-04-05 19:35:54 +03:00
Cargo.toml feat: Cloudflare worker proxy (#5) 2025-10-19 17:14:51 +03:00
LICENSE Initial commit 2024-08-10 14:31:45 +03:00
mx-config.yaml feat: Cloudflare worker proxy (#5) 2025-10-19 17:14:51 +03:00
README.md feat: Cloudflare worker proxy (#5) 2025-10-19 17:14:51 +03:00

mx-scraper

Download image galleries or metadata on the web.

This rewrite is expected to support previous implementation's metadata format.

The main idea was to separate the core (mx-scraper) from the plugins (user defined) as it was not possible from previous implementations.

Usage

# pip install beautifulsoup4

# Plugins can be specified with -p or --plugin
# By default, it will be inferred from the args
# Each plugin may have its own set of dependencies that are independent from mx-scraper
# Uses bs4
mx-scraper fetch --plugin images https://www.google.com
# Uses gallery-dl
mx-scraper fetch --meta-only -v https://x.com/afmikasenpai/status/1901323062949159354
mx-scraper fetch -p gallery-dl https://x.com/afmikasenpai/status/1901323062949159354

# Alternatively, to infer batched terms targeting various sources/plugins, prefixing is often required (e.g. id or name)
# The prefix is plugin specific (refer to plugin_name/__init__.py :: mx_is_supported)
mx-scraper fetch --meta-only -v img:https://www.google.com https://mto.to/series/68737
mx-scraper fetch --meta-only -v nh:177013

Commands

mx-scraper engine

Usage: mx-scraper <COMMAND>

Commands:
  fetch        Fetch a sequence of terms
  fetch-files  Fetch a sequence of terms from a collection of files
  request      Request a url
  infos        Display various informations
  server       Spawn a graphql server interfacing mx-scraper
  help         Print this message or the help of the given subcommand(s)

Options:
  -h, --help  Print help

Each fetch strategy will share the same configuration..

Features

  • CLI

    • Fetch a list of terms
    • Fetch a list of terms from a collection of files⌈
    • Generic URL Request
      • Print as text
      • Download --dest flag
    • Authentications (Basic, Bearer token)
  • Cookies

    • Loading from a file (Netscape format, key-value)
    • Loading from the config (key-value)
  • Http Client/Downloader

    • Support of older mx-scraper book schema
    • Download
    • Cache support (can be disabled with --no-cache or from config)
    • Configurable Http Client (default, Flaresolverr, cfworker)
  • Plugins

    • Python plugin
      • MxRequest with runtime context (headers, cookies, auth)
    • gallery-dl extractors
    • Subprocess (e.g. imgbrd-grabber)
  • Send context from an external source (e.g. browser)

    • Cookies, UA (through --listen-cookies, will open a callback url that can receive a FetchContext object)
    • Rendered HTML page

GraphQL server

You can also use the extractors through GraphQL queries. You will have the same options as the command-line interface.

Usage: mx-scraper server [OPTIONS]

Options:
      --port <PORT>  Server port
  -h, --help         Print help

Playground Screenshot