Download image galleries or metadata accross the web

beautifulsoup4 cli cli-application downloader graphql-server image-gallery metadata-extraction python rust

Rust 91.2%
Python 5%
JavaScript 3.3%
HTML 0.5%

Find a file

michael-0acf4 2863e03681 feat: Cloudflare worker proxy (#5 ) * refactor: abstract http client logic into resolvers * feat: flaresolverr * fix: missed git add * fix: always use default downloader * feat: cloudflare worker proxy * feat: allow downloader override at client level * nuke * fix: typo		2025-10-19 17:14:51 +03:00
.github/workflows	feat: gallery-dl extractor support	2024-08-31 22:46:17 +03:00
.vscode	fix: mutex lock overhead	2024-08-18 12:51:36 +03:00
misc-extensions	feat: Cloudflare worker proxy (#5 )	2025-10-19 17:14:51 +03:00
plugins	feat(python): print traceback if -v	2024-09-01 20:38:34 +03:00
src	feat: Cloudflare worker proxy (#5 )	2025-10-19 17:14:51 +03:00
static	update screenshot	2024-11-09 23:39:19 +03:00
.gitignore	feat: enable header per item	2025-04-05 19:35:54 +03:00
Cargo.toml	feat: Cloudflare worker proxy (#5 )	2025-10-19 17:14:51 +03:00
LICENSE	Initial commit	2024-08-10 14:31:45 +03:00
mx-config.yaml	feat: Cloudflare worker proxy (#5 )	2025-10-19 17:14:51 +03:00
README.md	feat: Cloudflare worker proxy (#5 )	2025-10-19 17:14:51 +03:00

README.md

mx-scraper

Download image galleries or metadata on the web.

This rewrite is expected to support previous implementation's metadata format.

The main idea was to separate the core (mx-scraper) from the plugins (user defined) as it was not possible from previous implementations.

Usage

# pip install beautifulsoup4

# Plugins can be specified with -p or --plugin
# By default, it will be inferred from the args
# Each plugin may have its own set of dependencies that are independent from mx-scraper
# Uses bs4
mx-scraper fetch --plugin images https://www.google.com
# Uses gallery-dl
mx-scraper fetch --meta-only -v https://x.com/afmikasenpai/status/1901323062949159354
mx-scraper fetch -p gallery-dl https://x.com/afmikasenpai/status/1901323062949159354

# Alternatively, to infer batched terms targeting various sources/plugins, prefixing is often required (e.g. id or name)
# The prefix is plugin specific (refer to plugin_name/__init__.py :: mx_is_supported)
mx-scraper fetch --meta-only -v img:https://www.google.com https://mto.to/series/68737
mx-scraper fetch --meta-only -v nh:177013

Commands

mx-scraper engine

Usage: mx-scraper <COMMAND>

Commands:
  fetch        Fetch a sequence of terms
  fetch-files  Fetch a sequence of terms from a collection of files
  request      Request a url
  infos        Display various informations
  server       Spawn a graphql server interfacing mx-scraper
  help         Print this message or the help of the given subcommand(s)

Options:
  -h, --help  Print help

Each fetch strategy will share the same configuration..

Features

CLI
- Fetch a list of terms
- Fetch a list of terms from a collection of files⌈
- Generic URL Request
  - Print as text
  - Download --dest flag
- Authentications (Basic, Bearer token)
Cookies
- Loading from a file (Netscape format, key-value)
- Loading from the config (key-value)
Http Client/Downloader
- Support of older mx-scraper book schema
- Download
- Cache support (can be disabled with --no-cache or from config)
- Configurable Http Client (default, Flaresolverr, cfworker)
Plugins
- Python plugin
  - MxRequest with runtime context (headers, cookies, auth)
- gallery-dl extractors
- Subprocess (e.g. imgbrd-grabber)
Send context from an external source (e.g. browser)
- Cookies, UA (through --listen-cookies, will open a callback url that can receive a FetchContext object)
- Rendered HTML page

GraphQL server

You can also use the extractors through GraphQL queries. You will have the same options as the command-line interface.

Usage: mx-scraper server [OPTIONS]

Options:
      --port <PORT>  Server port
  -h, --help         Print help