Download image galleries or metadata accross the web
beautifulsoup4
cli
cli-application
downloader
graphql-server
image-gallery
metadata-extraction
python
rust
- Rust 91.2%
- Python 5%
- JavaScript 3.3%
- HTML 0.5%
* refactor: abstract http client logic into resolvers * feat: flaresolverr * fix: missed git add * fix: always use default downloader * feat: cloudflare worker proxy * feat: allow downloader override at client level * nuke * fix: typo |
||
|---|---|---|
| .github/workflows | ||
| .vscode | ||
| misc-extensions | ||
| plugins | ||
| src | ||
| static | ||
| .gitignore | ||
| Cargo.toml | ||
| LICENSE | ||
| mx-config.yaml | ||
| README.md | ||
mx-scraper
Download image galleries or metadata on the web.
This rewrite is expected to support previous implementation's metadata format.
The main idea was to separate the core (mx-scraper) from the plugins (user defined) as it was not possible from previous implementations.
Usage
# pip install beautifulsoup4
# Plugins can be specified with -p or --plugin
# By default, it will be inferred from the args
# Each plugin may have its own set of dependencies that are independent from mx-scraper
# Uses bs4
mx-scraper fetch --plugin images https://www.google.com
# Uses gallery-dl
mx-scraper fetch --meta-only -v https://x.com/afmikasenpai/status/1901323062949159354
mx-scraper fetch -p gallery-dl https://x.com/afmikasenpai/status/1901323062949159354
# Alternatively, to infer batched terms targeting various sources/plugins, prefixing is often required (e.g. id or name)
# The prefix is plugin specific (refer to plugin_name/__init__.py :: mx_is_supported)
mx-scraper fetch --meta-only -v img:https://www.google.com https://mto.to/series/68737
mx-scraper fetch --meta-only -v nh:177013
Commands
mx-scraper engine
Usage: mx-scraper <COMMAND>
Commands:
fetch Fetch a sequence of terms
fetch-files Fetch a sequence of terms from a collection of files
request Request a url
infos Display various informations
server Spawn a graphql server interfacing mx-scraper
help Print this message or the help of the given subcommand(s)
Options:
-h, --help Print help
Each fetch strategy will share the same configuration..
Features
-
CLI
- Fetch a list of terms
- Fetch a list of terms from a collection of files⌈
- Generic URL Request
- Print as text
- Download
--destflag
- Authentications (Basic, Bearer token)
-
Cookies
- Loading from a file (Netscape format, key-value)
- Loading from the config (key-value)
-
Http Client/Downloader
- Support of older mx-scraper book schema
- Download
- Cache support (can be disabled with
--no-cacheor from config) - Configurable Http Client (default, Flaresolverr, cfworker)
-
Plugins
- Python plugin
MxRequestwith runtime context (headers, cookies, auth)
- gallery-dl extractors
- Subprocess (e.g. imgbrd-grabber)
- Python plugin
-
Send context from an external source (e.g. browser)
- Cookies, UA (through
--listen-cookies, will open a callback url that can receive aFetchContextobject) - Rendered HTML page
- Cookies, UA (through
GraphQL server
You can also use the extractors through GraphQL queries. You will have the same options as the command-line interface.
Usage: mx-scraper server [OPTIONS]
Options:
--port <PORT> Server port
-h, --help Print help
