A scrapy pipeline which stores files using folder trees. https://sp1thas.codeberg.page/scrapy-folder-tree
Find a file
sp1thas 437238aef0
All checks were successful
ci/woodpecker/push/checkout-code Pipeline was successful
Use pre-commit plugin (#46)
Reviewed-on: #46
Co-authored-by: sp1thas <[email protected]>
Co-committed-by: sp1thas <[email protected]>
2025-05-04 08:05:36 +00:00
.woodpecker Use pre-commit plugin (#46) 2025-05-04 08:05:36 +00:00
docs replace umami with posthog 2023-06-10 23:16:18 +03:00
src refactor ci 2023-02-28 00:36:34 +02:00
tests add more unittests and extra dependencies 2023-01-15 22:01:39 +02:00
.gitignore add more unittests and extra dependencies 2023-01-15 22:01:39 +02:00
.pre-commit-config.yaml feat/ci (#45) 2025-05-03 21:54:37 +00:00
LICENSE add licence file 2022-02-01 12:20:41 +02:00
mkdocs.yml feat/ci (#44) 2025-04-20 18:23:07 +00:00
poetry.lock feat/ci (#45) 2025-05-03 21:54:37 +00:00
pyproject.toml feat/ci (#45) 2025-05-03 21:54:37 +00:00
README.md feat/ci (#44) 2025-04-20 18:23:07 +00:00

scrapy-folder-tree

status-badge PyPI PyPI - Format PyPI - Status

This is a scrapy pipeline that provides an easy way to store files and images using various folder structures.

Supported folder structures:

Given this scraped file: 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg, you can choose the following folder structures:

Using the file name

class: scrapy-folder-tree.ImagesHashTreePipeline

full
├── 0
.   ├── 5
.   .   ├── b
.   .   .   ├── 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg
Using the crawling time

class: scrapy-folder-tree.ImagesTimeTreePipeline

full
├── 0
.   ├── 11
.   .   ├── 48
.   .   .   ├── 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg
Using the crawling date

class: scrapy-folder-tree.ImagesDateTreePipeline

full
├── 2022
.   ├── 1
.   .   ├── 24
.   .   .   ├── 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg

Installation

pip install scrapy-folder-tree

Usage

Use the following settings in your project:

ITEM_PIPELINES = {
    'scrapy_folder_tree.FilesHashTreePipeline': 300
}