A scrapy pipeline which stores files using folder trees.
https://sp1thas.codeberg.page/scrapy-folder-tree
- Python 100%
|
All checks were successful
ci/woodpecker/push/checkout-code Pipeline was successful
Reviewed-on: #46 Co-authored-by: sp1thas <[email protected]> Co-committed-by: sp1thas <[email protected]> |
||
|---|---|---|
| .woodpecker | ||
| docs | ||
| src | ||
| tests | ||
| .gitignore | ||
| .pre-commit-config.yaml | ||
| LICENSE | ||
| mkdocs.yml | ||
| poetry.lock | ||
| pyproject.toml | ||
| README.md | ||
scrapy-folder-tree
This is a scrapy pipeline that provides an easy way to store files and images using various folder structures.
Supported folder structures:
Given this scraped file: 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg, you can choose the following folder structures:
Using the file name
class: scrapy-folder-tree.ImagesHashTreePipeline
full
├── 0
. ├── 5
. . ├── b
. . . ├── 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg
Using the crawling time
class: scrapy-folder-tree.ImagesTimeTreePipeline
full
├── 0
. ├── 11
. . ├── 48
. . . ├── 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg
Using the crawling date
class: scrapy-folder-tree.ImagesDateTreePipeline
full
├── 2022
. ├── 1
. . ├── 24
. . . ├── 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg
Installation
pip install scrapy-folder-tree
Usage
Use the following settings in your project:
ITEM_PIPELINES = {
'scrapy_folder_tree.FilesHashTreePipeline': 300
}