ImgTorch is a minimal yet powerful image importer and preprocessor tailored for classification tasks in PyTorch. It supports both common and RAW image formats, applies consistent preprocessing, and enables fast dataset creation and visualization — all with minimal dependencies.
- Directory-based class labeling — each subfolder = one class
- Supports RAW and standard formats:
.jpg,.png,.cr2,.nef,.dng, etc. - Aspect-preserving resize and padding to uniform shape
- Converts to PyTorch tensors ready for training
- Save/load dataset as
.ptfiles for fast reuse - Live previews via matplotlib and terminal-friendly ASCII art
- Graceful handling of unreadable or corrupted files
- Minimal dependencies: Only uses PyTorch, Pillow, rawpy, matplotlib, tqdm
Your dataset should be organized by class subfolders:
your_dataset/
├── ClassA/
│ ├── iMg1.jpg
│ ├── ige2.cr2
│ ├── imG3.cr3
│ └── imag4.png
├── ClassB/
│ ├── imAg5.cr2
│ ├── imag6.nef
│ └── imge7.jpeg
├── ClassC/
│ ├── img8.dng
│ └── img9.jpeg
from imgtorch import ImgTorch
imp = ImgTorch(
baseDir="your_dataset",
classDir=["ClassA", "ClassB", "ClassC"]
)imp.collect_images() # Scan all images
imp.shuffle_images() # Optional: randomize order
imp.process_images(imageSize=(128,256)) # Load, resize, convert to tensorimp.preview_images(max_images=6) # Matplotlib preview
imp.preview_ASCII(count=3, contrast=1.2) # Terminal-friendly ASCII visualizationimp.save_dataset("dataset.pt") # Save tensors to disk
X, Y = imp.get_dataset() # Retrieve processed data
print(X.shape, Y.shape)- RAW formats are decoded using
rawpyand converted to RGB usingPillow. - Aspect ratio is preserved using
thumbnail()and centered padding. - Corrupted or unreadable files are skipped and listed.
Install with:
pip install torch torchvision pillow rawpy matplotlib tqdmJesse Hng, 2025
A practical tool for quick dataset preparation in terminal or notebook environments.