ImgTorch: A Lightweight Image Dataset Loader for PyTorch

ImgTorch is a minimal yet powerful image importer and preprocessor tailored for classification tasks in PyTorch. It supports both common and RAW image formats, applies consistent preprocessing, and enables fast dataset creation and visualization — all with minimal dependencies.

Key Features

Directory-based class labeling — each subfolder = one class
Supports RAW and standard formats: .jpg, .png, .cr2, .nef, .dng, etc.
Aspect-preserving resize and padding to uniform shape
Converts to PyTorch tensors ready for training
Save/load dataset as .pt files for fast reuse
Live previews via matplotlib and terminal-friendly ASCII art
Graceful handling of unreadable or corrupted files
Minimal dependencies: Only uses PyTorch, Pillow, rawpy, matplotlib, tqdm

Folder Structure

Your dataset should be organized by class subfolders:

your_dataset/
├── ClassA/
│   ├── iMg1.jpg
│   ├── ige2.cr2
│   ├── imG3.cr3
│   └── imag4.png
├── ClassB/
│   ├── imAg5.cr2
│   ├── imag6.nef
│   └── imge7.jpeg
├── ClassC/
│   ├── img8.dng
│   └── img9.jpeg

Getting Started

1. Initialize

from imgtorch import ImgTorch

imp = ImgTorch(
    baseDir="your_dataset",
    classDir=["ClassA", "ClassB", "ClassC"]
)

2. Load and Preprocess

imp.collect_images()     # Scan all images
imp.shuffle_images()     # Optional: randomize order
imp.process_images(imageSize=(128,256))     # Load, resize, convert to tensor

3. Preview

imp.preview_images(max_images=6)         # Matplotlib preview
imp.preview_ASCII(count=3, contrast=1.2) # Terminal-friendly ASCII visualization

4. Save and Use

imp.save_dataset("dataset.pt")    # Save tensors to disk
X, Y = imp.get_dataset()          # Retrieve processed data
print(X.shape, Y.shape)

Additional Notes

RAW formats are decoded using rawpy and converted to RGB using Pillow.
Aspect ratio is preserved using thumbnail() and centered padding.
Corrupted or unreadable files are skipped and listed.

Dependencies

Install with:

pip install torch torchvision pillow rawpy matplotlib tqdm

Author

Jesse Hng, 2025
A practical tool for quick dataset preparation in terminal or notebook environments.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
MLkits		MLkits
imgtorch		imgtorch
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ImgTorch: A Lightweight Image Dataset Loader for PyTorch

Key Features

Folder Structure

Getting Started

1. Initialize

2. Load and Preprocess

3. Preview

4. Save and Use

Additional Notes

Dependencies

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ImgTorch: A Lightweight Image Dataset Loader for PyTorch

Key Features

Folder Structure

Getting Started

1. Initialize

2. Load and Preprocess

3. Preview

4. Save and Use

Additional Notes

Dependencies

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages