MambaOCR: Scene Text Recognition with LoRA

This project implements an OCR system using ResNet34, Mamba, and CTC. It uses adapter-based fine-tuning to train efficiently on the vision backbone.

How it works

Preprocessing: Images are resized to 32px height, preserving aspect ratio, and padded.
Vision Backbone: Uses a pre-trained ResNet34 with injected adapters. freezes most of the network and only train the adapters, partial layer4, and projections.
Mamba Encoder: Models the sequence of features to understand character context.
Decoding: Uses CTC to convert the sequence into text.

Features

Mamba: Uses state space models instead of RNNs for better scaling.
Adapters: Fine-tunes the backbone efficiently without breaking pre-trained weights.
ResNet34: Strong baseline feature extractor.
FP16: Supports mixed precision training.

Project Structure

configs/: Configuration settings.
data/: Dataset loading and processing.
models/: Model architecture (ResNet + Mamba).
train.py: Main training script.
infer.py: Inference script.
utils.py: Metrics and decoding helpers.
gen_data.py: Synthetic data generator.

Installation

Clone the repo and install dependencies:

git clone <repo_url>
cd ocr_project
pip install -r requirements.txt

Note: mamba-ssm needs a GPU.

Usage

Configuration

Check configs/config.py to set your data paths and batch size.

Training

Run the training script (it handles freezing automatically):

python train.py

Inference

Test on images:

python infer.py

Data Generation

Use gen_data.py to create synthetic training data with random fonts and noise.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
configs		configs
data		data
models		models
README.md		README.md
clean_data.py		clean_data.py
eval.py		eval.py
gen_data.py		gen_data.py
infer.py		infer.py
project_overview.txt		project_overview.txt
requirements.txt		requirements.txt
train.py		train.py
train_finetune.py		train_finetune.py
utils.py		utils.py
utils_lifecycle.py		utils_lifecycle.py
visualize_error.py		visualize_error.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MambaOCR: Scene Text Recognition with LoRA

How it works

Features

Project Structure

Installation

Usage

Configuration

Training

Inference

Data Generation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MambaOCR: Scene Text Recognition with LoRA

How it works

Features

Project Structure

Installation

Usage

Configuration

Training

Inference

Data Generation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages