This is our solution repo hosting the code and approach used for Mosaic'21 PS1 at Udyam, IIT BHU.
Character Recognition is the identification of printed characters from an image, a book, a handwritten note, cheques, or letters so that it can distinguish between different characters and can convert them from image to character. With the increasing need for character recognition models, so using deep learning and image processing methods we propose our solution to recognition of Devanagiri characters which forms the basis of many laguages.
The whole pipeline is a 3 stage process:
- Preprocess the image for removing extra space and noise using image processing in OpenCV.
- Segmentation of characters from cropped word image using contour detection
- Character Recognition of individual contours(handwritten character) using a CNN network.
The captcha image is processed as follows:
- extra blank region removal
- noise removal and extra lines filtering
- Removing header line of hindi word to separate the characters.
- Segmenting characters as contours.
A rule based character segmentation is performed using contours. As in Hindi characters, some of the characters are made of more than 1 contours such as श, ग etc. are removed, and the model is trained on 26 letters which are chosen by analysing the confusion matrix of all 36 characters(consonants). Our objective was to make a classifier that can give maximum performance on the characters given for prediction.
The characters chosen are mentioned in character.txt
We add several checks over expected dimensions of charcter blobs to select only those contours that are characters. Then the segmented characters are passed through a CNN classifier for recognition.
Run locally
- Clone the project
git clone https://github.com/arch-raven/Hindi-Captcha-Recognition-openCV.git- Go to the project directory
cd Hindi-Captcha-Recognition-openCV- Install dependencies
pip install -r requirements.txt- Make predictions
python main.py --image_path path/to/imageA very popular Devanagari handwritten character dataset.
Link: http://archive.ics.uci.edu/ml/datasets/Devanagari+Handwritten+Character+Dataset
Contains 92,000 Images belonging to 46 Devanagiri classes of 36 consonants and 10 digits from 0-9. Image Size : 32x32
![]() Training Accuracy |
![]() Training Loss |
![]() Validation Accuracy |
![]() Validation Loss |
![]() Lavish Bansal |
![]() Aditya Kumar |
![]() Ayush Singh |






