Inspiration
Libraries have lots of books. UIUC has the largest library of any public university in the United States with over 13 million volumes and 24 million items. That's a lot of books! Nitya, a Student Library Assistant at the Main Stacks, knows how difficult it is to keep all these books organized. Nitya and all other UIUC library assistants are trained to locate out-of-place books and relocate them to their proper location. This requires much time and focus on reading the labels of books. Our goal was to improve this process by increasing the accuracy of sorting and making it more time efficient.
What it does
BookSortWizard allows a user to take an image of a bookshelf. Through optical character recognition, BookSortWizard checks the ordering of the books based on the Library of Congress classification system. If a book is out of place, BookSortWizard lets the user know of the out of place book and its correct location with respect to the other books. This makes finding out-of-place books a much simpler task.
How we built it
- The website allows for a user to upload an image and will output detailed instructions relating to out-of-place books when the process is complete.
- We used OpenCV to preprocess the images and isolate the labels of the books from the image of the shelf. This will increase the accuracy of the optical character recognition system.
- The initial image is converted to grayscale and binarized.
- Morphological transformations are applied to the image to isolate the label.
- Once the positions of the labels are found, each individual label is extracted from the original grayscale image to be used for optical character recognition.
- PyTesseract was used for the optical character recognition.
- Extract the text from each individual preprocessed label. Due to imperfection in the preprocessing, images that are not labels or do not conform to the Library of Congress system are removed.
- Store the text in Label classes with values corresponding to lines according to the Library of Congress classification system.
- Create a list of label classes in order of position on the shelf.
- A merge sort algorithm was used for sorting the books and determining the position of out-of-place books.
- The Library of Congress classification system has 4 lines with the order of precedence decreasing from top to bottom.
- Label classes were compared based on lines in order of precedence.
- The result is an output of instructions to the librarian telling which book at which location is out of place.
Challenges we ran into
- Preprocessing the images for successful character recognition proved very challenging. Images with different lighting and books with different backgrounds proved difficult to process.
- None of us were very familiar with web development, and the integration between python scripts and the website itself was challenging to navigate.
Accomplishments that we're proud of
- All of us are beginners and relatively new to programming. For many of us, this is only our first hackathon. Thus, putting together a project at all was a big accomplishment.
- PyTesseract proved very challenging to interact with. We are proud to be have been able to utilize the optical character recognition tool.
What we learned
- How a hackathon works!
- Optical character recognition
- Tools used for preprocessing images
- Basic web development
What's next for BookSortWizard
- The preprocessing of images can be improved. Our system struggles with books that have a light cover.
- We would like to include other categorization types for books. For instance, the Dewey Decimal System is often used as opposed to the Library of Congress system. A conversion tool between different categorization styles could also prove helpful.
- Allowing BookWizardSort to run on a mobile application instead of a website would likely improve its ease and effectiveness.
Log in or sign up for Devpost to join the conversation.