Inspiration

Many students tend to copy stuff and simply ignore the essence of the project. Hence it is an attempt to check to what degree the 2 documents are similar and give a short summary of the 2 documents so that the user can understand the keynote stuff.

What it does

-User has to select 2 files be it in txt or pdf format -OCR is used so that the application can easily accept all types of pdf by means of recognition of if the content is text or images. -Any other file type excepting these would result in an error. -After processing, it will give the desired value.

How I built it

I have used several open source technologies to get the model going. I have used OCRs, concepts of NLP and also used an equivalent amount of Pyqt5 as the frontend technology to float our project.

Challenges I ran into

  • I was quite new to the field of NLP but with the use of various youtube tutorials and friends, I got to know about such technologies and implemented it. -The OCR one was a bit tough on our part as I didn't know how to do it. ##Accomplishments that I am proud of -I am happy that this works on all types of pdfs irrespective of image pdf or searchable pdfs and all gives the correct output.

What I learned

-pyqt5 and various applications of NLP

What's next for Sirius

-Deploy it in various versions like in web and app -Take direct input of images to find similarity from web or mobile cam

Built With

Share this project:

Updates