Inspiration
Inspiration behind it was the growing need to automatically extract useful information from unstructured text. In this technical era as digitalized documents are increasing its importance, it is need of identifying key entities like names, locations, organizations, and dates that can help to automate tasks in domains such as healthcare, finance, and law. This project aims to highlight and extract such entities using various NLP libraries in Python.
What it does
This project aims to extract meaningful entities such as: Person names, Locations, Organizations, Dates and Times, Monetary values from a document using NER. Percentages. These entities are then highlighted or tagged to provide structure and insight into the text.
How we built it
I used multiple Python libraries and techniques including: spaCy: for fast, pre-trained NER tagging NLTK: for tokenization and preprocessing
Challenges we ran into
To integrate different libraries and tackle with model compatibility Dealing with noisy or informal text was much difficult.
Accomplishments that we're proud of
I combine different NER tools and their outputs successfully and built a clean and user_friendly interface to extract information.
What we learned
strengths and limitations How different models perform depending on the domain and language importance of preprocessing and data cleaning before applying NER
What's next for Named Entity Recognition
Implement entity linking to external databases like wikidata or wikipedia, To enhance contextual understandings
Log in or sign up for Devpost to join the conversation.