The final names dataset is compiled from a few sources:
- https://www.acko.com/health-insurance/s/pregnancy/baby-names/modern-indian-baby-names-for-boys-and-girls-with-meanings/
- https://www.kaggle.com/datasets/jasleensondhi/indian-names-corpus-nltk-data/data
- https://www.kaggle.com/datasets/meemr5/indian-names-boys-girls
The data from the above three sources is combined and saved in a file called names.txt in the data folder. The processing script is in the notebook prepare_names_data.qmd.
Learning to build neural networks, with a focus on Natural Language Processing, from the following sources:
- Neural Networks: Zero to Hero series on Youtube.
- Notes, Assignments and video lectures from Stanford course CS224N: Natural Language Processing with Deep Learning from Spring 2024.
- Early release version of the book - Hands-On Machine Learning with Scikit-Learn and PyTorch by Aurélien Géron, published by O’Reilly.