Data Scientist | NLP & LLM Engineer | NYU M.S. Data Science '26
π New York, NY Β Β·Β π§ [email protected] Β Β·Β πΌ linkedin.com/in/deepali-bk Β Β·Β π Portfolio
I'm a Data Science graduate student at New York University (GPA: 3.6) with a background in software engineering and machine learning. My work sits at the intersection of NLP, LLMs, and production ML systems β from processing millions of unstructured documents to evaluating emotional intelligence in large language models.
I've built classifiers, information extraction pipelines, and multilingual NLP systems across healthcare, enterprise, and research domains. I care about models that work in the real world, not just on benchmarks.
- π¬ Graduate Research Assistant @ NYU Rory Meyers College of Nursing
- π Violet Internship & Research Award 2025
- π Women in Data Science Ambassador 2026
Languages & Tools
Python R SQL C++ Git
Machine Learning & Deep Learning
Scikit-learn PyTorch TensorFlow XGBoost Flair FastText Computer Vision
NLP & Generative AI
HuggingFace Transformers LangChain SpaCy BERT RoBERTa GPT-4 RAG Systems
Sentiment Analysis Topic Modeling Named Entity Recognition Few-shot Prompting
Data & Visualization
Pandas NumPy Matplotlib Seaborn Tableau Spark Hadoop (HDFS)
Statistics
Statistical Modeling Bayesian Methods A/B Testing Hypothesis Testing
Production-grade ML pipeline combining BeautifulSoup + XGBoost with rule-based heuristics for HTML quality detection.
- Achieved 0.98 precision / 0.920 F1 score
- Built a GPT-4.1-mini few-shot labeling pipeline to expand an imbalanced seed dataset
- Eliminated manual QA bottlenecks entirely
Python XGBoost BeautifulSoup GPT-4 Few-shot Learning ML Pipelines
Benchmarked emotional intelligence capabilities of Gemma, Qwen, and Llama using zero-shot and few-shot prompt engineering.
- Demonstrated above-chance performance across all three models
- Leveraged LangChain and HuggingFace for evaluation orchestration
LangChain HuggingFace Prompt Engineering Zero-shot Few-shot LLM Evaluation
- Statistical modeling on survey data from 90+ countries for COVID-19 healthcare pattern recognition
- Built interactive Tableau dashboard for the organization's public-facing website
- BERT-based multi-class classifier achieving 87% accuracy on 5,000+ survey responses on violence against medical professionals
- Analyzed 21,000+ survey responses across 40+ languages using multilingual RoBERTa
- Applied sentiment analysis and topic modeling on PPE usage data during COVID-19
- Led NER implementation as SME for a Fortune 500 telecom client β extracted entities from 2M+ unstructured documents using SpaCy
- Reduced manual review time by 71% through automated document processing
- Built classification system with Flair + FastText to categorize 10,000+ reviews for C-suite strategic planning
| Degree | Institution | Year |
|---|---|---|
| M.S. Data Science | New York University | 2024 β 2026 |
| B.E. Electronics & Communication | B.N.M Institute of Technology | 2016 β 2020 |
Relevant coursework: Deep Learning Β· Machine Learning Β· Natural Language Understanding Β· Reinforcement Learning Β· Big Data Β· AI Applications in Business (NYU Stern)
- π₯ Violet Internship & Research Award 2025 β Competitive funding for excellence in research and internship performance
- π Women in Data Science Ambassador 2026 β Selected to represent and promote WiDS initiatives
Always open to research collaborations and data science opportunities. Feel free to reach out!