Skip to content
View drnsmith's full-sized avatar
๐Ÿ’ญ
I may be slow to respond.
๐Ÿ’ญ
I may be slow to respond.

Block or report drnsmith

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
drnsmith/README.md

Hi ๐Ÿ‘‹, I'm Natasha โ€” a Quantitative Data Scientist working at the intersection of statistical modelling, machine learning, and complex systems.

My background spans business and economics, econometrics, risk analysis, and computer science. I started in business economics, moved into statistical modelling and uncertainty analysis during my PhD, and later added computer science to strengthen my engineering foundations.

I now build applied data and AI systems across the full workflow: data pipelines โ†’ modelling โ†’ evaluation โ†’ deployment.

Iโ€™m especially drawn to messy, high-stakes problems where the data is imperfect, the signal is partial, and the cost of getting it wrong is not trivial. That includes work in areas such as healthcare, risk, forecasting, trust, and decision support.

My work typically focuses on:

  • statistical modelling, inference, and uncertainty analysis
  • machine learning across structured, unstructured, image, and time-series data
  • NLP and LLM-based systems where retrieval, evaluation, and reliability matter
  • data pipelines, analytics engineering, and reproducible workflows in Python and SQL
  • explainability, diagnostics, and decision support for real-world use

My approach is straightforward: strong statistical reasoning, practical engineering, and clear communication.

๐Ÿ’ผ Featured Projects

  • Spectral Drug Verification

    Built a verification workflow for comparing measured spectral signatures against reference compound libraries, with a focus on chemically similar classes, concentration differences, and classification reliability. A scientifically grounded project centred on signal quality, similarity structure, and decision confidence.

  • Histopathology AI for Breast Cancer Detection

    Developed deep learning pipelines for breast cancer image classification using transfer learning, class-balancing strategies, and interpretability methods. This work explores not just model performance, but the practical challenges of medical image analysis in imbalanced settings.

  • Air Quality Forecasting (PM10)

    Built forecasting models for PM10 pollution using regression and neural network approaches, including LSTM-based experiments. Focused on temporal modelling, feature design, and comparing methods for environmentally meaningful prediction.

  • Big Data Sentiment Analysis of NASDAQ Companies

    Analysed more than 4 million tweets about NASDAQ-listed firms using Hadoop, MapReduce, Hive, and NLP techniques. The project combines large-scale text processing with sentiment analysis to study public opinion in financial contexts.

  • Customer Churn Prediction & Retention Analytics

    Built an end-to-end churn modelling pipeline using XGBoost and SHAP, with an interactive analytics layer for interpreting risk drivers and supporting retention decisions. Designed to connect predictive modelling with usable business insight.

  • Analytics Data Warehouse & ETL Design

    Designed a structured analytics warehouse with ETL workflows and dimensional modelling principles to support reporting and downstream analysis. This project reflects the data engineering side of data science: clean inputs, reliable structure, and usable outputs.

๐Ÿ“ How I Think & Where I Write

I write about systems under strain: data systems, decision systems, biological systems, and the human tendency to misunderstand all three.

My work sits somewhere between data science, AI, uncertainty, physiology, and philosophy of modern life. The themes vary, but the underlying interest is constant: complex systems, failure modes, and the gap between reality and the stories we tell about it.

I publish on Medium, Substack and LinkedIn. New connections are always welcome!


๐Ÿ›  Tools & Stack

Languages

Python R MATLAB SQL

ML & Data Science

TensorFlow PyTorch scikit-learn Pandas NumPy

Engineering & Infrastructure

Docker FastAPI Git AWS GCP

Visualisation & Dashboards

Plotly Streamlit Tableau

Popular repositories Loading

  1. Histopathology-AI-BreastCancer Histopathology-AI-BreastCancer Public

    Deep learning models for breast cancer diagnosis using histopathology images. Techniques include advanced CNN architectures, class balancing, and Grad-CAM interpretability.

    Jupyter Notebook 1 1

  2. drnsmith drnsmith Public

    Config files for my GitHub profile.

  3. ML-foundations ML-foundations Public

    Forked from jonkrohn/ML-foundations

    Machine Learning Foundations: Linear Algebra, Calculus, Statistics & Computer Science

    Jupyter Notebook

  4. Client-Server-Network-Socket-Programming Client-Server-Network-Socket-Programming Public

    This project builds a client-server network.

    Python

  5. SQLtoAPI-RBAC SQLtoAPI-RBAC Public

    The purpose of this project is to create SQL queries to support functional requirements, create a business logic layer to have API for each functional requirement using NodeJS, test and demonstrateโ€ฆ

    JavaScript

  6. char-rnn char-rnn Public

    Forked from karpathy/char-rnn

    Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch

    Lua