Hi ๐, I'm Natasha โ a Quantitative Data Scientist working at the intersection of statistical modelling, machine learning, and complex systems.
My background spans business and economics, econometrics, risk analysis, and computer science. I started in business economics, moved into statistical modelling and uncertainty analysis during my PhD, and later added computer science to strengthen my engineering foundations.
I now build applied data and AI systems across the full workflow: data pipelines โ modelling โ evaluation โ deployment.
Iโm especially drawn to messy, high-stakes problems where the data is imperfect, the signal is partial, and the cost of getting it wrong is not trivial. That includes work in areas such as healthcare, risk, forecasting, trust, and decision support.
My work typically focuses on:
- statistical modelling, inference, and uncertainty analysis
- machine learning across structured, unstructured, image, and time-series data
- NLP and LLM-based systems where retrieval, evaluation, and reliability matter
- data pipelines, analytics engineering, and reproducible workflows in Python and SQL
- explainability, diagnostics, and decision support for real-world use
My approach is straightforward: strong statistical reasoning, practical engineering, and clear communication.
-
Built a verification workflow for comparing measured spectral signatures against reference compound libraries, with a focus on chemically similar classes, concentration differences, and classification reliability. A scientifically grounded project centred on signal quality, similarity structure, and decision confidence.
-
Histopathology AI for Breast Cancer Detection
Developed deep learning pipelines for breast cancer image classification using transfer learning, class-balancing strategies, and interpretability methods. This work explores not just model performance, but the practical challenges of medical image analysis in imbalanced settings.
-
Air Quality Forecasting (PM10)
Built forecasting models for PM10 pollution using regression and neural network approaches, including LSTM-based experiments. Focused on temporal modelling, feature design, and comparing methods for environmentally meaningful prediction.
-
Big Data Sentiment Analysis of NASDAQ Companies
Analysed more than 4 million tweets about NASDAQ-listed firms using Hadoop, MapReduce, Hive, and NLP techniques. The project combines large-scale text processing with sentiment analysis to study public opinion in financial contexts.
-
Customer Churn Prediction & Retention Analytics
Built an end-to-end churn modelling pipeline using XGBoost and SHAP, with an interactive analytics layer for interpreting risk drivers and supporting retention decisions. Designed to connect predictive modelling with usable business insight.
-
Analytics Data Warehouse & ETL Design
Designed a structured analytics warehouse with ETL workflows and dimensional modelling principles to support reporting and downstream analysis. This project reflects the data engineering side of data science: clean inputs, reliable structure, and usable outputs.
I write about systems under strain: data systems, decision systems, biological systems, and the human tendency to misunderstand all three.
My work sits somewhere between data science, AI, uncertainty, physiology, and philosophy of modern life. The themes vary, but the underlying interest is constant: complex systems, failure modes, and the gap between reality and the stories we tell about it.
I publish on Medium, Substack and LinkedIn. New connections are always welcome!
Languages
ML & Data Science
Engineering & Infrastructure
Visualisation & Dashboards