I'm a bioengineer who got pulled into computational biology and never looked back. My background is wet lab (DNA extractions, PCR, microbial cultures), but somewhere along the way, I got fascinated by what you can learn from data that you simply cannot see at a bench.
I'm interested in building the computational tools that read that story accurately and I'm currently looking for roles at the intersection of ML, multi-omics, and precision medicine.
Connect with me on LinkedIn Mohiraa Shafreen | Check out my publications Google Scholar | For research or project collaborations, feel free to reach out at [email protected] |📄 Download CV
- Bioengineer with a Gold Medal (B.Tech Biotechnology) | M.Tech First Class | Bioinformatics Industrial Internship
- Published 8 peer-reviewed articles | 190+ citations
- Integrated expertise in computational workflows and wet lab (both sides of the bench😉)
Over the past few months I built a 10-project ML portfolio working through the techniques that keep showing up in computational biology research. Started from scratch, went in order, and tried to actually understand each method rather than just run the code.
| Project | What I learned | Techniques | Why I built it |
|---|---|---|---|
| Heart Disease EDA | How to actually read a dataset before touching a model | pandas, seaborn, statistical analysis, visualisation | Most tutorials skip straight to modelling. I wanted to get this part right first |
| Diabetes Data Cleaning | Real medical data is messy and cleaning it properly takes longer than modelling | Missing data imputation, IQR outlier capping, feature engineering, scaling | Dirty data breaks everything downstream and I wanted to understand how to fix it properly |
| Cancer Risk Classification | When the simplest model wins and why that is not a failure | Logistic regression, Random Forest, XGBoost, AUC-ROC, cross-validation | Needed to understand the core classification algorithms and how to evaluate them honestly |
| Survival Analysis | Time-to-event modelling has its own entirely different logic from classification | Kaplan-Meier, log-rank test, Cox Proportional Hazards, C-index | This comes up constantly in clinical research and I had no idea how it worked |
| Customer Segmentation | Finding structure in data without being told what to look for | K-Means, Elbow Method, Silhouette Score, PCA | Unsupervised learning is everywhere in omics research and I had never properly done it |
| Gene Expression Clustering | RNA-Seq data has its own preprocessing rules and skipping them breaks everything | Log transformation, variance selection, hierarchical clustering, heatmaps | I work with this kind of data and wanted to understand the pipeline from raw counts to clusters |
| Explainable AI with SHAP | A model nobody can explain is a model nobody will trust or use | TreeExplainer, beeswarm, waterfall plots, bootstrap stability | Interpretability matters a lot in clinical contexts and I wanted to go beyond feature importance |
| Counterfactual Explanations | Turning a risk score into something a person can actually act on | Actionable counterfactuals, diverse CF generation | SHAP tells you why. Counterfactuals tell you what to change. Both matter |
| Multi-Modal Data Fusion | Genomic, microbiome, and clinical data together tell a story none of them can tell alone | Early/late/intermediate fusion, stacking ensemble, ablation study | Multi-omics integration is the problem I most want to work on and this is its core technical challenge |
| Transfer Learning | When your target population is small you need a model that borrows knowledge not one that starts blind | Neural network pre-training, layer freezing, fine-tuning, learning curves | Small and underrepresented cohorts are a real problem in genomics research and this is how you address it |
- 🧬 NGS Pipelines: Quality control → Alignment → Quantification → Analysis
- 🔬 Variant Analysis: VCF processing, annotation, population genetics
- 🦠 Metagenomics: Taxonomic profiling, diversity analysis, phylogenetics
- 🧬 RNA-seq Analysis: Reference-based & De novo
| Domain | Techniques | Sample Types |
|---|---|---|
| Microbiology | Microbial isolation, antimicrobial screening, Monod modeling, growth kinetics | Bacterial cultures, environmental samples |
| Molecular Biology | DNA/RNA extraction, PCR, qPCR, RT-PCR | Bacteria, blood, feces, plant tissues, soil, water, fungi |
| Biochemistry | Enzymatic assays, protein quantification, metabolite extraction, purification & analysis | Cellular extracts |
🤓Fun Fact: When I'm not sciencing, I read a lot of books, hoard them, watch a lot of movies, and analyze more film plots on Letterboxd @manicindisguise(^▽^) 📊➡️🎬
PS: Currently reading Project Hail Mary by Andy Weir (microbiologists would especially LOVE it). The movie trailer blew me away… can’t wait to see if the movie is better than the book, or if the book wins out as always.
𝗨𝗽𝗱𝗮𝘁𝗲: The movie is 𝘀𝗼𝗼𝗼𝗼 𝗴𝗼𝗼𝗱. Book = Movie. No notes.
If you like a lot of science in your science fiction, Andy Weir is the writer for you....