Skip to content
View Shaflovescoffee19's full-sized avatar
💭
👩🏻‍💻
💭
👩🏻‍💻

Block or report Shaflovescoffee19

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Shaflovescoffee19/README.md

Hi there, I'm Mohiraa Shafreen!

I'm a bioengineer who got pulled into computational biology and never looked back. My background is wet lab (DNA extractions, PCR, microbial cultures), but somewhere along the way, I got fascinated by what you can learn from data that you simply cannot see at a bench.

I'm interested in building the computational tools that read that story accurately and I'm currently looking for roles at the intersection of ML, multi-omics, and precision medicine.

Connect with me on LinkedIn Mohiraa Shafreen | Check out my publications Google Scholar | For research or project collaborations, feel free to reach out at [email protected] |📄 Download CV

📚Background

  • Bioengineer with a Gold Medal (B.Tech Biotechnology) | M.Tech First Class | Bioinformatics Industrial Internship
  • Published 8 peer-reviewed articles | 190+ citations
  • Integrated expertise in computational workflows and wet lab (both sides of the bench😉)

What I've been building

Over the past few months I built a 10-project ML portfolio working through the techniques that keep showing up in computational biology research. Started from scratch, went in order, and tried to actually understand each method rather than just run the code.

Project What I learned Techniques Why I built it
Heart Disease EDA How to actually read a dataset before touching a model pandas, seaborn, statistical analysis, visualisation Most tutorials skip straight to modelling. I wanted to get this part right first
Diabetes Data Cleaning Real medical data is messy and cleaning it properly takes longer than modelling Missing data imputation, IQR outlier capping, feature engineering, scaling Dirty data breaks everything downstream and I wanted to understand how to fix it properly
Cancer Risk Classification When the simplest model wins and why that is not a failure Logistic regression, Random Forest, XGBoost, AUC-ROC, cross-validation Needed to understand the core classification algorithms and how to evaluate them honestly
Survival Analysis Time-to-event modelling has its own entirely different logic from classification Kaplan-Meier, log-rank test, Cox Proportional Hazards, C-index This comes up constantly in clinical research and I had no idea how it worked
Customer Segmentation Finding structure in data without being told what to look for K-Means, Elbow Method, Silhouette Score, PCA Unsupervised learning is everywhere in omics research and I had never properly done it
Gene Expression Clustering RNA-Seq data has its own preprocessing rules and skipping them breaks everything Log transformation, variance selection, hierarchical clustering, heatmaps I work with this kind of data and wanted to understand the pipeline from raw counts to clusters
Explainable AI with SHAP A model nobody can explain is a model nobody will trust or use TreeExplainer, beeswarm, waterfall plots, bootstrap stability Interpretability matters a lot in clinical contexts and I wanted to go beyond feature importance
Counterfactual Explanations Turning a risk score into something a person can actually act on Actionable counterfactuals, diverse CF generation SHAP tells you why. Counterfactuals tell you what to change. Both matter
Multi-Modal Data Fusion Genomic, microbiome, and clinical data together tell a story none of them can tell alone Early/late/intermediate fusion, stacking ensemble, ablation study Multi-omics integration is the problem I most want to work on and this is its core technical challenge
Transfer Learning When your target population is small you need a model that borrows knowledge not one that starts blind Neural network pre-training, layer freezing, fine-tuning, learning curves Small and underrepresented cohorts are a real problem in genomics research and this is how you address it

💻 Technical Skills

💡Core Bioinformatics Expertise

  • 🧬 NGS Pipelines: Quality control → Alignment → Quantification → Analysis
  • 🔬 Variant Analysis: VCF processing, annotation, population genetics
  • 🦠 Metagenomics: Taxonomic profiling, diversity analysis, phylogenetics
  • 🧬 RNA-seq Analysis: Reference-based & De novo

🔬 Wet Lab Expertise

Domain Techniques Sample Types
Microbiology Microbial isolation, antimicrobial screening, Monod modeling, growth kinetics Bacterial cultures, environmental samples
Molecular Biology DNA/RNA extraction, PCR, qPCR, RT-PCR Bacteria, blood, feces, plant tissues, soil, water, fungi
Biochemistry Enzymatic assays, protein quantification, metabolite extraction, purification & analysis Cellular extracts

🤓Fun Fact: When I'm not sciencing, I read a lot of books, hoard them, watch a lot of movies, and analyze more film plots on Letterboxd @manicindisguise(^▽^) 📊➡️🎬

PS: Currently reading Project Hail Mary by Andy Weir (microbiologists would especially LOVE it). The movie trailer blew me away… can’t wait to see if the movie is better than the book, or if the book wins out as always.

𝗨𝗽𝗱𝗮𝘁𝗲: The movie is 𝘀𝗼𝗼𝗼𝗼 𝗴𝗼𝗼𝗱. Book = Movie. No notes.

And for all the Game of Thrones fans out there.. This is actually from George R. R. Martin

If you like a lot of science in your science fiction, Andy Weir is the writer for you....

Popular repositories Loading

  1. knowledge-based-ml-analysis knowledge-based-ml-analysis Public

    Knowledge-based ML for analysing and functionally interpreting upregulated genes involved in the co-morbidity of patients undergoing maintenance hemodialysis and heart failure

    R 2

  2. single_cell single_cell Public

    Forked from Munfred/scbb

    Single Cell Biology and Bioinformatics

    2

  3. cancer-classification cancer-classification Public

    Trains and compares Logistic Regression, Random Forest, and XGBoost to classify breast tumours as malignant or benign. Includes ROC curves, confusion matrices, 5-fold cross-validation, and feature …

    Python 2

  4. Shaflovescoffee19 Shaflovescoffee19 Public

    Your Friendly Neighbourhood Bioinfo Engineer

    1

  5. cbioportal cbioportal Public

    Forked from cBioPortal/cbioportal

    cBioPortal for Cancer Genomics

    Java 1

  6. TCGAWorkflow TCGAWorkflow Public

    Forked from BioinformaticsFMRP/TCGAWorkflow

    TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages

    TeX 1