Skip to content
View biomystery's full-sized avatar

Block or report biomystery

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
biomystery/README.md

πŸ‘‹ Hi, I'm Frank (Zhang Cheng)

Director, Bioinformatics · CAR-T Cell Therapy · San Diego 🌊

Building the data infrastructure that turns cells into cures.

GitHub followers Profile views


🧬 About Me

Computational biologist and bioinformatics leader with 10+ years spanning genomics, epigenomics, single-cell multi-omics, and therapeutic data science. I build the pipelines, platforms, and AI-powered tools that make biological data reproducible and actionable β€” from academic research to clinical CAR-T development.

πŸ”­ Current focus  β†’  CAR-T manufacturing QC | Multi-omics | LLM tooling
🌱 Growing in     β†’  Data engineering (Dagster/DBT) | AI-assisted pipelines
πŸ’¬ Ask me about   β†’  scRNA-seq Β· snATAC-seq Β· Snakemake Β· R Shiny Β· Cell therapy
πŸ“ Based in       β†’  San Diego, CA

πŸ› οΈ Tech Stack

Genomics & Bioinformatics
Python R MATLAB Shell Jupyter Nextflow

Data Engineering & Web
TypeScript SQL Django Docker Dagster dbt AWS

AI & Agentic
Claude LLM Obsidian


🌟 Highlighted Public Work

πŸ”¬ atacCNV

CNV calling from bulk ATAC-seq data in R.

Production RNA-seq pipeline: Snakemake + Kallisto.

Mathematical modeling of innate immune signaling (MATLAB/ODE).

Python microservice bridging LLMs and biomedical research workflows.


🏒 Industry Work @ Sonoma Biotherapeutics

Director of Bioinformatics building the company’s full data infrastructure for CAR-T cell therapy β€” from raw NGS to clinical dashboards across 14+ internal repos.

System Description Stack
CMC Data Platform Full-stack manufacturing analytics: ETL, Prisma/DBML schema, SDK, web frontend for CAR-T QC reporting Python Β· TypeScript
Clinical LIMS Sample tracking system (v0.15.2, 17 releases) with AI module, Dockerized TypeScript Β· Docker
Clinical Data Pipelines Dagster + dbt orchestrated pipelines for clinical trial data; R and Python variants Python Β· R Β· Dagster Β· dbt
Patient Safety Monitor R Shiny app for real-time safety monitoring with EDC and ADAM datasets R Β· Shiny
Internal Datahub Data sharing portal with Nextflow NGS pipelines and reproducible dataset registry Nextflow Β· Shell
Translational Omics scRNA-seq + CAR construct integration, Treg fingerprinting, clinical reports R Β· Jupyter
Cloud Bioinformatics AWS bioinformatics environment and document data extraction automation Python Β· AWS

πŸ”’ All repositories above are private. Details available upon request.


πŸ›οΈ Academic Research @ Bing Ren Lab Β· UCSD (epigen-UCSD)

Contributed to the Bing Ren epigenomics lab at UCSD, building core NGS infrastructure and analysis pipelines for chromatin accessibility and gene regulation research.

Project Description Stack
Lab Management System Django-based web app for NGS job tracking, barcode validation, sequencing status, and run management Python Β· Django
ATAC-seq Pipeline Production ATAC-seq processing pipeline (6 β˜…) Python Β· Shell
ChIP-seq Pipeline ChIP-seq analysis pipeline with BDS workflow engine Java Β· Shell
BCL Demux QC Parser for bcl2fastq undetermined barcodes β€” Illumina sequencing QC Python
snATAC Tools Index designer, preprocessing pipeline, and motif analysis for single-nucleus ATAC-seq Python Β· Shell Β· R
Hi-C Scripts Processing scripts for Hi-C 3D genome data (3 β˜…) Jupyter Β· Shell
Nat. Genetics 2021 Analysis notebooks for peer-reviewed publication (4 β˜…) Jupyter
C Bioinformatics Tools Adapted lsgkm (gkm-SVM) and kentUtils for regulatory sequence analysis C

Work spans 2017–2021. Repos are public at github.com/epigen-UCSD.


πŸ€– Side Projects @ LolipopAI

Personal AI tooling lab β€” building agentic productivity systems with Claude AI, TypeScript, and Obsidian.

Project Description Stack
Job Applier Skill Suite Claude Code agentic skill: CV β†’ tailored resume PDF + cover letter + JD match + interview prep, fully automated Shell Β· Claude AI
Job Seeker Copilot Full-stack TypeScript UI for end-to-end job application management TypeScript
AI-Powered Second Brain Obsidian Cognitive OS: AI-collaborative PKM with Cursor rules. Published EN & CN editions (MIT) TypeScript Β· Python Β· Obsidian
Journal-to-Graph Python pipeline: journal entries β†’ knowledge graph Python

πŸ”— github.com/lolipopai


πŸ”’ Personal Private Research

~40 additional private repos: personal multi-omics clinical projects, systems biology manuscripts, and AI/LLM tooling experiments.


πŸ“Š GitHub Stats

Frank's GitHub Stats Top Languages


Open to collaborations in bioinformatics tooling, cell therapy data science, and AI Γ— genomics

Pinned Loading

  1. epigen-UCSD/epigen_ucsd_django epigen-UCSD/epigen_ucsd_django Public

    Python 1 1

  2. CA-HIV-infection-model CA-HIV-infection-model Public

    A stochastic agent-based model to demonstrate two types of HIV transmission.

    Python 2

  3. epigen-UCSD/atac_seq_pipeline epigen-UCSD/atac_seq_pipeline Public

    Python 6

  4. epigen-UCSD/bcl2fastq_undetermined_parser epigen-UCSD/bcl2fastq_undetermined_parser Public

    Parse the result from bcl2fastq and get the undermined barcode info

    Python 1 1

  5. epigen-UCSD/chipseq_pipeline epigen-UCSD/chipseq_pipeline Public

    Java 1

  6. research_ai_service research_ai_service Public

    Python