Skip to content

CharlesDexterW/TCA-genomics-MM9

Repository files navigation

Genomic Analysis Pipeline: TCA Cycle Regulatory Signatures

A Bioinformatics Toolkit for Mouse Genome (mm9) Analysis

🧬 Project Overview

This project implements a bioinformatics pipeline designed to analyze genomic sequences with a specific focus on the Tricarboxylic Acid (TCA) Cycle enzymes. The tool aims to process large-scale genomic data (FASTA/GTF) to identify regulatory signatures, calculate thermodynamic properties of DNA, and predict primer suitability for biochemical assays.

Given my background in biochemistry engineering, the project is focused on bridging the gap between raw sequence data and physical properties like Melting Temperature ($T_m$) and GC-content stabilization.

🛠 Features

  • Optimized Data Loading: Efficiently parses zipped FASTA files and gene annotation data using memory-mapped logic and state-machine parsing.
  • TCA Pathway Focus: Specifically isolates the Citrate Synthase (Cs) enzyme to analyze promoter regions.
  • DNA Thermodynamics: Includes a $T_m$ prediction engine for genomic sequences based on nucleotide composition.
  • PCR Primer Design: Automated scanning for candidate primers within regulatory regions.
  • Biochemical Visualization: Generates sliding-window GC content plots. Work on plotting regulatory signature comparisons is still in process.

📂 Project Structure

.
├── main.py                 # Core execution pipeline
├── LoadFASTA_Function.py   # High-performance sequence/gene loaders
├── sequence_analysis.py    # GC-content, Tm prediction, and Primer design logic
├── tca_analysis.py         # Enrichment and isolation of TCA cycle enzymes
├── visualization.py        # Matplotlib/Seaborn genomic plotting logic
├── Summary.py              # Automated biochemical report generation
└── README.md               # Documentation

🚀 Installation & Usage

Prerequisites

  • OS: Ubuntu 24.04 LTS
  • Environment: Python 3.12+
  • Dependencies:
pip install numpy matplotlib pandas seaborn

Execution

  1. Ensure mm9_sel_chroms_knownGene.txt and selChroms_mm9.fa.zip are in the project root.
  2. Run the main pipeline:
python3 main.py

📊 Biochemical Insights

1. Regulatory Signatures

The pipeline analyzes the upstream regions (promoters) of TCA cycle genes. It calculates the GC-content which dictates DNA stability:

$$GC%=\frac{G+C}{A+T+G+C}\times 100$$

2. Melting Temperature Prediction

Using the predicted $T_m$ for sequences, the tool assists in understanding the thermal denaturation profile of metabolic gene promoters.

3. Citrate Synthase (Cs) Analysis

A deep dive into the Cs promoter allows for the identification of potential transcription factor binding sites and the design of PCR primers for experimental verification.


🎓 Author

Andrés Benjamin Garcés Cifuentes Biochemistry Engineer Specializing in Genomic Data Science and Bioprocess Modeling.


About

This pipeline aims to analyze the Cytric acid cicle enzimes of mm9

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors