Skip to content

teepean/WGSExtractv6

Repository files navigation

WGS Extract v6

A tool for processing whole genome sequencing (WGS) data files. Supports BAM/CRAM analysis, microarray file generation, and Y/MT haplogroup determination.

Features

  • BAM/CRAM file analysis with coverage statistics and reference genome detection
  • Microarray RAW file generation (23andMe, AncestryDNA, FTDNA, MyHeritage, etc.)
  • Y chromosome haplogroup determination using Yleaf
  • Mitochondrial haplogroup determination using Haplogrep
  • Reference genome library management with automatic downloading
  • MT/Y chromosome extraction to FASTA, BAM, or VCF

Requirements

System

  • Linux, macOS, or Windows
  • 8 GB RAM minimum (16+ GB recommended)
  • 50 GB free disk space (200+ GB recommended for reference genomes)

Software

  • Python 3.11 or later
  • samtools 1.10+
  • bcftools 1.10+
  • tabix 1.10+
  • Java 8+ (for Haplogrep)

Installing Dependencies

Ubuntu/Debian:

sudo apt install samtools bcftools tabix default-jre python3.11 python3.11-venv

Fedora/RHEL:

sudo dnf install samtools bcftools htslib java-latest-openjdk python3.11

macOS (Homebrew):

brew install samtools bcftools htslib openjdk [email protected]

Windows:

Installation

Linux / macOS

./setup.sh

Windows

setup.bat

The setup script creates a Python virtual environment, installs dependencies, and configures the bundled Yleaf tool.

Usage

Linux / macOS

./wgsextract.sh

Windows

wgsextract.bat

First Run

  1. Load a BAM/CRAM file from the Settings tab
  2. Download a reference genome matching your file (Settings → Reference Library)
  3. Use Extract Data tab for microarray generation or chromosome extraction
  4. Use Analyze tab for haplogroup determination

Project Structure

wgsextract/
├── core/           # Configuration and logging
├── services/       # BAM analysis, microarray generation, haplogroups
├── gui/            # PyQt6 user interface
├── models/         # Data structures
└── utils/          # Utility functions

reference/          # Reference genome catalog
yleaf/              # Bundled Yleaf 3.2.1 (Y haplogroup tool)
tools/              # Haplogrep JAR file

Supported Microarray Formats

  • 23andMe (V3, V4, V5)
  • AncestryDNA (V1, V2)
  • FTDNA (V1, V2, V3)
  • MyHeritage
  • Living DNA
  • National Geographic (Geno 2.0)
  • CombinedKit (all available SNPs)

License

GNU General Public License v3 or later

Acknowledgments

  • Yleaf: Y-chromosomal haplogroup assignment tool (Ralf et al.)
  • Haplogrep: mtDNA haplogroup classification
  • htslib/samtools/bcftools: BAM/VCF processing libraries

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors