A tool for processing whole genome sequencing (WGS) data files. Supports BAM/CRAM analysis, microarray file generation, and Y/MT haplogroup determination.
- BAM/CRAM file analysis with coverage statistics and reference genome detection
- Microarray RAW file generation (23andMe, AncestryDNA, FTDNA, MyHeritage, etc.)
- Y chromosome haplogroup determination using Yleaf
- Mitochondrial haplogroup determination using Haplogrep
- Reference genome library management with automatic downloading
- MT/Y chromosome extraction to FASTA, BAM, or VCF
- Linux, macOS, or Windows
- 8 GB RAM minimum (16+ GB recommended)
- 50 GB free disk space (200+ GB recommended for reference genomes)
- Python 3.11 or later
- samtools 1.10+
- bcftools 1.10+
- tabix 1.10+
- Java 8+ (for Haplogrep)
Ubuntu/Debian:
sudo apt install samtools bcftools tabix default-jre python3.11 python3.11-venvFedora/RHEL:
sudo dnf install samtools bcftools htslib java-latest-openjdk python3.11macOS (Homebrew):
brew install samtools bcftools htslib openjdk [email protected]Windows:
- Python: https://www.python.org/downloads/
- samtools/bcftools: https://github.com/samtools/samtools/releases
- Java: https://adoptium.net/
./setup.shsetup.batThe setup script creates a Python virtual environment, installs dependencies, and configures the bundled Yleaf tool.
./wgsextract.shwgsextract.bat- Load a BAM/CRAM file from the Settings tab
- Download a reference genome matching your file (Settings → Reference Library)
- Use Extract Data tab for microarray generation or chromosome extraction
- Use Analyze tab for haplogroup determination
wgsextract/
├── core/ # Configuration and logging
├── services/ # BAM analysis, microarray generation, haplogroups
├── gui/ # PyQt6 user interface
├── models/ # Data structures
└── utils/ # Utility functions
reference/ # Reference genome catalog
yleaf/ # Bundled Yleaf 3.2.1 (Y haplogroup tool)
tools/ # Haplogrep JAR file
- 23andMe (V3, V4, V5)
- AncestryDNA (V1, V2)
- FTDNA (V1, V2, V3)
- MyHeritage
- Living DNA
- National Geographic (Geno 2.0)
- CombinedKit (all available SNPs)
GNU General Public License v3 or later
- Yleaf: Y-chromosomal haplogroup assignment tool (Ralf et al.)
- Haplogrep: mtDNA haplogroup classification
- htslib/samtools/bcftools: BAM/VCF processing libraries