Analysis code and workflows for Fernandez-Guerra et al. 2025 — ancient metagenomics of the Kap København Formation, Greenland.
.
├── analysis/
│ ├── scripts/ # R scripts for figures and supplementary tables (01–15)
│ │ └── extra/ # Auxiliary scripts (simulations, benchmarks, sourcetracker curation)
│ ├── libs/ # Shared R helper functions
│ ├── beast2/ # BEAST2 XMLs and MCC trees (Methanoflorens stasis)
│ └── .scripts/ # Packaging utilities (stage_erda.sh)
└── wf/
├── analysis/
│ ├── function/ # DART/AGP functional profiling pipeline (KEGG + CAZy)
│ └── taxonomy/ # Taxonomic profiling + metaDMG authentication pipeline
├── binning/ # MAG binning quality assessment (CheckM2, GUNC, GTDB-Tk)
└── stasis/ # Methanoflorens evolutionary stasis pipeline (phylogenomics + BEAST2)
Input data and pre-computed results are distributed via ERDA (not included in this repository).
Download and extract into the analysis/ working directory:
cd analysis
wget http://files.metagenomics.eu/2025-kapk-microbial-v2/data.tar.gz
wget http://files.metagenomics.eu/2025-kapk-microbial-v2/results.tar.gz
tar xzf data.tar.gz
tar xzf results.tar.gzThe taxonomic database used for profiling is available at: https://doi.org/n6ks
Run from the analysis/ directory in order:
| Script | Description |
|---|---|
01--control-analysis.R |
Control sample QC |
02--prepare-taxonomic-data.R |
Taxonomic data processing |
03--dmg-threshold-selection.R |
Damage threshold selection |
04--taxonomic-profiling.R |
Main taxonomic profiling |
05--sourcetracker.R |
Source tracking |
06--taxonomic-mag-distribution.R |
MAG distribution analysis |
07--agp-functional.R |
DART/AGP functional profiling (KEGG + CAZy) |
08--virome-figures.R |
Viral community analysis |
09--biomarkers.R |
Lipid biomarker analysis |
10--briggs.R |
Briggs damage pattern analysis |
11--reextractions.R |
Re-extraction comparison |
12--summary-tables.R |
Supplementary tables generation |
13--methanoflorens-stasis.R |
Methanoflorens evolutionary stasis figure |
generate_dart_tables.R |
DART supplementary tables |
generate_mag_table.R |
MAG quality table |
rerender_heatmap.R |
Functional heatmap figure |
All pipelines use Snakemake (≥7.0) with conda environments.
Read preprocessing, mapping to the hires taxonomic database, filterBAM filtering, and metaDMG damage authentication.
DART gene prediction on ancient reads, MMseqs2 search against KEGG/CAZy databases, damage annotation via AGP, and per-sample aggregation.
CheckM2 completeness/contamination, GUNC chimerism detection, and GTDB-Tk phylogenetic classification of recovered MAGs.
Phylogenomics pipeline for Methanoflorens: marker gene extraction, alignment, IQ-TREE ML trees, BEAST2 tip-dating, SNP matrix construction, and pN/pS calculation.
- R ≥ 4.0 with tidyverse, phyloseq, ape, ggtree, openxlsx, and related packages
- Snakemake ≥ 7.0, conda/mamba
- DART/AGP (see DART repository)
Fernandez-Guerra et al. 2025. https://doi.org/10.1101/2023.06.10.544454
For questions or issues, please open an issue in this repository.