"Are we throwing away good data? Evaluation of chimera detection algorithms on long-read amplicons reveals high false positive rates across algorithms" (Hakimzadeh et al. 2025)
This repository contains the data and part of the analysis stack for the abovementioned paper. It is structured as follows:
Simulated data holds scripts related to the simulated dataset from generating the simulated data, chimeric sequence creation, quality filtering, and chimera filtering related to the simulated dataset. Moreover, the scripts for the simulated dataset and statistical analysis were used to calculate the F1 score.
Real data holds scripts related to real data analysis.
BlasCh contains the BLAST scripts for alignment and specific module BlasCh designed for processing XML outputs to find false positive chimeras and false negative chimeras.
Figures & tables contain the scripts used for generating graphs and tables.
The workflow we followed for the real dataset was like this:
