Skip to content

Question about size of the resulting fasta file of mitochondrial genome sequencing  #22

@Ponerinae

Description

@Ponerinae

Dear Changwei,

I was using autoMito to generate the 2 gfa files (raw and master) of the assembled Malus Domestica genome (as in Demo2) with the code as follows:

~/PMAT-1.5.3/bin/PMAT autoMito -i Malus_domestica.540Mb.fa -o ./out.all -st hifi -g 703m -mm -tp all -cpu 20

and after using ll command I found the size of the 2 files are all around 500000b:

-rw-rw-r-- 1 526388 2024-07-02 21:06:48 PMAT_mt_master.gfa
-rw-rw-r-- 1 557590 2024-07-02 21:06:48 PMAT_mt_raw.gfa

The contigs included in raw.gfa are:

1
2
3
2159
4834
15388
1233

However, the reference mitochondrial sequence data for Apple I downloaded from NCBI (https://www.ncbi.nlm.nih.gov/nuccore/NC_018554.1/) is only 403000b in size, and the obtained raw fasta file contains many contigs that are not included in the reference sequence, e.g. contig 4834. So I copied and pasted the contig into NCBI's search engine, and found that this contig actually belongs to apple's chloroplast genome. In other words, the autoMito command I used earlier caused chloroplast sequences to get included in the mt gfa file, which is supposed to contain only mitochondrial genome.

Do you have any clue on this?

Thank you very much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions