Skip to content

Add checks for genotype file uploads #32

@Cristianetaniguti

Description

@Cristianetaniguti

Adding vcf_sanity_check function to modules

  • GS
  • GWAS
  • Diversity
  • PCA
  • Dosage Call
  • GSAcc
  • DAPC

Example of usage:

res <- vcf_sanity_check(vcf_path, n_data_lines = 100, max_markers = 10000, verbose = FALSE)

It checks:

Check FALSE Message TRUE Message
VCF_header VCF header is missing. Please check the file format. VCF header is present.
VCF_compressed VCF is compressed but filename doesn't have the extension or it has non-supported format VCF is .gz compressed or uncompressed
VCF_columns Required VCF columns are missing. Please check the file format. Required VCF columns are present.
max_markers More than 10,000 markers found. Consider subsampling or running in HPC. Less than maximum number of markers found.
GT Genotype information is not available in the VCF file. Genotype information is available in the VCF file.
allele_counts Allele counts are not available in the VCF file. Allele counts are available in the VCF file.
samples Sample information is not available in the VCF file. Sample information is available in the VCF file.
chrom_info Chromosome information is not available in the VCF file. Chromosome information is available in the VCF file.
pos_info Position information is not available in the VCF file. Position information is available in the VCF file.
ref_alt REF/ALT fields contain invalid nucleotide codes. REF/ALT fields are valid.
multiallelics Multiallelic sites not found in the VCF file. Multiallelic sites found in the VCF file.
phased_GT Phased genotypes (|) are not present in the VCF file. Phased genotypes (|) are present in the VCF file.
duplicated_samples No duplicated sample IDs found. Duplicated sample IDs found.
duplicated_markers No duplicated marker IDs found. Duplicated marker IDs found.
mixed_ploidies No mixed ploidies detected Mixed ploidies detected.

The function should never return an error, if VCF doesn't exist or is malformed, it returns a single message about it.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions