Adding vcf_sanity_check function to modules
Example of usage:
res <- vcf_sanity_check(vcf_path, n_data_lines = 100, max_markers = 10000, verbose = FALSE)
It checks:
| Check |
FALSE Message |
TRUE Message |
| VCF_header |
VCF header is missing. Please check the file format. |
VCF header is present. |
| VCF_compressed |
VCF is compressed but filename doesn't have the extension or it has non-supported format |
VCF is .gz compressed or uncompressed |
| VCF_columns |
Required VCF columns are missing. Please check the file format. |
Required VCF columns are present. |
| max_markers |
More than 10,000 markers found. Consider subsampling or running in HPC. |
Less than maximum number of markers found. |
| GT |
Genotype information is not available in the VCF file. |
Genotype information is available in the VCF file. |
| allele_counts |
Allele counts are not available in the VCF file. |
Allele counts are available in the VCF file. |
| samples |
Sample information is not available in the VCF file. |
Sample information is available in the VCF file. |
| chrom_info |
Chromosome information is not available in the VCF file. |
Chromosome information is available in the VCF file. |
| pos_info |
Position information is not available in the VCF file. |
Position information is available in the VCF file. |
| ref_alt |
REF/ALT fields contain invalid nucleotide codes. |
REF/ALT fields are valid. |
| multiallelics |
Multiallelic sites not found in the VCF file. |
Multiallelic sites found in the VCF file. |
| phased_GT |
Phased genotypes (|) are not present in the VCF file. |
Phased genotypes (|) are present in the VCF file. |
| duplicated_samples |
No duplicated sample IDs found. |
Duplicated sample IDs found. |
| duplicated_markers |
No duplicated marker IDs found. |
Duplicated marker IDs found. |
| mixed_ploidies |
No mixed ploidies detected |
Mixed ploidies detected. |
The function should never return an error, if VCF doesn't exist or is malformed, it returns a single message about it.
Adding vcf_sanity_check function to modules
Example of usage:
It checks:
The function should never return an error, if VCF doesn't exist or is malformed, it returns a single message about it.