gNALI (gene nonessentiality and loss-of-function identifier) is a tool to find and filter high-confidence loss-of-function variants of genes. gNALI has built-in support for gnomADv2.1.1 and gnomadv3.1.1 and can be configured to be used with other VCF databases.
NOTE: loss-of-function is influenced by the genome build. Not all variants available in gnomADv2.1.1 are available in gnomADv3 and vice versa.
- Website: https://phac-nml.github.io/gnali/
- Installation: https://phac-nml.github.io/gnali/install.html/
- Parameters: https://phac-nml.github.io/gnali/parameters.html/
Installation
We commend installing gNALI as a Conda package. gNALI may be installed on any 64-bit Linux system using Bioconda (further details are available in the documentation):
- Install Bioconda
- Install the
gnaliBioconda package (conda install gnali).
gNALI may also be installed directly and instructions are available in the documentation.
After installing, optionally run the command gnali_get_data <reference genome> to download reference files required to add loss-of-function annotations.
- For use with gnomADv2 or gnomADv3, you do not have to run
gnali_get_data - For use with custom databases WITH loss-of-function annotations, you do not have to run
gnali_get_data - For use with custom databases WITHOUT loss-of-function annotations, run
gnali_get_data grch37orgnali_get_data grch38depending on the reference genome used
The files downloaded by gnali_get_data for each reference genome require abut 35GB of disk space and took 1.5 hours on our systems with 16GB of RAM and a 3.20GHz processor.
gNALI's command line arguments can be found by running:
gnali --help
Please refer to the documentation for more details.
Input
Your input file must be of format .csv, .txt, or tsv and should contain a list of genes
(as HGNC symbols) to test, separated by newline characters.
It should not contain any blank lines until the end of the list.
Here is an example of a valid input file:
CCR5
ALCAM
Output
You can specify an output folder name, otherwise, the output will use a default name (results-<id> with a randomly generated unique ID). gNALI will tell you where your output goes after it finishes executing.
gNALI by default provides two output files:
- A basic output file, containing genes from your input file with high-confidence loss-of-function variants that pass filtering
- A detailed output file with additional information
Example commands
gNALI requires at minimum an input file. Here is a simple example of running gNALI:
gnali -i my_genes.txt -o my_results
- An input file called
my_genes.txtis tested for high-confidence loss-of-function variants - The output folder
my_resultswill contain the output files
Simple and advanced walkthroughs are available in the documentation
Population Frequencies
When using the population frequencies feature (-P/--pop_freqs):
Per population group:
- "AC" denotes allele count
- "AN" denotes allele number
- "AF" denotes allele frequency
Before running tests, run gnali_get_data test to install required files. This is not necessary if you have already run gnali_get_data after the initial installation of gNALI.
Copyright Government of Canada 2020-2021
Written by: National Microbiology Laboratory, Public Health Agency of Canada
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this work except in compliance with the License. You may obtain a copy of the License at:
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Gary Van Domselaar: [email protected]
Xia Liu: [email protected]