NPStat is a tool designed for population genetics tests and estimators for pooled Next Generation Sequencing (NGS) data. This documentation provides detailed instructions on how to build and use the NPStat application using Docker, ensuring a consistent and reproducible environment across different systems.
Before proceeding, ensure that Docker is installed on your system. You can download and install Docker from the official Docker website.
For Linux systems, you can install Docker using the package manager of your distribution. For MacOS and Windows, you can download the Docker Desktop application from the official Docker website. For example, to install Docker on Ubuntu/Debian or CentOS, you can use the following commands:
sudo apt-get update
sudo apt-get install docker-cesudo yum install dockerFor more detailed installation instructions, refer to the official Docker documentation.
- Download Docker Desktop: Visit the Docker Desktop for Mac page.
- Run the Installer: Open the downloaded .dmg file and drag Docker to the Applications folder.
- Start Docker Desktop: Open Docker from the Applications folder.
- Verify Installation: Open a Terminal and run:
docker --version- Download Docker Desktop: Visit the Docker Desktop for Windows page.
- Run the Installer: Open the downloaded .exe file and follow the installation instructions.
Before using the container, you need to pull the image from Docker Hub. The image is available at the following location: biotechvana/npstat.
docker pull biotechvana/npstattag the image, if you want to use a different name or to avoid using the long image name
docker tag biotechvana/npstat npstatVerify that the image has been successfully pulled by running the following command:
docker run --rm npstatTo run the NPStat application, use the following command:
docker run --rm npstatThis command will display the usage information for NPStat.
To analyze specific data, you can mount a local directory containing your data files to the container and specify the input files. For example:
docker run --rm -v /path/to/data:/data npstat -n sample_size -l window_length [options] FILE.pileup- Replace
sample_sizewith the haploid sample size. - Replace
window_lengthwith the window length in bases. - Replace
/path/to/datawith the path to your local directory containing the data files andinput_filewith the name of your input file. - Note: that
:/dataspecifies the mount point inside the container where the data files will be available and mounted as the current working directory. And the mount point/datais fixed inside the container and should not be changed./datais also the default working directory inside the container.
For example, if you have a data file named FILE.pileup in a directory /home/user/npstat_data, you can run NPStat as follows:
docker run --rm -v /home/user/npstat_data:/data npstat -i /data/FILE.pileup
## as /data is the working directory inside the container, you can also run it as follows
docker run --rm -v /home/user/npstat_data:/data npstat -i FILE.pileupanother example if you have a data file named FILE.pileup in your current directory, you can run NPStat as follows:
docker run --rm -v $(pwd):/data npstat -i FILE.pileup
## or
docker run --rm -v ./:/data npstat -i FILE.pileupOutput: The output file will be generated in the same directory where the input file is located.
If you want to build the Docker image locally, you can follow these steps:
- Clone the NPStat repository from GitHub:
git clone https://github.com/ahmedihafez/npstat.git- Change to the NPStat directory:
cd npstat- Build the Docker image using the provided Dockerfile:
docker build -t npstat .- Verify that the image has been successfully built by running the following command:
docker run --rm npstatUsing Docker to build and run NPStat provides a consistent and reproducible environment, minimizing issues related to dependencies and system configurations. For more information on NPStat and its applications, refer to the NPStat GitHub repository and related literature on population genetics and NGS data analysis GitHub.
References
- Ferretti, L., Ramos-Onsins, S.E., & Perez-Enciso, M. (2013). Population genomics from pool sequencing. Molecular Ecology, 22(17), 3845-3863. DOI: 10.1111/mec.12522.
- GitHub - lucaferretti/npstat: Population genetics tests and estimators for pooled NGS data. Retrieved from GitHub.