Skip to content

ShaunFChen/2020_FA_HPC

Repository files navigation

2020 FA HPC - Garibaldi (SLURM)

Author: Shaun Chen
First created: 2020/10/25
Last updated: 2021/06/29

Slide: 2021_summer_intern_boot_camp_HPC.pdf

Basic introduction of high-performance computing (HPC) in Scripps Research. This repository worked as class materials and was tested on Garibaldi only. Shared large datasets was deprecated.

Cheat sheet

Shell commands

  1. man: format and display the on-line manual pages.

  2. ssh: OpenSSH SSH client (remote login program).

  3. scp: secure copy (remote file copy program).

  4. watch: execute a program periodically, showing output fullscreen.

  5. vim, nano, emacs: programmers text editors.

  6. history: list command history.

  7. exit: exit the current session.

  8. Ctrl + C: interrupt a running script when you feel not right...

  9. Ctrl + R: Recall the last command matching the characters you provide.

Git commands

Prerequisite on Garibaldi: module load git

  1. git init: Create an empty git repository or reinitialize an existing one.

  2. git clone: Clone a repository into a new directory.

  3. git status: Show the working tree status.

  4. git add: Add file contents to the index.

  5. git commit -m: Record changes to the repository.

  6. git push: Update remote refs along with associated objects.

  7. git pull: Fetch from and merge with another repository or a local branch.

  • Note: suggested file size <50MB; limited <100MB.
  • Use .gitignore to specify intentionally untracked files that Git should ignore.

Simple Linux Utility for Resource Management (SLURM) Workload Manager

  1. showuserjobs: display the current usage of the entire cluster.

  2. sinfo: view information about Slurm nodes and partitions.

  3. squeue -u [username]: Request jobs or job steps from a comma separated list of users.

  4. srun --pty bash -i: Request a job in interactive terminal mode.

  5. sbatch: submit slurm job
    --export=<environment variables [ALL] | NONE>: Identify which environment variables from the submission environment are propagated to the launched application. --mem=<size[units]>: Specify the real memory required per node. -t, --time=<time>: Set a limit on the total run time of the job allocation. -J, --job-name=<jobname>: Specify a name for the job allocation. -p, --partition=<partition_names>: Request a specific partition for the resource allocation.

  6. scancel [job_id]: used to signal or cancel jobs, job arrays or job steps.

  7. module: command interface to the Modules package
    av: display all available modules load: Load modulefile(s) into the shell environment.
    unload: Remove modulefile(s) from the shell environment.
    purge: Unload all loaded modulefiles.

Job script template

#!/bin/bash
#SBATCH --time=24:00:00
#SBATCH --mem=16G
## SBATCH --nodes=1                   ### Node count required for the job
## SBATCH --ntasks=1                  ### Number of tasks to be launched per Node
## SBATCH --cpus-per-task=16

# sbatch --export=month=,day= run_slurm.qsub
# slurm will set the working directory with the job script by default.

module purge
module load R 

# always print the path of your working directory
pwd

echo “Hello HPC!! Today is ${month} ${day}

Case study on Gariabldi

Requirements

Modules and shell tools

  • R (module load R) with ggplot2
  • python >= 3.6.3 (module load python/3.6.3)
  • PLINK v2 (required_tools/plink2)
  • bcftools >= 1.9 (module load samtools)
  • vcftools >= 0.1.14 (module load vcftools)
  • ADMIXTURE (module load admixture)
    Alexander, David H., and Kenneth Lange. "Enhancements to the ADMIXTURE algorithm for individual ancestry estimation." BMC bioinformatics 12.1 (2011): 246.
  • RFMix2 (required_tools/rfmix2)
    Maples, Brian K., et al. "RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference." The American Journal of Human Genetics 93.2 (2013): 278-288.

Datasets

Test case

Unsupervised ancestry inference using PCA

Run:

sbatch --job-name=PCA_TGP 0_TGP_PCA.slurm

Expected output: figure_tgp_pca.png

Extra practice: supervised ancestry inference

0. (Optional) Convert personal 23andMe genetic data to VCF

It is sufficient to be done by srun --pty bash -i.

bcftools convert --tsv2vcf ${input_txt_path} -f ${ref_fasta_path) -s ${subject_ID} -Oz -o ${output_filename}.vcf.gz
tabix -p vcf ${output_filename}.vcf.gz

GRCh37 on Gariabldi: /gpfs/work/sfchen/human_g1k_v37.fasta

1. Local ancestry inference with RFMix2

sample input: /gpfs/work/sfchen/HGDP

sbatch --export=input=[vcf_gz_path] 1_ancestry_inference.slurm

sample metadata:

Sample_ID       sub_pop          sup_pop
HGDP00001       Brahui           CENTRAL_SOUTH_ASIA
HGDP01041       Pima             AMERICA
HGDP00977       Han              EAST_ASIA
HGDP01062       Sardinian        EUROPE
HGDP00580       Druze            MIDDLE_EAST
HGDP01419       BantuKenya       AFRICA
HGDP00664       Bougainville     OCEANIA

REFERENCE

  1. Scripps Reserach HPC (in Intranet) https://intranet.scripps.edu/its/highperformancecomputing/index.html
  2. HPC Challenges — A Perspective for General Data Analysis and Visualization http://web.eecs.utk.edu/~huangj/hpc/hpc_intro.php
  3. Tutorial: Produce PCA bi-plot for 1000 Genomes Phase III - Version 2. Biostars https://www.biostars.org/p/335605/
  4. Converting from 23andMe to VCF https://samtools.github.io/bcftools/howtos/convert.html
  5. Bergström, Anders, et al. "Insights into human genetic variation and population history from 929 diverse genomes." Science 367.6484 (2020).
  6. Skoglund, Pontus, and Iain Mathieson. "Ancient genomics of modern humans: the first decade." Annual review of genomics and human genetics 19 (2018): 381-404.
  7. Martin, Alicia R., et al. "Clinical use of current polygenic risk scores may exacerbate health disparities." Nature genetics 51.4 (2019): 584-591.
  8. Tian, Rui, Malay K. Basu, and Emidio Capriotti. "Computational methods and resources for the interpretation of genomic variants in cancer." BMC genomics 16.S8 (2015): S7.
  9. Alicia R., "Ancestry pipeline". GitHub, https://github.com/armartin/ancestry_pipeline
  10. Bernie Pope, "pbs2slurm". Github, https://github.com/bjpop/pbs2slurm

About

Class material for HPC in Scripps Research

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors