Code accompanying the preprint: RNA genome conservation and secondary structure in SARS-CoV-2 and SARS-related viruses https://doi.org/10.1101/2020.03.27.012906
Top-level python scripts run conservation and secondary structure analysis:
conservation.pyfinds conserved intervals in SARS-related viruses and SARS-CoV-2 sequencesunstructured.pyfinds unstructured intervals, and conserved unstructured intervalsrnaz_analysis.pyanalyzes the RNAz screen data and compiles conserved structured intervalsalifoldz_analysis.pyprepares alignment windows for rscape and alifoldz analysis, and compares alifoldz hits with those from RNAz
The alignments folder includes starting alignments of SARS-related and SARS-CoV-2 sequences.
The rnaz_data folder includes output from a genome-wide RNAz screen on SARS-related viruses.
The alifoldz folder includes output from alifoldz analysis.
The rscape folder includes output from rscape analysis.
The scanfold_data folder includes ScanFold output from Andrews, et al. bioRXiv 2020
The example_results folder includes example output files from the top-level python scripts, which should be reproduced by running the scripts.
python packages in (pip install requirements.txt):
- scipy
- numpy
- biopython
External Daslab dependencies:
- arnie
- Contrafold 2.0 is used for secondary structure calculations
External packages:
- R-scape v1.4.0