Machine learning methods to analyze and predict protein structure, dynamics and function
The content of this exercise aligns with the morning lecture "Learning models of complex dynamics from simulation data" held by Bettina Keller.
The exercise will make use of Jupyter notebooks and requires (a recent version of) the following Python packages:
- matplotlib
- numpy
- scipy
- sklearn
- pyemma
- nglview
- cnnclustering
We recommend to use a Python 3.8 based (virtual) environment for this exercise. Using conda maybe the easiest solution here.
If you want to read up on Python virtual environments, you could start with Python Virtual Environments: A Primer.
If you use conda, a ready to use conda virtual environment with all the requirements installed can be created using the provided environment.yml file:
conda env create -f environment.ymlThen activate the new environment:
conda activate AlgoSBThis is equivalent to a manual creation of a fresh environment followed by an installation of the needed packages:
conda create --name AlgoSB python=3.8 -y
conda activate AlgoSB
conda install matplotlib numpy scipy scikit-learn pyemma nglview -c conda-forgePlease note that the cnnclustering package is only available on PyPi:
pip install cnnclusteringor directly from the development repository on GitHub:
git clone https://github.com/janjoswig/CommonNNClustering.git
cd CommonNNClustering
pip install .If you genrally prefer pip over conda to manage packages, you can instead install all requirements from the provided requirements.txt file:
pip install -r requirements.txtPlease note, that the installation of pyemma via pip can be sometimes problematic, though.
If you like to use the notebook in Colab, consider installing the requirements via condacolab. Open the notebook in Colab and add the following:
!pip install -q condacolab
import condacolab
condacolab.install()Then install only the still missing dependencies:
!conda install pyemma nglview -c conda-forge
!pip install cnnclustering