There two main schools of thought on how to run jobs on Alvis:
module system and virtual environmentsI reccommend using virtual environments and limit to only use pip-installed packages when possible, as it allows for easier sharing of code and environments with others, for example via requirements.txt files (pip freeze > requirements.txt to create it, and pip install -r requirements.txt to install the packages from it).
Using containers allows for better local reproducibility and isolation of dependencies, but it might be more complex to set up initially and hard to share with others.
I reccommend using containers when the code to run has complex dependencies that are hard to install via pip or conda, for example non-Python libraries or packages that require specific system configurations.
To create and use a virtual environment on Alvis with Python 3.11, first ssh and login, then go the directory where to create the virtual environment and run:
module load Python/3.11.3-GCCcore-12.3.0
module load PyTorch/2.1.2-foss-2023a-CUDA-12.1.1
# NOTE: If you try to run the following command without pre-loading the above
# modules, Alvis will complain and tell that python3 is not even an available
# command.
python3 -m venv my-env-name
The environment will be created at my-env-name.
To install packages, run the following:
source path/to/my/environment/my-env-name/bin/activate
pip install --upgrade pip
pip install jupyter
pip install <what you need>
pip install bitsandbytes
pip install optuna
pip install h5py
pip install optuna
pip install rdkit
pip install pynvml tqdm jsonargparse nltk rouge_score evaluate
pip install pandas tensorboard tabulate scikit-learn
pip install datasets
pip install tokenizers
pip install accelerate
pip install huggingface
pip install bitsandbytes
pip install trl
Once eveything is ready, one can “wrap” the above setup steps into a bash script, for example named setup_environment.sh, containing:
#!/bin/bash
# Load necessary modules
module load Python/3.11.3-GCCcore-12.3.0
module load PyTorch/2.1.2-foss-2023a-CUDA-12.1.1
# Source the virtual environment
source path/to/my/environment/my-env-name/bin/activate
The script can be run via the command:
source setup_environment.sh
One must use the source command, making the script executable is not enough.
Setting up the modules and virtual environment, i.e., source setup_environment.sh, can directly be placed in sbatch script files, for example:
#!/bin/bash
#SBATCH --account=my-account-number
#SBATCH --partition=alvis
#SBATCH --gpus-per-node=T4:1
#SBATCH --job-name=my-awesom-job
#SBATCH --time=2:00:00
source setup_environment.sh
python my_python_script.py
deactivate
Alternatively, in a less concise way:
#!/bin/bash
#SBATCH --account=my-account-number
#SBATCH --partition=alvis
#SBATCH --gpus-per-node=T4:1
#SBATCH --job-name=my-awesom-job
#SBATCH --time=2:00:00
# Load Python modules
module load Python/3.11.3-GCCcore-12.3.0
module load PyTorch/2.1.2-foss-2023a-CUDA-12.1.1
# Activate environment
source path/to/my/environment/my-env-name/bin/activate
# Run my stuff
python my_python_script.py
# Deactivate the environment
deactivate
To run Jupyter Notebooks on Alvis using the module system and virtual environments, first create a virtual environment as described above, install Jupyter via pip install jupyter, and then run:
source setup_environment.sh
jupyter notebook --no-browser
Copy the link provided by Jupyter Notebook and provide that to your local browser to access the notebooks.
The idea is to have a container that contains all the dependencies needed for running the code, and use that to develop code and run jobs on Alvis.
Very brutally speaking, a container is like a virtual machine, meaning that I can run “its terminal” and run commands inside it, but the code in it can access and interact with the host filesystem (i.e., the files on Alvis).
Alvis uses Apptainer for containerization.
Apptainer is the “manager” running the containers, but one can have different container files, each built in a different way (e.g., with different pip-installed packages).
To create a container, Alvis already provides some starting points, which are available in /apps/containers/Conda/. For example, one can use the miniconda-22.11.1.sif file as a base image.
Below is an example of a content of a recipe file (container_file.def) to create a “container file”:
Bootstrap: localimage
From: /apps/containers/Conda/miniconda-22.11.1.sif
# NOTE: The line above can be modified to use a different base image if needed,
# for example: "From <my/local/folders/container/file>.sif"
%post
# Put all your installation commands here, all the following commands will
# be executed when building the container. None of them are required, it's
# just an example of how to install packages and libraries.
apt-get -y update
apt-get -y install git-lfs
/opt/conda/bin/conda install -y pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
/opt/conda/bin/conda install -y -c huggingface transformers tokenizers datasets
/opt/conda/bin/conda install -y -c conda-forge accelerate pandas evaluate tensorboard tabulate scikit-learn
pip install pynvml tqdm jsonargparse nltk rouge_score
conda install -c conda-forge tabulate
conda install -y -c conda-forge scikit-learn
conda install -y -c anaconda scikit-learn
pip install rdkit
pip install -U "huggingface_hub[cli]"
conda remove transformers -y
pip install transformers
conda remove datasets -y
pip install datasets
pip install evaluate
pip install peft
[...]
To build the container, one shall run the following command:
apptainer build <path/to/destination/container/file>.sif <path/to/recipe/file>.def
Bash script (saved to, for example, run_jupyter_in_apptainer.sh) to run a Jupyter Notebook from a container:
#!/bin/bash
# Check if Apptainer path is provided
if [ "$#" -lt 1 ] || [ "$#" -gt 2 ]; then
echo "Usage: $0 <path-to-apptainer-image> [path-to-work-directory]"
exit 1
fi
# Path to the Apptainer image
IMAGE_PATH="$1"
# Optional path to the work directory
if [ -n "$2" ]; then
WORK_DIR="$2"
# Check if directory exists
if [ ! -d "$WORK_DIR" ]; then
echo "The specified directory does not exist: $WORK_DIR"
exit 1
fi
# Change to the specified directory
cd "$WORK_DIR"
fi
# Check if Apptainer is installed
if ! command -v apptainer &> /dev/null; then
echo "Apptainer is not installed. Please install it to continue."
exit 1
fi
# Run the Apptainer image with Jupyter Notebook
apptainer exec "$IMAGE_PATH" jupyter notebook --no-browser --ip=0.0.0.0 --port=8889 --allow-root
Example of usage: ./run_jupyter_in_apptainer.sh ./containers/container_file.sif
It will output something like the following:
[ribes@alvis2-02 ~]$ ./run_jupyter_in_apptainer.sh ./containers/container_file.sif
[I 2025-03-27 09:21:33.125 ServerApp] jupyter_lsp | extension was successfully linked.
[I 2025-03-27 09:21:33.129 ServerApp] jupyter_server_terminals | extension was successfully linked.
[I 2025-03-27 09:21:33.133 ServerApp] jupyterlab | extension was successfully linked.
[I 2025-03-27 09:21:33.137 ServerApp] notebook | extension was successfully linked.
[I 2025-03-27 09:21:33.448 ServerApp] notebook_shim | extension was successfully linked.
[I 2025-03-27 09:21:33.469 ServerApp] notebook_shim | extension was successfully loaded.
[I 2025-03-27 09:21:33.471 ServerApp] jupyter_lsp | extension was successfully loaded.
[I 2025-03-27 09:21:33.472 ServerApp] jupyter_server_terminals | extension was successfully loaded.
[I 2025-03-27 09:21:33.474 LabApp] JupyterLab extension loaded from /opt/conda/lib/python3.10/site-packages/jupyterlab
[I 2025-03-27 09:21:33.474 LabApp] JupyterLab application directory is /opt/conda/share/jupyter/lab
[I 2025-03-27 09:21:33.474 LabApp] Extension Manager is 'pypi'.
[I 2025-03-27 09:21:33.483 ServerApp] jupyterlab | extension was successfully loaded.
[I 2025-03-27 09:21:33.486 ServerApp] notebook | extension was successfully loaded.
[I 2025-03-27 09:21:33.487 ServerApp] Serving notebooks from local directory: /cephyr/users/ribes/Alvis
[I 2025-03-27 09:21:33.487 ServerApp] Jupyter Server 2.14.2 is running at:
[I 2025-03-27 09:21:33.487 ServerApp] http://alvis2-02:8889/tree?token=d263b3a10fa6558d1f673a646128a5d847948f6972df27d8
[I 2025-03-27 09:21:33.487 ServerApp] http://127.0.0.1:8889/tree?token=d263b3a10fa6558d1f673a646128a5d847948f6972df27d8
[I 2025-03-27 09:21:33.487 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2025-03-27 09:21:33.501 ServerApp]
To access the server, open this file in a browser:
file:///cephyr/users/ribes/Alvis/.local/share/jupyter/runtime/jpserver-65940-open.html
Or copy and paste one of these URLs:
http://alvis2-02:8889/tree?token=d263b3a10fa6558d1f673a646128a5d847948f6972df27d8
http://127.0.0.1:8889/tree?token=d263b3a10fa6558d1f673a646128a5d847948f6972df27d8
[I 2025-03-27 09:21:33.517 ServerApp] Skipped non-installed server(s): bash-language-server, dockerfile-language-server-nodejs, javascript-typescript-langserver, je
di-language-server, julia-language-server, pyright, python-language-server, python-lsp-server, r-languageserver, sql-language-server, texlab, typescript-language-se
rver, unified-language-server, vscode-css-languageserver-bin, vscode-html-languageserver-bin, vscode-json-languageserver-bin, yaml-language-server
In VSCode, with an open notebook, with the Select kernel button in the top-right corner, one can point to the existing running kernel by pasting the link that shows up when running the jupyter server.
In practice, look at the output for a link that looks something like this: http://alvis2-02:8889/tree?token=d263b3a10fa6558d1f673a646128a5d847948f6972df27d8 (it’s in the example above).
For having a jupyter notebook with GPU support, one shall create a session on Alvis on-demand, then open a terminal, and finally follow the same steps above.
Example of sbatch script (CPU-only):
#!/usr/bin/env bash
#SBATCH -A NAISS2024-5-630 -p alvis
#SBATCH -N 1
#SBATCH -C NOGPU
#SBATCH --cpus-per-task=32
#SBATCH -t 0-2:00:00
#SBATCH -J "slurm-score-predictions"
#SBATCH -o slurm-score-predictions.log
cd $SLURM_SUBMIT_DIR
echo "Running score_predictions.py"
export PYTHONPATH=$PYTHONPATH:/my/local/dir/containing/code
apptainer exec ~/containers/container_file.sif python $SLURM_SUBMIT_DIR/scripts/score_predictions.py --num_proc=32 --skip_if_log_exists
Example of sbatch script with GPU:
#!/usr/bin/env bash
#SBATCH -N 1 --gpus-per-node=T4:1
#SBATCH -t 0-2:00:00
#SBATCH -J "slurm-predict-MyModel"
#SBATCH -o slurm-predict-MyModel.log # Output log file
echo "Starting at `date`"
echo "Running on hosts: $SLURM_NODELIST"
echo "Running on $SLURM_NNODES nodes."
echo "Running $SLURM_NTASKS tasks."
echo "Job id is $SLURM_JOBID"
echo "Job submission directory is: $SLURM_SUBMIT_DIR"
cd $SLURM_SUBMIT_DIR
local_dir=my/local/dir/containing/code
# echo "-------------------------------------------------------------------------"
# apptainer exec ~/containers/container_file.sif accelerate env
# echo "-------------------------------------------------------------------------"
export PYTHONPATH=$PYTHONPATH:my/local/dir/containing/code/
apptainer exec ~/containers/container_file.sif python ${local_dir}/scripts/collect_llm_predictions.py \
--model_name=MyModel \
--batch_size=32 \
--num_proc=16 \
--eval_gen_strategies=true
The goal is to ensure that all figures are clear, consistent, and suitable for publication. This is mainly a collection of personal notes coming from experience with various tools and formats, and it is not exhaustive nor a definitive guide (I’m still learning myself!).
The final results shall be done in Adobe Illustrator (Ai), since it’s the tool giving the best control over the final output.
Tools used:
| Tool | Version | Purpose |
|---|---|---|
| Matplotlib/Seaborn | v3.10.3 / v0.13.2 | Data visualization, e.g., bar plots, line plots, etc. |
| DrawIO | v27.0.9 | Diagrams, flowcharts, and simple figures (it supports LaTeX equations too) |
| ChemDraw | 20.1 | Chemical structures and reactions |
| Adobe Illustrator | 2025 | Final drawings, e.g., multi-panel figures |
The main idea is to work on “artboards” that are the same size as the final image size, which is usually the text width of the paper. This allows for easy scaling and ensures that all elements are properly aligned and sized everywhere and in the final manuscript (I’m using the Adobe Illustrator term to indicate the area where the drawing is done, which might be called differenlty in other tools).
In general, as unit of measurement, use pt (points) for all drawings to ease consistency.
To determine the artoboard dimensions, check the LaTeX style (it can be in the main header, or in a separate class or style file) for the maxium image width, that usually is the same as the text width.
Finally, remember that a figure might be placed in a single column or have multiple panels, so use rulers or guides to work on the right size and alignment of the arboard,e.g., in half of the text width for a single column figure.
Be consistent in every figure you make by setting the followings:
Sans Serif or Helvetica. NOTE: it can be different from the one used in the manuscript, but it should be consistent across all figures.11pt#000000 (black)I usually check the font used by checking the LaTeX style file (ChatGPT can help with this). If not available in my installed fonts, I download it and install it on my system.
Downloaded fonts can be in .otf or .ttf format, and they can be installed on macOS by double-clicking the file and clicking “Install Font” in the Font Book app.
Sometimes the font might not be available in the plotting library, even after installing it.
In this case, one can use the matplotlib.font_manager module to set the font globally for all plots.
Here’s an example of how to do this with Matplotlib and Seaborn:
import matplotlib.font_manager as fm
from matplotlib import pyplot as plt
import seaborn as sns
fontsize = 10
fontname = "Latin Modern Roman" # Change this to the font you want to use
# Set publication-like style with 11pt font
sns.set_context("paper", font_scale=1/fontsize)
sns.set_style("white")
# Rebuild font cache (to be done only once after installing new fonts)
# This will make the font available for Matplotlib, comment it out after the
# first run, otherwise it will take a long time to run every time you start the
# script.
fm._load_fontmanager(try_read_cache=False)
# Double check the font is available
print([f.name for f in fm.fontManager.ttflist if fontname in f.name])
print(fm.findfont(fontname))
# Set global font settings for Matplotlib and Seaborn
# NOTE: This must happen after changing global settings in seaborn
plt.rcParams["font.family"] = fontname
plt.rcParams["mathtext.fontset"] = "custom"
plt.rcParams["mathtext.rm"] = fontname
plt.rcParams["font.size"] = fontsize
For plots with complex images, for example with million of markers in a scatter plot for visualizing a chemical space, flag the rasterize=True argument in the plotting command, e.g., plt.scatter(..., rasterize=True).
This will avoid a monstrously large PDF image that cannot be imported anywhere else.
In this scenario, always specify the dpi (dot per inch) in the savefig command, e.g., dpi=300 or dpi=600 for high-resolution images, even when saving as PDF or SVG, as the default will make the rasterized part very, very, blurry.
Fix a palette of HEX colors, and use it consistently across all figures. This will help maintain a consistent look and feel. Example:
# Dictionary of hue (for different categories to plot) and respective color
# NOTE: These colors might look weird, sorry, I won't give you my cool palette,
# so go and make your own awesome style! :)
palette_colors = {
'Train': '#FF2720',
'Validation': '#FE222E',
'Test': '#F42D27',
}
sns.scatterplot(..., hue='split', palette=palette_colors)
DrawIO is a quick and easy option for creating diagrams and flowcharts. It’s main limitation is its inability to import PDF images, like plots or chemical structures, so one has to export those images to SVG and then import them into DrawIO.
If one is skilled enough, DrawIO might be used as a standalone tool for creating the final figures for the paper, but it lacks some advanced features and flexibility compared to Adobe Illustrator.
DrawIO supports LaTeX equations.
To enable this, tick on Extras -> Mathematical Typesetting, and then encapsulate LaTeX equations in text boxes with $$ symbols, like $$E = mc^2$$.
Text boxes places close to the image edge might also create “misterious” extra white space in the exported PDF.
To overcome this, in the Style tab of the text box properties, set the property Text Overflow to Block and tick the Resizeable option to allow you to resize the text box as needed.
Exporting DrawIO drawings to SVG might cause issues when importing those to Adobe Illustrator, so it’s better to export to PDF directly.
To export, first select the area to export, then go to File -> Export as -> PDF.... In the export dialog, make sure to set the border width to 0.
The final image might still have some white edges, so it might be necessary to crop it using some other tools.
I personally like this online website for automatically cropping and removing the white edges from PDF files.
Be aware that one needs to upload the PDF file to the website, so if the file contains sensitive information, use a local tool instead!
When exporting to PDF in DrawIO, one can also choose to include a copy of the diagram in the PDF file, which can be useful for further editing in DrawIO later on. This is done by checking the “Include a copy of my diagram” option in the export dialog.
One shall set the document size as the same as the final image size, which is usually the text width of the paper. ChemDraw might not allow for setting the document size in pt, so one can set it in mm and then convert it to pt (1 pt = 0.352778 mm).
Depending on the target of the manuscript, one shall set the molecule style to the appropriate one, for example, after selecting the molecule(s), go to Object -> Apply Object Settings from and choose the desired style, such as “ACS Document 1996” for American Chemical Society publications.
Set it default to the whole document if asked.
All chemical related figures must have the same size everywhere in the manuscript (funny how attached code is usually unreadeable, but chemical style is enforced…). Hence, scaling is allowed, as long as it doesn’t make the chemical structures too small and applied to all with the same settings. When scaling, scale labels too, and 60% is usually a good compromise but it depends on the final image size.
Kekulization is SMILES-dependent, so if you a molecule looks odd, copy it as SMILES and paste it in a RDKit-powered script to modify it. For example:
from rdkit import Chem
from rdkit.Chem import Kekulize
# Kekulize the SMILES string
smi = "COC1=CC2=C(OC[C@@H]3CCC(N3)=O)N=CC([*:1])=C2C=C1C(N)=O"
mol = Chem.MolFromSmiles(smi)
print(Chem.MolToSmiles(mol, kekuleSmiles=False))
I personally found that working on a single document and putting all required chemical structures there is the best way to ensure consistency and avoid issues with different styles or sizes.
When done, export the document to PDF using File -> Export... and choose SVG as the format, ready to be imported into Adobe Illustrator.
Color mode should be set to RGB, not CMYK, otherwise colors might not be rendered correctly in the final PDF.
Layers are very useful. Groups too, for example when importing an SVG file containing multiple chemical structures. In fact, every chemical structure is a collection of lines, shapes, and text, so grouping them together allows for easier manipulation and alignment.
Naively cropping PDF images in Ai will result in a rasterized image, which is bad, so it’s better to have PDF already properly sized before importing to Ai.
When importing PDF files, make sure to check the “Link” option, so that the original PDF file is not embedded in the Ai file, which can lead to large file sizes and it’s convenient when experimenting. For example, making the right plots can require multiple iterations, and having the original PDF linked allows for quick updates without needing to re-import the file.
When exporting the final figure, use File -> Export As... and choose the desired format (PDF, SVG, PNG, etc.). For publication-quality figures, PDF is preferred as it retains vector graphics quality.
SVGs will be separately opened in Adobe Illustrator, and they allow for full editing of the vector graphics (i.e., you can every line, text, and shape).
If an Ai file includes rasterized images, when exporting to PDF, in the Compression tab, untick the Compress Text and Line Art option to avoid rasterizing the text and lines, which can lead to loss of quality.
When importing figures into LaTeX, specify the width as a fraction of the text width, e.g., \includegraphics[width=\textwidth]{figure.pdf}. If eveything is done correctly, the figure should scale properly to fit the text width.
For example:
\begin{figure}
\centering
\includegraphics[width=0.99\textwidth]{my/awesome/picture.pdf}
\caption{My awesome picture showing the results of my research.}
\label{fig:my-awesome-picture}
\end{figure}