Graph Neural Networks (GNNs) have emerged as a prominent class of data-driven methods for molecular property prediction. However, a key limitation of typical GNN models is their inability to quantify uncertainties in the predictions. This capability is crucial for ensuring the trustworthy use and deployment of models in downstream tasks. To that end, we introduce AutoGNNUQ, an automated uncertainty quantification (UQ) approach for molecular property prediction. AutoGNNUQ leverages architecture search to generate an ensemble of high-performing GNNs, enabling the estimation of predictive uncertainties. Our approach employs variance decomposition to separate data (aleatoric) and model (epistemic) uncertainties, providing valuable insights for reducing them. In our computational experiments, we demonstrate that AutoGNNUQ outperforms existing UQ methods in terms of both prediction accuracy and UQ performance on multiple benchmark datasets. Additionally, we utilize t-SNE visualization to explore correlations between molecular features and uncertainty, offering insight for dataset improvement. AutoGNNUQ has broad applicability in domains such as drug discovery and materials science, where accurate uncertainty quantification is crucial for decision-making.
Presentation
- 👉 AIChE
Paper
- 👉 arXiv
The necessary data can be found in the data folder. It includes the csv files for Lipo, FreeSolv, ESOL, and QM7 datasets from Deepchem.
Download the folder from the git repository and install it as a pip package with the editable option.
$ git clone -b AUTOGNNUQ --single-branch https://github.com/zavalab/ML.git
$ cd ML/AUTOGNNUQ
$ pip install -e .The trained weights are available here.
python = 3.8
deephyper = 0.3.3
tensorflow = 2.5.0
tensorflow-probability = 0.13.0
ray = 2.0.1This package heavily relies on DeepHyper. Detailed documentation and installation tutorials for different machines can be found at https://deephyper.readthedocs.io/en/latest/index.html.
- The
datafolder includes all raw data files in CSV format, containing SMILES strings and properties. - The
gnn_uqfolder contains all Python files used for neural architecture search (NAS), post-training analysis, and result analysis. - The
notebookfolder contains three Jupyter Notebooks for the results of NAS, UQ, and tSNE, respectively. - The
resultfolder contains pickle files with all the results. Instructions on how to load them can be found in the notebooks.
python gnn_uq/train.pyruns neural architecture search for uncertainty quantification using multi-GPU support.python gnn_uq/post_training.pyexecutes post-training with the top 10 architectures discovered.python gnn_uq/post_training_random.pyexecutes post-training with random 10 architectures discovered.
To run NAS training with multiple GPUs, submit the following Slurm script on your server.
#!/bin/bash
#SBATCH --job-name=some.name
#SBATCH --output=some.out
#SBATCH --error=some.err
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-gpu=1
#SBATCH --mem-per-cpu=32G
#SBATCH --gres=gpu:32
#SBATCH --time=24:00:00
export TRAIN="/some/dir/autognnuq/gnn_uq/train.py"
conda activate your.python.env
srun python $TRAIN