This repository contains code to run the inference framework that estimates time-varying ascertainment rates of COVID-19 cases (Russell et al.). To do so, we use a Gaussian Process modelling framework, fit to the confirmed COVID-19 death time series for the country or region in question (see Russell et al. for more details on the methods and limitations involved).
To run the code, first of all clone this repository, using the command
git clone https://github.com/thimotei/CFR_calculationThe time-varying estimates result from fitting a Guassian Process model, which
is implemented in the R libraries greta and greta.gp.
These need to be run from a virtual environment, which is taken care of in the
script the model is run from. Specifically, the user needs to run the following
commands to ensure the necessary packages are installed
install.packages(c("reticulate", "greta", "greta.gp"))reticulate is required for a virtual environment to
python, as greta requires a virtual environment, as
it uses tensorflow called from this virtual environment.
The user therefore needs to install the correct version of
tensorflow for greta. This is done from R with the
following commands (the same commands are in the main script, but commented out
and need only to be run once):
library(reticulate)
use_condaenv('r-reticulate', required = TRUE)
library(greta)
library(greta.gp)
greta::install_tensorflow(method = "conda",
version = "1.14.0",
extra_packages = "tensorflow-probability==0.7")Once the user has installed tensorflow, they can run the model
from within the script
scripts/main_script_GP.Rwhich runs the model for a single country or region, specified by the 3-letter iso-code. The script downloads the latest data from Johns Hopkins COVID-19 dataset here and munges the data into the correct format using this function
R/jhu_data_import.RTo run the model at scale, a HPC is used, using the scripts found in
hpc_scripts