-
Notifications
You must be signed in to change notification settings - Fork 0
Description
In updating one of the jupyterhub demos, it is not easy to find a combination that works. Everything has to match, even up to patch version: the spark version, the python version, and the requirements must be compatible with the host image libraries (e.g. pandas, scikit-learn etc.). Otherwise there are a lot of serial-class-ID errors on de/serialization. A lot of trial-and-error is involved.
The older jupyter/pyspark-notebook:python-3.11 images have been discontinued and they have been replaced with these from quay.io. Testing these illustrates some typical incompatibilites:
quay.io/jupyter/pyspark-notebook:python-3.11.10 (notebook: spark 3.5.3)
quay.io/jupyter/pyspark-notebook:python-3.11.9 (notebook: spark 3.5.2/python 3.11.9, SDP 3.5.2: python 3.11.7)
quay.io/jupyter/pyspark-notebook:python-3.11.8 (notebook: spark 3.5.1/python 3.11.8, SDP 3.5.1: python 3.11.7)
quay.io/jupyter/pyspark-notebook:python-3.11.7 (notebook: spark 3.5.0, but.... etc.etc.)
Proposal
Build our own jupyterhub images, so we can co-ordinate these versions.
See if opendatahub images can be reused.
See also https://drive.google.com/file/d/1-f9wQutYNBQr2iyIXDQEsYZl2j4TWvqM/view?usp=sharing
Links
https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#jupyter-pyspark-notebook