Information Visualization Starter Kit
This repository contains companion materials for the Information Visualization course at Aalto CS.
Setup instructions below. Brief summary of folder contents:
.
├── data: Datasets needed for some visualizations (+ dataset license)
├── figures: Figures produced by the code in the notebooks folder
├── notebooks: Jupyter notebooks to produce example figures (many of them used on the lecture slides)
└── src: See below
Install uv (documentation)
On macOS and Linux:
curl -LsSf https://astral.sh/uv/install.sh | shOn Windows:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"Creat a new folder (e.g., infoviz) and navigate to that folder in your shell.
Then run:
uv init --no-workspace --no-readme --lib --name infovizThis will create the following structure in your current working directory:
.
├── pyproject.toml
└── src
└── infoviz
├── __init__.py
└── py.typed
The folder structure may look odd, but you don't need to worry about it. The setup has a number of workflow benefits (see below).
uv add pandas seaborn scipy numpy jupyterlab colorcet cmasher scikit-learnThis will install the named packages and add them to your project's dependencies.
It will also create a uv.lock and a .python-version.
In the root of your main folder, run:
uv run jupyter labThis will open an interactive coding environment in your browser.
I recommend creating a separate folder named notebooks for all your Jupyter notebooks (e.g., in your root directory).
For example, create a utils.py as a sibling of __init__.py, and put any functions you may want to reuse for different plots there.
I often use the following (if you have LaTeX installed locally, you can comment in the lines currently starting with #, which will allow you to plot labels using LaTeX):
import matplotlib.pyplot as plt
def set_rcParams(**kwargs):
plt.rcParams['pdf.fonttype'] = 42
# plt.rcParams['text.usetex'] = True
plt.rcParams['font.family'] = 'serif'
# plt.rcParams['text.latex.preamble'] = r"\usepackage{amssymb}\usepackage{amsmath}\usepackage{times}"
for k, v in kwargs.items():
try:
plt.rcParams[k] = v
except KeyError:
passBecause you set up your project as a package, using set_params in a Jupyter notebook is as simple as:
from infoviz.utils import set_rcParamsYou can run scripts in your Jupyter notebooks using so-called line magics (%), e.g., to run the script helper.py, you can do:
%run helper.pyThis can be useful to outsource imports that recur across many of your Jupyter notebooks.
For example, I might add the following to my helper.py:
import os
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
import seaborn as sns
import colorcet as cc
from infoviz.utils import set_rcParams
os.makedirs("../figures", exist_ok=True)
fontsize = 20
rcParams = {'figure.labelsize':fontsize, 'axes.labelsize':fontsize, 'xtick.labelsize':fontsize,
'ytick.labelsize':fontsize, 'legend.fontsize':fontsize, 'figure.titlesize':fontsize,
'legend.title_fontsize':fontsize,
'axes.titlesize':fontsize, 'legend.frameon':False}
set_rcParams(**rcParams)Assuming you created utils.py and helper.py as in the workflow tips above, you can now do the following in a Jupyter notebook (say, notebooks/pca.ipynb) to perform dimensionality reduction on the iris dataset.
Run your helper script and import the necessary scikit-learn libraries:
%run helper.py
from sklearn import datasets
from sklearn.decomposition import PCALoad the iris dataset, perform PCA, and pour the results into a dataframe:
iris = datasets.load_iris()
X = iris.data
y = iris.target
target_names = iris.target_names
pca = PCA(n_components=2)
X_r = pca.fit(X).transform(X)
df = pd.DataFrame(X_r, columns=["First Principal Component","Second Principal Component"])
df["y"] = yFinally, plot the results using seaborn:
fig, ax = plt.subplots(1,1, figsize=(6,6))
sns.scatterplot(df, x="First Principal Component", y="Second Principal Component",
hue="y", style="y", hue_order=[0,1,2], style_order=[0,1,2],
palette=cc.glasbey_hv[:3], s=50, markers=["o","X","s"]
)
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles, target_names, markerscale=2, handletextpad=0.25, borderpad=0, borderaxespad=0)
sns.despine(fig)
plt.savefig("../figures/pca-iris.png", bbox_inches="tight", transparent=False)- Use vector formats (e.g.,
.pdf) to save figures wherever possible to enable smooth zooming. Raster formats (e.g.,.png) will become pixelated when zooming in (but you may need them to use your figures in some presentation software, e.g., Google Slides). - Set
transparent=Trueto disable the white background. This is useful, e.g., when making figures for slides that have a non-white background color.
