Nonlinear Decision Boundaries with a Two-Layer Neural Network

Implementation of a regularized 2-layer multiclass classifier on nonlinear 2D benchmarks (flower, spiral) in NumPy (explicit backprop) and PyTorch (module-based baseline).

Model class

A single-hidden-layer network:

$$z_2 = W_1 x + b_1 \\\ a_2 = \phi(z_2) \\\ z_3 = W_2 a_2 + b_2 \\\ p = \mathrm{softmax}(z_3)$$

with activation $\phi \in {\mathrm{ReLU}, \sigma}$. This is the simplest nonlinear hypothesis class that can represent nonlinearly separable decision boundaries (e.g., spiral), illustrating feature learning via learned hidden representations.

Objective

Empirical risk minimization with L2 regularization (weight decay):

$$\mathcal{L} = \frac{1}{n}\sum_{i=1}^{n} \mathrm{CE}(y_i, p_i) + \frac{\lambda}{2}\left(\lVert W_1\rVert_F^2 + \lVert W_2\rVert_F^2\right).$$

Cross-entropy corresponds to the negative log-likelihood under a categorical model $p(y\mid x)$.

Optimization

Full-batch gradient descent / SGD using analytically derived gradients; demonstrates the chain rule through softmax + cross-entropy (yielding $p-y$) and through the hidden nonlinearity.

Numerical stability

Softmax is computed with log-sum-exp shifting to prevent overflow.

Verification

Finite-difference gradient checking (using sigmoid to ensure differentiability everywhere) validates the NumPy backprop by comparing:

$$\frac{\lVert g_{\text{num}} - g_{\text{ana}}\rVert}{\lVert g_{\text{num}} + g_{\text{ana}}\rVert}$$

against a small tolerance.

Quickstart

1) Install

python -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt

2) Run training

NumPy (Flower):

python scripts/train_flower_numpy.py

NumPy (Spiral):

python scripts/train_spiral_numpy.py

PyTorch (Flower):

python scripts/train_flower_torch.py

PyTorch (Spiral):

python scripts/train_spiral_torch.py

Plots are saved to:

outputs/figures/flower-boundary.jpg
outputs/figures/spiral-boundary.jpg

Results

Flower dataset

Flower decision boundary

Spiral dataset

Spiral decision boundary

Gradient checking

Gradient checking uses a sigmoid hidden activation so the network is differentiable everywhere (ReLU is not differentiable at 0, which makes numerical gradients disagree near 0).

Run:

python scripts/gradient_check.py

The printed output includes the relative difference:

$$\frac{\lVert g_{\text{num}} - g_{\text{ana}}\rVert}{\lVert g_{\text{num}} + g_{\text{ana}}\rVert}$$

A typical pass condition is diff < 1e-6.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
outputs/figures		outputs/figures
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nonlinear Decision Boundaries with a Two-Layer Neural Network

Model class

Objective

Optimization

Numerical stability

Verification

Quickstart

1) Install

2) Run training

Results

Flower dataset

Flower decision boundary

Spiral dataset

Spiral decision boundary

Gradient checking

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Nonlinear Decision Boundaries with a Two-Layer Neural Network

Model class

Objective

Optimization

Numerical stability

Verification

Quickstart

1) Install

2) Run training

Results

Flower dataset

Flower decision boundary

Spiral dataset

Spiral decision boundary

Gradient checking

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages